Springer Geophysics
For further volumes: http://www.springer.com/series/10173
•
Erik W. Grafarend
Joseph L. Awange
Applications of Linear and Nonlinear Models Fixed Effects, Random Effects, and Total Least Squares
123
Prof. Dr.-Ing. Erik W. Grafarend University of Stuttgart Geodetic Institute Geschwister-Scholl-Strasse 24D 70174 Stuttgart Germany
[email protected]
Prof. Dr.-Ing. Joseph L. Awange Professor of Environment and Earth Sciences Maseno University Maseno Kenya Curtin University Perth Australia Karlsruhe Institute of Technology Karlsruhe Germany Kyoto University Kyoto Japan
ISBN 978-3-642-22240-5 ISBN 978-3-642-22241-2 (eBook) DOI 10.1007/978-3-642-22241-2 Springer Heidelberg New York Dordrecht London Library of Congress Control Number: 2011945776 c Springer-Verlag Berlin Heidelberg 2012 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Foreword
This book is a source of knowledge and inspiration not only for geodesists and mathematicians, but also for engineers in general, as well as natural scientists and economists. Inference on effects which result in observations via linear and nonlinear functions is a general task in science. The authors provide a comprehensive in-depth treatise on the analysis and solution of such problems. They start with the treatment of underdetermined or/and overdetermined linear systems by methods of minimum norms, weighted least squares and regularization, respectively. Considering the observations as realizations of random variables leads to the question of how to balance bias and variance in linear stochastic regression estimation. In addition, the causal effects are allowed to be random (general linear Gauss– Markov model of estimation and prediction), even the model function; conditions are allowed, too (Gauss–Helmert model). Nonlinear functions are comprehended via Taylor expansion or by their special structures, e.g., within the context of directional observations. The self-contained monograph is comprehensive in its inclusion of the more fundamental elements of the topic to the state of the art, complemented by an extensive list of references. All chapters have an introduction, often with an indication of fast track reading and with an explanation addressed to nonmathematicians which are of interest also to me as a mathematician. The book contains a great variety of case studies and numerical examples ranging from classical geodesy to dynamic systems theory. The background and much more is provided by appendices dealing with linear algebra, probability, statistics and stochastic processes as well as numerical methods. The presentation is lucid and elegant, with historical remarks and humorous interpretations, such that I took great pleasure in reading the individual chapters. I wish all readers of this brilliant encyclopaedic book this pleasure and much benefit. Prof. Dr. Harro Walk
v
•
Preface
“Probability does not exist”, such a statement is at the beginning of the book by B. de Finetti dedicated to the foundation of probability, nowadays called “probability theory”. We believe that such a statement is overdone. The target of the reference is made very clear, namely the analysis of the random effects or of a random experiment. The result of an experiment like throwing dice or a LOTTO outcome is not predictable. Such an indeterminism is characteristic analysis of an experiment. In general, we analyze here nonlinear relations between random effects, maybe stochastic or not. Obviously, the center of interest is on linear models, which are able to approximate nonlinear relations. There are typical nonlinear models, which cannot be linearized. Our experimental world is divided into linear or linearized and nonlinear models. Sometimes, nonlinear models can be transformed into polynomial equations, also called algebraic, which allow a rigorous solution set. A prominent example is the resection problem of overdetermined type. If we are able to express the characteristic rotation matrix between reference frames by quaternions, an algebraic equation of known high order arrives. But, in contrast to linear models, there exist a whole world of solutions, which have to be discussed. P. Lohse presented in his Ph.D thesis wonderful examples of which we refer. A.N. Kolmogorov founded the axiomatic (mathematical) theory of probability. Typical for the axiomatic treatment, there are special rules relating to measure theory. Here, we only refer to excellent introductions of measure theory, for instance H. Bauer (1992), J. Elstrodt (2005) or W. Rudin (1987). Of course, there are very good textbooks on the theory of probability, for instance H. Bauer (2002), K.L. Chung (2001), O. Kallenberg (2002), M. Loeve (1977), and A.N. Shiryaev (1996). It is important to inform our readers about these many subjects, which are not treated here: robust estimation, Bayes statistics (only examples), stochastic integraO stochastic processes, stochastic signal processing, martingales, Markov tion, IT O processes, limit theorems for associated random fields, modeling, measuring, and managing risks, genetic algorithms, chaos, turbulence, fractals, neural networks, wavelets, Gibbs field, Monte Carlo, and many, indeed very important subjects. This statement refers to an overcritical reviewer who complained of the subject, which we did not treat. Unfortunately we had to strongly restrict our subjects. Our contribution vii
viii
Preface
is only an atom in the 108 atoms. What will be treated? It is the task of the observer or the designer to decide: what is the best fit between a linear or nonlinear model or the “real world” of the observer. Both types of analysis have to make an a priori statement about a lot of facts. • “Are the observations random or not, a stochastic effect or a fixed effect?” • “What are the parameters in the model space or would you prefer a ‘model free’ concept?” • “What is the nature of the parameters, random effects or fixed effects?” • “Could you find the conditions between the observations or between the parameters? What is the nature of the conditions, linear or nonlinear, random or not?” • “What is the bias between the model, and its estimation or prediction?” There are various criteria, which characterize the observation space or the parameter space: linear or nonlinear, estimation theory, linear and nonlinear distributions, l r optimality, (LESS: l 2 ), restrictions, mixed models of fixed and random effects, zero-, first-, and second order design, nonlinear statistics on curved manifolds, minimal bias estimation, translational invariance, total least squares, generalized least squares, minimal distance estimation, optimal design, and optimal prediction. Of key interest is to find equivalence lemmas, for instance: Assume direct observations. Then a least squares fit (LESS) is equivalent to a best linear uniformly unbiased estimation (BLUUE). The Chap. 1 deals with the first problem of algebraic regression, the consistent system of linear observational equations or a system of underdetermined linear equations. From the first page, examples onwards, we treat MINOS and the “horizontal rank partitioning”. In detail, we give the eigenvalue decomposition of weighted minimum norm solution. Examples are FOURIER series, FOURIERLEGENDRE series including the NYQUIST frequency for special data. Special nonlinear models with datum defects in terms of Taylor polynomials and the generalized Newton iterations are introduced. The Chap. 2 is based on the special first problem, the bias problem expressed by the Equivalence Theorem of the adjusted minimum norm solution (Gx -MINOS) and linear uniformly minimum biased estimator (S-LUMBE). The second problem of algebraic regression, namely solving an inconsistent system of linear observational equations as an overdetermined system of linear equations is extensively treated in Chap. 3. We start with a front example for the Least Squares Solution (LESS). Indeed we discuss alternative solutions depending on the metric of the observation space like Second Order Design, the TaylorKarman structure, an optimal choice of the weight matrix and the Fuzzy sets.
Preface
ix
By an eiqenvalue decomposition of Gy -LESS, we introduce “canonical LESS”. Our case studies range from partial redundancies, latent conditions, the theory of high leverage points versus breaking points, direct and inverse Grassman coordinates ¨ and PLUCKER coordinates. We conclude with historical notes on C.F. Gauss, A.M. Legendre and generalizations. In another extensive review, we concentrate on the second problem of probabilistic regression, namely the special Gauss–Markov model without datum defect of Chap. 5. We set up the best, linear, uniformly unbiased estimator (BLUUE) for moments of the first order and the best, quadratically uniformly unbiased estimator for the central moments of the second order (BIQUUE). We depart again from a detailed example BLUUE and BIQUUE. We set up ˙y -BLUUE and note the Equivalence Theorem of Gy -LESS and ˙y -BLUUE. For BIQUUE, we study the block partitioning of the dispersion matrix, the invariant quadratic estimation of types IQE, IQUUE, HIQUUE versus HIQE according to F.R. Helmert, variancecomponent estimation, simultaneous estimation of the first moments and the second central moments, the so called E-D correspondence, inhomogeneous multilinear estimation, and BAYES design of moments estimation. The third problem of algebraic regression of Chap. 5 is defined as the problem to solve inconsistent systems of linear observation equations with datum defect as an overdetermined and indetermined system of linear equations. In the introduction, we begin with the front page example, subject to the minimum norm- least squares solution of the front page example, namely by the technique of additive rank partitioning as well as multiplicative rank partitioning. In more detail, we present MINOS in the second section including the weights of the MINOS as well as LESS. Especially, we interpret Gy ; Gx -MINOS in terms of the generalized inverse. By eigenvalue decomposition, we find the solution to Gy ; Gx -MINOS. A special section is devoted to total least squares in terms of ˛-weighted hybrid approximate solution (˛-HAPS) within Tykhonov-Phillips regularization. The third problem of probabilistic regression as a special Gauss–Markov model with datum problem is treated in Chap. 6. We set up best linear minimum unbiased estimators of type BLUMBE and best linear estimators of type hom BLE, hom S-BLE and hom-˛-BLE depending on the weight assumption for the bias. For example, we introduce continuous networks of first and second derivatives. Finally, we discuss the limit process of discrete network into continuum. The overdetermined system of nonlinear equations on curved manifolds in Chap. 7 is presented for inconsistent system of directional observation equations, namely by minimal geodesic distance (MINGEODISC). A special example is the directional observation from circular normal distribution of type von MISESFISHER with a note of angular metric. Chapter 8 is relating to the fourth probabilistic regression as a setup of type BLIP and VIP for the central moments of first order. We begin the definition of a random effect model according to our magic triangle. Three examples relate to (i) Nonlinear error propagation with random effect components (ii) Nonlinear vector-valued error propagation with random effect Geoinformatics
in
x
Preface
(iii) Nonlinear vector-valued error propagation for distance measurements including HESSIANS. The fifth problem of algebraic regression in Chap. 9 is defined as solving conditional equations of type homogeneous and inhomogeneous. Alternatively, the fifth problem of probabilistic regression in Chap. 10 is treated as inhomogeneous general linear GAUSS–MARKOV model including fixed and random effects. In an explicit representation of errors in the general GAUSS– MARKOV model with mixed effects, we present an example, “collocation”. Our comments relate to the KOLMOGOROV–WIENER prediction and more up-to-date literature. The sixth problem of probabilistic regression in Chap. 11 is defined as a special random effect model called “error-variables”. Here, we assume with respect to the second order statistics the first order design matrix to be random. Another name for this is the Total Least Squares as an algebraic analogue. At first, we relate the Total Least Squares to the nonlinear system “error-in-variables”. Secondly, we introduce the models SIMEX and SYMEX from the approach of Carroll–Cook– Stefanski–Polzehl–Zwanzig depending on the variance-covariance matrix of the random effects. As special problem of nonlinearity, we treat the popular 3d-datum transformation and the PROCRUSTES Algorithm in Chap. 12. As the sixth problem of generalized algebraic regression, we introduce finally the GAUSS–HELMERT problem as a system of conditional equations with unknowns. One part is the variance-covariance estimation Dfyg, second the variancecovariance model of conditional equations, and finally the complete conditional equations with unknown in the third model Ax C By D C. As a result of Chap. 13, we present the block-structure of the dispersion matrix. As special problems of algebraic regression as well as stochastic estimation, we treat in Chap. 14 the multivariate GAUSS–MARKOV model, especially the n-way classification model and dynamical systems. We conclude with algebraic solution of systems of equations of type linear, bilinear, quadratic and so on referring to combinatoric subsets, the GROEBNER basis method, and the Multipolynomial resultants in Chap. 15. A very important part of our contribution is included in the appendices. Appendix A is an introduction to tensor algebra, linear algebra, matrix algebra as well as multilinear algebra. Topics are multilinear functions and their decomposition, matrix algebra and the Hodge star operator including self duality. But the major parts in linear algebra are “Ass”, “Uni”, and “Comm”, Rings, spaces, division algebra, Lie algebra, de Witt algebra, composition algebra, generalized inverse, special matrix like Helmert, Hankel, Vandemonte, scalar measures of matrices, complex algebra, quaternion algebra, octonian algebra, and Clifford algebra. A short introduction is given on sampling distribution and their uses in confidence intervals and confidence region in Appendix B. Sampling distribution for the GAUSS–LAPLACE normal distribution and its generalization to the multidimensional GAUSS–LAPLACE normal distribution as well as the distribution of the
Preface
xi
sample mean and variance-covariance matrix including the correlation coefficients are reviewed. An introduction into statistical notions, random events and stochastic processes in Appendix C range from the moment representation of a probability, the GAUSS– LAPLACE and other normal distributions, to error propagation, scalar-, vectorand tensor valued stochastic processes of one- and multiparameter systems. Simple examples of one parameter include (i) cosine oscillation with random amplitudes and phases (ii) superposition of two uncorrelated random functions (iii) pulse modulation, (iv) random sequences with constant and moving averages of type ARMA (r,s), (v) WIENER processes, (vi) special analysis of the parameter stationary and non-stationary stochastic processes with discrete and continuous spectrum white and coloured noise with band limitation, (vii) multiparameter systems homogeneous and isotropic multipoint systems are given. Examples are (i) two-dimensional EUCLIDEAN networks, (ii) criterion matrices, (iii) space gravity spectroscopy based on the TAYLOR-KARMAN structures, (iv) nonlinear prediction and (v) nonlocal time series analysis. Appendix D is devoted to GROEBNER basis algebra, the BUCHBERGER Algorithm, and C.F. GAUSS combinatorial formulation.
Acknowledgements E.W. Grafarend wants to thank the colleagues B.G. Blaha (Bedford/Mass., USA), J. Cai (Stuttgart, Germany), A. Dermanis (Thessalonik, Greece), T. Krarup (Copenhagen, Denmark), K.R. Koch (Bonn/Germany), G. Kampmann (UebachPalenberg/Germany), L. Kubacek (Olomouc, Slovakia), P. Lohse (Paderborn, Germany), B. Palancz (Budapest, Hungary), F. Pukelsheim (Augsburg/Germany), R. Rummel (Munich, Germany), F. Sanso (Como, Italy), L.E. Sjoberg (Stockholm, Sweden), P. Teunissen (Deft, Netherlands), H. Walk (Stuttgart, Germany), P. Xu (Kyoto, Japan), P. Zaletnyik (Budapest, Hungary) and S. Zwanzig (Uppsala, Sweden). J.L. Awange wishes to express his sincere thanks to B. Heck of the Department of Physical Geodesy (Karlsruhe Institute of Technology, Germany) for hosting him during the period of his Alexander von Humboldt Fellowship during the period 2008-2011 when part of the book was written. He is further grateful to B. Veenendaal (Head of Department, Spatial Sciences, Curtin University, Australia) and F. N. Onyango (Vice Chancellor, Maseno University, Kenya) for the support and motivation. Last, but not least, he acknowledges the support of Curtin Research Fellowship, while his stay at Karlsruhe Institute of Technology (KIT) was supported by a Alexander von Humboldt’s Ludwig Leichhardt’s Memorial Fellowship. He is very grateful for the financial support. For the transfer of copy rights of our contributions, we thank the Springer Publishing House (Heidelberg-Berlin-New York) as well as Spectrum Verlag (Heidelberg), and European Geosciences Union. A proper reference is added to the
xii
Preface
place we quote from. During the making of this edition of the book, more than 3000 references have been cited. Finally, special thanks to I. Anjasmara and B. Arora of Curtin University who typeset parts of the book. This is an Institute for Geoscience Research (TIGeR) publication No. 260. Stuttgart and Karlsruhe (Germany), Perth (Australia)
Erik W. Grafarend Joseph L. Awange
Contents
1
The First Problem of Algebraic Regression fAx D yjA 2 Rnm ; y 2 R.A/ rk A D n D dim Yg, MINOS . . . . 1-1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 1-11 The Front Page Example . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 1-12 The Front Page Example: Matrix Algebra .. . . . . . . . . . . . . . 1-13 The Front Page Example: MINOS, Horizontal Rank Partitioning . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 1-14 The Range R.f / and the Kernel N .f / . . . . . . . . . . . . . . . . . 1-15 The Interpretation of MINOS . . . . . . . . . .. . . . . . . . . . . . . . . . . . 1-2 Minimum Norm Solution (MINOS) . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 1-21 A Discussion of the Metric of the Parameter Space X. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 1-22 An Alternative Choice of the Metric of the Parameter Space X . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 1-23 Gx -MINOS and Its Generalized Inverse .. . . . . . . . . . . . . . . . 1-24 Eigenvalue Decomposition of Gx -MINOS: Canonical MINOS . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 1-3 Case Study.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 1-31 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 1-32 Fourier–Legendre Series. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 1-33 Nyquist Frequency for Spherical Data . . . . . . . . . . . . . . . . . . . 1-4 Special Nonlinear Models . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 1-41 Taylor Polynomials, Generalized Newton Iteration .. . . . 1-42 Linearized Models with Datum Defect . . . . . . . . . . . . . . . . . . 1-5 Notes .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .
1 4 5 5 8 10 12 16 21 22 22 24 37 38 49 62 63 63 69 78
xiii
xiv
2
3
4
Contents
The First Problem of Probabilistic Regression: The Bias Problem Special Gauss–Markov Model with Datum Defects, LUMBE . . . . . 2-1 Linear Uniformly Minimum Bias Estimator (LUMBE) . . . . . . . . . 2-2 The Equivalence Theorem of Gx -MINOS and S-LUMBE.. . . . . . 2-3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . The Second Problem of Algebraic Regression Inconsistent system of linear observational equations . . . . . . . . . . . . . . 3-1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 3-11 The Front Page Example . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 3-12 The Front Page Example in Matrix Algebra .. . . . . . . . . . . . 3-13 Least Squares Solution of the Front Page Example by Means of Vertical Rank Partitioning .. . . . . . 3-14 The Range R.f / and the Kernel N .f /, Interpretation of “LESS” by Three Partitionings .. . . . . . . 3-2 The Least Squares Solution: “LESS” . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 3-21 A Discussion of the Metric of the Parameter Space X. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 3-22 Alternative Choices of the Metric of the Observation Y . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 3-23 Gx -LESS and Its Generalized Inverse .. . . . . . . . . . . . . . . . . . 3-24 Eigenvalue Decomposition of Gy -LESS: Canonical LESS. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 3-3 Case Study.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 3-31 Canonical Analysis of the Hat Matrix, Partial Redundancies, High Leverage Points . .. . . . . . . . . . . . . . . . . . 3-32 Multilinear Algebra, “Join” and “Meet”, the Hodge Star Operator . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 3-33 From A to B: Latent Restrictions, Grassmann Coordinates, Pl¨ucker Coordinates.. . . . .. . . . . . . . . . . . . . . . . . 3-34 From B to A: Latent Parametric Equations, Dual Grassmann Coordinates, Dual Pl¨ucker Coordinates.. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 3-35 Break Points .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 3-4 Special Linear and Nonlinear Models: A Family of Means for Direct Observations . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 3-5 A Historical Note on C.F. Gauss and A.M. Legendre .. . . . . . . . . . . The Second Problem of Probabilistic Regression Special Gauss-Markov model without datum defect . . . . . . . . . . . . . . . . 4-1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 4-11 The Front Page Example . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 4-12 Estimators of Type BLUUE and BIQUUE of the Front Page Example . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .
81 84 87 88 89 92 92 93 95 98 105 112 113 129 130 141 142 150 156
168 172 180 180 183 187 188 189
Contents
xv
4-13
4-2
4-3
5
BLUUE and BIQUUE of the Front Page Example, Sample Median, Median Absolute Deviation . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 4-14 Alternative Estimation Maximum Likelihood (MALE) . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . Setup of the Best Linear Uniformly Unbiased Estimator . . . . . . . . 4-21 The Best Linear Uniformly Unbiased Estimation O of W †y -BLUUE . . . . . . . .. . . . . . . . . . . . . . . . . . 4-22 The Equivalence Theorem of Gy -LESS and †y -BLUUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 4-31 Block Partitioning of the Dispersion Matrix and Linear Space Generated by Variance-Covariance Components ... . . . . . . . . . . . . . . . . . 4-32 Invariant Quadratic Estimation of Variance-Covariance Components of Type IQE.. . . . . . . . 4-33 Invariant Quadratic Uniformly Unbiased Estimations of Variance-Covariance Components of Type IQUUE . . . . . . . . . .. . . . . . . . . . . . . . . . . . 4-34 Invariant Quadratic Uniformly Unbiased Estimations of One Variance Component (IQUUE) from †y -BLUUE: HIQUUE . . . . . . . . . . . . . . . . . . 4-35 Invariant Quadratic Uniformly Unbiased Estimators of Variance Covariance Components of Helmert Type: HIQUUE Versus HIQE . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 4-36 Best Quadratic Uniformly Unbiased Estimations of One Variance Component: BIQUUE. . . . 4-37 Simultaneous Determination of First Moment and the Second Central Moment, Inhomogeneous Multilinear Estimation, the E D Correspondence, Bayes Design with Moment Estimations .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .
The Third Problem of Algebraic Regression fAx C i D yjA 2 Rnm ; y … R .A/ rkA < minfm; ngg . . . . . . . . . . 5-1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 5-11 The Front Page Example . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 5-12 The Front Page Example in Matrix Algebra .. . . . . . . . . . . . 5-13 Minimum Norm: Least Squares Solution of the Front Page Example by Means of Additive Rank Partitioning . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .
198 202 205 206 213 214
215 220
224
228
230 234
241 263 265 266 266
268
xvi
Contents
5-14
5-2
5-3 6
7
Minimum Norm: Least Squares Solution of the Front Page Example by Means of Multiplicative Rank Partitioning . . . . . . .. . . . . . . . . . . . . . . . . . 5-15 The Range R(f) and the Kernel N(f) Interpretation of “MINOLESS” by Three Partitionings . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . MINOLESS and Related Solutions Like Weighted Minimum Norm-Weighted Least Squares Solutions . . . . . . . . . . . . . 5-21 The Minimum Norm-Least Squares Solution: “MINOLESS”.. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 5-22 .Gx ; Gy /-MINOS and Its Generalized Inverse .. . . . . . . . . 5-23 Eigenvalue Decomposition of .Gx ; Gy /-MINOLESS .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 5-24 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . The Hybrid Approximation Solution: ˛-HAPS and Tykhonov–Phillips Regularization .. . . . . . . . .. . . . . . . . . . . . . . . . . .
The Third Problem of Probabilistic Regression Special Gauss-Markov model with datum defect . .. . . . . . . . . . . . . . . . . . 6-1 Setup of the Best Linear Minimum Bias Estimator of Type BLUMBE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 6-11 Definitions, Lemmas and Theorems . . .. . . . . . . . . . . . . . . . . . 6-12 The First Example: BLUMBE Versus BLE, BIQUUE Versus BIQE, Triangular Leveling Network .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 6-2 Setup of the Best Linear Estimators of Type hom BLE, hom S-BLE and hom a-BLE for Fixed Effects .. . . . . . . . . . . . . . . . . . 6-3 Continuous Networks .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 6-31 Continuous Networks of Second Derivatives Type . . . . . 6-32 Discrete Versus Continuous Geodetic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . Overdetermined System of Nonlinear Equations on Curved Manifolds inconsistent system of directional observational equations . . . . . . . . 7-1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 7-2 Minimal Geodesic Distance: MINGEODISC . .. . . . . . . . . . . . . . . . . . 7-3 Special Models: From the Circular Normal Distribution to the Oblique Normal Distribution . . . . . . . . . . . . . . . . . 7-31 A Historical Note of the von Mises Distribution . . . . . . . . 7-32 Oblique Map Projection . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 7-33 A Note on the Angular Metric . . . . . . . . .. . . . . . . . . . . . . . . . . . 7-4 Case Study.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .
272
276 283 283 293 297 301 302 305 308 310
317 332 345 346 357
361 362 365 370 370 372 375 376
Contents
8
9
The Fourth Problem of Probabilistic Regression Special Gauss–Markov model with random effects . . . . . . . . . . . . . . . . . 8-1 The Random Effect Model . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 8-2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . The Fifth Problem of Algebraic Regression: The System of Conditional Equations: Homogeneous and Inhomogeneous Equations: fBy D Bi versus c C By D Big . . . . . . . . 9-1 Gy -LESS of a System of a Inconsistent Homogeneous Conditional Equations . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 9-2 Solving a System of Inconsistent Inhomogeneous Conditional Equations . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 9-3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .
10 The Fifth Problem of Probabilistic Regression general Gauss–Markov model with mixed effects .. . . . . . . . . . . . . . . . . . 10-1 Inhomogeneous General Linear Gauss–Markov Model Fixed Effects and Random Effects . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 10-2 Explicit Representations of Errors in the General Gauss–Markov Model with Mixed Effects . . . . .. . . . . . . . . . . . . . . . . . 10-3 An Example for Collocation .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 10-4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 11 The Sixth Problem of Probabilistic Regression - The Random Effect Model – “errors-in-variable” . . . . . . . . . . . . . . . . . 11-1 The Model of Error-in-Variables or Total Least Squares .. . . . . . . . 11-2 Algebraic Total Least Squares for the Nonlinear System of the Model “Error-in-Variables” .. . . . .. . . . . . . . . . . . . . . . . . 11-3 Example: The Straight Line Fit. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 11-4 The Models SIMEX and SYMEX . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 11-5 References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 12 The Nonlinear Problem of the 3d Datum Transformation and the Procrustes Algorithm .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 12-1 The 3d Datum Transformation and the Procrustes Algorithm . . . 12-2 The Variance: Covariance Matrix of the Error Matrix E . . . . . . . . . 12-21 Case Studies: The 3d Datum Transformation and the Procrustes Algorithm . . . . . . . . . .. . . . . . . . . . . . . . . . . . 12-3 References .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .
xvii
383 384 399
411 411 415 416 419 421 426 428 438 443 447 448 450 453 459 461 463 470 471 474
13 The Sixth Problem of Generalized Algebraic Regression the system of conditional equations with unknowns (Gauss–Helmert model) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 13-1 Variance-Covariance-Component Estimation in the Linear Model Ax C " D y; y … R.A/. . . . . . . . . . .. . . . . . . . . . . . . . . . . . 13-2 Variance-Covariance-Component Estimation in the Linear Model B" D By c; By … R.A/ C c . . .. . . . . . . . . . . . . . . . . .
477 479 482
xviii
Contents
13-3 Variance-Covariance-Component Estimation in the Linear Model Ax C " C B" D By c; By … R.A/ C c .. . . . . . . . 13-4 The Block Structure of Dispersion Matrix Dfyg . . . . . . . . . . . . . . . . . 14 Special Problems of Algebraic Regression and Stochastic Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 14-1 The Multivariate Gauss–Markov Model: A Special Problem of Probabilistic Regression . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 14-2 n-Way Classification Models .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 14-21 A First Example: 1-Way Classification . . . . . . . . . . . . . . . . . . 14-22 A Second Example: 2-Way Classification Without Interaction . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 14-23 A Third Example: 2-Way Classification with Interaction . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 14-24 Higher Classifications with Interaction . . . . . . . . . . . . . . . . . . 14-3 Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 15 Algebraic Solutions of Systems of Equations Linear and Nonlinear Systems of Equations . . . . . . .. . . . . . . . . . . . . . . . . . 15-1 Introductory Remarks .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 15-2 Background to Algebraic Solutions . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 15-3 Algebraic Methods for Solving Nonlinear Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 15-31 Solution of Nonlinear Gauss–Markov Model . . . . . . . . . . . 15-32 Adjustment of the combinatorial subsets . . . . . . . . . . . . . . . . 15-4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 15-5 Notes .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . A
Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . p A-1 Multilinear Functions and the Tensor Space Tq . . . . . . . . . . . . . . . . . . A-2 Decomposition of Multilinear Functions into Symmetric Multilinear Functions Antisymmetric Multi-linear Functions and Residual Multilinear p p p p Functions T Tq D Sq ˚ Aq ˚ Rq . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . A-3 Matrix Algebra, Array Algebra, Matrix Norm and Inner Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . A-4 The Hodge Star Operator, Self Duality . . . . . . . . .. . . . . . . . . . . . . . . . . . A-5 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . A-51 Definition of a Linear Algebra . . . . . . . . .. . . . . . . . . . . . . . . . . . A-52 The Diagrams “Ass”, “Uni” and “Comm” .. . . . . . . . . . . . . . A-53 Ringed Spaces: The Subalgebra “Ring with Identity” . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . A-54 Definition of a Division Algebra and Non-Associative Algebra .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .
485 489 493 493 498 499 503 509 514 517 527 527 528 532 532 552 556 563 571 572
578 584 587 592 593 595 597 598
Contents
A-6
A-7
B
xix
A-55 Lie Algebra, Witt Algebra .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . A-56 Definition of a Composition Algebra . . . . . . . . . . . . . . . . . . . . Matrix Algebra Revisited, Generalized Inverses . . . . . . . . . . . . . . . . . A-61 Special Matrices: Helmert Matrix, Hankel Matrix, Vandemonte Matrix .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . A-62 Scalar Measures of Matrices . . . . . . . . . . .. . . . . . . . . . . . . . . . . . A-63 Three Basic Types of Generalized Inverses.. . . . . . . . . . . . . Complex Algebra, Quaternion Algebra, Octonian Algebra, Clifford Algebra, Hurwitz Theorem . .. . . . . . . . . . . . . . . . . . A-71 Complex Algebra as a Division Algebra as well as a Composition Algebra, Clifford algebra C l.0; 1/ . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . A-72 Quaternion Algebra as a Division Algebra as well as a Composition Algebra, Clifford algebra C l.0; 2/ . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . A-73 Octanian Algebra as a Non-Associative Algebra as well as a Composition Algebra, Clifford algebra with Respect to H H . . . . . . . . . . . . . . . . . A-74 Clifford Algebra . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .
Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . B-1 A First Vehicle: Transformation of Random Variables .. . . . . . . . . . B-2 A Second Vehicle: Transformation of Random Variables.. . . . . . . B-3 A First Confidence Interval of Gauss–Laplace Normally Distributed Observations , 2 Known, the Three Sigma Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . B-31 The Forward Computation of a First Confidence Interval of Gauss–Laplace Normally Distributed Observations: ; 2 Known . . . . . B-32 The Backward Computation of a First Confidence Interval of Gauss–Laplace Normally Distributed Observations: ; 2 Known . . . . . B-4 Sampling from the Gauss– Laplace Normal Distribution: A Second Confidence Interval for the Mean, Variance Known . . . . . . . . . . . . . B-41 Sampling Distributions of the Sample Mean , O 2 Known, and of the Sample Variance O 2 . . . . . . . . . . B-42 The Confidence Interval for the Sample Mean, Variance Known . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . B-5 Sampling from the Gauss– Laplace Normal Distribution: A Third Confidence Interval for the Mean, Variance Unknown .. . . . .. . . . . . . . . . . . . . . . . . B-51 Student’s Sampling Distribution of the Random Variable .O /=O . . . . . . . . . . .. . . . . . . . . . . . . . . . . .
598 599 602 606 612 618 619
620
622
629 633 637 638 642
648
653
659
662 677 688
692 692
xx
Contents
B-52
B-6
B-7
B-8
C
The Confidence Interval for the Mean, Variance Unknown .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . B-53 The Uncertainty Principle . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . Sampling from the Gauss– Laplace Normal Distribution: A Fourth Confidence Interval for the Variance . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . B-61 The Confidence Interval for the Variance .. . . . . . . . . . . . . . . B-62 The Uncertainty Principle . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . Sampling from the Multidimensional Gauss–Laplace Normal Distribution: The Confidence Region for the Fixed Parameters in the Linear Gauss–Markov Model.. . . . . . . . . . Multidimensional Variance Analysis, Sampling from the Multivariate Gauss–Laplace Normal Distribution .. . . . . . . . . . . B-81 Distribution of Sample Mean and Variance-Covariance . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . B-82 Distribution Related to Correlation Coefficients . . . . . . . .
Statistical Notions, Random Events and Stochastic Processes.. . . . . . C-1 Moments of a Probability Distribution, the Gauss– Laplace Normal Distribution and the Quasi-Normal Distribution .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . C-2 Error Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . C-3 Useful Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . C-4 Scalar – Valued Stochastic Processes of One Parameter . . . . . . . . . C-5 Characteristic of One Parameter Stochastic Processes . . . . . . . . . . . C-6 Simple Examples of One Parameter Stochastic Processes . . . . . . . C-7 Wiener Processes .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . C-71 Definition of the Wiener Processes . . . . . . . . . . . . . . . . . . . . . . C-72 Special Wiener Processes: Ornstein– Uhlenbeck, Wiener Processes with Drift, Integral Wiener Processes . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . C-8 Special Analysis of One Parameter Stationary Stochastic Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . C-81 Foundations: Ergodic and Stationary Processes .. . . . . . . . C-82 Processes with Discrete Spectrum . . . . .. . . . . . . . . . . . . . . . . . C-83 Processes with Continuous Spectrum ... . . . . . . . . . . . . . . . . . C-84 Spectral Decomposition of the Mean and Variance-Covariance Function . . . . . . . . .. . . . . . . . . . . . . . . . . . C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes of Multi-Parameter Systems . . . . . . . . .. . . . . . . . . . . . . . . . . . C-91 Characteristic Functional . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . C-92 The Moment Representation of Stochastic Processes for Scalar Valued and Vector Valued Quantities.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . C-93 Tensor-Valued Statistical Homogeneous and Isotropic Field of Multi-Point Systems . . . . . . . . . . . . . . . . . .
701 707
708 709 715
717 739 740 744 753
754 757 760 762 765 769 781 781
785 793 793 795 798 808 811 812
814 818
Contents
D
xxi
Basics of Groebner Basis Algebra . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . D-1 Definitions .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . D-2 Buchberger Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . D-21 Mathematica Computation of Groebner Basis . . . . . . . . . . D-22 Maple Computation of Groebner Basis . . . . . . . . . . . . . . . . . . D.3 Gauss Combinatorial Formulation . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .
887 887 889 889 891 892
References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .
895
Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . .
1011
Chapter 1
The First Problem of Algebraic Regression Consistent system of linear observational equations – underdetermined system of linear equations: Ax D yjA 2 Rnm ; y 2 R.A/ rkA D n; n D dim 2 R. The optimisation problem which appears in treating underdetermined linear system equations and weakly nonlinear system equations is a standard topic in many textbooks on optimisation. Here we define a nearly nonlinear system of equations as a problem which allows Taylor expansion. We start in the first section with a front example, the transformation of a threedimensional parameter space into a twodimensional observation space. The Minimum Norm Solution, in short MINOS, leads to the solution (1.21) and (1.22). We determine as an example the range and the kernel of the mapping, namely by “horizontal rank partitioning”. In particular,we refer to three types of analysis of underdetermined linear systems : (i) algebraic, (ii) geometric and (iii) set theoretical. The second section is based on a review of Minimum Norm Solution of an underdetermined linear system, for instance using the reflexive generalized inverse called RAO’s Pandora Box. (C.R. Rao and S.K.Mitra 1971 : pages 44-47, “three basic g-inverses, section 3.1 : minimum norm solution). We apply the method of Lagrange multiplyer, or the Implicit Function Theorem, or the theory of extrema with side conditions. The numerical example is the eigenspace -eigenvector decomposition , namely the left and right eigenspace elements. While the theory of solving rank deficient linear equations is well known since the seventeenth of the 20st century, it is not so for many important applications. Beside economical model theory we present applications in (i) Fourier series and (ii) Fourier-Legendre series enriched by NYQUIST Frequency Analysis of spherical data. For instance, for data on a grid 5 minutes by 5 minutes of arc resulting in n D 9,331,200 observations the NYQUIST Frequency N(f) D 2,160 as a limit when we develop the series by degree and order. Example (iii) introduces Taylor polynomials and generalized Newton iteration, namely in applying within twodimensial network analysis,in particular linearized models with Datum Defects. Such a problem arises when we have measured distances and want to derive absolute coordinates of network stations. Finally, as example (iv) we take reference of the General Datum Problem in R(n) including coordinate differences, distances, angles, angle ratios, cross-ratios and area elements presented in an extended reference list.
E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 1, © Springer-Verlag Berlin Heidelberg 2012
1
2
1 The First Problem of Algebraic Regression
Fast track reading: read only Lemma 1.3. Compare with the guideline of Chap. 1.
The minimum norm solution of a system of consistent linear equations Ax D y subject to A 2 Rnm , rkA D n; n < m, is presented by Definition 1.1, Lemma 1.2, and Lemma 1.3. Lemma 1.4 characterizes the solution of the quadratic optimization problem in terms of the (1, 2, 4)-generalized inverse, in particular the right inverse. The system of consistent nonlinear equations Y D F .X/ is solved by means of two examples. Both examples are based on distance measurements in a planar network,
1 The First Problem of Algebraic Regression
3
namely a planar triangle. In the first example, Y D F .X/ is linearized at the point x, which is given by prior information and solved by means of Newton iteration. The minimum norm solution is applied to the consistent system of linear equations 4y D A4x and interpreted by means of first and second moments of the nodal points. In contrast, the second example aims at solving the consistent system of nonlinear equations Y D F .X/ in a closed form. Since distance measurements as Euclidean distance functions are left equivariant under the action of the translation group as well as the rotation group – they are invariant under translation and rotation of the Cartesian coordinate system – at first a TR-basis (translation-rotation basis) is established. Namely the origin and the axes of the coordinate system are fixed. With respect to the TR-basis (a set of “free parameters” has been fixed), the “bounded parameters” are analytically fixed. Since no prior information is built in, we prove that two solutions of the consistent system of nonlinear equations Y D F .X/ exist. The mapping y 7! X/ is 1 7! 2. In the chosen TR-basis the solution vector X is not of minimum norm. Accordingly, we apply a datum transformation X 7! X/ of type “group of motion” (decomposed into the translation group and the rotation group). The parameters of the “group of motion” (2 for translation, 1 for rotation) are determined under the condition of minimum norm of the unknown vector x, namely by means of a special Procrustes algorithm. As soon as the optimal datum parameters are determined, we are able to compute the unknown vector x, which is minimum norm. Finally, the notes are an attempt to explain the origin of the injectivity rank deficiency, namely the dimension of the null space N .A/; mrkA of the consistent system of linear equations subject to A 2 Rnm and rkA D n; n < m as well as of the consistent system of nonlinear equations subject to a Jacobi matrix J 2 Rnm , rkJ D n; n < m D dimX. The fundamental relation to the datum transformation, also called transformation groups (conformal group, dilatation group (scale), translation group, rotation group, and projective group), as well as to the “soft” Implicit Function Theorem is outlined. By means of a certain algebraic objective function which geometrically are called minimum distance functions, we solve the first inverse problem of linear and nonlinear equations, in particular of algebraic type, which relate observations to parameters. The system of linear or nonlinear equations we are solving here is classified as underdetermined. The observations, also called measurements, are elements of a certain observation space Y of integer dimension dimY D n, which may be metrical, especially Euclidean, pseudo-Euclidean, in general a differentiable manifold. In contrast, the parameter space X of integer dimension dimX D m is metrical as well, especially Euclidean, pseudo-Euclidean, in general a differentiable manifold, but its metric is unknown. A typical feature of algebraic regression is the fact that the unknown metric of the parameter space X is induced by the functional relation between observations and parameters. We shall outline three aspects of any discrete inverse problem: (i) set-theoretic (fibering), (ii) algebraic (rank partitioning, IPM, the Implicit Function Theorem) and (iii) geometrical (slicing). Here, we treat the first problem of algebraic regression. A consistent system of linear observational equations, which is also called underdetermined system of linear equations (“more unknowns than equations”), is
4
1 The First Problem of Algebraic Regression
Fig. 1.1 The trinity
solved by means of an optimization problem. The following introduction presents us a front page example of two inhomogeneous linear equations with unknowns. In terms of five boxes and five figures, we review the minimum norm solution of such a consistent system of linear equations which is based upon the trinity that is shown in Fig. 1.1.
1-1 Introduction With the introductory paragraph, we explain the fundamental concepts and basic notions of this section. For you, the analyst, who has the difficult task to deal with measurements, observational data, modeling, and modeling equations, we present numerical examples and graphical illustrations of all abstract notions. The elementary introduction is written not for a mathematician, but for you, the analyst, with limited remote control of the notions given hereafter. May we gain your interest. Assume an n-dimensional observation space Y, here a linear space parameterized by n observations (finite, discrete) as coordinates y D Œy1 ; : : : ; yn 0 2 Rn in which an r-dimensional model manifold is embedded (immersed). The model manifold is described as the range of a linear operator fO from an m-dimensional parameters space X into the observation space Y. The mapping f is established by the mathematical equations which relate all observables to the unknown parameters. Here, the parameters space X, the domain of the linear operator fO, will be restricted also to a linear space which is parameterized by coordinates x D Œx1 ; : : : ; xn 0 2 Rn . In this manner, the linear operator fO can be understood as a coordinate mapping of type A W x 7! y D Ax. The linear mapping f W X ! Y is geometrically characterized by its range R.f /, namely R.A/, defined by R.f / WD fy 2 Y j y D f .x/8x 2 Xg, which in general is a linear subspace of Y, and its null space defined by N .f / WD fy 2 Y j f .x/ D 0g. Here, we restrict the range R.f /, namely R.A/, to coincide with the n D r-dimensional observation space Y such that y 2 R.f /, namely y 2 R.A/. Note that Example 1.1 will therefore demonstrate the range space R.f /, namely R.A/, which coincides here with the observation space Y (f is surjective or “into”) as well as the null space N .f /, namely N .A/, which
1-1 Introduction
5
is not empty (f is not injective or “one-to-one”). Furthermore, note that Box 1.1 will introduce the special linear model of interest. By means of Box 1.2, it will be interpreted as a polynomial of degree two based upon two observations and three unknowns, namely as an underdetermined system of consistent linear equations. Box 1.3 reviews the formal procedure in solving such a system of linear equations by means of “horizontal” rank partitioning and the postulate of the minimum norm solution of the unknown vector. In order to identify the range space R.A/, the null space N .A/, and its orthonormal compliment N ? .A/, Box 1.4 by means of algebraic partitioning (“horizontal” rank partitioning) outlines the general solution of a system of homogeneous linear equations approaching zero. With a background, Box 1.5 presents the diagnostic algorithm for solving an underdetermined system of linear equations. In contrast, Box 1.6 is a geometric interpretation of a special solution of a consistent system of inhomogeneous linear equations of type MINOS (minimum norm solution). The g-inverse A m of type MINOS is characterized by three conditions collected in Box 1.7. Moreover, note that Fig. 1.2 illustrates the range space R.A/, while Figs. 1.3 and 1.4 demonstrate the null space N .A/ as well as its orthogonal compliment N ? .A/. Figure 1.5 illustrates the orthogonal projection of an element of the null space N .A/ onto the range space R.A /. In terms of fibering, the set of points of the parameter space as well as of the observation space, Fig. 1.6 introduces the related Venn diagrams.
1-11 The Front Page Example Example 1.1. (Polynomial of degree two, consistent system of linear equations, i.e. fAx D y; x 2 X D Rm ; dimX D mI y 2 Y D Rn ; dimY D n; r D rkA D dimYg. As a front page example, consider the following front page consistent system of linear equations: x1 C x2 C x3 D 2 (1.1) x1 C 2x2 C 4x3 D 3 Obviously, in general, it deals with the linear space X D Rm 3 x; dimX D m, here m D 3, called the parameter space, and the linear space Y D Rn 3 y; dimY D n, here n D 2, called the observation space.
1-12 The Front Page Example: Matrix Algebra By means of Box 1.1 and according to A. Cayley’s doctrine, let us specify (1.1) in terms of matrix algebra.
6
1 The First Problem of Algebraic Regression
Box 1.1. (A special linear model: polynomial of degree two, two observations, and three unknowns). yD
y1 y2
D
a11 a12 a13 a21 a22 a23
2
3 x1 4 x2 5 x3
” 2 3 x1 2 111 4 5 x D Ax W D x2 3 124 x3 ”
x 0 D Œx1 ; x2 ; x3 ; y 0 D Œy1 ; y2 D Œ2; 3 x 2 R31 ; y 2 ZC 21 R21 111 AWD 2 ZC 23 R23 124 r D rkA D dimY D n D 2
(1.2)
(1.3)
(1.4)
The matrix A 2 Rnm is an element of Rnm , the n m array of real numbers. dimX D m defines the number of unknowns (here m D 3) and dimY D n defines the number of observations (here n D 2). A mapping { is called linear if f .x1 C x2 /D f .x1 / C f .x2 / holds. Beside the range R.f /, the range space R.A/, the linear mapping is characterized by the kernel N .f / WD fx 2 Rn jf .x/ D 0g, the null space N .A/ WD fx 2 Rm jA§ D 0g to be specified later on. Why is the front page consistent system of linear equations to be considered here called underdetermined? Just observe that we are left with only two linear equations for three unknowns Œx1 ; x2 ; x3 . Indeed, the system of inhomogeneous linear equations is underdetermined. Without any additional postulate, we shall be unable to inverse those equations for Œx1 ; x2 ; x3 . In particular, we shall outline how to find such an additional postulate. Beforehand, we here have to introduce some special notions that are known from operator theory. Within matrix algebra, the index of the linear operator A is the rank r D rkA, here r D 2, which coincides with the dimension of the observation space, here n D dimY D 2. A system of linear equations is called consistent if rkA D dimY. Alternatively, we say that the mapping f W x 7! y D f .x/ 2 R.f / or A W x 7! A§ D † 2 R.A/ takes an element x 2 X into the range R.f / or R.A/, also called the column space of the matrix A: compare with (1.5). Here, the column space is spanned by the first column c1 and the second column c2 of the matrix A, the 2 3 array: compare with (1.6). The right complementary index of the linear operator A 2 Rnm accounts for the injectivity defect that is given by d D m::rkA (here d D m::rkA D 1). Injectivity relates to the kernel N .f / or the null space we
1-1 Introduction
7
shall constructively introduce later on. f W x 7! y D f .x/; f .x/ 2 R.f /; A W x 7! Ax D y; y 2 R.A/; 1 1 R.A/ D span ; : 1 2
(1.5)
(1.6)
How can such a linear model of interest, namely a system of consistent linear equations, be generated? Let us assume that we have observed a dynamical system y(t) which is represented by a polynomial of degree two with respect to time t 2 R, namely y.t/ D x1 C x2 t C x3 t 2 . Due to y D 2x3 , it is a dynamical system with constant acceleration or constant second derivative with respect to time t. The unknown polynomial coefficients are collected in the column array x D Œx1 ; : : : ; xn 0 ; x 2 X D R3 ; dimX D 3. They constitute the coordinates of the three-dimensional parameter space X. If the dynamical system y.t/ is observed at two instants, say y.t1 / D y1 D 2 and y.t2 / D y2 D 3, say at t1 D 1 and t2 D 2, respectively, and if we collect the observations in the column array y D Œy1 ; y2 0 D Œ2; 30 ; y 2 Y D R2 ; dimY D 2, they constitute the coordinates of the two-dimensional observation space Y. Thus, we are left with a special linear model interpreted in Box 1.2. Box 1.2. (A special linear model: polynomial of degree two, two observations, and three unknowns). yD
y1 y2
#2x 3 1 1 t1 t12 4 x2 5 D 1 t2 t22 x3 ” "
(1.7)
t1 D 1; y1 D 2 t2 D 2; y2 D 3 W 2 3 x1 2 111 4 5 D x2 3 124 x3
(1.8)
y D Ax r D rkA D dimY D n D 2
(1.9)
In the next section, let us begin with a more detailed analysis of the linear mapping f W Ax D y, namely of the linear operator A 2 Rnm ; r D rkA D dimY D n. We shall pay special attention to the three fundamental partitioning, namely (i) algebraic partitioning, called rank partitioning of the matrix A, (ii) geometric
8
1 The First Problem of Algebraic Regression
partitioning, called slicing of the linear space X, and (iii) set-theoretical partitioning, called fibering of the domain D.f /.
1-13 The Front Page Example: MINOS, Horizontal Rank Partitioning Let us go back to the front page consistent system of linear equations, namely the problem to determine three unknown polynomial coefficients from two sampling points which we classified as an underdetermined one. Nevertheless we are able to compute a unique solution of the underdetermined system of inhomogeneous linear equations Ax D y; y 2 R.A/ or rkA D dimY, here A 2 R23 , x 2 R31 , y 2 R21 , if we determine the coordinates of the unknown vector x of minimum norm (minimal Euclidean length, l2 norm), here kxk21 D xx D x12 C x22 C x32 D min. Box 1.3 outlines the solution of the related optimization problem. Box 1.3. (MINOS (minimum norm solution) of the front page consistent system of inhomogeneous linear equations, horizontal rank partitioning). The solution of the optimization problem kxk2I D minx Ax D yI rkA D dimYg is directly based upon the horizontal rank partitioning of the linear mapping f W x 7! y D Ax; rkA D dimY, which we already have introduced. As soon as x1 D A11 A2 x2 CA11 y is implemented in the norm kxk2I , we are prepared to compute the first derivatives of the unconstrained Lagrangean (1.10), which constitutes the necessary conditions. (The theory of vector derivatives is presented in Appendix B.) Following Appendix A, which is devoted to matrix algebra, namely applying .I C AB/1 A D A.I C BA/1 and .BA/1 D A1 B1 for appropriate dimensions of the involved matrices such that the identities (1.12) hold, we finally derive (1.13). L.x1 ; x2 / D kxk21 D x12 C x22 C x32 D .y A2 x2 /0 .A1 A1 0 /1 ..y A2 x2 // C x2 0 x2 D y0 .A1 A1 0 /1 y2x2 0 A2 .A1 A1 0 /1 yCx2 0 A2 0 .A1 A1 0 /1 A2 x2 Cx2 0 x2 D minx @L .x2m / D 0 @x2 ” A2 0 .A1 A1 0 /1 y C ŒA2 0 .A1 A1 0 /1 A2 C Ix2m D 0 ” x2m D ŒA2 0 .A1 A1 0 /1 A2 C I1 A2 0 .A1 A1 0 /1 y;
(1.10)
(1.11)
1-1 Introduction
9
x2m D ŒA2 0 .A1 A1 0 /1 A2 C I1 A2 0 .A1 A1 0 /1 y D A2 0 .A1 A1 0 /1 ŒA2 A2 0 .A1 A1 0 /1 C I1 y 0
0 1
0
D A2 ŒA2 A2 .A1 A1 /
1
(1.12)
0 1
C I .A1 A1 / y;
x2m D A2 0 .A1 A1 0 C A2 A2 0 /1 y
(1.13)
The second derivatives that are given by (1.13), due to positive-definiteness of the matrix A2 0 .A1 A1 0 /1 A2 C I, generate the sufficiency condition for obtaining the minimum of the unconstrained Lagrangean. Carrying out the backward transform 1 x2m 7! x1m D A1 1 A2 x2 C A1 y, we obtain (1.15). @2 L .x2m / D 2ŒA2 0 .A1 A1 0 /1 A2 C I > 0; @x2 @x2 0
(1.14)
1 0 0 0 1 x2m D A1 1 A2 A2 .A1 A1 C A2 A2 / y C A1 y:
(1.15)
Let us additionally right multiply the identity A1 A1 0 D A2 A2 0 C .A1 A1 0 C A2 A2 0 / by .A1 A1 0 C A2 A2 0 /1 , such that (1.16) holds, and left multiply by A1 0 , such that (1.17) holds. Obviously, we have generated the linear forms (1.18)–(1.20). A1 A1 0 .A1 A1 0 C A2 A2 0 /1 D A2 A2 0 .A1 A1 0 C A2 A2 0 /1 C I; 0
0 1
0
A1 .A1 A1 C A2 A2 /
D
0 0 A1 1 A2 A2 .A1 A1
0 1
C A2 A2 /
C
(1.16) A1 1 ;
x1m D A1 0 .A1 A1 0 C A2 A2 0 /1 y; D A2 0 .A1 A1 0 C A2 A2 0 /1 y 0 A1 x1m D .A1 A1 0 C A2 A2 0 /1 y x2m A2 0
(1.17) (1.18) (1.19)
xm D A0 .AA0 /1 y
(1.20)
The following relations supply us with a numerical computation with respect to the introductory example.
2
3 C7 ; .A1 A1 0 C A2 A2 0 /1 D 4 21 C7 21 7 1 14 4 ; A2 0 .A1 A1 0 C A2 A2 0 /1 D D 14 7 1
.A1 A1 0 C A2 A2 0 / D A1 0 .A1 A1 0 C A2 A2 0 /1
3 7 5 3 (1.21) 1 Œ7; 6 ; 14
10
1 The First Problem of Algebraic Regression
x1m
p 3=7 D ; x2m D 1=14; kxm kI D 3 42=14; 11=14
@2 L .x2m / D 28 > 0; @x2 @x2 0 y.t/ D
(1.22)
8 11 1 C t C t 2: 7 14 14
(1.23)
In the next section, let us go into the detailed analysis of R.f /, N .f /, and N ? .f / with respect to the front page example. Actually how can we identify the range space R.A/, the null space N .A/ or its orthogonal complement N ? .A/.
1-14 The Range R.f / and the Kernel N .f / The range space R.A/ is conveniently described by the first column c1 D Œ1; 10 and the second column c2 D Œ1; 20 of the matrix A, namely 2-leg with respect to the orthogonal base vector e1 and e2 , respectively, attached to the origin O. Symbolically, we write the 2-leg in the form (1.24). Symbolically, we write R.A/ in the form (1.25). As a linear space, R.A/ y is illustrated by Fig. 1.2. fe1 C e2 ; e1 C 2e2 jOg or fec1 ; ec2 jOg I
(1.24)
R.A/ D span Œe1 C e2 ; e1 C 2e2 jO :
(1.25)
By means of Box 1.4, we identify N .f / or the null space N .A/ and give its illustration by Fig. 1.2. Such a result has paved the way to the diagnostic algorithm for solving an underdetermined system of linear equations by means of rank partitioning presented in Box 1.5. Box 1.4. (General solution of a system of homogeneous linear equations Ax D 0, horizontal rank partitioning). y ec2
1
ec1
e2
Fig. 1.2 Range R.f /, range space R.A/, y 2 R.A/
O
e1
1
1-1 Introduction
11
The matrix A is called horizontally rank partitioned if (1.26) holds. (In the introductory example, A 2 R23 , A1 2 R22 , A2 2 R21 , and d.A/ D 1 applies.) A consistent system of linear equations Ax D y; rkA D dimY is horizontally rank partitioned if (1.27) for a partitioned unknown vector (1.28) applies. 8 < r D rkA D rkA1 D n; A 2 Rnm ^ A D ŒA1 ; A2 A1 2 Rnr ; A2 2 Rnd ; : d D d.A/ D m rkA;
Ax D y; rkA D dimY ” A1 x1 C A2 x2 D y; x1 nm r1 d 1 x2R jx1 2 R ; x2 2 R : ^AD x2
(1.26) (1.27) (1.28)
The horizontal rank partitioning of the matrix A as well as the horizontally rank partitioned consistent system of linear equations Ax D y; rkA D dimY of the introductory example is
111 11 1 AD ; A1 D ; A2 D ; 124 12 4
(1.29)
Ax D y; rkA D dimY ” A1 x1 C A2 x2 D y; x1 D Œx1 ; x2 2 Rr1 ; x2 D Œx3 2 R; 1 11 x1 C x3 D y: x2 4 12
(1.30)
By means of the horizontal rank partitioning of the system of homogenous linear equations, an identification of the null space (1.31) is provided by (1.32). In particular, in the introductory example, we are led to (1.33) and (1.34). N .A/ D fx 2 Rm jAx D A1 x1 C A2 x2 D 0g ;
(1.31)
A1 x1 C A2 x2 D 0 ” x1 D A1 1 A2 x 2 ; 2 1 1 x1 D x3 ; x2 1 1 4
(1.32)
x1 D 2x3 D 2; x2 D 3x3 D 3; x3 D :
(1.33) (1.34)
Here, the two equations Ax D 0 for any x 2 X D R2 constitute the linear space N .A/; dimN .A/ D 1, a one-dimensional subspace of X D R2 . For instance, if we introduce the parameter x3 D , the other coordinates of the parameter space X D R2 amount to x1 D 2x3 D 2; x2 D 3x3 D 3. In geometric language, the linear space N .A/ is a parameterized straight line L10 through the origin O illustrated by Fig. 1.3. The parameter space X D Rr (here r = 2) is sliced by the subspace, the
12
1 The First Problem of Algebraic Regression
Fig. 1.3 The parameter space X D R3 (x3 is not displayed) sliced by the null space, the linear manifold N .A/ D L10 R2
linear space N .A/, also called linear manifold, and dimN .A/ D d.A/ D d (here a straight line L10 through the origin O). In the following section, let us consider the interpretation of MINOS by the three fundamental partitionings: algebraic (rank partitioning), geometric (slicing), and set-theoretical (fibering).
1-15 The Interpretation of MINOS The diagnostic algorithm for solving an underdetermined system of linear equations Ax D y, rkA D dim Y D n, n < m D dimX; y 2 R.A/ by means of rank partitioning is presented to you in Box 1.5 While we have characterized the general solution of the system of homogenous linear equations Ax D 0, we are left with the problem of solving the consistent system of inhomogeneous linear equations. Again, we take advantage of the rank partitioning of the matrix A summarized in Box 1.6. Box 1.5. (A diagnostic algorithm for solving an underdetermined system of linear equations by means of rank partitioning). A diagnostic algorithm for solving an underdetermined system of linear equations Ax D y, rkA D dimY; y 2 R.A/ by means of rank partitioning is immediately introduced by the following chain of statements. Determine the rank of the matrix A W rkA D dimY D n: + Compute the horizontal rank partitioning W A D ŒA1 ; A2 ; A1 2 Rrr D Rnn ; A2 2 Rn.mr/ D Rn.mn / m r D m n is called right complementary index. A as a linear operator is not injective, but surjective. +
1-1 Introduction
13
Compute the null space N .A/ W N .A/ D fx 2 Rm jAx D 0g D fx 2 Rm jx1 C A1 1 A2 x2 D 0g: + Compute the unknown parameter vector of type MINOS: xm D A0 .AA0 /1 y: Box 1.6. (A special solution of a consistent system of inhomogeneous linear equations Ax D y, horizontal rank partitioning). Ax D y
rkA D dimY ” A1 x 1 C A2 x 2 D y y 2 R.A/
(1.35)
Since the matrix A1 is of full rank, it can be regularly inverted (Cayley inverse). In particular, we solve for x1 : 1 x1 D A1 1 A2 x 2 C A1 y
or 2 1 2 1 x1 1 y1 D ; x3 C x2 y2 1 1 1 1 4 x1 D 2x3 C 2y1 y2 ; x2 D 3x3 y1 C y2 :
(1.36)
(1.37) (1.38)
For instance, if we introduce the parameter x3 D , the other coordinates of the parameter space X D R2 amount to (1.39). In geometric language, the admissible parameter space is a family of a one-dimensional linear space, a family of onedimensional parallel straight lines dependent on y D Œy1 ; y2 0 , here Œ2; 30 , in particular, the family (1.40), including the null space L10;0 D N .A/. x1 D 2 C 2y1 y2 ; x2 D 3 y1 C y2 ;
(1.39)
L1y1 ;y2 D fx 2 R3 j2x3 C 2y1 y2 ; x2 D 3x3 y1 C y2 g:
(1.40)
Figure 1.4 illustrates (i) the admissible parameter space L1.y1 ;y2 / , (ii) the line L1? which is orthogonal to the null space called N ? .A/, and (iii) the intersection L1.y1 ;y2 / \ N ? .A/, generating the solution point xm as will be proven now. The geometric interpretation of the minimum norm solution is the following: With reference to Fig. 1.5, we decompose the vector x D xN .A/ C xN ? .A/ , where xN .A/ is an element of the null space N .A/ (here the straight line L10;0 ), and xN ? .A/ x0 is an element of the null space N .A/ (here the straight line L10;0 ), while iN .A/ D im is an element of the rang space R.A / (here the above outlined
14
1 The First Problem of Algebraic Regression
Fig. 1.4 The range space R.A / (the admissible parameter space), parallel straight lines L1.y1 ;y2 / , namely L1.2;3/
x2 1~ ⊥
(A)⊥
x1 O
(A)~
1
(0, 0) 1
(2, 3)
Fig. 1.5 Orthogonal projection of an element of N .A/ onto the range space R.A /
straight line L1y1 ;y2 , namely L12;3 of the generalized inverse matrix A of type minimum norm solution, which in this book mostly is abbreviated as “MINOS”. k x k2I Dk xN .A/ C xN ? .A/ k2 Dk x k2I C2 < x j i > C k i k2 is minimal if and only if the inner product < x j i >D 0 and xN .A/ and iN ? .A/ D im are orthogonal. The solution point xm is the orthogonal projection of the null point onto R.A /: PR.A / D A Ax D A y for all x 2 D.A/. Alternatively, if the vector xm of minimal length is orthogonal to the null space N .A/, being an element of N ? .A/ (here the line L10;0 ), we may say that N ? .A/ intersects R.A / in the solution point xm . Or we may say that the normal space NL10 with respect to the tangent space TL10 – which is in linear models identical to L10 , the null space N .A/ – intersects the tangent space TL10 , the range space R.A / in the solution point xm . In summary, xm 2 N ? .A/ \ R.A /. Let the algebraic partitioning and the geometric partitioning be merged to interpret the minimum norm solution of the consistent system of linear equations of
1-1 Introduction
15
type underdetermined MINOS. As a summary of such a merger, we take reference to Box 1.7. Box 1.7. (General solution of a consistent system of inhomogeneous linear equations Ax D y/. Consistent system of equations f Wx 7! y D Ax, x 2 X D Rn (parameter space), y 2 Y D Rn (observation space), r D rkA D dimY; A generalized inverse of type MINOS. Condition#1 W f .x/ D f .g.y// ” f D f ı g ı f:
Condition#2 W (reflexive g-inverse mapping) x D g.y/ D g.f .x// Condition#3 W g.f .x// D xR.A / ” g ı f D ProjR.A / :
Condition#1 W Ax D AA Ax ” AA A D A:
(1.41)
Condition#2 W (reflexive g-inverse) x D A y D A AA y ” A AA D A :
(1.42)
Condition#3 W A A D xR.A / ” A A D ProjR.A / :
(1.43)
Regarding the above first condition AA A D A (let us here depart from MINOS of y D Ax, x 2 X D Rn , y 2 Y D Rn , r D rkA D n, i.e., xm D A my D A0 .AA0 /1 y: see (1.44). Concerning the above second condition A AA D A see (1.45). Concerning the above third condition A A D ProjR.A / : see (1.46). Axm D AA m y D AAm Axm H) Axm D AAm Axm ” AA A D A; xm D A0 .AA0 /1 y D A m y D AAm Ax H) xm D Am y D Am AAm y H) H) A m y D Am AAm y ” A AA D A ; x m D A m y D Am Axm ” A A D ProjR.A / WD ProjR.A /
(1.44) (1.45) (1.46)
How are rkA m D rkA and Am A interpreted? On the interpretation of rkAm D rkA: the g-inverse of type MINOS is the generalized inverse of maximal rank since in general rkA rkA holds. On the interpretation of A m A: Am A is an orthogonal projection onto R.A /, but im D I Am A onto its orthogonal complement R? .A /.
16
1 The First Problem of Algebraic Regression
Fig. 1.6 Venn diagram, trivial fibering of the domain D .f /, trivial fibers N .f / and N ? .f /, f W Rn D X ! Y D Rn ; Y D R.f /, X is the set system of the parameter space, Y is the set system of the observation space
If the linear mapping f W x 7! y D f .x/, y 2 R.f / is given, we are aiming at a generalized inverse (linear) mapping y 7! x D g.y/ in such a manner that y D f .x/ D f .g.y// D f .g.f .x/// or f D f ıgıf as a first condition is fulfilled. Alternatively, we are going to construct a generalized inverse AA W y 7! A y D x in such a manner that the first condition y D Ax D AA Ay or A D AA A holds. Though the linear mapping f W x 7! y D f .x/, y 2 R.f / or the system of linear equations Ax D y, rkA D dimY is consistent, it suffers from the (injectivity) deficiency of the linear mapping f .x/ or of the matrix A. Indeed, it recovers from the (injectivity) deficiency if we introduce the projection x 7! g.f .x// D q 2 R.g/ or x 7! A Ax D q 2 R.A / as the second condition. The projection matrix A A is idempotent, which follows from P2 D P or .A A/.A A/ D A AA A D A A. Let us here finally outline the set-theoretical partitioning, the fibering of the set system of points which constitute the parameters space X, the domain D.f /. Since the set system X (the parameters space) is Rr , the fibering is called trivial. Nontrivial fibering is reserved for nonlinear models in which case we are dealing with a parameters space X which is a differentiable manifold. Here, the fibering D.f / D N .f / [ N ? .f / produces the trivial fibers N .f / and N ? .f /, where the trivial fiber N ? .f / is the quotient set Rn =N .f /. By means of a Venn diagram (John Venn 1834–1928), also called Euler circles (Leonhard Euler 1707–1783), Fig. 1.6 illustrates the trivial fibers of the set system X D Rm that is generated by N .f / and N ? .f /. Of course, the set system of points which constitute the observation space Y is not subject to fibering since all points of the set system D.f / are mapped into the range R.f /.
1-2 Minimum Norm Solution (MINOS) The system of consistent linear equations Ax D y.A 2 Rnm ; rkA D n < m/ allows certain solutions which we introduce by means of Definition 1.1 as a solution of a certain optimization problem. Lemma 1.2 contains the normal equations of the optimization problem. The solution of such a system of normal equations is presented in Lemma 1.3 as the minimum norm solution (MINOS) with respect to the Gx -seminorm. Finally, we discuss the metric of the parameter space and alternative choices of its metric before we identify by Lemma 1.4 the solution of the quadratic optimization problem in terms of the (1, 2, 4)-generalized inverse.
1-2 Minimum Norm Solution (MINOS)
17
By Definition 1.1, we characterize Gx -MINOS of the consistent system of linear equations Ax D y subject to A 2 Rnm , rkA D n < m (algebraic condition) or y 2 R.A/ (geometric condition). Loosely speaking, we are confronted with a system of linear equations with more unknowns m than equations n, namely n < m. Gx -MINOS will enable us to find a solution of this underdetermined problem. By means of Lemma 1.2, we shall write down the “normal equations” of Gx -MINOS. Definition 1.1. (The minimum norm solution (MINOS) with respect to the Gx -seminorm). A vector xm is called Gx -MINOS (minimum norm solution (MINOS) with respect to the Gx -seminorm) of the consistent system of linear equations (1.47) if both Axm D y and – in comparison to all other vectors of solution x 2 X Rn – the inequality k xm k2Gx WD x0 m Gx xm x0 Gx x WDk x k2Gx holds. The system of inverse linear equations A y C i D x, rkA ¤ m or x 62 R.A / is inconsistent. 8 ˆ < rkA D rkŒA; y D n < m n Ax D y; y 2 Y R A 2 Rnm ˆ : y 2 R.A/:
(1.47)
Lemma 1.2. (The minimum norm solution (MINOS) with respect to the Gx seminorm). A vector xm 2 X Rn is Gx -MINOS of (1.47) if and only if the system of equations (1.48) with the vector 2 Rn 1 of Lagrange multipliers is fulfilled. xm exists always and is, in particular, unique if rkŒGx ; A0 D m holds or, equivalently, if the matrix Gx C A0 A is regular.
G x A0 A 0
xm
D
0 y
(1.48)
Proof. Gx -MINOS is based on the constrained Lagrangean (1.49) such that the first derivatives (1.50) constitute the necessary conditions. The second derivatives (1.51), according to the positive semi-definiteness of the matrix Gx , generate the sufficiency condition for obtaining the minimum of the constrained Lagrangean. According to the assumption rkA D rkŒA; y D n or y 2 R.A/ the existence of Gx -MINOS xm is guaranteed. In order to prove uniqueness of Gx -MINOS xm , we have to consider case (i) (Gx positive-definite) and we have to consider case (ii) (Gx positive semidefinite). L.x; / D x0 Gx x C 20 .Ax y/ D minx;
(1.49)
18
1 The First Problem of Algebraic Regression
9 1 @L > 0 .xm ; m / D Gx xm C A m D 0 > = xm 0 G x A0 2 @x D ;(1.50) ” > 1 @L A 0 y > ; .xm ; m / D Axm y D 0 2 @ 1 @2 L .xm ; m / D Gx 0: 2 @x@x0
(1.51)
Case (i) (Gx positive-definite): Due to rkGx D m and jGx j ¤ 0, the partitioned system of normal equations (ref) is uniquely solved. The theory of inverse partitioned matrices (IPM) is presented in Appendix A. G x A0 xm 0 jGx j ¤ 0; D : (1.52) A 0 y Case (ii) (Gx positive semi-definite): Follow these algorithmic steps. Multiply the second normal equation by A0 in order to produce A0 Ax A0 y D 0 or A0 Ax D A0 y and add the result to the first normal equation, which generates Gx xm C A0 Axm C A0 m D A0 y and therefore .Gx C A0 A/xm C A0 m D A0 y. The augmented first normal equation and the original second normal equation build up the equivalent system of normal equations (1.53), which is uniquely solved due to rkŒGx C A0 AD m; jGx C A0 Aj ¤ 0. jGx C A0 Aj ¤ 0;
G x C A0 A A
A0 0
xm
D
A0 y: In
(1.53)
The solution of the system of normal equations directly leads to the linear form xm D Ly, which is Gx -MINOS of (1.48) and can be represented as follows. Lemma 1.3. (The minimum norm solution (MINOS) with respect to the Gx seminorm). xm D Ly is Gx -MINOS of the consistent system of linear equations (1.48), i.e. Ax D y and rkA D rkŒA; y D n or y 2 R.A/, if and only if L 2 Rmn is represented by the following “relations”. Case (i) .Gx D Im / W 0 0 1 L D A (right inverse); g D A .AA / 0 0 1 x m D A g y D A .AA / y; x D xm C im :
(1.54) (1.55)
x D xm C im is an orthogonal decomposition of the unknown vector x 2 X Rm into the I-MINOS vector xm 2 Ln and the I-MINOS vector of inconsistency im 2 Ld subject to xm D A0 .AA0 /1 Ax and im D xxm D ŒIm A0 .AA0 /1 Ax. (According to xm D A0 .AA0 /1 Ax, I-MINOS has the reproducing property. As projection
1-2 Minimum Norm Solution (MINOS)
19
matrices, the matrix A0 .AA0 /1 A with rkA0 .AA0 /1 A D rkA D n and the matrix ŒIm A0 .AA0 /1 A with rkŒIm A0 .AA0 /1 A D n rkA D d are independent). Their corresponding norms are positive semi-definite, namely k xm k2Im and k im k2Im yield k xm k2Im D y0 AA0 /1 y D x0 A0 .AA0 /1 Ax D x0 Gm x and k im k2Im D x0 ŒIm A0 .AA0 /1 Ax. In particular, note that xm D A0 .AA0 /1 Ax; im D x xm D ŒIm A0 .AA0 /1 Ax:
(1.56)
Case (ii) .Gx positive-definite/ W 0 1 0 1 L D G1 (weighted right inverse); x A .AGx A /
xm D
0 1 0 1 G1 x A .AGx A / y;
x D xm C im :
(1.57) (1.58)
x D xm C im is an orthogonal decomposition of the unknown vector x 2 X Rm into the Gx -MINOS vector xm 2 Ln and the Gx -MINOS vector of inconsistency 0 1 0 1 1 0 1 0 1 im 2 Ld .xm D G1 x A .AGx A / Ax; im D ŒIm Gx A .AGx A / Ax/. (Due 0 1 0 1 to xm D G1 A .AG A / Ax, G -MINOS has the reproducing property. As prox x x 0 1 0 1 1 0 1 0 1 jection matrices, the matrix G1 A .AG A / A with rkG A .AG x x x x A/ A D 0 1 0 1 1 0 1 0 1 rkA D n and ŒIm G1 A .AG A / A with rkŒI G A .AG A / A D n m x x x x rkA D d are independent). Their corresponding norms are positive semi-definite, 0 1 0 0 1 0 1 namely these are given by k xm k2Gx D y0 .AG1 x A / y D x A .AGx A / Ax D 2 0 0 0 1 0 1 x Gm x and k im kGx D x ŒGx A .AGx A / Ax. In particular, note that 0 1 0 1 1 0 1 0 1 xm D G1 x A .AGx A / Ax; im D ŒIm Gx A .AGx A / Ax:
(1.59)
Case (iii) (Gx positive-definite): L D .Gx C AA0 /1 A0 /ŒA0 .Gx C AA0 /1 A0 1 ;
(1.60)
xm D .Gx C AA0 /1 A0 /ŒA0 .Gx C AA0 /1 A0 1 y;
(1.61)
xm D xm C im :
(1.62)
x D xm C im is an orthogonal decomposition of the unknown vector x 2 X Rm into the Gx C AA0 -MINOS vector xm 2 Ln and the Gx C AA0 -MINOS vector of inconsistency im 2 Ld subject to xm D .Gx C AA0 /1 A0 /ŒA0 .Gx C AA0 /1 A0 1 Ax;
(1.63)
im D .Im .Gx C AA0 /1 A0 ŒA0 .Gx C AA0 /1 A0 1 A/x:
(1.64)
Due to xm D .Gx C AA0 /1 A0 ŒA0 .Gx C AA0 /1 A0 1 Ax, Gx C AA0 -MINOS has the reproducing property. As projection matrices, the two matrices (1.65) and (1.66) .Gx C AA0 /1 A0 ŒA.Gx C AA0 /1 A0 1 A rkŒGx C AA0 1 A0 ŒA.Gx C AA0 /1 A0 1 A D rkA D n
(1.65)
20
1 The First Problem of Algebraic Regression
.Im .Gx C AA0 /1 A0 ŒA.Gx C AA0 /1 A0 1 A/ rkŒIm .Gx C AA0 /1 A0 ŒA.Gx C AA0 /1 A0 1 A D n rkA D d:
(1.66)
Their corresponding norms are positive semi-definite, namely the corresponding norms read k xm k2Gx CAA0 D y0 ŒA.Gx C AA0 /1 A0 1 y D x0 A0 ŒA.Gx C AA0 /1 A0 1 Ax D x0 Gm x; k
im k2Gx CAA0
0
0 1
0
0 1
(1.67) 0 1
D x .Im .Gx C AA / A ŒA.Gx C AA / A A/x (1.68)
Proof. A basis of the proof could be C.R. Rao’s Pandora Box, the theory of inverse partitioned matrices. (Appendix A. Fact: Inverse Partitioned Matrix (IPM) of a 0 symmetric matrix). Due to the rank identity rkA D rkAG1 x A D n < m, the normal equations of the case (i) and the case (ii) (!(1.69)) can be directly solved by Gauss elimination. For the case (iii), we add to the first normal equation the term AA0 xm D A0 y. Due to the rank identity rkA D rkA0 .Gx C AA0 /1 A D n < m, the modified normal equations of the case (i) and the case (ii) (!(1.70)) can be directly solved by Gauss elimination.
0 ; y 0 A0 xm A y: D 0 In
G x A0 A 0
G x C A0 A A
xm
D
(1.69) (1.70)
Multiply the first equation of (1.69) by AG1 x and subtract the second equation of (1.69) from the modified first one to obtain (1.71). The forward reduction step is followed by the backward reduction step. Implement m into the first equation of (1.71) and solve for xm to obtain (1.72). Thus, Gx -MINOS xm and m are represented by (1.73). 0 Axm C AG1 x A m D 0; Axm D y; 0 1 m D .AG1 x A / y; 0 1 Gx xm .AG1 x A / y D 0; 0 1 0 1 xm D G1 x A .AGx A / y
0 1 0 1 xm D G1 x A .AGx A / y; 0 1 m D .AG1 x A/ y
(1.71)
(1.72)
(1.73)
Multiply the first equation of (1.70) by AG1 x and subtract the second equation of (1.70) from the modified first one to obtain (1.74). The forward reduction step
1-2 Minimum Norm Solution (MINOS)
21
is followed by the backward reduction step. Implement m into the first equation of (1.74) and solve for xm to obtain (1.75). Thus, Gx -MINOS xm and m are represented by (1.76). Axm C A.Gx C A0 A/1 A0 m D A.Gx C A0 A/1 A0 y; Axm D y
(1.74)
m D ŒA.Gx C A0 A/1 A0 1 ŒA.Gx C A0 A/1 A0 In y D .ŒA.Gx C A0 A/1 A0 1 In /y: .Gx C A0 A/xm A0 ŒA.Gx C A0 A/1 A0 1 y D 0; xm D .Gx C A0 A/1 A0 ŒA.Gx C A0 A/1 A0 1 y; 0
1
0
0
1
(1.75)
0 1
xm D .Gx C A A/ A ŒA.Gx C A A/ A y; m D .ŒA.Gx C A0 A/1 A0 1 In /y:
(1.76)
1-21 A Discussion of the Metric of the Parameter Space X With the completion of the proof, we have to discuss in more detail the basic results of Lemma 1.3. At first, we have to observe that the matrix Gx of the metric of the parameter space X has to be given a priori. We classified MINOS according to (i) Gx D Im , (ii) Gx positive-definite, and (iii) Gx positive semi-definite. But how do we know the metric of the parameters space? Obviously, we need prior information about the geometry of the parameter space X, namely from the empirical sciences like physics, chemistry, biology, geosciences, social sciences. If the parameter space X 2 Rm is equipped with an inner product < x1 jx2 > of the type < x1 jx2 >D x1 0 Gx x2 , with x1 2 X and x2 2 X, where the matrix Gx of the kxk2 metric kxk2 D x0 Gx x is positive-definite, we refer to the metric space X 2 Rm as Euclidean Em . In contrast, if the parameter space X 2 Rm is restricted to a metric space with a matrix Gx of the metric which is positive semi-definite, we call the parameter space semi-Euclidean Em1 ;m2 . m1 is the number of positive eigenvalues, m2 is the number of zero eigenvalues of the positive semi-definite matrix Gx of the metric .m D m1 C m2 /. In various applications, namely in the adjustment of observations which refer to Special Relativity or General Relativity, we have to generalize the metric structure of the parameter space X: if the matrix Gx of the pseudometric kxk2 D x0 Gx x is built on m1 positive eigenvalues (signature C), m2 zero eigenvalues, and m3 negative eigenvalues (signature ), we call the pseudometric parameter space pseudo Euclidean Em1 ;m2 ;m3 ; m D m1 Cm2 Cm3 . For such a parameter space, MINOS has to be generalized to kxk2 = extr., for instance, maximum norm solution.
22
1 The First Problem of Algebraic Regression
1-22 An Alternative Choice of the Metric of the Parameter Space X Another problem that is associated with the parameter space X is the so-called norm choice problem. Here, we have used the l2 -norm, for instance, the l2 vnorm (1.77). An alternative norm of choice is the lp -norm (1.78) with 1 p < 1. Beside the choice of the matrix Gx of the metric within the l2 -norm and of the lp -norm itself, we like to discuss the result of the MINOS matrix Gm of the metric. Indeed, we have constructed MINOS from an a priori choice of the metric G, called Gx , and were led to the a posteriori choice of the metric Gm of the types (1.79). The matrices (1.79) are (i) idempotent, (ii) Gx idempotent, and (iii) ŒA.Gx C A0 A/1 A0 1 idempotent, namely projection matrices. Similarly, the norms kim k2 of the types (1.80) measure the distance of the solution point xm 2 X from the null space N .A/. The matrices 0 1 (1.80) are (i) idempotent, (ii) G1 x idempotent, and (iii) .Gx C A A/ A idempotent, namely projection matrices.
q
q p
kxk2 WD p WD x0 x D x12
C
x22
C
p x2
C
2 : : : xm1
(1.77) C
2; xm
kxkp WD p x1
p
p
C : : : xm1 C xm ;
(1.78)
.i / Gm D A0 .AA0 /1 A; 0 1 .ii/ Gm D A0 .AG1 x A / A; 0 0 .iii/ Gm D A ŒA.Gx C A A/1 A0 1 A;
(1.79)
Im A0 .AA0 /1 A; 0 1 Gx A0 .AG1 x A / A; 0 1 0 Im .Gx C A A/ A ŒA.Gx C A0 A/1 A0 1 A
(1.80)
1-23 Gx -MINOS and Its Generalized Inverse By Lemma 1.4, a more formal version of the generalized inverse which is characteristic for Gx -MINOS is presented. Lemma 1.4. (Characterization of Gx -MINOS). xm D Ly is I-MINOS of the consistent system of linear equations (1.47) (i.e. AxD y and rkA D rk.A; y/ or y 2 R.A/) if and only if the matrix L 2 Rmn fulfils the two conditions (1.81). The matrix L is reflexive and is the A1;2;4 generalized inverse.
1-2 Minimum Norm Solution (MINOS)
23
ALA D A; LA D LA0
(1.81)
xm D Ly is Gx -MINOS of the consistent system of linear equations (1.47) (i.e. Ax D y and rkA D rk.A; y/ or y 2 R.A/) if and only if the matrix L 2 Rmn fulfils the two conditions (1.82). The reflexive matrix L is the Gx -weighted A1;2;4 generalized inverse. ALA D A; Gx LA D .Gx LA/0
(1.82)
Proof. According to the theory of the general solution of a system of linear equations which is presented in Appendix A, the conditions ALA D L or L D A guarantee the solution x D Ly of (1.47) .rkA D rk.A; y//. The general solution (1.83) with an arbitrary vector x 2 Rm1 leads to the appropriate representation of the Gx seminorm by means of (1.84), where the arbitrary vector y 2 Y Rn holds, if and only if y0 L0 Gx .I LA/z D 0 for all z 2 Rm 1 or A0 L0 Gx .I LA/ D 0 or A0 L0 Gx D A0 L0 Gx LA. The right hand side is a symmetric matrix. Accordingly, the left hand side must have this property, too, namely Gx LA D .Gx LA/0 , what had to be shown. x D xm C .I LA/z kxm k2Gx
D
kLyk2Gx
kxk2Gx
(1.83) D kxm C .I
LA/zk2Gx
D kxm k2Gx C 2x0 m Gx .I LA/z C k.I LA/zk2Gx
(1.84)
D kLyk2Gx C 2y0 L0 Gx .I LA/z C k.I LA/zk2Gx D y0 L0 Gx Ly C 2y0 L0 Gx .I LA/z C z0 .I A0 L0 /Gx .I LA/z Reflexivity of the matrix L originates from the consistency condition, namely .I AL/y D 0 for all y 2 Rm1 or AL D I. The reflexive condition of the 1 0 1 0 Gx weighted,minimum norm generalized inverse im ŒIm G1 x A .AGx A / Ax, Gx LAL D Gx L, is a direct consequence. Consistency of the normal equations (1.48) or, equivalently, the uniqueness of Gx xm follows from (1.85) for arbitrary 1 0 1 0 matrices L1 2 Rmn and L2 2 Rmn which satisfy xm D G1 x A .AGx A / Ax. Gx L1 y D A0 L0 1 Gx L1 y D Gx L1 AL1 y D Gx L1 AL2 y D A0 L0 1 AL0 2Gx L2 y D A0 L2 Gx L2 y D Gx L2 y
(1.85)
24
1 The First Problem of Algebraic Regression
1-24 Eigenvalue Decomposition of Gx -MINOS: Canonical MINOS In the empirical sciences, we meet quite often the inverse problem to determine the infinite set of coefficients of a series expansion of a function or a functional (Taylor polynomials) from a finite set of observations. Mind the following examples. (i) Determine the Fourier coefficients (discrete Fourier transform, trigonometric polynomials) of a harmonic function with circular support from observations in a one-dimensional lattice. (ii) Determine the spherical harmonic coefficients (discrete Fourier–Legendre transform) of a harmonic function with spherical support from observations in a two-dimensional lattice. End of examples. Typically, such expansions generate an infinite dimensional linear model based upon orthogonal (orthonormal) functions. Naturally, such a linear model is underdetermined since a finite set of observations is confronted with an infinite set of unknown parameters. In order to make such an infinite dimensional linear model accessible to the computer, the expansion into orthogonal (orthonormal) functions is truncated or band-limited. Note that the observables y 2 Y and dim Y D n are related to parameters x 2 X and dimX D m n D dimY (namely the unknown coefficients) by a linear operator A 2 Rmn , which is given in the form of an eigenvalue decomposition. Furthermore, note that we are confronted with the problem to construct canonical MINOS, also called eigenvalue decomposition of Gx -MINOS. First, we intend to canonically represent the parameter space X as well as the observation space Y. Here, we shall assume that both spaces are Euclidean, equipped with a symmetric, positive-definite matrix of the metric Gx and Gy , respectively. Enjoy the diagonalization procedure of both matrices that is reviewed in Box 1.8. The inner products aa0 and bb0 , respectively, constitute the matrix of the metric Gx and Gy , respectively. The base vectors fa1 ; : : : ; am j 0g span the parameter space X, dim X D m, and the base vectors fb1 ; : : : ; bm j 0g span the observation space Y, dimY D n. Note the rank identities rkGx D m and rkGy D n, respectively. The left norm kxk2Gx D x0 Gx x is taken with respect to the left matrix of the metric Gx . In contrast, the right norm kyk2Gy D y0 Gy y refers to the right matrix of the metric Gy . In order to diagonalize the left quadratic form as well as the right quadratic form, we transform Gx 7! Gx D diagŒx1 ; : : : ; xm D V0 Gx V y y as well as Gy 7! Gy D diagŒ1 ; : : : ; n D U0 Gy U into the canonical form by means of the left orthonormal matrix V and by means of the right orthonormal matrix U. Such a procedure is called eigenspace analysis of the matrix Gx as well as eigenspace analysis of the matrix Gy . ƒx constitutes the diagonal matrix of the y y left positive eigenvalues (x1 ; : : : ; xm /, the right positive eigenvalues (1 ; : : : ; n ) the n-dimensional right spectrum. The inverse transformation G x D ƒx 7! Gx
1-2 Minimum Norm Solution (MINOS)
25
as well as Gy D ƒy 7! Gy is denoted as left eigenspace synthesis as well as right eigenspace synthesis. Box 1.8. (The canonical representation of the matrix of the metric, parameter space versus observation space). Parameter Space W spanŒa1 ; : : : ; am D X < aj1 jaj2 >D gj1 j2 8j1 j2 2 f1; : : : ; mg
Observation Space: spanŒb1 ; : : : ; bn D Y < bk1 jbk2 >D gk1 k2 8k1 k2 2 f1; : : : ; ng
aa0 D Gx ; rkGx D m:
bb0 D Gy ; rkGy D n:
Left norms W 0 kxk2Gx D x0 Gx x D x x
Right norms W 0 kyk2Gy D y0 Gy y D y y :
Eigenspace analysis.Gx / W Gx D V0 Gx V D D diagŒx1 ; : : : ; xm D ƒx ; subject to: VV0 D V0 V D Im ; .Gx xj Im /vj D 0: Eigenspace synthesis.Gx / W Gx D VGx V0 D Vƒx V0 :
(1.86)
(1.87)
Eigenspace analysis.Gy / W Gy D U0 Gy U D (1.88) y y D diagŒ1 ; : : : ; n D ƒy ; subject to: UU0 D U0 U D In ; y .Gy k In /uj D 0:
(1.89)
Eigenspace synthesis.Gy / W Gy D UGy U0 D Uƒy U0 :
(1.90)
Second, let us here study the impact of the left diagonalization of the metric Gx as well as right diagonalization of the matrix of the metric Gy on the coordinates x 2 X and y 2 Y, the parameter systems of the left Euclidean space X, dim X D m, and of the right Euclidean space Y. Enjoy the way how we have established in Box 1.9 the canonical coordinates x WD Œx1 ; : : : ; xm 0 of X as well as the canonical coordinates y WD Œy1 ; : : : ; yn , which are called the left and right star coordinates of X and Y, respectively. In terms of the star coordinates (1.92), the left norm kx k2 of the type (1.90, left) as well as the right norm ky k2 of type (1.90, right) take the canonical left and right quadratic form. The transformations x 7! x as well as y 7! y of type (1.92, left) and (1.92, right) are special versions of the left and right polar decomposition: a rotation constituted by the matrices U; V is followed 1=2 by a stretch constituted by the matrices fƒ1=2 x ; ƒy g as diagonal matrices. The forward transformations x 7! x and y 7! y are computed by the backward transformations x 7! x and y 7! y. fƒ1=2 and ƒ1=2 x y g, respectively, denote those diagonal matrices which are generated by the positive roots of the left and right eigenvalues, respectively. Equation (1.94) defines corresponding direct and inverse
26
1 The First Problem of Algebraic Regression
matrix identities. We conclude with the proof that the ansatz (1.92) indeed leads to the canonical representation (1.91) of the left and right norms. Box 1.9. (The canonical coordinates x 2 X and y 2 Y, parameter space versus observation space).
Canonical coordinates (parameter space) W 0 kx k2 D x x D
Canonical coordinates (observation space) W 0 ky k2 D y y D Ansatz) W y D U0 ƒ1=2 y y
Ansatz W x D V0 ƒ1=2 x x versus x D ƒ1=2 Vx x hp p i x ƒC1=2 WD diag ; : : : ; xm ; x 1 1 1 p p ƒ1=2 : DWD diag ; : : : ; x x x 1
Ansatz proof W Gx D Vƒx V0 ; kxk2Gx D D x0 Gx x D 1=2 0 0 D x Vƒ1=2 x ƒx V x D 0 1=2 0 x ƒx V Vƒx ƒ1=2 x 1=2 0 V Vƒx x D 0 D x x D D kx k2 :
(1.91)
m
(1.92)
versus) (1.93) Uy y D ƒ1=2 y q p y y C1=2 ƒy WD diag 1 ; : : : ; m ; 1 1 p p ƒ1=2 : D ; : : : ; y y y 1
Ansatz proof W Gy D Uƒy U0 ; kyk2Gy D D y0 Gy y D 1=2 0 0 D y Uƒ1=2 y ƒy U y D 0 1=2 0 y ƒy U Uƒy ƒ1=2 y 1=2 0 U Uƒy y D 0 D y y D D ky k2 :
m
(1.94)
(1.95)
Third, let us here study the dual operations of coordinate transformations x 7! x and y 7! y , namely the behavior of canonical bases, which are also called orthonormal bases ex and ey , or Cartan frames of reference fex1 ; : : : ; exm j0g spanning y y the parameter space X and fe1 ; : : : ; en j0g spanning the observation space Y, here a 7! ex and b 7! ey . In terms of orthonormal bases ex and ey , as outlined in Box 1.10, the matrix of the metric ex e0 x D Im and e0 y ey D In takes the canonical form (“modular”). Compare the relations (1.97, left) and (1.99, left). Equations (1.97, right) and (1.99, right) are achieved by the changes of bases (“CBS”) of type left ex 7! a, a 7! ex , compare with (1.100, left) and (1.101, left), and of type right ey 7! b, b 7! ey , compare with (1.100, right) and (1.101, right). Indeed, the transformations x 7! x , a 7! ex , and y 7! y , b 7! ey are dual or inverse. Fourth, let us begin the eigenspace analysis versus
1-2 Minimum Norm Solution (MINOS)
27
eigenspace synthesis of the rectangular matrix, A 2 Rnm , r WD rkA D n; n < m. Indeed, the eigenspace of the rectangular matrix looks differently when compared to the eigenspace of the quadratic, symmetric, positive-definite matrix Gx 2 Rmm , rkGx D m and Gy 2 Rnn , rkGy D n of the left and the right metric. At first, we have to generalize the transpose of a rectangular matrix by introducing the adjoint operator A# , which takes into account the matrices fGx ; Gy g of the left and the right metric. The definition of the adjoint operator A# , compare with Definition 1.5, is followed by its representation, compare with Lemma 1.6. Box 1.10. (The general bases versus the orthonormal bases spanning the parameter space as well as the observation space). Left .parameter spaceX; general left base/ W spanŒa1 ; : : : ; am D X:
Right .observation spaceY; general right base/ W spanŒb1 ; : : : ; bn D Y:
(1.96)
Matrix of the metric bb0 D Gy :
(1.97)
Orthonormal right base y y spanŒe1 ; : : : ; en D Y:
(1.98)
Matrix of the metric ex e0 x D Im :
Matrix of the metric e0 y ey D In :
(1.99)
Base transformation a D ƒ1=2 x Vex :
Base transformation b D ƒ1=2 y Uey :
(1.100)
versus b; ey D U0 ƒ1=2 y y y spanŒe1 ; : : : ; en D Y:
(1.101)
Matrix of the metric aa0 D Gx : Orthonormal left base spanŒex1 ; : : : ; exm D X:
versus ex D V0 ƒ1=2 a; x spanŒex1 ; : : : ; exm D X: Definition 1.5. (Adjoint operators A# ).
The adjoint operator A# 2 Rmn of the matrix A 2 Rnm is defined by the inner product identity < yjAx >Gy D< xjA# y >Gx ; (1.102) where the left inner product operates on the symmetric, full rank matrix Gy of the observation space Y, while the right inner product is taken with respect to the symmetric full rank matrix Gx of the parameter space X.
28
1 The First Problem of Algebraic Regression
Lemma 1.6. (Adjoint operators A# /. A representation of the adjoint operator A# 2 Rmn of the matrix A 2 Rnm is provided by 0 A# D G1 (1.103) x A Gy : Proof. For the proof, we take advantage of the symmetry of the left inner product, namely < yjAx >Gy D y0 Gy Ax versus < xjA# y >Gx D x0 Gx A# y; y0 Gy Ax D x0 A0 Gy y D x0 Gx A# y ” A0 G y D G x A# ” 1 0 G x A G y D A#
(1.104)
(1.105)
Fifth, we additionally solve the underdetermined system of linear equations fAx D yjA 2 Rnm ; rkA D n; n < mg by introducing the eigenspace of the rectangular matrix A 2 Rnm of rank r WD rkA D n; n < m.A 7! A / and the left and right canonical coordinates x 7! x and y 7! y . Compare with the relations that are supported by Box 1.11. Box 1.11. (Canonical representation, underdetermined system of linear equations).
Left .parameter space X W 1=2
y D U0 Gy y;
Vx :
y D Gy
x D V0 Gx x; 1=2
x D Gx
Right .observation space Y W 1=2
1=2
(1.106)
Uy :
Underdetermined system of linear equations .fAx D yjA 2 Rnm ; rkA D n; n < mg/ W y D Ax; 1=2 1=2 Gy Uy D AGx Vx ;
y D .U
0
1=2 1=2 Gy AGx V/x
subject to U0 U D UU0 D In :
y D A x ; 1=2
1=2
U0 Gy y D A V0 Gx x; yD
(1.107)
1=2 1=2 .Gy UA V0 Gx /x
subject to V0 V D VV0 D Im :
(1.108)
1-2 Minimum Norm Solution (MINOS)
29
Left and right eigenspace. Left eigenspace analysis W 1=2 1=2 A D U0 Gy AGx ŒV1 ; V2 D ŒV; 0:
Right eigenspace synthesis W 0 V1 1=2 1=2 Gx : A D Gy UŒV; 0 V0 2 (1.109)
Dimension identities
ƒ 2 Rrr ; 0 2 Rr.mr/ ; r WD rkA D n; n < m V1 2 Rmr ; V2 2 Rm.mr/ ; U 2 Rrr Right eigenspace W Left eigenspace W
1
1
1
1
R WD Gx 2 V ) R1 D V0 Gx2
L WD Gy 2 U ) L1 D U0 Gy2
1
1
(1.110)
R1 WD Gx 2 V1 ; R2 WD Gx 2 V2 1
1
0 2 0 2 R 1 WD V1 Gx ; R2 WD V2 Gx 1 0 1 LL0 D G1 D Gy y ) .L / L
1 0 1 RR0 D G1 D Gx x ) .R / R
A D LA R1 versus A D L1 AR A D Œƒ; 0 D R1 versus A D L Œƒ; 0 R2 D L1 A ŒR1 ; R2 # A AR1 D R1 ƒ2 AA# L D Lƒ2 versus A# AR2 D 0
(1.111) (1.112) (1.113) (1.114)
Underdetermined system of linear equations solved in canonical coordinates:
x y D A x D Œƒ; 0 1 x2
D ƒx1 ; x1 2 Rr1 ; x2 2 R.mr/1
H) 1 ƒ y D x2 if x is MINOS, then x2 D 0 W x1 m D ƒ1 y :
x1 x2
(1.115)
The transformations x 7! x , y 7! y from the original coordinates Œx1 ; : : : ; xm /, the parameters of the parameter space X X, to the canonical coordinates Œx1 ; : : : ; xm , the left star coordinates, as well as from the original coordinates
30
1 The First Problem of Algebraic Regression
Œy1 ; : : : ; yn , the parameters of the observation space Y Y, to the canonical coordinates Œy1 ; : : : ; yn , the right star coordinates are polar decompositions: a 1=2 1=2 1=2 rotation fU; Vg is followed by a general stretch fGy ; Gx g. The matrices Gy as 1=2 well as Gx are product decompositions of type Gy D Sy S0y and Gx D S0x Sx . If 1=2
1=2
we substitute Sy D Gy or Sx D Gx symbolically, we are led to the methods 1=2 1=2 of general stretches Gy and Gx respectively. Let us substitute the inverse 1=2 1=2 transformations x 7! x D Gx Vx and y 7! y D Gy Uy into our system of linear equations y D Ax or its dual y D A x . Such an operation leads us to y D f .x / as well as y D f .x/. Subject to the orthonormality conditions U0 U D In and V0 V D Im we have generated the matrix A of left-right eigenspace analysis (1.109, left), namely (1.116), subject to the horizontal rank partitioning of the matrix V D ŒV1 ; V2 . Alternatively, the left-right eigenspace synthesis (1.109, 1=2 right), namely (1.117), is based upon the left matrix L WD Gy U and the right 1=2 matrix R WD Gx V. A D Œƒ; 0 ;
1
A D Gy 2 U Œƒ; 0
"
#
V01 G1=2 x : V02
(1.116) (1.117)
Indeed the left matrix L by means of LL0 D G1 y reconstructs the inverse matrix of the metric of the observation space Y. Similarly, the right matrix R by means of RR0 D G1 x generates the inverse matrix of the metric of the parameter space X. In terms of L and R we have summarized the eigenvalue decompositions (1.112)– (1.114). Such an eigenvalue decomposition helps us to canonically invert y D A x by means of (1.115), namely the rank partitioning of the canonical unknown vector x into x1 2 Rr and x2 2 Rmr to determine x1 D ƒ1 y , but leaving x2 underdetermined. Consult the commutative diagram that is shown in Fig. 1.7 for a short hand summary of the introduced transformations of coordinates of both the parameter space X and the observation space Y. Next we shall proof that x2 D 0 if x is MINOS. Six, we prepare ourselves for MINOS of the underdetermined system of linear equations fy D AxjA 2 Rnm ; rkA D n; n < m; jjxjj2Gx D ming by introducing Lemma 1.7, namely the eigenvalue–eigencolumn equations of the matrices A# A and AA# , respectively, as well as Lemma 1.8, our basic result on canonical MINOS, subsequently completed by proofs. Lemma 1.7. (eigenspace analysis versus eigenspace synthesis of the matrix fA 2 Rnm ; r WD rkA D n < mg) The pair of matrices fL; Rg for the eigenspace analysis and the eigenspace synthesis of the rectangular matrix A 2 Rnm of rank r WD rkA D n < m, namely
1-2 Minimum Norm Solution (MINOS) Fig. 1.7 Commutative diagram of coordinate transformations
31 A
∋x
y∈
1
1
V′Gx2
U′Gy2
∋ x*
y* ∈ A*
A D L1 AR versus A D LA R1
(1.118)
or R1 1 A D Œƒ; 0 D L A ŒR1 ; R2 versus A D L Œƒ; 0 ; R1 2
1
(1.119)
is determined by the eigenvalue–eigencolumn equations (eigenspace equations) of the matrices A# A and AA# , respectively, namely A# AR1 D R1 ƒ2 versus AA# L D ƒ2
(1.120)
subject to 2
21 : : : 6 ƒ2 D 4 ::: : : : 0
3 0 :: 7 ; : 5
q q 2 ƒ D Diag C 1 ; : : : ; C 2r :
(1.121)
2r
Proof. Let us prove first AA# L D Lƒ2 , second A# AR1 D R1 ƒ2 . D L Œƒ; 0
V01 V02
0 AA# L D AG1 yL x A G 1 1 1 ƒ 1=2 1 2 0 Gx Gx .Gx / ŒV1 ; V2 0 U0 .Gy 2 /0 Gy Gy 2 U; 0
V01 V1 V01 V2 AA L D L Œƒ; 0 V02 V1 V02 V2 #
ƒ Ir 0 ƒ D L Œƒ; 0 : 00 00 0 Imr
(1.122)
(1.123)
32
1 The First Problem of Algebraic Regression 0 A# AR D G1 x A Gy AR 1 1 1 1 0 ƒ 1=2 0 0 2 2 2 2 G V D G1 x x 0 U .Gy / Gy Gy U Œƒ; 0 V Gx Gx V; (1.124) 0
A AR D #
1 ƒ 2 Gx V 0 Œƒ; 0 0
1 2
A# A ŒR1 ; R2 D Gx
D
1 Gx 2
ƒ2 0 ŒV1 ; V2 ; 0 0
V1 ƒ2 ; 0 ; A# AR1 D R1 ƒ2 :
(1.125)
Lemma 1.8. (The canonical MINOS). Let y D A x be a canonical representation of the underdetermined system of linear equations fy D AxjA 2 Rnm ; r WD rkA D n; n < mg. Then the rank partitioning of xm is Gx -MINOS. In terms if the original coordinates Œx1 ; : : : ; xm 0 of the parameter space X a canonical representation of Gx -MINOS is provided by (1.127). The Gx -MINOS solution xm is built on the canonical fGx ; Gy g weighted reflexive inverse of A provided by (1.128). ƒ1 y ; x1 2 Rr1 ; x2 2 R.mr/1 ; 0 1 ƒ 1=2 1=2 xm D G1=2 U0 Gy y D G1=2 ŒV ; V V1 ƒ1 U0 Gy 1 2 x x 00 xm D
x1 x2
D
A m
D R1 ƒ1 L1 y; 1=2 D Gx V1 ƒ1 U0 G1=2 y :
(1.126) (1.127)
(1.128)
Proof. For the proof we depart from Gx -MINOS (1.58) and replace the matrix A 2 Rnm by its canonical representation, namely eigenspace synthesis. xm D
0 AG1 x A
D
0 G1 x A
0 1 AG1 x A
y; A D
0 1 V1 2 G1=2 Gy U Œƒ; 0 x ; V02
0 1 V1 1 1=2 0 2 G1=2 Gy U Œƒ; 0 x Gx .Gx / V02
(1.129)
1 ƒ ŒV1 ; V2 U0 .Gy 2 /0 (1.130) 0
1 1 1 1=2 0 2 0 2 0 1 A D G Uƒ U G D Gy 2 Uƒ2 U0 .Gy 2 /0 ; AG1 y ; x y
1-2 Minimum Norm Solution (MINOS)
!0 1 1
0
0 ƒ 1=2 1=2 Gx Gy xm D G1 ŒV1 ; V2 U0 G y 2 Uƒ2 U0 Gy2 y x 0 1 1 1 1 1 ƒ U0 Gy2 y D Gx 2 V1 ƒ1 U0 Gy2 y D A D Gx 2 ŒV1 ; V2 m y; 0
1
33
(1.131)
1
1;2;4 1 0 2 2 A m D Gx V1 ƒ U Gy 2 AGx .Gx weighted reflexive inverse of A/; (1.132) 1 1 1 1 1 x1 ƒ ƒ ƒ y 0 2 0 2 xm D D V Gx xm D U Gy y D y D : (1.133) x2 0 0 0
The pair of eigensystems AA# L D Lƒ2 ; A# AR1 D R1 ƒ2 is unfortunately based 0 # 1 0 upon non-symmetric matrices AA# D AG1 x A Gy and A A D Gx A Gy A which make the left and right eigenspace analysis numerically more complex. It appears that we are forced to use the Arnoldi method rather than the more efficient Lanczos method used for symmetric matrices. In this situation we look out for 1=2 1=2 an alternative. Indeed when we substitute fL; Rg by fGy U; Gx Vg into the 1=2 # pair of eigensystems and consequently left multiply AA L by Gx , we achieve a pair of eigensystems identified in Corollary 1.9 relying on symmetric matrices. In addition, such a symmetric pair of eigensystems produces the canonical base, namely orthonormal eigencolumns. Such a procedure requires two factorizations, 1=2 1=2 1=2 1=2 1=2 namely Gx D .Gx /0 Gx ; G1 .Gx /0 and Gy D .Gy /0 Gy ; G1 x D Gx y D 1=2
1=2
Gy .Gy /0 via Cholesky factorization or eigenvalue decomposition of the matrices Gx and Gy . Corollary 1.9. (symmetric pair of eigensystems) The pair of eigensystems (1.134) is based upon symmetric matrices. The left and right eigencolumns are orthogonal. 1 0 2 1=2 0 1=2 0 0 G1=2 / A Gy AG1=2 V1 D V1 ƒ2 ; (1.134) x y AGx A .Gy / U D ƒ U versus .Gx
ˇ ˇ ˇ ˇ 1=2 1 0 1=2 0 ˇGy AGx A .Gy / 2i Ir ˇ D 0; ˇ ˇ ˇ ˇ 1=2 0 0 2j Im ˇ D 0: ˇ.Gx / A Gy AG1=2 x
(1.135)
The important result of xm based on the canonical Gx -MINOS of (1.136) needs a short comment. The rank partitioning of the canonical unknown vector x , namely x1 2 Rr , x2 2 Rmr again paved the way for an interpretation. First, we acknowledge the “direct inversion” (1.137), for instance (1.138). Second, x2 D 0, for instance (1.139) introduces a fixed datum for the canonical coordinates that are given by ŒxrC1 ; : : : ; xm . y D A x jA 2 Rnm ; rkA D rkA D n; n < m;
(1.136)
34
1 The First Problem of Algebraic Regression
Fig. 1.8 Commutative diagram of inverse coordinate transformations
A−m
∋y
xm ∈
1
1
V′Gx2
U′Gy2
∋ y*
x1
x*m ∈
−
(A*) m
q q 2 D ƒ y ; ƒ D Diag C 1 ; : : : ; C 2r ; 1
(1.137)
1 0 Œx1 ; : : : ; xr 0 D Œ1 1 y1 ; : : : ; r yr 0 ; : : : ; xm ŒxrC1
(1.138)
0
D Œ0; : : : ; 0
(1.139)
Beyond it, enjoy the commutative diagram of Fig. 1.8 illustrating our previously introduced transformations of type MINOS and canonical MINOS, by means of A m and .A / m . Finally in Box 1.12, let us compute canonical MINOS for the Front Page Example, specialized by Gx D I3 and Gy D I2 . Note that the eigenspace analysis, in summary, leads to the result q ƒ D Diag
12 C
p 130;
q 12
p 130 ;
3 7 7 p p p p 7 6 6 260 C 18p130 p 260 18 p130 7 UD6p 7; 4 211 C 18 130 211 C 18 130 5 p p p p 260 C 18 130 260 18 130 ˇ 2 p p ˇ 62 C 5 130 62 5 130 ˇ 2 p 6 p ˇ p p p 6 102700 C 9004 130 102700 9004 130 ˇˇ 14 p p 6 6 ˇ 3 105 C 9 130 105 9 130 ˇ p VD6 p p p 6 p ˇ 6 102700 C 9004 130 102700 9004 p p 130 ˇˇ 114 6 191 C 17 130 191 C 17 130 ˇ p 4 ˇ 14 p p p p 102700 C 9004 130 102700 9004 130 ˇ
(1.140)
2
(1.141) 3 7 7 7 7 7 D ŒV1 ; V2 : 7 7 7 5 (1.142)
1-2 Minimum Norm Solution (MINOS)
35
Box 1.12. (The computation of canonical MINOS for the Front Page Example): 2 3 x1 2 111 4 5 y D Ax W D x2 ; r WD rkA D 2 3 124 x3 Left eigenspace
(1.143)
Right eigenspace
AA# U D AA0 U D Uƒ2 ;
A# AV1 D A0 AV1 D V1 ƒ2 ; A# AV2 D A0 AV2 D 0; 2 3 23 5 4 3 5 9 5 D A0 A: 5 9 17
3 7 ; AA D 7 21 0
(1.144)
Eigenvalues: ˇ ˇ ˇ 0 ˇ ˇA A 2j I3 ˇ D 0
ˇ 0 ˇ ˇAA 2 I2 ˇ D 0 i
21
,
p D 12 C 130
22
Left eigencolumns (1st):
3 7
21
7 21 21
u11 u21
(1.145)
Right eigencolumns (1st): 2
D0
.subject to u211 C u221 D 1/ .3 21 /u11 C 7u21 D 0;
p D 12 130; 23 D 0
32 3 2 21 3 v11 5 4 3 5 2 9 5 4 v21 5 D 0 1 v31 5 9 17 21 .subject to v211 C v221 C v231 D 1/ (1.146) .2 21 /v11 C 3v21 C 5v31 D 0 (1.147) 3v11 C .5 21 /v21 C 9v31 D 0
49 49 p ; D 2 2 49 C .3 1 / 260 C 18 130 p ; .3 21 /2 211 C 18 130 p D D 49 C .3 21 /2 260 C 18 130
u211 D u221 2
(1.148)
3 2 3 v211 .2 C 521 /2 1 4 v2 5 D 4 .3 92 /2 5; 21 1 .2 C 521 /2 C .3 921 /2 C .1 C 721 41 /2 2 v31 .1 721 C 41 /2 (1.149)
36
1 The First Problem of Algebraic Regression
p 2 3 62 C 5 130 6 v211
2 7 7 6 p 1 6 105 9 130 7: 4 v2 5 D p 21 7 6 102700 C 9004 130 4 p 2 5 v231 191 C 17 130 2
2
3
Left eigencolumns (2nd):
3 7
22
7 21 22
u12 u22
Right eigencolumns (2nd): 2
D0
.subject to u212 C u222 D 1/; .3 22 /u12 C 7u22 D 0;
32 3 2 22 3 v12 5 4 3 5 2 9 5 4 v22 5 D 0 2 v32 5 9 17 22 2 2 2 .subject to v12 C v22 C v32 D 1/; (1.150) .2 22 /v12 C 3v22 C 5v32 D 0; (1.151) 3v12 C .5 22 /v22 C 9v32 D 0;
49 49 D p ; 2 2 49 C .3 2 / 260 18 130 (1.152) p .3 22 /2 211 18 130 2 p ; D u22 D 49 C .3 22 /2 260 18 130 2 3 2 2 3 .2 C 522 /2 v12 1 4 .3 92 /2 5 ; 4 v2 5 D 22 2 .2 C 522 /2 C .3 922 /2 C .1 C 722 42 /2 2 2 v32 .1 7 C 42 /2 2 3 2 p 2 2 2 3 130 62 5 6 v12
2 7 7 6 p 1 2 6 105 C 9 130 7 : 4v 5 D p 22 7 6 102700 9004 130 4 p 2 5 v232 191 17 130 (1.153) u212 D
Right eigencolumns (3rd): 3 2 32 v13 23 5 4 3 5 9 5 4 v23 5 D 0 v33 5 9 17
(1.154)
(subject to v213 C v223 C v233 D 1), 2v13 C 3v23 C 5v33 D 0 3v13 C 5v23 C 9v33 D 0 5 5 3 5 23 v13 v D v33 , 13 D v33 v23 v23 9 3 2 9 35 2 2 9 2 1 ;v D ;v D : 7 23 14 33 14 There are four combinatorial solutions to generate square roots.
v13 D 2v33 ; v23 D 3v33 ; v213 D
(1.155) (1.156) (1.157)
1-3 Case Study
37
2 q ˙ u211 u11 u12 D4 q u21 u22 ˙ u221 2 q 3 ˙ v2 v12 v13 6 q 11 6 v22 v23 5 D 6 ˙ v221 4 q v32 v33 ˙ v231
2
v11 4 v21 v31
q 3 ˙ u212 q 5; ˙ u222 q q 3 ˙ v212 ˙ v213 q q 7 7 ˙ v222 ˙ v223 7: q q 5 ˙ v232 ˙ v233
(1.158)
(1.159)
Here we have chosen the one with the positive sign exclusively. q ƒ D Diag
12 C
p 130;
q 12
p 130 ;
3 7 7 p p p p 7 6 6 260 C 18p130 p 260 18 p130 7 UD6p 7; 4 211 C 18 130 211 C 18 130 5 p p p p 260 C 18 130 260 18 130 ˇ 2 p p ˇ 62 C 5 130 62 5 130 ˇ 2 p 6 p ˇ p p p 6 102700 C 9004 ˇ 130 102700 9004 130 p p 6 ˇ 14 6 ˇ 3 105 C 9 105 9 130 130 6 ˇ p p VD6 p p p ˇ 14 6 102700 C 9004 ˇ 130 102700 9004 130 p p 6 ˇ 1 191 C 17 130 191 C 17 130 ˇ p 4 ˇ 14 p p p p 102700 C 9004 130 102700 9004 130 ˇ
(1.160)
2
(1.161) 3 7 7 7 7 7 D ŒV1 ; V2 : 7 7 7 5 (1.162)
1-3 Case Study Let us continue this chapter by considering some important case studies. Let us consider orthogonal functions and let us study Fourier series versus Fourier– Legendre series and circular harmonic versus spherical harmonic regression. In empirical sciences, we continuously meet the problems of underdetermined linear equations. Typically we develop a characteristic field variable into orthogonal series, for instance into circular harmonic functions (discrete Fourier transform) or into spherical harmonics (discrete Fourier–Legendre transform) with respect to a reference sphere. We are left with the problem of algebraic regression to determine the values of the function at sample points, an infinite set of coefficients of the series expansion from a finite set of observations. An infinite set of coefficients, the coordinates in an infinite-dimensional Hilbert space, cannot be determined by finite computer manipulations. Instead, band-limited functions are introduced. Only a finite set of coefficients of a circular harmonic expansion or of a spherical harmonic expansion can be determined. It is the art of the analyst to fix the degree/order of
38
1 The First Problem of Algebraic Regression
the expansion properly. In a peculiar way the choice of the highest degree/order of the expansion is related to the Uncertainty Principle, namely to the width of lattice of the sampling points. Another aspect of any series expansion is the choice of the function space. For instance, if we develop scalar-valued, vector-valued or tensor-valued functions into scalar-valued, vector-valued or tensor-valued circular or spherical harmonics, we generate orthogonal functions with respect to a special inner product, also called scalar product on the circle or spherical harmonics are eigenfunctions of the circular or spherical Laplace–Beltrami operator. Under the postulate of the Sturm–Liouville boundary conditions, the spectrum (namely the eigenvalues) of the Laplace–Beltrami operator are positive and integer. We here note that the eigenvalues of the circular Laplace–Beltrami operator are˚l 2 for integer values l 2 f0; 1; : : : ; 1g, of the spherical Laplace–Beltrami operator k.k C 1/; l 2 for integer values k 2 f0; 1; : : : ; 1g, l 2 fk; k C 1; : : : ; 1; 0; 1; : : : ; k 1; kg. Thanks to such a structure of the infinite-dimensional eigenspace of the Laplace–Beltrami operator we discuss the solutions of the underdetermined regression problem (linear algebraic regression) in the context of “canonical MINOS”. We solve the system of linear equations fAx D yjA 2 Rnm ; rkA D n; n mg by singular value decomposition as shortly outlined in Appendix A.
1-31 Fourier Series Fourier series represent the periodic behavior of a function x./ on a circle S1 . Fourier series are also called trigonometric series since trigonometric functions f1; sin ; cos , sin 2; cos 2; : : : ; sin `; cos `g represent such a periodic signal. Here we have chosen the parameter “longitude ” to locate a point on S1 . Instead we could exchange the parameter by time t, if clock readings would substitute longitude, a conventional technique in classical navigation. In such a setting, D !t D
t 2 t D 2t; ` D `!t D 2` D 2`t: T T
(1.163)
(In such a setting, the longitude would be exchanged by 2, the product of ground period T and time t or by 2, the product of ground frequency . In contrast, ` for all ` 2 f0; 1; : : : ; Lg would be substituted by 2 the product of overtones `=T or ` and time t, where ` is integer, i.e., l 2 Z. According to classical navigation, ! would represent the rotational speed of the Earth.) Let us refer to Box 1.13 as a summary of the Fourier representation of a function x./, 2 S1 . Note that a Fourier series (1.164) or (1.165) can be understood as an infinite-dimensional vector space (linear space, Hilbert space) since the base functions e` ./ generate a complete orthogonal (orthonormal) system based on trigonometric functions. The countable base, namely the base functions e` ./ or f1; sin ; cos ; sin 2; cos 2; : : : ; sin `; cos `g span the Fourier space L2 Œ0; 2. According to the ordering by means of positive
1-3 Case Study
39
and negative indices fL; L C 1; : : : ; 1; 0; C1; : : : ; L 1; Lg (0.4) x^ ./ is an approximation of the function x./ up to order three, also denoted by xL . Box 1.13. (Fourier series): x./ D x1 C .sin /x2 C .cos /x3 C .sin 2/x4 C .cos 2/x5 C CO3 .sin `; cos `/ CL X
x./ D lim
L!1
2 e` ./x` ; e` ./ WD 4
`DL
cos ` 8` > 0 1 8` D 0 sin j`j 8` < 0:
(1.164)
(1.165)
Example (approximation of order three): x^ ./ D e0 x1 C e1 x2 C eC1 x3 C e2 x4 C eC2 x5 C O3 :
(1.166)
What is an infinite dimensional vector space? What is a Hilbert space? What makes up the Fourier space? An infinite dimensional vector space (linear space) is similar to a finite dimensional vector space: As in an Euclidean space an inner product and a norm is defined. While the inner product and the norm in a finite dimensional vector space required summation of their components, the inner product (1.167), (1.168) and the norm (1.170) in an infinite-dimensional vector space force us to integration. Indeed the inner products (scalar products) (1.167) and (1.168) are integrals over the line element of S1r applied to the vectors x./, y./ or e`1 , e`2 , respectively. Those integrals are divided by the length s of a total arc of S1r . Alternative representations of < xjy > and < e`1 je`2 > (Dirac’s notation of brackets, decomposed into “bra” and “ket”) based upon ds D rd, s D 2 r, lead us directly to the integration over S1 , the unit circle. Let us refer to Box 1.14 as a summary of the most essential features of the Fourier space. Box 1.14. (Fourier space): The base functions e` ./; ` 2 fL; L C 1; : : : ; 1; 0; C1; : : : ; L 1; Lg, span the Fourier space L2 Œ0; 2: they generate a complete orthogonal (orthonormal) system of trigonometric functions. Inner product .x 2 FOURIER; y 2 FOURIER/ W hx j yi WD
1 s
Z
1 0
dsx.s/y.s/ D
1 2
Z
2 0
dx./y./
Normalization: Z 2 1 d e`1 ./ e`2 ./ D `1 ı`1 `2 < e`1 ./je`2 ./ >WD 2 0 subject to
(1.167)
(1.168)
40
1 The First Problem of Algebraic Regression
`1 D 18`1 D 0; `1 D 12 8`1 ¤ 0
(1.169)
Norms, convergence jjxjj2 D
1 2
R 2 0
d x2 ./ D lim
CL P
L!1 `DL
` x2` < 1;
2 lim jjx x^ L jj D 0 .convergence in the mean/:
(1.170)
L!1
Synthesis versus analysis: x D lim
L!1
CL X `DL
e` x` versus x` D
1 1 < e` jx >WD ` 2`
Z
2 0
d e` ./ x ./ ;
CL X 1 x D lim e` < xje` > : L!1 `
(1.171) (1.172)
`DL
Canonical basis of the Hilbert space “FOURIER” (orthonormal basis) 2p 2 sin ` 8` > 0; e` WD 4 8` D 0; p 1 2 cos ` 8` < 0: p e` D p1 e` versus e` D ` e` ; ` p x` D ` x` versus x` D p1 x` ;
(1.173)
(1.174)
`
x D lim
L!1
CL X
e` < xje` >
(1.175)
`DL
Orthonormality: < e`1 .x/je`2 .x/ >D ı`1 `2
(1.176)
Fourier space L ! 1 FOURIER D span feL ; eLC1 ; : : : ; e1 ; e0 ; e1 ; : : : ; eL1 ; eL g; dim FOURIER D lim .2L C 1/ D 1;
(1.177)
L!1
“FOURIER” D “HARM”L2 .S1 / :
(1.178)
A further comment has to be made to the normalization (1.168). Thanks to < e`1 ./je`2 ./ >D 0 for all `1 ¤ `2 . for instance < e1 ./je1 ./ > D 0, the base functions e` ./ are called orthogonal. But according to < e` ./je` ./ > D 12 ; for instance < e1 ./je1 ./ > D jje1 ./jj2 D 12 , < e2 ./je2 ./ > D jje2 ./jj2 D 12 , they are not normalized to
1-3 Case Study
41
1. Note that a canonical basis of the Hilbert space “FOURIER” has been introduced by e` . Indeed the base functions e` ./ fulfil the condition of orthonormality. The crucial point of an infinite dimensional vector space is convergency. When we write x./ (! (1.164)) as an identity of infinite series we must be sure that the series converge. In infinite dimensional vector space no pointwise convergency is required. In contrast, convergence in the mean (! (1.170)) is postulated. The norm jjxjj2 (! (1.170)) equals the limes of the infinite sum of the ` weighted, squared coordinate x` , the coefficient in the trigonometric function (1.164), jjxjj2 D CL P ` x`2 < 1, which must be finite. As soon as convergence in the mean lim L!1 `DL
is guaranteed, we move from a pre-Fourier space of trigonometric functions to a Fourier space we shall define more precisely later on. (i) Fourier analysis as well as Fourier synthesis, represented by (1.171, left) versus (1.171, right), is meanwhile well prepared. First, given the Fourier coefficients x` we are able to systematically represent the vector x 2 “FOURIER00 in the orthogonal base e` ./. Second, the projection of the vector x 2 “FOURIER00 onto the base vectors e` ./ agrees analytically to the Fourier coefficients as soon as we take into account the proper matrix of the metric of the Fourier space. Note the reproducing representation “from x to x” (! (1.172)). (ii) The transformation from the orthogonal base e` ./ to the orthonormal base e` , also called canonical or modular as well as its inverse is summarized by (1.174). The dual transformations from Fourier coefficients x` to canonical Fourier coefficients x` as well as its inverse is also summarized by (1.174). Note the canonical reproducing representation “from x to x” (! (1.175)). (iii) The space (1.177) has the dimension of hyperreal number 1. As already mentioned in the introduction, (1.178) is identical with the Hilbert space L2 .S1 / of harmonic functions on the circle S1 . What is a harmonic function which has the unit circle S1 as a support? A harmonic function “on the unit circle” S1 is a function x./; 2 S1 , which fulfils on one hand fulfills the one-dimensional Laplace equation (the differential equation) of a harmonic oscillator) and on the other hand fulfills a special Sturm–Liouville boundary condition. The special Sturm–Liouville equation force the frequency to be integer, which is proven subsequently. (1st): d2 C ! 2 /x./ D 0 d2 (2nd):
1 x./ D 0 , .
2 4
x.0/ D x.2/ d d x./ .0/ D x./ .2/: d d
(1.179)
(1.180)
42
1 The First Problem of Algebraic Regression
Proof. “Ansatz”: x./ D c! cos ! C s! sin !
(1.181)
“Initial conditions”: x.0/ D x.2/ ,
(1.182)
Consequences:
c! D c! cos 2! C s! sin 2!; d d x./ .0/ D x.2/ .2/ d d ”
s! ! D c! ! sin 2! C s! ! cos 2! ” cos 2! D 0 , ) ! D ` 8` 2 f0; 1; : : : ; L 1; Lg : sin 2! D 0
(1.183)
(1.184)
! D `, ` 2 f0; 1; : : : ; L 1; Lg concludes the proof. How can we setup a linear model for Fourier analysis? The linear model of Fourier analysis which relates the elements x 2 X of the parameter space X to the elements y 2 Y of the observation space Y is setup in Box 1.15. Here we shall assume that the observed data have been made available on an equidistant angular grid, in short equidistant lattice of the unit circle parameterized by .1 ; : : : ; n /. For the optimal design of the Fourier linear model it has been proven that the equidistant lattice i D .i 1/ 2 8i 2 f1; : : : ; I g is “D-optimal”. Box 1.15 contains four I examples for such a lattice. Figure 1.9 shows such a lattice. In summary, the finite dimensional observation space Y, dimY D n, n D I , has integer dimension I . In contrast, the parameter space X, dimX D 1, is infinite dimensional. The unknown Fourier coefficients, conventionally collected in a Pascal triangular graph is shown in Fig. 1.10, are vectorized by (1.188) in a peculiar order (1.190). Indeed, the linear model (1.189) contains m D 2L C 1, L ! 1, m ! 1, unknowns, a hyperreal number. The linear operator A W X 7! X is generated by the base functions of lattice points. Equation (1.191) is a representation of the linear observational equations (1.189) in Ricci calculus which is characteristic for Fourier analysis.
1-3 Case Study
43
I=2
180°
0°
360°
I=3 0°
120°
360°
240°
I=4 0°
I=5
0°
90°
180°
72°
144°
270°
216°
288°
360°
360°
Fig. 1.9 Equidistance lattice on S1 . I D 2 or 3 or 4 or 5
level L = 1
level L = 0
level L = 2 level L = 3
Fig. 1.10 Fourier series. Pascal triangular graph, weights of the graph: unknown coefficients of Fourier series
Box 1.15. (Fourier analysis as an underdetermined linear model) 3
2
The observation space Y: 3
2
y1 x.1 / 6 y2 7 6 x.2 / 7 7 6 7 6 6 :: 7 7 6 :: 6 : 7 WD 6 7 D Œx.i / 8i 2 f1; ::; I g; 2 Œ0; 2 : 7 6 7 6 4 yn1 5 4 x.n1 / 5 yn
x.n /
(1.185)
44
1 The First Problem of Algebraic Regression
Equidistant lattice on S1 : 2 8i 2 f1; : : : ; I g: I Examples:
i D .i 1/
(1.186)
I D 2 W 1 D 0; 2 D 180ı ; 2 4 120ı ; 3 D 240ı ; 3 3 2 90ı ; 3 D 180ı; I D 4 W 1 D 0; 2 D 4 Example 3 270ı ; 4 D 2 2 4 72ı ; 3 D 144ı ; I D 5 W 1 D 0; 2 D 5 5 8 288ı 4 D 6 216ı ; 5 D 5 5
I D 3 W 1 D 0; 2 D
(1.187)
The parameter space X x1 D x0 ; x2 D x1 ; x3 D xC1 ; x4 D x2 ; x5 D xC2 ; : : : ; xm1 D xL ; xm D xL ; ; dimX D m 2L C 1:
(1.188)
The underdetermined linear model (n < m W I < 2L C 1: 3 2 y1 1 6 y 7 61 6 2 7 6 7 6 6 y WD 6 : : : 7 D 6 7 6 6 4 yn1 5 4 1 yn 1
32 3 cos L1 x1 7 6 cos L2 7 7 6 x2 7 76 7 7 6 : : : 7: 76 7 sin n1 cos n1 : : : sin Ln1 cos Ln1 5 4 xm1 5 sin n cos n : : : sin Ln cos Ln xm (1.189) (1.190) X D span fx0 ; x1 ; xC1 ; : : : ; xL ; xCL g; dim X D m D 1 2
sin 1 sin 2
cos 1 : : : sin L1 cos 2 : : : sin L2
L!1
yi D y.i / D lim
L!1
L X
e` .i /x` 8i 2 f1; : : : ; ng
(1.191)
`DL
The portray of Fourier analysis which for the readers’ convenience is outlined in Box 1.16 summarizes its peculiarities effectively. A finite number of observations is confronted with an infinite number of observations. Such a linear model of type “underdetermined of power 2” cannot be solved in finite computer time. Instead one has to truncate the Fourier series, a technique or approximation to make up Fourier series “finite” or “bandlimited”. We have to consider three cases that are shown in Box 1.16.
1-3 Case Study
45
Box 1.16. The portray of Fourier analysis Number of observed data at versus number of unknown lattice points:
Fourier coefficients:
nDI
m D 2L C 1 ! 1
(finite)
(infinite) The three cases: n>m (Overdetermined case) nDm (regular case) n<m (underdetermined case).
First, we can truncate the infinite Fourier series such that n > m holds. In this case of an overdetermined problem, we have more observations than equations. Second, we alternatively balance the number of unknown Fourier coefficients such that n D m holds. Such a model choice assures a regular linear system. Both linear Fourier models which are tuned to the number of observations suffer from a typical uncertainty. What is the effect of the forgotten unknown Fourier coefficients m > n? Indeed a significance test has to decide upon any truncation to be admissible. We are in need of an objective criterion to decide upon the degree m of bandlimit. Third, in order to be as objective as possible we follow the third case of “less observations than unknowns” such that n < m holds. Such a Fourier linear model which generates an underdetermined system of linear equations is explicitly considered in Box 1.17 and Box 1.18. Note that the first example that is presented in Box 1.17 (n m D 1) as well as the second example that is present in Box 1.18 (n m D 2) demonstrate MINOS of the Fourier linear model. Box 1.17. (The first example). Fourier analysis as an underdetermined linear model: n rk A D n m D 1; L D 1 dimY D n D 2; dimX D m D 3; 2 3 x1 y1 1 sin 1 cos 1 4 5 D x2 y D Ax: y2 1 sin 2 cos 2 x3
(1.192)
(1.193)
46
1 The First Problem of Algebraic Regression
Example .I D 2/ W 1 D 0ı ; 2 D 180ı : sin 1 D 0; cos 1 D 1; sin 2 D 0; cos 2 D 1 10 1 1 sin 1 cos 1 A WD 2 R23 D 1 sin 2 cos 2 1 0 1 AA0 D 2I2 , .AA0 /1 D
1 I2 2
(1.194)
(1.195)
1 C sin 1 sin 2 C cos 1 cos 2 AA D : 1 C sin 2 sin 1 C cos 2 cos 1 2 (1.196) 0
2
if i D .i / 2 I , then 1 C 2 sin 1 sin 2 C 2 cos 1 cos 2 D 0
(1.197)
or LD1W LD1W 2
3
CL P `DL CL P `DL
e` .i1 /e` .i2 / D 0 8i1 ¤ i2 ; (1.198) e` .i1 /e` .i2 / D L C 1 8i1 D i2 ;
3 3 2 1 1 y1 C y2 1 1 4 0 0 5 y D 4 0 5; 2 2 1 1 y1 y2 1 jjx` jj2 D y0 y 2
x1 x` D 4 x2 5 D A0 .AA0 /1 y D x3 `
2
(1.199)
Box 1.18. (The second example) Fourier analysis as an underdetermined linear model: n rk A D n m D 2; L D 2 dimY D n D 3; dimX D m D 5;
2 3 x 36 17 y1 1 sin 1 cos 1 sin 21 cos 21 6 x2 7 7 4 y2 5 D 4 1 sin 2 cos 2 sin 22 cos 22 5 6 6 x3 7: 6 7 y3 1 sin 3 cos 3 sin 23 cos 23 4 x4 5 x5 2
3
(1.200)
2
Example .I D 3/ W 1 D 0ı ; 2 D 120ı ; 3 D 240ı
(1.201)
1-3 Case Study
47
1p 1p 3; sin 3 D 3; 2 2 1 1 cos 1 D 1; cos 2 D ; cos 3 D ; 2 2 1p 1p sin 21 D 0; sin 22 D 3; sin 23 D 3; 2 2 1 1 cos 21 D 1; cos 22 D ; cos 23 D ; 2 2 2 3 1 0 1 0 1 6 7 6 1p 1 1p 17 61 3 3 7 A WD 6 7; 2 2 27 6 2 4 1p 1 1p 15 1 3 3 2 2 2 2 sin 1 D 0; sin 2 D
AA0 D 3I3 , .AA0 /1 D 13 I3 2 3 3 .AA0 /12 .AA0 /13 AA0 D 4 .AA0 /21 3 .AA0 /23 5; 0 0 .AA /31 .AA /32 3
(1.202)
(1.203)
(1.204) (1.205)
.AA0 /12 D 1 C sin 1 sin 2 C cos 1 cos 2 C C sin 21 sin 22 C cos 21 cos 22 .AA0 /13 D 1 C sin 1 sin 3 C cos 1 cos 3 C C sin 21 sin 23 C cos 21 cos 23 .AA0 /21 D 1 C sin 2 sin 1 C cos 2 cos 1 C C sin 22 sin 21 C cos 22 cos 21 (1.206) .AA0 /23 D 1 C sin 2 sin 3 C cos 2 cos 3 C C sin 22 sin 23 C cos 22 cos 23 .AA0 /31 D 1 C sin 3 sin 1 C cos 3 cos 1 C C sin 23 sin 21 C cos 23 cos 21 .AA0 /32 D 1 C sin 3 sin 2 C cos 3 cos 2 C C sin 23 sin 22 C cos 23 cos 22 if i D .i 1/
2 , then I
1 C sin 1 sin 2 C cos 1 cos 2 C sin 21 sin 22 C cos 21 cos 22 D 1 1 D1 D0 2 2 1 C sin 1 sin 3 C cos 1 cos 3 C sin 21 sin 23 C cos 21 cos 23 D (1.207) 1 1 D1 D0 2 2 1 C sin 2 sin 3 C cos 2 cos 3 C sin 22 sin 23 C cos 22 cos 23 D 1 3 1 1 D1 C D0 4 4 4 4
48
1 The First Problem of Algebraic Regression
LD2W LD2W
CL P
`DL CL P
`DL
e` .i1 /e` .i2 / D 08i1 ¤ i2 ; (1.208)
e` .i1 /e` .i2 / D L C 18i1 D i2 ; 2
3 1 1 6 1p 1p 7 60 7 6 2 3 2 37 2 3 6 1 7 7 y1 1 61 1 74 5 x` D A0 .AA0 /1 y D 6 2 2 7 y2 36 p 1p 7 y 6 1 3 60 3 3 7 6 7 2 2 4 5 1 1 1 2 2 1
3 y1 C y2 C y3 7 6 1p 1p 6 x1 3y2 3y3 7 7 6 6x 7 2 7 6 2 6 27 1 1 7 6 1 6 7 y y y 6 D x` D 6 x3 7 D 6 1 2 2 2 3 7 7; 6 7 3 6 1p p 7 1 4 x4 5 6 3y2 C 3y3 7 7 6 2 x5 ` 5 4 2 1 y1 12 y2 y3 2 1 jjxjj2 D y0 y: 3 2
2
3
(1.209)
Lemma 1.9. (Fourier analysis). If finite Fourier series (1.210) or (1.211) are sampled at observations points l 2 S1 on an equidistance lattice (equiangular lattice) (1.212), then discrete orthogonality (1.213) applies. yi D y.i /
2
x1 x2 x3 :: :
3
7 6 7 6 7 6 7 6 7 6 7; D Œ1; sin i ; cos i ; : : : ; cos.L 1/i ; sin Li ; cos Li 6 7 6 7 6 6 xm2 7 7 6 4 xm1 5 xm y D Ax; A 2 R nm ; rkA D n; I D n < m D 2L C 1 A 2 O.n/ WD fA 2 Rnm jAA0 D .L C 1/In g
(1.210)
(1.211)
1-3 Case Study
49
2 8i; i1 ; i2 2 f1; : : : ; I g ; I CL X 0 8i1 ¤ i2 0 AA D .L C 1/In e` .i1 /e` .i2 / D L C 1 8i1 D i2 i D .i 1/
(1.212) (1.213)
`DL
A is an element of the orthogonal group O.n/. MINOS of the underdetermined system of linear equations is xm D
1 1 A0 y; kxm k2 D y0 y: LC1 LC1
(1.214)
1-32 Fourier–Legendre Series Fourier–Legendre series represent the periodic behavior of a function x.; / on a sphere S2 . They are called spherical harmonic functions since f1; P11 .sin / sin , P10 .sin /, P11 .sin / cos ; : : : ; Pkk .sin / cos kg represent such a periodic signal. Indeed they are a pelicular combination of Fourier’s trigonometric polynomials fsin `; cos `g and Ferrer’s associated Legendre polynomials Pk` .sin /. Here we have chosen the parameters longitude and latitude to locate a point on S2 . Instead we could exchange the parameter by time t, if clock readings would submit longitude, a conventional technique in classical navigation. In such a setting, D !t D
t 2 t D 2t; ` D `!t D 2` D 2`t: T T
(1.215)
In such a setting, longitude would be exchanged by 2 multiplied with the time t and divided by the ground period T or by the product of 2, the ground frequency , and the time t. In contrast, ` for all ` 2 fk; k C 1; : : : ; 1; 0; 1; : : : ; k 1; kg would be substituted by 2 the product of overtones `=T or ` and time t, where k; ` are integer, i.e., k; ` 2 Z. According to classical navigation, ! would represent the rotational speed of the Earth. Let us refer to Box 1.19 as a summary of the Fourier–Legendre representation of a function x.; /, where 2 Œ0; 2 and 2 Œ=2; C=2. Box 1.19. (Fourier–Legendre series). x.; / D P00 .sin /x1 CP11 .sin / sin x2 C P10 .sin /x3 C P1C1 .sin / cos x4 CP22 .sin / sin 2x5 C P21 .sin / sin x6
(1.216)
CP20 .sin /x7 C P21 .sin / cos x8 C P22 .sin / cos 2x9 CO3 ŒPk` .sin /fcos `; sin `g
50
1 The First Problem of Algebraic Regression
x.; / D lim
K!1
Ck K X X
ek` .; /xk`
8 < Pk` .sin / cos ` 8` > 0 ek` .; / WD Pk0 .sin / 8` D 0 : Pkj`j .sin / sin j`j 8` < 0 x.; / D lim
k K X X
K!1
(1.217)
kD0 `Dk
Pkj`j .sin /.ck` cos ` C sk` sin `/
(1.218)
(1.219)
kD0 `Dk
“Legendre polynomials of the first kind” (recurrence relation): kPk .t/ D .2k 1/tPk1 .t/ .k 1/Pk2 .t/ ) initial data W P0 .t/ D 1; P1 .t/ D t
(1.220)
Example: 3 2 1 t 2 2 3 2 1 if t D sin , then P2 .sin / D sin 2 2
2P2 .t/ D 3tP1 .t/ P0 .t/ D 3t 2 1 ) P2 .t/ D
(1.221)
Ferrer’s associates Legendre polynomials of the first kind: l
Pk` .t/ WD .1 t 2 / =2
d ` Pk .t/ dt `
(1.222)
Example: p d P1 .t/; P11 .t/ D 1 t 2 dt if t D sin , then P11 .sin / D cos .
P11 .t/ D
p
1 t2
Example: p p d d 3 2 1 t P21 .t/ D 1 t 2 P2 .t/; P21 .t/ D 1 t 2 dt dt 2 2 p P21 .t/ D 3t 1 t 2
(1.223)
(1.224)
if t D sin , then P21 .sin / D 3 sin cos . Example: P22 .t/ D .1 t 2 /
d2 P2 .t/; P22 .t/ D 3.1 t 2 / dt 2
if t D sin , then P22 .sin / D 3 cos2 Example (approximation of order three):
(1.225)
1-3 Case Study
51 k−1, −1
Fig. 1.11 A formal representation of the vertical recurrence relation
k−1,
k−1, +1
k,
x^ .; / D e00 x1 C e11 x2 C e10 x3 C e11 x4 Ce22 x5 C e21 x6 C e20 x7 C e21 x8 C e22 x9 C O3
(1.226)
Pk` .sin / D sin Pk1;` .sin / cos ŒPk1;`C1 .sin / Pk1;`1 .sin / (vertical recurrence relation) P00 .sin / D 1; Pkj`j D Pk` 8` < 0 (1.227) (initial data). Compare with Fig. 1.11. We note that the Fourier–Legendre series (1.216) can be understood as an infinite-dimensional vector space (linear space, Hilbert space) since the base functions ek` .; / generate a complete orthogonal (orthonormal) system based on surface spherical functions. The countable base, i.e., the base functions ek` .; / or f1; cos sin , sin , cos cos ; : : : ; Pkj`j .sin / sin `g, span the Fourier– Legendre space L2 fŒ0; 2Œ =2; C=2g. According to our order xO .; / is an approximation of the function x.; / up to order Pk` .sin / fcos `; sin jlj g for all ` > 0; ` D 0 and ` < 0; respectively. Let us refer to Box 1.20 as a summary of the essential features of the Fourier–Legendre space. Box 1.20. (The Fourier–Legendre space) The base functions ek` .; /, k 2 f1; : : : ; Kg, ` 2 fK; K C 1; : : : ; 1; 0; 1; : : : ; K 1; Kg span the Fourier–Legendre space L2 fŒ0; 2 =2; C=2g. The base function generate a complete orthogonal (orthonormal) system of surface spherical functions. Inner product (x 2 FOURIER–LEGENDRE and y 2 FOURIER–LEGENDRE)
< xjy > D
1 S
Z
1 D 4
dS x.; /T y.; / Z2
C=2 Z
d 0
=2
d cosx.; /y.; /:
(1.228)
52
1 The First Problem of Algebraic Regression
Normalization: < ek1 `1 .; /jek2 `2 > .; / > D
1 4
R 2 0
d
C=2 R =2
d cos ek1 `1 .; /ek2 `2 .; /
D k1 `1 ık1 k2 ı`1 `2 8k1 ; k2 2 f0; : : : ; Kg; `1 ; `2 2 fk; : : : ; Ckg; k1 `1 D
(1.229)
1 .k1 j`1 j/Š : 2k1 C 1 .k1 C `1 /Š Norms, convergence: Z2
1 jjxjj D 4 2
d 0
D lim
K!1
lim jjx
K!1
C=2 Z
2 x^ K jj
d cos x2 .; /
=2 Ck K X X
(1.230) k` x2k` < 1;
kD0 `Dk
D 0 (convergence in the mean).
(1.231)
Synthesis versus analysis: x D lim
Ck K X X
K!1
ek` xk` versus xk` D
kD0 `Dk C=2 Z2 Z
1 WD 4k`
d 0
1 < ek` jx > (1.232)
d cos ek` .; /x.; /;
=2
Ck K X X 1 x D lim ek` < xjek` > K!1 k`
(1.233)
kD0 `Dk
Canonical basis of the Hilbert space : Fourier–Legendre series p ek` .; / WD 2k C 1
s
2 p 2 cos ` 8` > 0 .k C j`j/Š Pk` .sin / 4 8` D 0 p 1 .k j`j/Š 2 sin j`j 8` < 0
(1.234)
(orthonormal basis), p 1 ek` D p ek` versus ek` D k` ek` ; k` p 1 xk` D k` xk` versus xk` D p x ; k` k`
(1.235)
1-3 Case Study
53
x D lim
Ck K X X
K!1
ek` < xj: ek` >
(1.236)
kD0 `Dk
Orthonormality: < ek` .; /jek` .; / > D ık1 k2 ı`1 `2 :
(1.237)
Fourier–Legendre space K ! 1: “FOURIER LEGENDRE” D spanŒeK;L ; eK;LC1 ; : : : ; eK;1 ; eK;0 ; eK;1 ; : : : ; eK;L1 ; eK;L ; dim “FOURIER v LEGENDRE”
(1.238)
D lim .K C 1/2 K!1
D 1;
“FOURIER – LEGENDRE” = “HARM”L2 .S2 /: We note that an infinite-dimensional vector space (linear space) is similar to a finitedimensional vector space: As in an Euclidean space an inner product and a norm is defined. While the inner product and the norm in a finite-dimensional vector space required summation of their components, the inner product (1.228) and (1.229) and the norm (1.230) in an infinite-dimensional vector space force us to integration. Indeed the inner products (scalar products) (1.228) and (1.229) are integrals over the surface element of S2r applied to the vectors x.; /; , y.; / or ek1 `1 , ek2 `2 respectively. Those integrals are divided by the size of the surface element 4 of S2r . Alternative representations of < x; y > and <ek1 `1 ; ek2 `2 > (Dirac’s notation of a bracket decomposed into “bra” and “txt”) based upon dS D rdd cos ; and S D 4 r 2 , lead us directly to the integration over S2r , the unit sphere. Let us adopt the definitions of Fourier–Legendre analysis as well as Fourier– Legendre synthesis following (1.228)–(1.237). Here we concentrate on the key problem: What is a harmonic function which has the sphere as a support? A harmonic function “on the unit sphere S2r ”is a function x.; /; .; / 2 Œ0; 2Œ =2; C=2 which fulfils (i) the two-dimensional Laplace equation (the differential equation of a two-dimensional harmonic osculator) and (ii) a special Sturm–Liouville boundary condition (1.239) plus the harmonicity condition (1.240) for k . Note that the special Strum–Liouville equation force the frequency to be integer! (1st) 2 d kj`j x.; / D 0 , C ! x./ D 0: d2 (2nd)
(1.239)
54
1 The First Problem of Algebraic Regression
x.0/ D x.2/;
d d x./ .0/ D x./ .2/: d d
(1.240)
How can we setup a linear model for Fourier–Legendre analysis? The linear model of Fourier–Legendre analysis which relates the elements x 2 X of the parameter space X to the elements y 2 Y of the observations space Y is again setup in Box 1.21. Here we shall assume that the observed data have been made available on a special grid which extents to 2 Œ0; 2; 2 Œ=2; C=2 and I D 2J , j 8j 2 f1; : : : ; I g, i D .i 1/ 2I i 2 f1; : : : ; I g. Longitudinal interval: 8 DW i C1 i D 2 . Lateral interval: J.eve n/ W DW j Ci j D J , I J.odd / W DW j C1 j D J C1 . In addition, we shall review the data sets fix J even as well as for J odd. Examples are given for (i) J D 1; I D 2 and (ii) J D 2; I D 4. The number of observations which correspond to these data sets have been (i) n D 18 and (ii) n D 32. Box 1.21. The Fourier–Legendre analysis as an underdetermined linear model. The observation space Y Equidistant lattice on S2 (equiangular): 2 Œ0; 2; 2
h i ;C ; 2 2
i D .i 1/ 2 8i 2 f1; : : : ; I g; I 8j 2 f1; : : : ; I g; j 8 J ˆ ˆ ; 8k 2 1; : : : ; < k D C .k 1/ J J 2 J.even/ J C2 ˆ ˆ : k D .k 1/ 8k 2 ;:::;J ; J J 2 8 J C1 ˆ < k D .k 1/ 8k 2 1; : : : ; J C1 2 J.odd/ J C3 ˆ : k D .k 1/ 8k 2 f ;:::;Jg J C1 2 I D 2J
(1.241) (1.242)
(1.243)
longitudinal interval: WD i C1 i D
2 I
lateral interval: 8 ˆ < J even: WD j C1 j D J ˆ : J odd: WD j C1 j D J C1 Initiation: choose J , derive I D 2J
(1.244)
(1.245)
1-3 Case Study
55
J 8k 2 1; : : : ; 6 k D 2 C .k 1/ 2 6 J.eve n/ 6 4 J C2 k D .k 1/ 8k 2 ;:::;J 2 2 8 J C1 ˆ ˆ k D .k 1/ 8k 2 1; : : : ; ˆ < 2 J.odd / ˆ J C3 ˆ ˆ : k D .k 1/ 8k 2 ;:::;J 2 (
2
i D .i 1/8i 2 f1; : : : ; I g ; I D 2J:
(1.246)
(1.247)
Multivariate setup of the observation space: yij D x.i ; j /
(1.248)
Vectorizations of the matrix of observations: Example .J D 1; I D 2/ Sample points .1 ; 1 / .2 ; 1 /
Observation vector y .1 ; 1 / DD y 2 R8x1 .2 ; 1 /
Example .J D 2; I D 4/ Sample points .1 ; 1 / .2 ; 1 / .3 ; 1 / .4 ; 1 / .1 ; 2 / .2 ; 2 / .3 ; 2 / .4 ; 2 /
Observation vector y 3 .1 ; 1 / 6 .2 ; 1 / 7 6 7 6 . ; / 7 6 3 1 7 6 7 6 .4 ; 1 / 7 6 7 DD y 2 R8x1 6 .1 ; 2 / 7 6 7 6 .2 ; 2 / 7 6 7 4 .3 ; 2 / 5 .4 ; 2 / 2
Number of observations: n D IJ D 2J 2 Example: J J J J
D1 D3 D2 D4
)nD )nD )nD )nD
2; 18; 8; 32:
(1.249)
56
1 The First Problem of Algebraic Regression
Table 1.1. Equidistant lattice on S2 . The lateral lattice J
lateral grid
Δf 1
2
3
4
5
6
8
7
9
10
1
-
0°
2
90°
45°
-45°
3
45°
0°
45°
4
45°
22.5°
5
30°
0°
30°
60°
-30°
-60°
6
30°
15°
45°
75°
-15°
-45°
7
22.5°
0°
22.5°
45°
67.5° -22.5°
8
22.5°
9
18°
0°
18°
36°
54°
72°
-18°
-36°
-54°
-72°
10
18°
90°
27°
45°
63°
81°
-9°
-27°
-45°
-63°
-81°
8
9
10
288°
324°
-45°
67.5° -22.5° -67.5°
-75° -45°
-67.5°
11.25° 33.75° 56.25° 78.75° -11.25° -33.75° -56.25° -78.75°
Table 1.2. Equidistant lattice on S2 . The longitudinal lattice J
I = 2J
Δl
1
2
2
4
3 4 5
Longitudinal grid 1
2
180°
0°
180°
90°
0°
6
60°
0°
8
45°
10
36°
3
4
90°
180°
270°
60°
120°
0°
45°
90°
0°
36°
72°
5
6
180°
240°
300°
135°
180°
225°
108°
144°
180°
7
270°
315°
216°
252°
For the optimal design of the Fourier–Legendre linear model it has been shown that the equidistant lattice (1.242) and (1.243) is “D-optimal”. Tables 1.1 and 1.2 contain samples of an equidistant lattice on S, especially in a lateral and a longitudinal lattice. As samples, via Fig. 1.12, we have computed various horizontal and vertical sections of spherical lattices, for instants, fJ D 1; I D 2g, fJ D 2; I D 4g, fJ D 3; I D 6g, fJ D 4; I D 8g, and fJ D 5; I D 10g. By means of Fig. 1.13, we have added the corresponding Platt–Carr´e Maps. The unknown Fourier–Legendre coefficients, collected in a Pascal triangular graph of Fig. 1.12 are vectorized by (1.250). Indeed, the linear model (1.175) contains m D IJ D 2J 2 observations and m ! 1 unknowns, a hyperreal number. The linear operator A W X ! Y is generated by the base functions of lattice points (1.251). Equation 1.251 is a representation of the linear observational equations (1.175) in Ricci calculus which is characteristic for Fourier–Legendre analysis.
1-3 Case Study
57
Fig. 1.12 The spherical lattice. Left: vertical section, trace of parallel circles. Right row: horizontal section, trace of meridians
J = 1, I = 2
J = 1, I = 2
J = 2, I = 4
J = 2, I = 4
J = 3, I = 6
J = 3, I = 6
J =4 , I =8
J =4 , I =8
J = 5 , I = 10
J = 5 , I = 10
X D spanfx00 ; x11 ; x10 ; x11 ; : : : ; xkj`j g k ! 1; k D 0 : : : k; j`j D 0 : : : k
(1.250)
dimX D m D 1; jij D y.xij / D lim
K!1
Ck K X X kD0 `Dk
ek` .i ; j /xk` 8i; j 2 f1; : : : ; ng:
(1.251)
58
1 The First Problem of Algebraic Regression
Fig. 1.13 Platt–Carr´e Map of longitude-latitude lattice
+
f
p 2
0°
+
p
2p
p – − 2 +
f
0°
− +
f
p
p 2
f
p
−
p 2
+
f
p 2
f
f
+
2p
2p
−
p 2
+
p 2
f −
l
p 2
f
l
p
p 2
−p − 2
p 2
0°
−
2p
p
p 2
p + 2
l
p 2
0°
2p
−
−p 2 +
+
l
p 2
0°
f −p − 2
l
p 2
p 2
p 2
Note that the finite dimensional observation space Y has integer dimension, i.e. dim Y D n D IJ , I D 2J and the parameter space X is infinite dimensional, i.e. dim X D m D 1. The portray of Fourier–Legendre analysis which for the readers’ convenience is outlined in Box 1.22 summarizes effectively its peculiarities. A finite number of observations is confronted with an infinite number of observations. Such a linear model of type “underdetermined of power 2” cannot be solved in finite computer time. Instead, one has to truncate the Fourier–Legendre series, leaving the series “bandlimited”. We consider the three cases shown in Box 1.22.
1-3 Case Study
59
Box 1.22. (The portray of Fourier–Legendre analysis) Number of observed data at lattice points n D IJ D 2J 2
Number of unknown Fourier–Legendre coefficients: K Ck P P m D lim ek`
(finite).
(infinite).
n>m (overdetermined case),
versus
K!1 kD0 `Dk
The three cases: nDm (regular cases),
n<m (underdetermined case).
First, we have to truncate the. infinite Fourier–Legendre series that n > m hold. In this case of an overdetermined problem, we have more observations than equations. Second, we alternatively balance the number of unknown Fourier– Legendre coefficients such that n D m holds. Such a model choice assures a regular linear system. Both linear Fourier–Legendre models which are tuned to the number of observations suffer from a typical uncertainty. What is the effect of the forgotten unknown Fourier–Legendre coefficients m > n? Indeed a significance test has to decide upon any truncation to be admissible. We need an objective criterion to decide upon the degree m of bandlimit. Third, in order to be as objective as possible we again follow the third case of “less observations than unknowns” such that n < m holds. Such a Fourier–Legendre linear model generating an underdetermined system of linear equations will consequently be considered in Boxes 1.23 and 1.24. Note that the first example that is presented in Box 1.23 demonstrates MINOS of the Fourier–Legendre linear model for n D IJ D 2J 2 D 2 and k D 1; m D .k C 1/2 D 4 as observations and unknowns. Furthermore, note that the second example presented in Box 1.24 refers to MINOS of the Fourier–Legendre linear model for n D 8 and m D 9. In the framework of the second example, we have computed the design matrix A 2 R89 . Box 1.23. (Example) Fourier–Legendre analysis as an underdetermined linear model: m rkA D m n D 2, dim Y D n D 2 versus dim X D m D 4, J D 1; I D 2J D 2 ) n D IJ D 2J 2 D 2 versus K D 1 ) m D .k C 1/2 D 4
y1 y2
"
D
1 P11 .sin 1 / sin 1 P10 .sin 1 / P11 .sin 1 / cos 1 1 P11 .sin 2 / sin 2 P10 .sin 2 / P11 .sin 2 / cos 2
#
3 x1 6 x2 7 6 7 (1.252) 4 x3 5 x4 2
60
1 The First Problem of Algebraic Regression
subject to .1 ; 1 / D .0ı ; 0ı / and .2 ; 2 / D .180ı; 0ı /, fy D Ax j A 2 R24 ; rkA D n D 2; m D 4; n D m D 2g; (1.253) 100 1 1 P11 .sin 1 / sin 1 P10 .sin 1 / P11 .sin 1 / cos 1 ; A WD D 1 0 0 1 1 P11 .sin 2 / sin 2 P10 .sin 2 / P11 .sin 2 / cos 2 (1.254) P11 .sin / D cos ; P10 .sin / D sin ; P11 .sin 1 / D P11 .sin 2 / D 1;
(1.255)
P10 .sin 1 / D P10 .sin 2 / D 0; sin 1 D sin 2 D 0; cos 1 D 1; cos 2 D 1; .AA0 /11 .AA0 /12 0 AA D .AA0 /21 .AA0 /22
(1.256) (1.257)
2 2 .sin 1 / C P10 .sin 1 /; .AA0 /11 D 1 C P11
.AA0 /12 D 1 C P11 .sin 1 / P11 .sin 2 / sin 1 sin 2 CP10 .sin 1 / P10 .sin 2 / CP11 .sin 1 / P11 .sin 2 / cos 1 cos 2 0
.AA /21 D 1 C P11 .sin 2 / P11 .sin 1 / sin 2 sin 1
(1.258)
CP10 .sin 2 / P10 .sin 2 / CP11 .sin 2 / P11 .sin 1 / cos 2 cos 1 2 2 .AA0 /22 D 1 C P11 .sin 2 / C P10 .sin 2 /;
1 AA0 D 2I2 , .AA0 /1 D I2 ; 2 KD1 P
CkP 1 ;Ck2
k1 ;k2 D0 `1 Dk1 ;`2 Dk2 CkP KD1 1 ;Ck2 P k1 ;k2 D0 `1 Dk1 ;`2 Dk2
(1.259)
ek1 `1 .i1 ; i1 /ek2 `2 .i2 ; i2 / D 0; 8i1 ¤ i2 ; (1.260) ek1 `1 .i1 ; i1 /ek2 `2 .i2 ; i2 / D 2; 8i1 D i2 ;
1-3 Case Study
61
3 2 3 c00 x1 6 s11 7 6 x2 7 1 0 7 6 7 x` D 6 4 x3 5 .MINOS/ D 4 c10 5 .MINOS/ D 2 A y x4 c11 3 2 2 3 1 1 y1 C y2 7 6 1 60 0 7 7 y1 D 1 6 0 7 : D 6 4 4 5 y2 0 5 2 0 0 2 y1 y2 1 1 2
(1.261)
Fourier–Legendre analysis as an underdetermined linear model: m rkA D m n D 1; dim Y D n D 8 versus dim X D m D 9; J D 2; I D 2J D 4 ) n D IJ D 2J 2 D 8 versus k D 2 ) m D .k C 1/2 D 9; 3 2 2 1 P11 .sin 1 / sin 1 P10 .sin 1 / P11 .sin 1 / cos 1 y1 6 :: 7 6 :: :: :: :: ::: 4: 5D4: : : : y2
1 P11 .sin 8 / sin 8 P10 .sin 8 / P11 .sin 8 / cos 8 P22 .sin 1 / sin 21 P21 .sin 1 / sin 1 P20 .sin 1 / :: :: :: ::: ::: : : :
(1.262)
P22 .sin 8 / sin 28 P21 .sin 8 / sin 8 P20 .sin 8 / 32 3 x1 P21 .sin 1 / cos 1 P22 .sin 1 / cos 21 6 7 : : : 7 :: :: ::: 5 4 :: 5 : P21 .sin 8 / cos 8 P22 .sin 8 / cos 28
x9
Equidistant lattice (longitudinal width , lateral width ” D 90ı ; D 90ı /: .1 ; 1 / D .0ı ; C45ı /; .2 ; 2 / D .90ı ; C45ı /; .3 ; 3 / D .180ı; C45ı /; .4 ; 4 / D .270ı; C45ı /; .5 ; 5 / D .0ı ; 45ı /; .6 ; 6 / D .90ı ; 45ı /; .7 ; 7 / D .180ı; 45ı /; .8 ; 8 / D .270ı; 45ı /;
(1.263)
P11 .sin / D cos ; P10 .sin / D sin ; P22 .sin / D 3 cos2 ; P21 .sin / D 3 sin cos ; P20 .sin / D .3=2/ sin2 .1=2/;
(1.264)
62
1 The First Problem of Algebraic Regression
Table 1.3 (Number of observed data at lattice points versus the number of Fourier– Legendre Coefficients) Finite Number of observed data at lattice n D IJ D 2J 2
Infinite Number of unknown Fourier–Legendre Coefficients K P m D lim .2K C 1/= K!1 KD0
= lim .K C 1/2 K!1
J D 1 H) n D 2 J D 2 H) n D 8 J D 3 H) n D 18
J D 4 H) n D 32
finite MINOS:K D 1; .K C 1/2 D 4; m n D 4 2 D 2 LESS:K D 0; .K C 1/2 D 1; m n D 2 1 D 1 MINOS:K D 2; .K C 1/2 D 9; m n D 1 LESS:K D 1; .K C 1/2 D 4; m n D 4 MINOS:K D 4; .K C 1/2 D 25; m n D 17 LESS:K D 2; .K C 1/2 D 9; m n D 9 K D 3; .K C 1/2 D 16 MINOS:K D 5; .K C 1/2 D 36; m n D 4 K D 3; .K C 1/2 D 16; n m D 16 K D 4; .K C 1/2 D 25
1-33 Nyquist Frequency for Spherical Data We repeat our introduction of a regular grid analysis for data on the unit sphere with identical intervals in longitude and latitude. See our Box 1.21 which is enlarged by Table 1.3 generalizing to MINOS and LESS (LEast Squares Solution) Table 1.4 lists the various Nyguist frequencies for data on the unit sphere depending on (i) The number of observations (ii) The number of unknown parameters (iii) The Nyguist frequencies Nf D = D =. 0
0
For instance the number of estimable parameters for a 5 5 regular grid and 9,326,916 unknowns account for Nf D 2;160 quantities. But for a 5” 5” regular grid and 3; 356; 224 1010 unknowns, the number of the estimable quantities sum up to Nf D 129;600 as the Nyguist frequencies.
1-4 Special Nonlinear Models
63
Table 1.4 (Nyquist frequency, data on the reference sphere) Number of observations n D IJ D 2J 2 5ı 5ı regular grid n D 2;529 1ı 1ı regular grid n D 64;800 0 0 5 5 regular grid n D 9;331;200 5” 5” regular grid n D 3;359;232 1010 1” 1” regular grid n D 8;398;08 1011
Number of parameters m D .K C 1/2 m D 2;500 K D 49
Nyquist frequency N D = D = pf n1K Nf D 36
m D 64;516 K D 253
Nf D 180
m D 9;326;916 K D 3;053
Nf D 2;160
m D 3;356;224 1010 K D 1;831
Nf D 129;600
m D 8;398;072;88 1011 K D 916;409
Nf D 648;000
1-4 Special Nonlinear Models Let us finish this chapter by considering an example of a consistent system of linearized observational equations Ax D y and rkA D rk.A; y/ where the matrix Ax D Rnm is the Jacobi matrix (Jacobi map) of the nonlinear model. Let us present a planar triangle whose nodal points have to be coordinated from three distance measurements and the minimum norm solution of type I-MINOS.
1-41 Taylor Polynomials, Generalized Newton Iteration In addition we review the invariance properties of the observational equations with respect to a particular transformation group which makes the a priori indeterminism of the consistent system of linearized observational equations plausible. The observation vector is an element of the column space Y 2 R.A/. The geometry of the planar triangle is illustrated in Fig. 1.14. The point of departure for the linearization process of nonlinear observational equations is the nonlinear mapping X 7! F.X/ D Y. The Taylor expansion is truncated to the order O Œ.X x/ ˝ .X x/ ˝ .X x/,
64
1 The First Problem of Algebraic Regression
Fig. 1.14 Barycentric rectangular coordinates of the nodal points, namely of the equilateral triangle p˛ pˇ p .
Y D F.X/ D F.x/ C J.x/.X x/ C H.x/.X x/ ˝ .X x/ CO Œ.X x/ ˝ .X x/ ˝ .X x/:
(1.265)
J.x/ represents the Jacobi matrix of the first partial derivatives, while H.x/ is the Hesse matrix of second derivatives of the vector-valued function F.X/ with respect to the coordinates of the vector X, both taken at the evaluation point x. A linearized nonlinear model is generated by truncating the vector-valued function F.x/ to the order O Œ.X x/ ˝ .X x/, namely y WD F.X/ F.x/ D J.x/.X x/ C O Œ.X x/ ˝ .X x/:
(1.266)
A generalized Newton iteration process for solving the nonlinear observational equations by solving a sequence of linear equations of (injectivity) defect by means of the right inverse of type Gx -MINOS is the algorithm that is shown in Box 1.24. Box 1.24. Newton iteration, I-MINOS, rkA D rk.A; y/ Level 0 W x0 D x0 ; y0 D F.X/ F.x0 /; x1 D ŒJ.x0 / R y0 :
(1.267)
Level 1 W x1 D x0 C x1 ; y1 D F.x/ F.x1 /; x2 D ŒJ.x1 / R y1 :
(1.268)
Level i W xi D xi 1 C xi ; yi D F.x/ F.xi / xi C1 D ŒJ.xi / R yi :
(1.269)
Level n (i.e. the reproducing point in the computer arithmetric): xnC1 D xn . The planar triangle P˛ Pˇ P is approximately an equilateral triangle p˛ pˇ p whose nodal points are a priori coordinated by Box 1.25. Obviously, the approximate coordinates of the three nodal points are barycentric, namely characterized by Boxes 1.27 and 1.28. Their sum as well as their product sum vanish.
1-4 Special Nonlinear Models
65
Box 1.25. (Barycentric rectangular coordinates of the equilateral triangle p˛ pˇ p / 2
2
1 6 x˛ D 2 p˛ D 6 4 1p y˛ D 3 6
1 6 xˇ D 2 pˇ D 6 4 1p yˇ D 3 6
2 6 p D 4
x D 0 y D
1p 3: 3 (1.270)
Box 1.26. (First and second moments of nodal points, approximate coordinates). x˛ C xˇ C x D 0; y˛ C yˇ C y D 0;
(1.271)
Jxy D x˛ y˛ C xˇ yˇ C x y D 0; 1 1 Jxx D .y˛2 C yˇ2 C y 2 / D ; Jyy D .x˛2 C xˇ2 C x 2 / D ; (1.272) 2 2 x˛ C xˇ C x 0 I ŒIi D x D D 8i 2 f1; 2g ; (1.273) Iy y˛ C yˇ C y 0
Ixx Ixy Iij D Ixy Iyy # " 2 2 2 .y˛ C yˇ C y / x˛ y˛ C yˇ xˇ C x y D x˛ y˛ C yˇ xˇ C x y .x˛2 C xˇ2 C x 2 / (1.274) 2 3 1 0 7 1 6 D 4 2 1 5 D I2 8i; j 2 f1; 2g: 2 0 2 Box 1.27. (First and second moments of nodal points, inertia tensors).
I1 D
2 X i D1
e i Ii D e 1 I1 C e 2 I2 ;
C1 C1 Z Z Ii D dx dy .x; y/xi 8i 2 f1; 2g ; 1
1
(1.275)
66
1 The First Problem of Algebraic Regression
I2 D
2 X
ei ˝ ej Iij
i;j D1
D e1 ˝ e1 I11 C e1 ˝ e2 I12 C e2 ˝ e1 I21 C e2 ˝ e2 I22 ;
(1.276)
C1 C1 Z Z Iij D dx dy .x; y/.xi xj r 2 ıij / 8i; j 2 f1; 2g : 1
1
subject to r 2 D x2 C y 2 ;
.x; y/ D ı.x; y; x˛ y˛ / C ı.x; y; xˇ yˇ / C ı.x; y; x y /:
(1.277) (1.278)
The product sum of the approximate coordinates of the nodal points constitute the rectangular coordinates of the inertia tensor (1.279) and (1.280). The mass density distribution .x; y/ generates directly the coordinates Ixy ; Ixx ; Iyy of the inertia tensor. (ı.:::/ denotes the Dirac generalized function. The nonlinear observational equations of distance measurements are generated by the Pythagoras representation presented in Box 1.28. 2 X ID ei ˝ ej Iij (1.279) i;j D1 C1 Z
C1 Z
dy .x; y/.xi xj r 2 ıij / 8i; j 2 f1; 2g;
(1.280)
r 2 D x2 C y 2 ;
.x; y/ D ı.x; y; x˛ y˛ / C ı.x; y; xˇ yˇ / C ı.x; y; x y /:
(1.281)
Iij D
dx 1
1
Box 1.28. (Nonlinear observational equations of distance measurements in the plane, geometric notation versus algebraic notation). Taylor expansion of the nonlinear distance observational equations 2 Y1 D F1 .X/ D S˛ˇ D .Xˇ X˛ /2 C .Yˇ Y˛ /2 2 Y2 D F2 .X/ D Sˇ D .X Xˇ /2 C .Y Yˇ /2 Y3 D F3 .X/ D S 2˛ D .X˛ X /2 C .Y˛ Y /2 :
i h
2 2 ; Sˇ ; S 2˛ ; X0 WD X˛ ; Y˛ ; Xˇ ; Yˇ ; X ; Y Y0 WD S˛ˇ
1 1p 1 1p 1p x0 D x˛ ; y˛ ; xˇ ; yˇ ; x ; y D ; 3; ; 3; 0; 3 : 2 6 2 6 3 Jacobi map:
(1.282)
(1.283)
1-4 Special Nonlinear Models
67
3 @F1 @F1 @F1 @F1 @F1 @F1 6 @X˛ @Y˛ @Xˇ @Yˇ @X @Y 7 7 6 6 @F2 @F2 @F2 @F2 @F2 @F2 7 7 .x/ 6 J.x/ WD 6 7 6 @X˛ @Y˛ @Xˇ @Yˇ @X @Y 7 4 @F3 @F3 @F3 @F3 @F3 @F3 5 @X˛ @Y˛ @Xˇ @Yˇ @X @Y 2 2.xˇ x˛ / 2.yˇ y˛ / 2.xˇ x˛ / 2.yˇ y˛ / D4 0 0 2.x xˇ / 2.y yˇ / : : : 0 0 2.x˛ x / 2.y˛ y / 3 2 3 0 0 2 0 2 p 0 0 p0 : : : 2.x xˇ / 2.y yˇ / 5 D 4 0 0 1 3 1 3 5: p p 2.x˛ x / 2.y˛ y / 1 3 0 0 1 3 2
(1.284)
Let us analyze “observed minus computed”, namely the expression (1.285), here specialized to Box 1.29. The sum of the final coordinates is zero. However, due to the non-symmetric displacement field Œx˛ ; y˛ ; xˇ ; yˇ ; x ; y 0 the coordinate Jxy of the inertia tensor does not vanish. These results are collected in Box 1.30. y WD F.X/ F.x/ D J.x/.X x/ C O Œ.X x/ ˝ .X x/ D Jx C O Œ.X x/ ˝ .X x/ ;
(1.285)
Box 1.29. (Linearized observational equations of distance measurements in the plane, I-MINOS, rkA= dim Y). “Observed minus computed”: y WD F.X/ F.x/ D J.x/.X x/ C O Œ.X x/ ˝ .X x/ D Jx C O Œ.X x/ ˝ .X x/;
(1.286)
2
3 1 3 2 3 2 2 3 6 10 7 2 2 2 6 7 S˛ˇ s˛ˇ s˛ˇ 1:1 1 6 1 7 7 6 2 7 6 2 2 6 5 4 4 sˇ 5 D 4 Sˇ sˇ 5 D 0:9 1 D 6 7 7; 6 10 7 1:2 1 s 2˛ S 2˛ s 2˛ 4 1 5 5 3 2 2 s˛ˇ a˛ˇ 6 2 7 4 D s 0 4 ˇ 5 2 a ˛ s ˛ 2
(1.287)
3 x˛ 7 36 6 y˛ 7 b˛ˇ a˛ˇ b˛ˇ 0 0 7 6 6 xˇ 7 0 aˇ bˇ aˇ bˇ 5 6 7: (1.288) 6 yˇ 7 7 6 b ˛ 0 0 a ˛ b ˛ 4 x 5 y 2
68
1 The First Problem of Algebraic Regression
linearized observational equations: 2
3 2 0 2 p 0 0 p0 AD4 0 p 0 1 3 1 p3 5 ; 3 1 3 0 0 1 3 2 9 3 3p p p 6 3 3 5 37 7 6 7 6 1 6 9 3 3 0 0 1 p p 7 A .AA / D 7: 6 p 3 5 3 37 36 6 7 6 4 0 6 6p 5 p p 2 3 4 3 4 3
(1.289)
(1.290)
Minimum norm solution: 3 3 2 9y x˛ C 3y 3y 1 2 3 7 6 p3y C p3y 5p3y 6 y 7 1 2 3 7 6 6 ˛7 7 7 6 6 1 6 9y C 3y 3y 7 6 xˇ 7 1 2 3 p p xm D 6 7D 7; 6 p 6 yˇ 7 3y1 5 3y2 C 3y3 7 36 6 7 7 6 6 5 4 4 x 5 6y C 6y 2 3 p p p y 2 3y1 C 4 3y2 C 4 3y3 h p p p i 1 x0 m D 9; 5 3; 0; 4 3; C9; 3 ; 180 2
(1.291)
(1.292)
x C x/0
D x˛ C x˛ ; y˛ C y˛ ; xˇ C xˇ ; yˇ C yˇ ; x C x ; y C y (1.293) h i p p p 1 99; 35 3; C90; 26 3; C9; C61 3 : D 180 Box 1.30. (First and second moments of nodal points, final coordinates): y˛ C y˛ C yˇ C yˇ C y C y D y˛ C yˇ C y C y˛ C yˇ C y D 0;
(1.294)
Jxy D Ixy C Ixy D .x˛ C x˛ /.y˛ Cy˛ / C .xˇ C xˇ /.yˇ C yˇ / C .x Cx /.y Cy / D x˛ y˛ Cxˇ yˇ Cx y Cx˛ y˛ C y˛ x˛ Cxˇ yˇ Cyˇ xˇ Cx y Cy x CO.x˛ y˛ ; xˇ yˇ ; x y / p D 3=15;
(1.295)
1-4 Special Nonlinear Models
69
Jxx D Ixx C Ixx D .y˛ C y˛ /2 .yˇ C yˇ /2 .y y /2 D .y˛2 Cyˇ2 Cy 2 / 2y˛ y˛ 2yˇ yˇ 2y y O.y˛2 ; yˇ2 ; y 2 / D 7=12;
(1.296)
Jyy D Iyy C Iyy D .x˛ C x˛ /2 .xˇ C xˇ /2 .x x /2 D .x˛2 C xˇ2 C x 2 / 2x˛ x˛ 2xˇ xˇ 2x x O.x˛2 ; xˇ2 ; x 2 / D 11=20:
(1.297)
1-42 Linearized Models with Datum Defect More insight into the structure of a consistent system of observational equations with datum defect is gained in the case of a nonlinear model. Such a nonlinear model may be written Y D F.X/ subject to Y 2 Rn and X 2 Rm , or fYi D Fi .Xj / j i 2 f1; : : : ; ng; j 2 f1; : : : ; mgg. A classification of such a nonlinear function can be based upon the “soft” Implicit Function Theorem, which is a substitute for the theory of algebraic partitioning, i.e., rank partitioning, and which is reviewed in Appendix C. Let us compute the matrix of first derivatives Œ@Fi =@Xj 2 Rnm , which is a rectangular matrix of dimension n m. The set of n independent columns builds up the Jacobi matrix (1.298), which is the rectangular matrix of first derivatives (1.299) subject to (1.300). The term m rkA is called the datum defect of the consistent system of nonlinear equations y D F.X/, which is a priori known. By means of such a rank partitioning, we decompose the vector of unknowns (1.301) into “bounded parameters” X1 and “free parameters” X2 subject to (1.302). Let us apply the “soft” Implicit Function Theorem to the nonlinear observational equations of distance measurements in the plane which we already introduced in the previous example. Box 1.31 outlines the nonlinear observational equations 2 2 for Y1 D S˛ˇ ; Y2 D Sˇ ; Y3 D S 2˛ .. The columns c1 ; c2 ; c3 of the matrix nm Œ@Fi =@Xj 2 R are linearly independent and accordingly build up the Jacobi matrix J of full rank. Let us partition the unknown vector X0 D ŒX0 1 ; X0 2 , namely into the “free parameters” ŒX˛ ; Y˛ ; Yˇ and the “bounded parameters” ŒXˇ ; X ; Y 0 .
70
1 The First Problem of Algebraic Regression
2
@F1 6 @X 6 1 6 @F2 6 6 @X1 A WD 6 6 6
6 6 4 @Fn @X1
@F1 @F1
@X2 @Xn @F2 @F2
@X2 @Xn
3
7 7 7 7 7 7; 7 7 7 7 @Fn @Fn 5
@X2 @Xn
(1.298)
A WD ŒA1 ; A2 D ŒJ; K; r D rkA D n;
(1.299)
A 2 Rnm ; A1 D J 2 Rnn D Rrr ; A2 D K 2 Rn.mn/ D Rr.nr/ ;
(1.300)
X0 D ŒX0 1 ; X0 2 ;
(1.301)
X1 2 Rn D Rr ; X2 2 Rmn D Rmr :
(1.302)
Box 1.31. (Nonlinear observational equations of distance measurements in the plane, geometric notation versus algebraic notation). Geometric notation: 2 S˛ˇ D .Xˇ X˛ /2 C .Yˇ Y˛ /2 ; 2 Sˇ D .X Xˇ /2 C .Y Yˇ /2 ; S 2˛ D .X˛ X /2 C .Y˛ Y /2 :
(1.303)
Algebraic notation: 2 D .Xˇ X˛ /2 C .Yˇ Y˛ /2 ; Y1 D F1 .X/ D S˛ˇ 2 Y2 D F2 .X/ D Sˇ D .X Xˇ /2 C .Y Yˇ /2 ; Y3 D F3 .X/ D S 2˛ D .X˛ X /2 C .Y˛ Y /2 : 2 2 Y0 WD ŒY1 ; Y2 ; Y3 D ŒS˛ˇ ; Sˇ ; S 2˛ X WD ŒX1 ; X2 ; X3 ; X4 ; X5 ; X6 D ŒX˛ ; Y˛ ; Xˇ ; Yˇ ; X ; Y 0
2
(1.305)
Jacobi matrix:
.X3 X1 / .X4 X2 / .X3 X1 / 0 0 .X5 X3 / : : : 0 .X1 X5 / .X2 X6 / 3 .X4 X2 / 0 0 ::: .X6 X4 / .X5 X3 / .X6 X4 / 5; 0 .X1 X5 / .X2 X6 / @Fi D 3; rk @Xj @Fi @Xj
(1.304)
D 24
(1.306)
(1.307)
1-4 Special Nonlinear Models
71
2
@Fi @Xj
D 3 6;
3 .X3 X1 / .X4 X2 / .X3 X1 / JD4 0 0 .X5 X3 / 5 ; rkJ D 3; .X1 X5 / .X2 X6 / 0 2 3 .X4 X2 / 0 0 K D 4 .X6 X4 / .X5 X3 / .X6 X4 / 5 0 .X1 X5 / .X2 X6 / “Free parameters” W X1 D X˛ D 0; X2 D Y˛ D 0;
X4 D Yˇ D 0:
“Bounded parameters” W X3 D Xˇ D C S˛ˇ ; q ./ 2 2 X5 D X D C S˛ˇ Sˇ C S 2˛ ./ q D C Y32 Y22 C Y12 ./ q 2 2 X6 D Y D C Sˇ S˛ˇ ./ q C Y22 Y12 :
(1.308)
(1.309)
(1.310)
(1.311)
./
Here, we have made the following choice for the “free parameters”. The origin of the coordinate system, we have fixed by fX˛ D 0; Y˛ D 0g. Obviously, the point P˛ is this origin. The orientation of the X axis is given by Yˇ D 0. In consequence, the “bounded parameters” are now derived by solving a quadratic equation, indeed a very simple one: due to the datum choice, we find (1.312). Indeed, we meet the characteristic problem of nonlinear observational equations. There are two solutions which we indicate by ˙. Only prior information can tell us what is the realized one in our experiment. Such a prior information has been built into by “approximate coordinates” in the previous example, a prior information, we lack now. For special reasons, we have chosen here the C solution, which is in agreement with Box 1.25. q p 2 Xˇ D ˙ S˛ˇ D ˙ Y1
p 2 2 X D ˙.S˛ˇ Sˇ C S 2˛ /=.2S˛ˇ / D ˙.Y1 Y2 C Y3 /=.2 Y1 / q 2 2 2 Sˇ C S 2˛ /2 =.4S˛ˇ / (1.312) Y D ˙ S 2˛ .S˛ˇ p D ˙ Y3 .Y1 Y2 C Y3 /2 =.4Y1 /:
An intermediate summary of our first solution of a set of nonlinear observational equations is the following. By the choice of the datum parameters (here, the choice of origin and orientation of the coordinate system) as “free parameters”, we were able to compute the “bounded parameters” by solving a quadratic equation.
72
1 The First Problem of Algebraic Regression
The solution space, which could be constructed in a closed form, was nonunique. Uniqueness was only achieved by prior information. Note that the closed form solution X D ŒX1 ; X2 ; X3 ; X4 ; X5 ; X6 0 D ŒX˛ ; Y˛ ; Xˇ ; Yˇ ; X ; Y 0 has another deficiency, namely X is not MINOS: it is for this reason that we apply the datum transformation .X; Y / 7! .x; y/ that is outlined in Box 1.32 subject to k x k2 D min, namely I-MINOS. Since we have assumed distance observations, the datum transformation is described as a rotation (rotation group SO(2)) and a translation (translation group T(2)) with three parameters (1 rotation parameter called and two translational parameters called tx ; ty ). A pointwise transformation .X˛ ; Y˛ / 7! .x˛ ; y˛ /; .Xˇ ; Yˇ / 7! .xˇ ; yˇ / and .X ; Y / 7! .x ; y / is presented in Box 1.32. The datum parameters , tx and ty are determined by I-MINOS, in particular, by a special Procrustes algorithm. There are various representations of the Lagrangean of type MINOS. For instance, as it is outlined in Box 1.33, 2 we could use the representation of k x k2 in terms of observations Y1 D S˛ˇ , 2 2 2 Y2 D Sˇ , and Y3 D S ˛ , which transforms k x k of (1.318) into k x k2 .Y1 ; Y2 ; Y3 / of (1.319). k x k2 of (1.329) is equivalent to minimizing the product sums of Cartesian coordinates. As soon as we substitute the datum transformation of Box 1.32, which we illustrate by Figs. 1.15 and 1.16, into the Lagrangean L.tx ; ty ; ) of type MINOS (k x k2 D min), we arrive at the quadratic objective function of Box 1.34. In the first forward step of the special Procrustes algorithm, we obtain the minimal solution for the translation parameters tOx and tOy . The second forward step of the special Procrustes algorithm is built on (i) the substitution of tOx and tOy in the original Lagrangean, which leads to the reduced Lagrangean of Box 1.35, and (ii) the minimization of the reduced Lagrangean L() with respect to the rotation parameter . In an intermediate phase, we introduce “centralized coordinates” .X; Y /, namely coordinate differences with respect to the centre Po D .Xo ; Yo / of the polyhedron, namely the triangle P˛ ; Pˇ ; P . In this way, we are able to generate the simple (standard form) tan 2 ^ of the solution ^ , the argument of L1 D L1 ./ D min or L2 D L2 ./. The special Procrustes algorithm is completed by the backforward steps outlined in Box 1.36. Firstly, we convert tan 2 ^ to .cos ^ ; sin ^ /. Secondly, we substitute .cos ^ ; sin ^ / into the translation formula .tx^ ; ty^ /. Thirdly, we substitute .tx^ ; ty^ ; cos ^ ; sin ^ / into the Lagrangean L.tx ; tx ; /, thus generating the optimal objective function k x k2 x P X
Fig. 1.15 Commutative diagram (P diagram). PO D centre of polyhedron (triangle p˛ pˇ p ). Action of the translation group
PO XO
O
1-4 Special Nonlinear Models
73 R(φ)
{e1, e2|PO} R–1(φ)
{E1, E2|PO} I
E1, E2|O
Fig. 1.16 Commutative diagram (E diagram). Orthonormal 2-legs fE1 ; E2 jPO at PO and fe1 ; e2 jPO g at PO at PO . Action of the translation group
at .tx^ ; ty^ ; ^ /. Finally, as step four, we succeed to compute the centric coordinates (1.313, left) with respect to the orthonormal 2-leg fe1 ; e2 jPO at PO from the given coordinates (1.313, right) with respect to the orthonormal 2-leg fe1 ; e2 jO at O and the optimal datum parameters .tx^ ; ty^ ; cos ^ ; sin ^ /.
x˛ xˇ x X˛ Xˇ X ; y˛ yˇ y Y˛ Yˇ Y
(1.313)
Box 1.32. (Datum transformation of Cartesian coordinates): x X t DR C x ty y Y
(1.314)
R 2 SO.2/ WD fR 2 R22 jR0 R D I2 ; detR D C1g:
(1.315)
(representation of a 2 2 orthonormal matrix): cos sin RD sin cos
(1.316)
x˛ D X˛ cos C Y˛ sin tx y˛ D X˛ sin C Y˛ cos ty xˇ D Xˇ cos C Yˇ sin tx yˇ D Xˇ sin C Yˇ cos ty x D X cos C Y sin tx y D X sin C Y cos ty :
(1.317)
74
1 The First Problem of Algebraic Regression
Box 1.33. (Various forms of MINOS). (i) k x k2 D x˛2 C y˛2 C xˇ2 C yˇ2 C x 2 C y 2 D min I ;tx ;ty
(1.318)
(ii) 2 2 k x k2 D 12 .S˛ˇ C Sˇ C S 2˛ /C Cx˛ xˇ C xˇ x C x x˛ C y˛ yˇ C yˇ y C y y˛ D min I
(1.319)
;tx ;ty
k x k2 D min
,
(iii) x˛ xˇ C xˇ x C x x˛ D min y˛ yˇ C yˇ y C y y˛ D min :
(1.320)
Proof. The representation of the objective function of type MINOS in terms of the 2 2 ; Y2 D Sˇ , and Y3 D S 2˛ can be proven as follows: observations Y1 D S˛ˇ 2 S˛ˇ D .xˇ x˛ /2 C .yˇ y˛ /2 D x˛2 C y˛2 C xˇ2 C yˇ2 2.x˛ xˇ C y˛ yˇ /
D)
1 2 1 S C x˛ xˇ C y˛ yˇ D .x˛2 C y˛2 C xˇ2 C yˇ2 /; 2 ˛ˇ 2
(1.321)
k x k2 D x˛2 C y˛2 C xˇ2 C yˇ2 C x 2 C y 2 D
D
1 2 2 .S C Sˇ C S 2˛ / C 2 ˛ˇ Cx˛ xˇ C xˇ x C x x˛ C y˛ yˇ C yˇ y C y y˛ 1 .Y1 C Y2 C Y3 / 2 Cx˛ xˇ C xˇ x C x x˛ C y˛ yˇ C yˇ y C y y˛ :
(1.322)
Box 1.34. (Minimum norm solution, special Procrustes algorithm, 1st forward step). k x k2 WD x˛2 C y˛2 C xˇ2 C yˇ2 C x 2 C y 2 D min
tx ;ty ;
Lagrangean
(1.323)
1-4 Special Nonlinear Models
75
L.tx ; ty ; / WD .X˛ cos C Y˛ sin tx /2 C.X˛ sin C Y˛ cos ty /2 C.Xˇ cos C Yˇ sin tx /2
(1.324)
C.Xˇ sin C Yˇ cos ty /2 C.X cos C Y sin tx /2 C.X sin C Y cos ty /2 : 1st forward step 1 @L ^ .t / D .X˛ C Xˇ C X / cos C .Y˛ C Yˇ C Y / sin 3tx^ D 0; 2 @tx x 1 @L ^ .t / D .X˛ C Xˇ C X / sin C .Y˛ C Yˇ C Y / cos 3ty^ D 0; 2 @ty y (1.325) 1 ^ tx D C f.X˛ C Xˇ C X / cos C .Y˛ C Yˇ C Y / sin g; 3 (1.326) 1 ^ ty D C f.X˛ C Xˇ C X / sin C .Y˛ C Yˇ C Y / cos g; 3 ftx^ ; ty^ g D argŒL.tx ; ty ; / D min:
(1.327)
Box 1.35. (Minimum norm solution, special Procrustes algorithm, second forward step). Solution ftx^ ; ty^ g in Lagrangean (reduced Lagrangean): L./ W D
2 ˚ W D X˛ cos C Y˛ sin 13 Œ.X˛ C Xˇ C X / cos C .Y˛ C Yˇ C Y / sin ˚ 2 C Xˇ cos C Yˇ sin 13 Œ.X˛ C Xˇ C X / cos C .Y˛ C Yˇ C Y / sin ˚ 2 C X cos C Y sin 13 Œ.X˛ C Xˇ C X / cos C .Y˛ C Yˇ C Y / sin ˚ 2 C X˛ sin C Y˛ cos 13 Œ.X˛ C Xˇ C X / sin C .Y˛ C Yˇ C Y / cos ˚ 2 C Xˇ sin C Yˇ cos 13 Œ.X˛ C Xˇ C X / sin C .Y˛ C Yˇ C Y / cos CfX sin C Y cos 13 Œ.X˛ C Xˇ C X / sin C .Y˛ C Yˇ C Y / cos g2 D min
(1.328)
76
1 The First Problem of Algebraic Regression
L./ D D
2 1 1 X˛ .X˛ CXˇ CX / cos C Y˛ .Y˛ C Yˇ C Y / sin 3 3 2 1 1 C Xˇ .X˛ CXˇ CX / cos C Yˇ .Y˛ CYˇ CY / sin 3 3 2 1 1 C X .X˛ CXˇ C X / cos C Y .Y˛ C Yˇ CY / sin 3 3 2 1 1 C X˛ .X˛ CXˇ CX / sin C Y˛ .Y˛ CYˇ CY / cos 3 3 1 1 CfŒXˇ .X˛ CXˇ C X / sin C ŒYˇ .Y˛ CYˇ CY / cos g2 3 3 2 1 1 C X .X˛ CXˇ CX / sin CŒY .Y˛ CYˇ C Y / cos : 3 3 (1.329) Centralized coordinate: 1 1 X WD X˛ .X˛ C Xˇ C X / D .2X˛ Xˇ X / 3 3 1 1 Y WD Y˛ .Y˛ C Yˇ C Y / D .2Y˛ Yˇ Y / 3 3 Reduced Lagrangean:
(1.330)
L1 ./ D .X˛ cos C Y˛ sin /2 C .Xˇ cos C Yˇ sin /2 C.X cos C Y sin /2 L2 ./ D .X˛ sin C Y˛ cos /2 C .Xˇ sin C Yˇ cos /2
(1.331)
C.X sin C Y cos /2 1 @L ^ . / D 0 2 @ , C.X˛ cos ^ C Y˛ sin ^ /.X˛ sin ^ C Y˛ cos ^ / C.Xˇ cos ^ C Yˇ sin ^ /2 .Xˇ sin ^ C Yˇ cos ^ / C.X cos ^ C Y sin ^ /2 .X sin ^ C Y cos ^ / D 0
(1.332)
1-4 Special Nonlinear Models
77
, .X˛2 C Xˇ2 C X 2 / sin ^ cos ^ C.X˛ Y˛ C Xˇ Yˇ C X Y / cos2 ^ .X˛ Y˛ C Xˇ Yˇ C X Y / sin2 ^ C.Y˛2 C Yˇ2 C Y 2 / sin ^ cos ^ D 0
(1.333)
, Œ.X˛2 C Xˇ2 C X 2 / .Y˛2 C Yˇ2 C Y 2 / sin 2 ^ D 2ŒX˛ Y˛ C Xˇ Yˇ C X Y cos 2 ^ ; tan 2 ^ D 2
.X˛2
X˛ Y˛ C Xˇ Yˇ C X Y : C Xˇ2 C X 2 / .Y˛2 C Yˇ2 C Y 2 /
(1.334)
Orientation parameter in terms of Gauss brackets: 2ŒXY ŒX2 ŒY2 ^ D argfL1 ./ D ming D argŒL2 ./ D min: tan 2 ^ D
(1.335)
Box 1.36. (Special Procrustes algorithm, backward steps). Step one: 2ŒXY tan 2 D ŒX2 ŒY2 ^
)
cos ^ ; sin ^ :
(1.336)
Step two: 1 .ŒX cos ^ C ŒY sin ^ /; 3 1 ty^ D .ŒX sin ^ C ŒY cos ^ /: 3 Step three: tx^ D
k x^ k2 D L.tx^ ; ty^ ; ^ /:
x˛ xˇ x y˛ yˇ y
Step four: " ^# t cos ^ sin ^ X˛ Xˇ X D x^ 10 : ty sin ^ cos ^ Y˛ Yˇ Y
(1.337)
(1.338)
(1.339)
We leave the proof of the following relations as an exercise to the reader: Œx D x˛ C xˇ C x D 0; Œy D y˛ C yˇ C y D 0, and Œxy D x˛ y˛ C xˇ yˇ C x y ¤ 0. A numerical example is provided by Example 1.2.
78
1 The First Problem of Algebraic Regression
Example 1.2. (A numerical example). S˛ˇ D 1:1; Sˇ D 0:9; S ˛ D 1:2; Y1 D 1:21 ; Y2 D 0:81; Y3 D 1:44; X˛ D 0; Xˇ D 1:10; X D 0:84; Y˛ D 0 ; Yˇ D 0
; Y D 0:86;
X˛ D 0:647; Xˇ D C0:453; X D C0:193; Y˛ D 0:287 ; Yˇ D 0:287; Y D C0:573:
(1.340)
(1.341)
(1.342)
Test: ŒX D 0; ŒY D 0; 2
(1.343) 2
ŒXY D 0:166; ŒX D 0:661; ŒY D 0:493;
(1.344)
tan 2 ^ D 1:979; ^ D 31ı :598; 828; 457 ^ D 31ı 350 5500 :782; cos ^ D 0:851; 738; sin ^ D 0:523; 968; tx^ D 0:701; ty^ D 0:095;
(1.345)
x˛ D 0:701; xˇ D C0:236; x D C0:465; y˛ D C0:095; yˇ D 0:481; y D C0:387;
(1.346)
Test: Œx D x˛ C xˇ C x D 0; Œy D y˛ C yˇ C y D 0;
(1.347)
Œxy D C0:019 ¤ 0
(1.348)
1-5 Notes Table 1.5 contains a list of observables in Rn , equipped with a metric and their corresponding transformation groups. The number of the datum parameters coincides with the injectivity rank deficiency in a consistent system of linear (linearized) observational equations Ax D y subject to d.A/ D m rkA. What is the origin of the rank deficiency three of the linearized observational equations, namely the three distance functions observed in a planar triangular network, we present in paragraph three. In geometric terms, the a priori indeterminancy of relating observed distances to absolute coordinates placing points in the plane can be interpreted easily: the observational equation of distances in the plane is invariant with respect to a translation and a rotation of the coordinate system.
1-5 Notes
79
Table 1.5. Observables and transformation groups Observed quantities
Transformation group
Coordinate differences in R2 Coordinate differences in R3 Coordinate differences in Rn Distances in R2 Distances in R3 Distances in Rn Angles, distance ratios in R2 Angles, distance ratios in R3 Angles, distance ratios in Rn Cross-ratios of area elements in the projective plane
Translation group T(2) Translation group T(3) Translation group T(n) Group of motion T(2),SO(2) Group of motion T(3),SO(3) Group of motion T(n),SO(n) Conformal group C4(2) Conformal group C7(3) Conformal group C.nC1/.nC2/=2.n/ Projective group
Datum parameters 2 3 n 3 3C3 D 6 n C (n C 1) D 2 4 7 (n + 1)(n + 2)/2 8
Mind the following note. The structure group of the two-dimensional Euclidean space is the group of motion decomposed into the translation group (two parameters) and the rotation group (one parameter). Under the action of the group of motion (three parameters) Euclidean distance functions are left equivariant. The three parameters of the group of motion cannot be determined from distance measurements: they produce a rank deficiency of three in the linearized observational equations. A detailed analysis of the relation between the transformation groups and the observational equations has been presented by Grafarend (1974, 1976). Mind also the following notes. (i) More generally, the structure group of a three-dimensional Weitzenboeck space is the conformal group which is decomposed into the translation group (3 parameters), the special orthogonal group SO(3) (3 parameters) and the dilatation group (“scale”, 1 parameter). Under the action of the conformal group – in total 7 parameters – distance ratios and angles are left equivariant. The conformal group generates a transformation of Cartesian coordinates covering which is called similarity transformation or datum transformation. Any choice of an origin of the coordinate system, of the axes orientation, and of the scale constitutes an S-base following W. Baarda (1962, 1967, 1973, 1979, 1995), J. Bossler (1973), M. Berger (1994), A. Dermanis (1998), A. Dermanis, and E. Grafarend (1993), A. Fotiou and D. Rossikopoulis (1993), E. Grafarend (1973, 1979, 1983), E. Grafarend, E. H. Knickmeyer, and B. Schaffrin (1982), E. Grafarend and G. Kampmann (1996), G. Heindl (1986), M. Molenaar (1981), H. Quee (1983), P.J.G. Teunissen (1960, 1985), and H. Wolf (1990). (ii) In projective networks (image processing, photogrammetry, robot vision), the projective group is active. The projective group generates a perspective transformation which is outlined in E. Grafarend and J. Shan (1997). Under the action of the projective group, cross-ratios of areal elements in the projective plane are left equivariant. For more details, let us refer to M. Berger (1994), M.H. Brill, and E.B. Barrett (1983), R.O. Duda and P.E. Heart (1973), E. Grafarend and J. Shan (1997), F. Gronwald
80
1 The First Problem of Algebraic Regression
and F.W. Hehl (1996), M.R. Haralick (1980), R.J. Holt, and A.N. Netrawalli (1995), R.L. Mohr, L. Morin, and E. Grosso (1992), J.L. Mundy and A. Zisserman (1992a, b), R.F. Riesenfeldt (1981), J.A. Schouten (1954). (iii) In electromagnetism (Maxwell equations), the conformal group is active. The conformal group generates a transformation of “space-time” by means of 16 parameters (6 rotational parameters three for rotation, three for “hyperbolic rotation”, 4 translational parameters, 5 “involutary” parameters, 1 dilatation (scale) parameter) which leaves the Maxwell equations in vacuum as well as pseudo-distance ratios and angles equivariant. Sample references are A.O. Barut (1972), H. Bateman (1910), F. Bayen (1976), J. Beckers, J. Harnard, M. Perroud, and M. Winternitz (1976), D.G. Boulware, L.S. Brown, R.D. Peccei (1970), P. Carruthers (1971), E. Cunningham (1910), T. tom Dieck (1967), N. Euler, and W.H. Steeb (1992), P.G.O. Freund (1974), T. Fulton, F. Rohrlich, and L. Witten (1962), J. Haantjes (1937), H.A. Kastrup (1962, 1966), R. Kotecky, and J. Niederle (1975), K.H. Marivalla (1971), D.H. Mayer (1975), J.A. Schouten, and J. Haantjes (1936), D.E. Soper (1976) and J. Wess (1990).
Chapter 2
The First Problem of Probabilistic Regression: The Bias Problem Minimum Bias solution of problems with datum defects. LUMBE of fixed effects. The bias problem in probabilistic regression has been the subject of Sect. 437 for simultaneous determination of first moments as well as second central moments by inhomogeneous multilinear, namely bilinear, estimation. Based on the review of the first author “Variance-covariance component estimation: theoretical results and geodetic application” (Statistical and Decision, Supplement Issue No. 2, pages 401–441, 105 references, Oldenbourg Verlag, M¨unchen 1989), we collected 5 postulates for simultaneous determination of first and second central moments. A first reference is J. Kleffe (1978). It forms the basis of Sect. 4-37: ansatz Efyg D A D Ax 1 ;
vech Dfyg D B D Bx 2
estimation: ansatz: bilinearity xb1 D K 1 C L1 y C M 1 .y ˝ y/ xb b x W 1 xb2 xb2 D K 2 C L2 y C M 2 .y ˝ y/ 1st postulate “inhomogeneous, bilinear estimation” 2 3 1 K 1 L1 M 1 ; Y WD 4 y 5 X WD K 2 L2 M 2 y˝y 2nd postulate “invariance” Q is invariant in the sense y 7! y A 3rd postulate “unbiased in the mean” versus “minimum bias” bias vector, bias matrix
E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 2, © Springer-Verlag Berlin Heidelberg 2012
81
82
2 The First Problem of Probabilistic Regression: The Bias Problem
A b WD Efxg O x D S x; S WD X I B 4th postulate “Dfxg O contains second, third and fourth order moments which are based on a quasiGauss normal structure.” 5th postulate “best estimation” kDfxgk O D min Here we need only two postulates, namely (i) linear: .x1 / D L1 y and (ii) minimum bias in the mean when we consider the simple model Efyg D A D Ax1 .A 2 R nm ; n < m/ for the moment of first order and Dfyg D I 2 D I x 2 for the central moment of the second order ( 2 > 0), one parameter of type variance. You find the estimate of type linear xb1 D L1 y and type minimum bias in the mean and of type FROBENIUS matrix norm kBk2 for the case m > n, here kBk2 D d D m n. Real World Problems are nonlinear. We have to divide the solution space into those classes: (a) strong nonlinear, (b) weak nonlinear, and (c) linear. Weak nonlinearity is defined by nonlinear system which allows a Taylor series approximation. Linear system equations, for instance the minimum bias problem in the FROBENIUS matrix norm with respect to linear estimation theory, have the advantage that its norm of equations has only one minimum solution. We call that solution “global” and “uniform”. In contrast to linear estimation, weak nonlinear equations produce “local” solutions. They experience a great disadvantage: there exist many local solutions. A typical example is a geodetic network analysis by P. LOHSE (1994). Another example is also C.R. RAO and J. KLEFFE (1999, pages 161–180). Minimum bias solutions of rank deficient linear systems have been discussed in detail by C.R. RAO (1974). He introduces the notation of the substitute matrix S referring the matrix 0 by the matrix S of arbitrary rank, for instance rkS D m. Note that the rank of the matrix 0 is one by the technique of LAGRANGE multiplyer the minimization of the S weighted FROBENIUS norm k.I m LA/0 k2L within Theorem 2.3. Obviously, S-weighted homogeneous linear uniform by minimum bias estimator (“hom S-LUMBE”) is equivalent to the weighted minimum norm solutions (“G-MINOS”) subject of the RAO-Theorem. Our example is the special model for S D I m called “unity substitute matrix”, obviously kBk2 D d D mn D m rkA. In practice, it is difficult to determine the rank of matrix A. For instance, in problems of Configuration Defects in Geodetic Sciences E. Grafarend and E. Livieratos (1979), E. Grafarend and K. Heinz (1978), E. Grafarend and V.
2 The First Problem of Probabilistic Regression: The Bias Problem
83
Lemma 2.2 xˆ hom S-LUMBE of x
Definition 2.1 xˆ hom S-LUMBE of x
Theorem 2.3 xˆ hom S-LUMBE of x Theorem 2.4 equivalence of Gx -MINOS and S-LUMBE
Fig. 2.1 The guideline of chapter two: definition, lemma and theorems
Miller (1985) and E. Grafarend and A. Mader (1989). Critical comments of the RAO bias problem solution have been raised by Arne Bjerhammar and Jujit Kumar Mitra (private communication to the first author). They argue that bias has nothing to do with rank definition of the type m-n. They use the argument that you need PRIOR INFORMATION, for instance of BAYES type: Bayes Estimation x0 . Here we leave this question open. In summary, the minimum of the trace of the matrix valued bias leads to the rank deficiency d subject to d D m n, m > n. Please pay attention to the guideline of Chap. 2 shown in Fig. 2.1. In particular, mind the structure of definitions, lemma, and theorems. Fast track reading: consult Box 2.1 and read only Theorem 2.3.
In the first chapter, we have solved a special algebraic regression problem, namely the inversion of a system of consistent linear equations which is classified as underdetermined. By means of the postulate of a minimum norm solution, namely jjxjj2 D min, we were able to determine m unknowns (m > n, say m D 106 ) from n observations (more unknowns m than equations n, say n D 10). Indeed, such a mathematical solution may surprise the analyst: in the example, MINOS produces one million unknowns from 10 observations. Though MINOS generates a rigorous solution, we are left with some doubts. How can we conveniently interpret such an “unbelievable solution”? The key for an evaluation of MINOS is handed over to us by treating the special algebraic regression problem with datum defect. The bias generated by any solution of an underdetermined or ill-posed problem will be introduced as a decisive criterion
84
2 The First Problem of Probabilistic Regression: The Bias Problem
for evaluating MINOS, now in the context of a probabilistic regression problem. In particular, a special form of LUMBE, the linear uniformly minimum bias estimator jjLA Ijj2 D min leads us to a solution which is equivalent to “MINOS”. Alternatively we may say that in the various classes of solving an underdetermined problem “LUMBE” generates a solution of minimal bias. What is a underdetermined regression problem of type MINOS? By means of a certain objective function, here of type “minimum bias”, we solve the inverse problem of linear and nonlinear equations with fixed effects. In particular, in order O we meet the problem that its minimum norm to minimize the bias vector E fg 2 O O O /0 subject to O D Ly depends on the kE fg k D t rŒ.E fg /.E fg 0 product where denotes the unknown parameter vector. In this section C.R. Rao 0 0 proposed the use instead of the matrix ; rk D 1, the substitute matrix S with full rank, for instance.
2-1 Linear Uniformly Minimum Bias Estimator (LUMBE) Let us introduce the special consistent linear Gauss–Markov model specified in Box 2.1, which is given form of a consistent system of linear equations relating non-stochastic, real-valued vector of unknowns to the observation vector y. Here, the rank rkA of the matrix A connecting the observations and unknown parameters equals the number n of observations y 2 Rn . Box 2.1. (Special consistent linear Gauss–Markov model of fixed effects. Bias vector, bias matrix, vector and matrix bias norms). fy D AjA 2 Rnm ; rkA D n; n < mg unknown “ansatz” O D Ly
(2.1)
“bias vector” O D ŒIm LA ; 8 2 Rm ˇ WD E fg
(2.2)
“bias matrix”
B0 D Im LA 00
“bias norms
(2.3)
2-1 Linear Uniformly Minimum Bias Estimator (LUMBE)
85
jjˇjj2 D ˇ 0 ˇ D 0 ŒIm LA0 ŒIm LA
(2.4)
jjˇjj2 D trˇˇ 0 D trŒIm LA 0 ŒIm LA0 D jjBjj2 0
(2.5)
jjˇjj2S W D trŒIm LASŒIm LA0 DW jjBjj2S
(2.6)
Since we deal with a linear model, it is “a natural choice” to setup a homogeneous linear form to estimate the parameters of fixed effects, at first, namely (2.1) where is a matrix-valued fixed unknown. In order to determine the real-valued m n matrix L, the homogeneous linear estimation O of the vector of fixed effects has to fulfil a certain optimality condition, which we will outline. Let us try to analyze the bias in solving an underdetermined system of linear equations. Take reference to Box 2.1, where we systematically introduce (i) the bias vector ˇ, (ii) the bias matrix, (iii) the S-modified bias matrix norm as a weighted Frobenius norm. In detail, let us discuss the bias terminology: For a homogeneous linear estimation O takes over the special form (2.2), (2.1), the vector-valued bias ˇ WD ŒE fg which led us to the definition of the bias matrix .I LA/0 . The norm of the bias vector ˇ, namely jjˇjj2 WD ˇ 0 ˇ, coincides with the 0 weighted Frobenius norm of the bias matrix B, namely jjBjj2 0 . Here, we meet the central problem that the weight matrix 0 , rk 0 D 1, has rank one. In addition, 0 is not accessible since is unknown. In this problematic case we replace the matrix 0 by a fixed positivedefinite m m matrix S, rkS D m, C.R. Rao’s substitute matrix and define the S-weighted Frobenius matrix norm (2.6) Indeed, the substitute matrix S constitutes the matrix of the metric of the bias space. Being prepared for optimality criteria we give a precise definition of O of type hom S-LUMBE in Definition 2.1. The estimations O of type hom S-LUMBE can be characterized by Lemma 2.2. Definition 2.1. (O hom S-LUMBE of ) An m 1 vector O is called hom S-LUMBE (homogeneous Linear Uniformly Minimum Bias Estimation) of in the special consistent linear Gauss–Markov model of fixed effects of Box 2.1, if (1st) O is a homogeneous linear form (2.7) (2nd) – in comparison to all other linear estimation O has the minimum bias in the sense of (2.8). O D Ly; jjBjj2S
WD j j.Im
(2.7) LA/0 jj2S :
(2.8)
Lemma 2.2. (O hom S-LUMBE of ). An m 1 vector O is hom S-LUMBE of in the special consistent linear Gauss– O fulfils the Markov model with fixed effects of Box 2.1, if and only if the matrix L normal equations
86
2 The First Problem of Probabilistic Regression: The Bias Problem
ASA0 LO 0 D AS:
(2.9)
Proof. The S-weighted Frobenius norm jj .Im LA/0 jj2S establishes the Lagrangean (2.10) for S-LUMBE. The necessary conditions for the minimum of the quadratic Lagrangean L.L/ are provided by (2.11). L.L/ WD tr .Im LA/ S .Im LA/0 D min L; i0 h @L O O 0 AS D 0: L WD 2 ASA0 L @L
(2.10) (2.11)
The second derivatives (2.12) at the “point” LO constitute the sufficiency conditions. In order to compute such a mn mn matrix of second derivatives we have to vectorize the matrix normal equation by (2.13) @2 L LO > 0; @.vecL/@.vecL/0 i h @L O 0 O L D 2 LASA SA0 ; @L i h @L O 0 O L D vec2 LASA SA0 ; @.vecL/ @L O L D 2 ASA0 ˝ Im vecLO 2vec SA0 : @.vecL/
(2.12) (2.13) (2.14) (2.15)
The Kronecker–Zehfuss product A ˝ B of two arbitrary matrices as well as Kronecker-Zehfuss product .A C B/˝C D A˝CCB˝C of three arbitrary matrices subject to the dimension condition dim A D dim B is introduced in Appendix A. The vec operation (vectorization of an array) is reviewed in Appendix A as well as vecAB D .B0 ˝ I0` /vecA for suitable matrices A and B. Now, we are prepared to compute (2.16) as a positive-definite matrix. 0 @2 L L D 2.ASA0 / ˝ Im > 0 0 @.vecL/@.vecL/
(2.16)
For an explicit representation of O of type hom LUMBE in the special consistent linear Gauss–Markov model of fixed effects of Box 2.1, we solve the normal equations for (2.17). Beside the explicit representation of O of type hom LUMBE in (2.15) we attempt to calculate the bias by (2.19). Of course, the bias computation depends on C.R. Rao’s substitute matrix S, rkS D m. Indeed we can associate any
2-2 The Equivalence Theorem of Gx -MINOS and S-LUMBE
87
element of the solution vector, the dispersion matrix as well as the bias vector with a particular weight which can be “designed” by the analyst. LO D argfL.L/ D ming L
(2.17)
Theorem 2.3. (O hom LUMBE of ) Let O D Ly be hom LUMBE in the special consistent linear Gauss–Markov model of fixed effects of Box 2.1. Then the solution of the normal equation is completed by bias vector O D SA0 .ASA0 /1 y
O D Im SA0 .ASA0 /1 A ; 8 2 Rm : ˇ WD E fg
(2.18) (2.19)
Indeed, the bias computation is problematic since we have no prior information of the parameter vector . We could indeed use an a posteriori information, for instance O 0 D .
2-2 The Equivalence Theorem of Gx -MINOS and S-LUMBE Of course, we have included the second chapter on hom S-LUMBE in order to interpret Gx -MINOS of the first chapter. When are hom S-LUMBE and Gx -MINOS equivalent is answered will by Theorem 2.4. Theorem 2.4. (equivalence of Gx -MINOS and S-LUMBE). With respect to the special consistent linear Gauss–Markov model (2.1), (2.2) O D Ly is hom S-LUMBE for a positive-definite matrix S if m D Ly is Gx -MINOS of the underdetermined system of linear equations (1.47) for Gx D S1 G1 x DS
(2.20)
The proof is straight forward if we compare directly the solution (1.55) of Gx MINOS and (2.20) of hom S-LUMBE. Obviously the inverse matrix of the metric of the parameter space X is equivalent to the matrix of the metric of the bias space B. Or conversely, the inverse matrix of the metric of the bias space B determines the matrix of the metric of the parameter space X. In particular, the bias vector ˇ of type (2.19) depends on the vector which is inaccessible. The situation is similar to the one in hypothesis testing. We can produce only an estimation ˇO of the bias O vector ˇ if we identify by the hypothesis 0 D .
88
2 The First Problem of Probabilistic Regression: The Bias Problem
2-3 Example Due to the Equivalence Theorem Gx -MINOS is equivalent to S-LUMBE the only O hom LUMBEg. For our example, let us use new item is the bias matrix Bfj the simple assumption S D Im . Such an assumption is called “u.s.” or “unity substituted” or unity substitute matrix. For our case the matrix Im A0 .AA0 /1 A is idempotent. Note the fact that trA D rkA is idempotent. Indeed thep Frobenius norm p of the u.s. bias matrix B .hom LUMBE/ equalizes the square root m n D d of the right complementary index of the matrix A jjBjj2 D trŒIm A0 .AA0 /1 A
(2.21)
jjBjj2 D d D m n D m rkA
(2.22)
Box 2.2 summarizes those data outputs of the front examples of the first chapter relating to jjB.hom BLUMBE/jj. Box 2.2. Simple matrix of type u.s., Frobenius norm of the simple bias matrix, front page example. A 2 R23 ; n D 2; m D 3; 1 21 7 111 3 7 0 0 1 A WD ; AA D ; .AA / D 124 7 21 14 7 3
rkA D 2 3
2
2 3 14 4 10 6 2 1 1 4 7 1 5 ; A0 .AA0 /1 A D 4 6 5 3 5 A0 .AA0 /1 D 14 14 7 5 2 3 13 2 3 106 51 59 1 1 245 84 4 51 25 27 5 ; A0 .AA0 /2 A D .AA0 /2 D 98 84 29 98 59 27 37 2 3 106 51 59 1 4 51 25 27 5 y2 † O D A0 .AA0 /2 Ay2 D 98 59 27 37 jjBjj2 D tr Im A0 .AA0 /1 A D trI3 trA0 .AA0 /1 A jjBjj2 D 3
C 5 C 13/ D 3 2 D 1 D d p jjBjj D 1 D d :
1 14 .10
Chapter 3
The Second Problem of Algebraic Regression Overdetermined system of linear equations: fAX C i D yjA 2 Rnm y 2 R.A/ rkA D m; m D dim Xg The optimization problem which appears in treating overdetermined linear system equations is a standard topic in any textbook on optimization. Here we consider again a weak nonlinear system as a problem which allows a Taylor expansion. We start from the first section with a front page example, an inconsistent linear system of a threedimensional observation space with a parameter space of two dimensions. Its Least Squares Solution, in short “LESS”, for such an overdetermined linear system leads to the solution xl D .A0 A/1 A0 y subject to the condition of a positive-definite second derivative. Again – as an example for non-mathematicians – we derive the range and the kernel of the mapping, namely by “vertical rank partitioning.” Finally, we give a geometric interpretation of the Least Squares Solution. In the second section we are turned to weighted Least Squares with weight matrix Gy in its semi-norm. As a generalized inverse (C.R. Rao and S.K. Mitra 1971, page 48–50, “three basic g-inverses, Sect. 3-2: g-inverse for a least squares solution”) it produces the Left Inverse. We apply alternatively by the method of Lagrange multiplier, or the theorem of extrema with side conditions in (3.16)– (3.18), a technique which is based on the theory of inverse partitioned matrices of type “C.R. Rao’s Pandora Box”. We extend the theory by means of a canonical representation of a rank deficient matrix within the overdetermined observation space, namely the eigenspace analysis and the eigenspace synthesis, as an example for non-mathematicians. The discussion of the metric of the observation space leads us to the notion of the weighted Least Squares Solution with a remarkable extention to a non-Euclidean observation space. An example is the metric of type von MisesFisher, measuring the spherical distance of a circle, a sphere or a hyper-sphere in Chap. 7. A geodetic example is the problem of an optimal choice of the weight matrix called Second Order Design. We call the First Order Design as the problem of Datum Determination. We take special reference to the excellent textbook by Friedrich Pukelsheim (Optimal Design of Experiments, J. Wiley Publication, New York, 1990). Here we base our analysis on the Taylor-Karman structure of a homogeneous and isotropic tensor-valued, two-point function of a twodimensional planar network. For the example of Gravimetric Leveling we analyze the projection matrices by horizontal rank partitioning. The importance of the “Hat Matrix” for robust estimates is highlighted as well as the Gy orthogonality between the matrices A and B, namely parameter fitting as well as applying conditions following G. Kampmann et al. (1992, 1994, 1997), for example. The Grassmann-Pl¨ucker coordinates which
E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 3, © Springer-Verlag Berlin Heidelberg 2012
89
90
3 The Second Problem of Algebraic Regression
span the normal space R(AC ) will be discussed in Chap. 10 when we discuss condition equations. Of key importance is the application of the concept of FUZZI Sets which we define carefully. An example is given for Deformation Analysis of Surface referring to E. Grafarend (2006), E. Grafarend and F. Krumm (2006), B. Voosoghi (2006), as well as B. Waelder (2008). We give more than 20 references to application of FUZZI SETS. A special subsection is devoted to the genesis of Gy-LESS and its related Generalized Inverse characterized by (3.38)–(3.39). Associated to this topic of our review of the special Eigenvalue Decomposition of Gy -LESS called canonical and canonical LESS, illustrated by an extensive example. A special section gives a number of important case studies: 1. Partial redundancies, latent conditions, high leverage points versus break points, direct and inverse GRASSMANN coordinates, Plcker coordinates: overview 2. Multilinear Algebra, “Join” and “Meet” operators, the HODGE Star operator 3. From A to B: latent restrictions, Grassmann and Pl¨ucker coordinates 4. From B to A: Latent parameter equations, dual GRASSMANN coordinates, dual ¨ PLUCKER coordinates 5. Break points, Gauss-Jacobi combinatorial Algorithms We conclude with a review of a family of means by direct observations as well as a historical note of C.F. GAUSS and A.M. Legendre. Fast track reading: Read only Lemma 3.7. By means of a certain algebraic objective function which geometrically is called a minimum distance function, we solve the second inverse problem of linear and nonlinear equations, in particular of algebraic type, which relate observations to parameters. The system of linear or nonlinear equations we are solving here is classified as overdetermined. The observations, also called measurements, are elements of a certain observation space Y of integer dimension, dim Y D n, which may be metrical, especially Euclidean, pseudo-Euclidean, in general a differentiable manifold. In contrast, the parameter space X of integer dimension, dim X D m, is metrical as well, especially Euclidean, pseudo-Euclidean, in general a differentiable manifold, but its metric is unknown. A typical feature of algebraic regression is the fact that the unknown metric of the parameter space X is induced by the functional relation between observations and parameters. We shall outline three aspects of any discrete inverse problem: (i) set-theoretic (fibering), (ii) algebraic (rank partitioning, “IPM”, the Implicit Function Theorem) and (iii) geometrical (slicing). Here we treat the second problem of algebraic regression: An inconsistent system of linear observational equations: AX C i D yjA 2 Rnm , rkA D m; n > m; also called “overdetermined system of linear equations”, in short
3 The Second Problem of Algebraic Regression
91
Lemma 3.2 x Gy -LESS of x
Lemma 3.3 x Gy -LESS of x
Lemma 3.4 x Gy -LESS of x constrained Lagrangean Lemma 3.5 x Gy -LESS of x constrained Lagrangean
Theorem 3.6 bilinear form
Lemma 3.7 Characterization of Gy -LESS
“more observations than unknowns” is solved by means of an optimization problem. The introduction presents us a front page example of three inhomogeneous equations with two unknowns. In terms of 31 boxes and 12 figures we review the least-squares solution of such a inconsistent system of linear equations which is based upon the trinity.
92
3 The Second Problem of Algebraic Regression
3-1 Introduction With the introductory paragraph we explain the fundamental concepts and basic notions of section. For you, the analyst, who has the difficult task to deal with measurements, observational data, modeling and modeling equations we present numerical examples and graphical illustrations of . all abstract notions. The elementary introduction is written not for a mathematician, but for you, the analyst, with limited remote control of the notions given hereafter. May we gain your interest Assume an n-dimensional observation space Y, here a linear space param0 eterized by n observations (finite, discrete) as coordinates y D Œy1 ; : : : ; yn 2 Rn in which an m-dimensional model manifold is embedded (immersed). The model manifold is described as the range of a linear operator f from an m-dimensional parameter space X into the observation space Y. The mapping f is established by the mathematical equations which relate all observables to the unknown parameters. Here the parameter space X, the domain of the linear operator f , will also be restricted to a linear space which is parameterized by coordinates X D 0 Œx1 ; : : : ; xn 2 Rm . In this way the linear operator f can be understood as a coordinate mapping A W x 7! y D Ax: The linear mapping f W X ! Y is geometrically characterized by its range R.f /, namely R.A/, defined by R.f / WD fy 2 Yjy D f .xg/8x 2 X which in general is a linear subspace of Y and its kernel N .f /, namely N .f /, defined by N .f / WD x 2 Xjf .x/ D 0g. Here the range R.f /, namely R.A/, does not coincide with the n-dimensional observation space Y such that y … R.f /, namely y … R.A/. In contrast, we shall assume that the null space element N .f / D 0 “is empty”: it contains only the element x D 0. Example 3.1 will therefore demonstrate the range space R.f /, namely the range space R.A/, which does not coincide with the observation space Y, (f is not surjective or “onto”) as well as the null space N .f /, namely N .A/, which is empty. f is not surjective, but injective. Box 3.21 will introduce the special linear model of interest. By means of Box 3.22 it will be interpreted.
3-11 The Front Page Example Example 3.1. (polynomial of degree two, inconsistent system of linear equations fAX C i D y; x 2 X D Rm ; dim X D m; y 2 Y D Rn ; rkA D dim X D m; y 2 N .A/ W First, the introductory example solves the front page inconsistent system of linear equations, : x1 C x2 D 1 x1 C x2 C i1 D 1; : x1 C 2x2 D 3 or x1 C 2x2 C i2 D 3 : x1 C 3x2 D 4 x1 C 3x2 C i3 D 4;
3-1 Introduction
93
obviously in general dealing with the linear space X D Rm Ö x, dim X D m, here m=2, called the parameter space, and the linear space dim Y D Rn Ö y, here n D 3, called the observation space.
3-12 The Front Page Example in Matrix Algebra Second, by means of Box 3.1 and according to A. Cayley’s doctrine let us specify the inconsistent system of linear equations in terms of matrix algebra. Box 3.1. (Special linear model: polynomial of degree one, three observations, two unknowns). 3 2 3 a11 a12 y1 x y D 4 y2 5 D 4 a21 a22 5 1 x2 y3 a31 a32 2 3 2 3 2 2 3 3 i1 1 11 11 x 1 C 4 i2 5 ) A D 4 1 2 5 , y D Ax C i W 4 2 5 D 4 1 2 5 x2 i3 4 13 13 2
x0 D Œx1 ; x2 ; y0 D Œy1 ; y2 ; y3 D Œ1; 2; 3; i0 D Œi1 ; i2 ; i3 ; R32 ; x 2 R21 ; y 2 Z31 R31 A 2 Z32 C C r D rkA D dim X D m D 2: : As a linear mapping f W x 7! y D Ax can be classified as following: f is injective, but not surjective. (A mapping f is called linear if f .x1 Cx2 / D f .x1 /C f .x2 / holds.) Denote the set of all x 2 X by the domain D.f / or the domain space D.A/. Under the mapping f we generate a particular set called the range R.f / or the range space R.A/. Since the set of all y 2 Y is not in the range R.f / or the range space R.A/, namely y … R.f / or y … R.A/, the mapping f is not surjective. Beside the range R.f /, the range space R.A/, the linear mapping is characterized by the kernel N .f/ WD fx 2 Rm jAx D 0g or the null space N .A/ WD fx 2 Rm jAx D 0g. Since the inverse mapping g W R.f / Ö y^ 7! x 2 D.f / is one-to-one, the mapping f is injective. Alternatively we may identify the kernel N .f /, or the null space N .A/ with f0g. ? Why is the front page system of linear equations called inconsistent? For instance, let us solve the first two equations, namely x1 D 0; x2 D 1. As soon as we substitute this solution in the third one, the inconsistency 3 ¤ 4 is met. Obviously such a system of linear equations needs general inconsistency parameters .i1 ; i2 ; i3 / in order to avoid contradiction. Since the right-hand side of the equations, namely the inhomogeneity of the system of linear equations, has been
94
3 The Second Problem of Algebraic Regression
measured as well as the model (the model equations) have been fixed, we have no alternative but inconsistency. Within matrix algebra the index of the linear operator A is the rank r D rkA, here r D 2, which coincides with the dimension of the parameter space X, dim X D m, namely r D rkA D dim X D m, here r D m D 2. In the terminology of the linear mapping f , f is not “onto” (surjective), but “one-toone” (injective). The left complementary index of the linear operator A 2 Rnm , which account for the surjectivity defect is given by ds D n rkA; also called “degree of freedom” (here ds D n rkA D 1 ). While “surjectivity” related to the range R.f / or “the range space R.A/” and “injectivity” to the kernel N .f / or “the null space N .A/” we shall constructively introduce the notion of range R.f / range spaceR.A/
versus
kernel N .f / null space N .f /
by consequently solving the inconsistent system of linear equations. But beforehand let us ask: How can such a linear model of interest, namely a system of inconsistent linear equations, be generated ? With reference to Box 3.2 let us assume that we have observed a dynamical system y(t) which is represented by a polynomial of degree one with respect to time namely y.t/ D x1 C x2 t: (Due to y .t/ D x2 it is a dynamical system with constant velocity or constant first derivative with respect to time t.) The unknown polynomial coefficients are 0 collected in the column array x D Œx1 ; x2 ; x 2 X D R2 ; dim X D 2, and constitute the coordinates of the two-dimensional parameter space X. If the dynamical system y(t) is observed at three instants, say y.t1 / D y1 D 1; y.t2 / D y2 D 2; y.t3 / D 0 y3 D 4, and if we collect the observations in the column array y D Œy1 ; y2 ; y3 D 0 Œ1; 2; 4 ; y 2 Y D R3 ; dim Y D 3, they constitute the coordinates of the threedimensional observation space Y. Thus we are left with the problem to compute two unknown polynomial coefficients from three measurements. Box 3.2. (Special linear model: polynomial of degree one, three observations, two unknowns). 2 3 3 2 3 i1 y1 1 t1 x y D 4 y2 5 D 4 1 t2 5 1 C 4 i2 5 x2 y3 1 t3 i3 2 3 2 3 2 2 3 i1 1 t1 D 1; y1 D 1 11 x , 4 t2 D 2; y2 D 2 W 4 2 5 D 4 1 2 5 1 C 4 i2 5: x2 i3 t3 D 3; y3 D 4 4 13 2
3-1 Introduction
95
Thirdly, let us begin with a more detailed analysis of the linear mapping f W Ax D y or Ax C i D y; namely of the linear operator, A 2 Rnm ; r D rkA D dim X D m. We shall pay special attention to the three fundamental partitionings, namely (i) Algebraic partitioning called rank partitioning of the matrix A (ii) Geometric partitioning called slicing of the linear space Y (observation space) (iii) Set-theoretical partitioning called fibering of the set Y of observations.
3-13 Least Squares Solution of the Front Page Example by Means of Vertical Rank Partitioning Let us go back to the front page inconsistent system of linear equations, namely the problem to determine two unknown polynomial coefficients from three sampling points which we classified as an overdetermined one. Nevertheless we are able to compute a unique solution of the overdetermined system of inhomogeneous linear equations Ax C i D y; y … R.A/ or r D rkA D dim X, here A 2 R32 ; x 2 R21 ; y 2 R31 if we determine the coordinates of the unknown vector x as well as the vector i of the inconsistency by least squares (minimal Euclidean length, l2 -norm), here k i k2I D i12 C i22 C i32 D min : Box 3.3 outlines the solution of the related optimization problem. Box 3.3. (Least squares solution of the inconsistent system of inhomogeneous linear equations, vertical rank partitioning ). The solution of the optimization problem f k i k2I D min jAx C i D y; rkA D dim Xg x
is based upon the vertical rank partitioning of the linear mapping f W x 7! y D Ax C i; rkA D dim X which we already introduced. As soon as
y1 y2
A1 i D xC 1 subjecttoA1 2 Rrr A2 i2
1 x D A1 1 i1 C A1 y1
1 y2 D A2 A1 1 i1 C i2 C A2 A1 y1
)
1 i2 D A2 A1 1 i1 A2 A1 y1 C y2
96
3 The Second Problem of Algebraic Regression
is implemented in the norm k i k2I we are prepared to compute the first derivatives of the unconstrained Lagrangean L.i1 ; i2 / W D k i k2I D i0 1 i1 C i0 2 i2 1
1
0 0 0 1 D i0 1 i1 C i0 1 A0 1 A0 2 A2 A1 1 i1 2i 1 A 1 A 2 .A2 A1 y1 y2 / 0 1 C.A2 A1 1 y1 y2 / .A2 A1 y1 y2 /
D min i1
@L .i1l / D 0 @i1 1
1
0 0 1 , A0 1 A0 2 .A2 A1 1 y1 y2 / C ŒA 1 A 2 A2 A1 C I i1l D 0 1
1
1 0 0 1 , i1l D ŒI C A0 1 A0 2 A2 A1 1 A 1 A 2 .A2 A1 y1 y2 /
which constitute the necessary conditions. 1st term 1
1
1 0 0 1 0 0 1 1 0 1 .I C A0 1 A0 2 A2 A1 1 / A 1 A 2 A2 A1 y1 D .A 1 C A 2 A2 A1 / A 2 A2 A1 y1 0 0 1 0 D A1 .A0 1 A1 C A0 2 A2 /1 A0 2 A2 A1 1 y1 D A1 .A 1 A1 C A 2 A2 / A 1 y1 0 0 0 1 0 C A1 .A0 1 A1 CA0 2 A2 /1 .A0 2 A2 A1 1 CA 1 /y1 D A1 .A 1 A1 CA 2 A2 / A 1 y1
C .A0 1 A1 CA0 2 A2 /1 .A0 2 A2 C A0 1 A1 /y1 D A1 .A0 1 A1 CA0 2 A2 /1 A0 1 y1 Cy1 2nd term 1
1
1 0 0 0 1 1 .I C A0 1 A0 2 A2 A1 1 / A 1 A2 y2 D .A 1 C A 2 A2 A1 / A2 y2
D A1 .A0 1 A1 C A0 2 A2 /1 A0 2 y2 ) , i1l D A1 .A0 1 A1 C A0 2 A2 /1 .A0 1 y1 C A0 2 y2 / C y1 : The second derivatives @2 L 0 1 .i1l / D 2Œ.A2 A1 1 / .A2 A1 / C I > 0 @i1 @i0 1 0 1 due to positive-definiteness of the matrix .A2 A1 1 / .A2 A1 / C I generate the sufficiency condition for obtaining the minimum of the unconstrained Lagrangean. 1 Finally let us backward transform i1l 7! i2l D A2 A1 1 i1l A2 A1 y1 C y2 .
i2l D A2 .A0 1 A1 C A0 2 A2 /1 .A0 1 y1 C A0 2 y2 / C y2 :
3-1 Introduction
97
Obviously we have generated the linear form i1l D A1 .A0 1 A1 C A0 2 A2 /1 .A0 1 y1 C A0 2 y2 / C y1 i2l D A2 .A0 1 A1 C A0 2 A2 /1 .A0 1 y1 C A0 2 y2 / C y2 or
i1l i2l
A1 y y 0 0 1 0 0 D .A 1 A1 C A 2 A2 / ŒA 1 ; A 1 C 1 A2 y2 y2 or il D A.A0 A/1 y C y:
Finally we are left with the backward step to compute the unknown vector of parameters x 2 X : 1 xl D A1 1 i1l C A1 y1
xl D .A0 1 A1 C A0 2 A2 /1 .A0 1 y1 C A0 2 y2 / or xl D .A0 A/1 A0 y: A numerical computation with respect to the introductory example is 1 14 6 3 6 0 0 0 0 1 ; A 1 A1 C A 2 A2 D ; .A 1 A1 C A 2 A2 / D 6 14 6 6 3 0
0
A1 .A 1 A1 C A 2 A2 /
1
1 8 3 ; D 6 2 0
1 Œ4; 3; 6 8 1 A0 1 y 1 C A0 2 y 2 D ; y1 D ; y2 D 4 19 3 1 1 1 1p ; i2l D ; k il k I D 6; i1l D 6 2 6 6 1 2 1p xl D 85; ; k xl kD 6 9 6 A2 .A0 1 A1 C A0 2 A2 /1 D
3 1 y.t/ D C t 3 2
1 @2 L 2 2 1 0 1 .x2m / D Œ.A2 A1 / .A2 A1 / C I D > 0; 2 5 2 @x2 @x0 2
98
3 The Second Problem of Algebraic Regression
2 2 2 5
D600 ; “Second eigenvalue 2 “First eigenvalue 1 2 2 D1:00 2 5 The diagnostic algorithm for solving an overdetermined system of linear equations y D Ax; rkA D dim X D m; m < n D dimY; y ¤ Y by means of rank partitioning is presented to you by Box 3.4.
3-14 The Range R.f / and the Kernel N .f /, Interpretation of “LESS” by Three Partitionings (i) algebraic (rank partitioning) (ii) geometric (slicing) (iii) set-theoretical (fibering) Fourthly, let us go into the detailed analysis of R.f /, R.f /? , N .f /, with respect to the front page example. Beforehand we begin with a comment. We want to emphasize the two step procedure of the least squares solution (LESS) once more: The first step of LESS maps the observation vector y onto the range space R.f / while in the second step the LESS point y 2 R.A/ is uniquely mapped to the point xl 2 X, an element of the parameter space. Of course, we directly produce xl D .A0 A/1 A0 y just by substituting the inconsistency vector i D y Ax into the l2 norm k i k2I D .y Ax/0 .y Ax/ D min : Such a direct procedure which is common practice in LESS does not give any insight into the geometric structure of LESS. But how to identify the range R.f /, namely the range space R.A/, or the kernel N .f /, namely the null space N .A/ in the front page example? By means of Box 3.4 we identify R.f / or “the null space R.A/” and give its illustration by Fig. 3.1. Such a result has paved the way to the diagnostic algorithm for solving an overdetermined system of linear equations by means of rank partitioning presented in Box 3.5. The kernel N .f / or “the null space” is immediately identified as f0g D N .A/ D fx 2 Rm jAx D 0g D fx 2 Rm jA1 x D 0g by means of rank partitioning (A1 x D 0 , x D 0g. Box 3.4. (The range space of the system of inconsistent linear equations Ax C i D y; “vertical” rank partitioning) The matrix A is called “vertically rank partitioned”, if 8 9 ˇ ˇ r D rkA D rkA1 D m; = < ˇ A1 ˇ A 2 Rnm ^ A D A1 2 Rrr ; A2 2 Rd r : ; A2 ˇˇ d D d.A/ D m rkA
3-1 Introduction
99
Fig. 3.1 Range R.f /, range space R.A/, y … R.A/, observation space Y D R3 , slice by y D e1 u C e2 v C e3 .u C 2v/ 2 R.A/
y 2 0
e3
ec1 O
e2
e1
holds. (In the introductory example A 2 R32 ; A1 2 R22 , A2 2 R12 ; rkA D 2; d.A/ D 1 applies.) An inconsistent system of linear equations is “vertically rank partitioned” if i A1 xC 1 Ax D y; rkA D dimX , y D A2 i2 y1 D A1 x C i1 , y2 D A2 x C i2 for a partitioned observation vector
y fy 2 R ^ y D 1 y2 n
j y1 2 Rr1 ; y2 2 Rd 1 g
and a partitioned inconsistency vector i fi 2 R ^ i D 1 j i1 2 Rr1 ; i2 2 Rd 1 g; i2 n
respectively, applies. (The “vertical” rank partitioning of the matrix A as well as the “vertically rank partitioned” inconsistent system of linear equations Ax C i D y; rkA D dimX D m, of the introductory example is
100
3 The Second Problem of Algebraic Regression
2 3 3 11 11 A 1 41 25 D A D D 4 1 2 5; A2 13 13 2 3 1 y1 D 4 3 5; y1 2 R21 ; y2 2 R: y2 4 2
By means of the vertical rank partitioning of the inconsistent system of inhomogeneous linear equations an identification of the range space R.A/, namely R.A/ D fy 2 Rn jy2 A2 A1 1 y1 D 0g is based upon y1 D A1 x C i1 ) x1 D A1 1 .y1 i1 /
y2 D A2 x C i2 ) x2 D A2 A1 1 .y1 i1 / C i2 ) 1 y2 A2 A1 1 y1 D i2 A2 A1 i1
which leads to the range space R.A/ for inconsistency zero, particularly in the introductory example y3 Œ1; 3
11 12
1
y1 y2
D 0:
For instance, if we introduce the coordinates y1 D u; y2 D v; the other coordinate y3 of the range space R.A/ Y D R3 amounts to
2 1 u u D Œ1; 2 y3 D Œ1; 3 1 1 v v ) y3 D u C 2v: In geometric language the linear space R.A/ is a parameterized plane through the origin illustrated by Fig. 3.1. The observation space Y D Rn (here n D 3) is sliced by the subspace, the linear space (linear manifold) R.A/, dim R.A/ D rk.A/ D r, namely a straight line, a plane (here), a higher dimensional plane through the origin O. What is the geometric interpretation of the least-squares solution k i k2I D min? With reference to Fig. 3.2 we additively decompose the observation vector accordingly to
3-1 Introduction
101 Box 3.5: Algorithm Diagnostic algorithm for solving an overdetermined system of linear equations y = Ax + i, rk A = dim , y ∉ (A) by means of rank partitioning Determine the rank of the matrix A rk A = dim = m
A=
A1 A2
Compute the “vertical rank partitioning”
, A1∈
r×r
=
m×n,
A2 ∈
(n−r)×r
=
(n−m)×m
¢¢n – r = n – m = ds is called left complementary index¢¢ ″A as a linear operator is not surjective, but injective″
Compute the range space (A) (A) := {y∈ n | y2 − A2A1−1y1 = 0}
Compute the inconsistency vector of type LESS il = − A(A′A)−1 y + y test : A′il = 0
Compute the unknown parameter vector of type LESS xl = (A′A)–1A′y.
♣ y D yR.A/ C yR.A/? ; where yR.A/ 2 R.A/ is an element of the range space R.A/, but the inconsistency vector il D iR.A/? 2 R.A/? an element of its orthogonal complement, the normal space R.A/? . HereR.A/ is the central plane P20 ; yR.A/2P2 ; but R.A/? 0 the straight line L1 , il 2 R.A/? . k i k2I D k y yR.A/? k2 D min can be understood as the minimum distance mapping of the observation point y 2 Y onto the range space R.A/. Such a mapping is minimal, if and only if the inner product hyR.A/j iR.A/? i D 0 approaches zero, we say “yR .A/ and iR .A/? are orthogonal00
102
3 The Second Problem of Algebraic Regression
Fig. 3.2 Orthogonal projection of the observation vector y 2 Y onto the range space R.A/; R.A/ WD Rn jfy2 A2 A1 1 y1 D 0g, il 2 R.A/? , here: yR.A 2 P20 (central plane), y 2 L1 (straight line), representation of yR.A/ .LESS/: y D e1 uCe2 vCe3 .uC2v/ 2 R3 , R.A/ D spanfeu ; ev g
The solution point yR.A/ is the orthogonal projection of the observation point y 2 Y onto the range space R.A/; an m-dimensional linear manifold, also called a Grassmann manifold Gn;m . 2
p eu WD Du yR.A/ = k Du yR.A/ kD .e1 e3 /= 2 6 p 6 Dv yR.A/ < Dv yR.A/ jeu > eu GramSchmidt W 6 ev WD D .e1 C e2 C e3 /= 3 4 k Dv yR.A/ < Dv yR.A/ jeu > eu k < eu jev > D 0; Dv yR.A/ D e2 C 2e3 As an “intermezzo” let us consider for a moment the nonlinear model by means of the nonlinear mapping “X Ö x 7! f .x/ D yR.A/ ; y 2 Y00 : In general, the observation space Y as well as the parameter space X may be considered as differentiable manifolds, for instance “curved surfaces”. The range R.f / may be interpreted as the differentiable manifolds. X embedded or more generally immersed, in the observation space Y D Rn for instance: X Y. The parameters Œx1 ; : : : ; xm constitute a chart of the differentiable manifolds X D Mm Mm D Y. Let us assume that a point p 2 R.f / is given and we are going to attach the tangent space Tp Mm locally. Such a tangent space Tp Mm at p 2 R.f / may be constructed by means of the Jacobi map, parameterized by the Jacobi matrix J; rkJ D m, a standard procedure in Differential Geometry. An observation point y 2 Y D Rn is orthogonally projected onto the tangent space Tp Mm at p 2 R.f /, namely by LESS as a minimum distance mapping. In a second step – in common use is the equidistant mapping – we bring the point which is located in the tangent space Tp Mm at p 2 R.f / back to the differentiable manifold, namely y 2 R.f /: The inverse map
3-1 Introduction
103
“R.f / Ö y 7! g.y / D xl 2 X00 maps the point y 2 R.f / to the point xl of the chosen chart of the parameter X as a differentiable manifold. Examples follow later on. Let us continue with the geometric interpretation of the linear model of this paragraph. The range space R.A/, dim R.A/ D rk.A/ D m is a linear space of dimension m, here m D rkA, which slices Rn . In contrast, the subspace R.A/? corresponds to a n rkA D ds dimensional linear space Lnr here n rkA D n m; r D rkA D m. Let the algebraic partitioning and the geometric partitioning be merged to interpret the least squares solution of the inconsistent system of linear equations as a generalized inverse (g-inverse) of type LESS. As a summary of such a merger we take reference to Box 3.7. The first condition: AA A D A Let us depart from LESS of y D Ax C i; namely 0 1 0 0 1 0 x l D A l y D .A A/ A y; il D .I AAl /y D ŒI A.A A/ A y: # Axl D AA l y D AAl .Axl C il / ) 0 1 0 A0 il D A0 ŒI A.A0 A/1 A0 y D 0 ) A l il D .A A/ A il D 0
) Axl D AA l Axl
, AA A D A:
The second condition A AA D A xl D .A0 A/1 A0 y D A l y D Al .Axl C il / ) A l il D 0 ) x l D Al y D A l AAl y ) A l y D Al AAl y , A AA D A :
D rkA is interpreted as following: the g-inverse of type LESS is the rkA l generalized inverse of maximal rank since in general rkA rkA holds. The third condition AA D PR.A /
104
3 The Second Problem of Algebraic Regression y D Axl C il D AA l C .I AAl /y
y D Axl C il D A.A0 A/1 A0 y C ŒI A.A0 A/1 A0 y ) y D yR.A/ C iR.A/? ) A A D PR.A / ; .I AA / D PR.A /? :
Obviously AA l is an orthogonal projection onto R.A/, but I AAl onto its ? orthogonal complement R.A/ :
Box 3.6. (The three condition of the generalized inverse mapping (generalized inverse matrix) LESS type). Condition #1 f .x/ D f .g.y// , f Df ıgıf Condition #2 x D g.y/ D g.f .x// Condition #3 .reflexive g-inverse mapping/ f .g.y// D yR.A/ , f ı g D projR.f/
Condition #1 Ax D AA Ax , AA A D A Condition #2 x D A y D A AA y , A AA D A Condition #3 .reflexive g-inverse/ A Ay D yR.A/ , A A D PR.A/ :
The set-theoretical partitioning, the fibering of the set system of points which constitute the observation space Y, the range R.f /, will be finally outlined. Since the set system Y (the observation space) is Rn , the fibering is called “trivial”. Nontrivial fibering is reserved for nonlinear models in which case we are dealing with a observation space as well as an range space which is a differentiable manifold. Here the fibering Y D R.f / [ R.f /? produces the trivial fibers R.f / and R.f /? where the trivial fibers R.f /? is the quotient set Rn =R.f /. By means of a Venn diagram (John Venn 1834–1928) also called Euler circles (Leonhard Euler 1707–1783). Figure 3.3 illustrates the trivial fibers of the set system Y D Rn generated by R.f / and R.f /? . The set system of points which constitute the parameter space X is not subject to fibering since all points of the set system R.f / are mapped into the domain D.f /.
3-2 The Least Squares Solution: “LESS”
105
Fig. 3.3 Venn diagram, trivial fibering of the observation space Y, trivial fibers R.f / and R.f /? , f W Rm D X 7! Y D R.f / [ R.f /? , X set system of the parameter space, Y set system of the observation space
3-2 The Least Squares Solution: “LESS” The system of inconsistent linear equations Ax C i D y subject to A 2 Rnm . rkA D m < n, allows certain solutions which we introduce by means of Definition 3.1 as a solution of a certain optimization problem. Lemma 3.2 contains the normal equations of the optimization problem. The solution of such a system of normal equations is presented in Lemma 3.3 as the least squares solution with respect to the Gy -norm. Alternatively Lemma 3.4 shows the least squares solution generated by a constrained Lagrangean. Its normal equations are solved for (i) the Lagrange multiplier, (ii) the unknown vector of inconsistencies by Lemma 3.5. The unconstrained Lagrangean where the system of linear equations has been implemented as well as the constrained Lagrangean lead to the identical solution for (i) the vector of inconsistencies and (ii) the vector of unknown parameters. Finally we discuss the metric of the observation space and alternative choices of its metric before we identify the solution of the quadratic optimization problem by Lemma 3.7 in terms of the (1, 2, 3)-generalized inverse. Definition 3.1. (least squares solution w.r.t. the Gy -seminorm): A vector xl 2 X D Rm is called Gy -LESS (LEast Squares Solution with respect to the Gy -seminorm) of the inconsistent system of linear equations 2 Ax C i D y; y 2 Y Rn ; 4
rk A D dimX D m or y … R.A/
(3.1)
(the system of inverse linear equations A y D x; rkA D dimX D m or x 2 R.A /, is consistent) if in comparison to all other vectors x 2 X Rm ; the inequality
106
3 The Second Problem of Algebraic Regression
k y Axl k2Gy D .y Axl /0 Gy .y Axl / .y Ax/0 Gy .y Ax/ D k y Ax k2Gy
(3.2)
holds, in particular if the vector of inconsistency il WD y Axl has the least Gy seminorm. The solution of type Gy -LESS can be computed as following Lemma 3.2. (least squares solution with respect to the Gy -seminorm): A vector xl 2 X Rm ; is Gy -LESS of (3.1) if and only if the system of normal equations A0 Gy Axl D A0 Gy y
(3.3)
is fulfilled. xl always exists and is in particular unique, if A0 Gy A is regular. Proof. Gy -LESS is constructed by means of the Lagrangean L.x/ W D k i
k2Gy D k
y Ax
k2Gy D
b ˙
p b 2 4ac 2a
D x0 A0 Gy Ax 2y0 Gy Ax C y0 Gy y D min x
such that the first derivatives @ i0 Gy i @L .xl / D .xl / D 2A0 Gy .Axl y/ D 0 @x @x constitute the necessary conditions. The theory of vector derivative is presented in Appendix B. The second derivatives @2 i0 Gy i @2 L .x / D .xl / D 2A0 Gy A 0 l @x @x0 @x @x0 due to the positive semidefiniteness of the matrix A0 Gy A generate the sufficiency condition for obtaining the minimum of the unconstrained Lagrangean. Because of the R.A0 Gy A/ D R.A0 Gy / there always exists a solution xl whose uniqueness is guaranteed by means of the regularity of the matrix A0 Gy A: It is obvious that the matrix A0 Gy A is in particular regular, if rkA D dimX D m, but on the other side the matrix Gy is positive definite, namely k i k2Gy is a Gy -norm. The linear form xl D Ly which for arbitrary observation vectors y 2 Y Rn ; leads to Gy -LESS of (3.1) can be represented as following.
3-2 The Least Squares Solution: “LESS”
107
Lemma 3.3. (least squares solution with respect to the Gy - norm, rkA D dimX D m or (x 2 R.A /): xl D Ly is Gy -LESS of the inconsistent system of linear equations (3.1) Ax C i D y restricted to rk.A0 GyA/DrkADdimX (or R.A0 Gy / D R.A0 / and x 2 R.A /) if and only if L 2 Rnm is represented by Case (i) : Gy D I O D L
A L
0
D .A A/1 A0 .left inverse/
(3.4)
0 1 0 x l D A L y D .A A/ A y:
(3.5)
y D yl C il
(3.6)
is an orthogonal decomposition of the observation vector y 2 Y Rn into the I-LESS vector yl 2 Y Rn and the I-LESS vector of inconsistency il 2 Y Rn ; subject to yl D Axl D A.A0 A/1 A0 y
(3.7)
il D y yl D ŒIn A.A0 A/1 A0 y:
(3.8)
Due to yl D A.A0 A/1 A0 y, I-LESS has the reproducing property. As projection matrices A.A0 A/1 A0 and ŒIn A.A0 A/1 A0 are independent. The “goodness of fit” of I-LESS is 0
ky Axl k2i D kil k2i D y ŒIn A.A0 A/1 A0 y:
(3.9)
Case (ii) : Gy positive definite, rkA D dimX LO D .A0 Gy A/1 A0 Gy .weighted left inverse/
(3.10)
xl D .A0 Gy A/1 A0 Gy y:
(3.11)
y D yl C il
(3.12)
is an orthogonal decomposition of the observation vector y 2 Y Rn into the Gy LESS vector yl 2 Y Rn and the Gy -LESS vector of inconsistency il 2 Y Rn subject to yl D Axl D A.A0 Gy A/1 A0 Gy y
(3.13)
il D y Axl D ŒIn A.A0 Gy A/1 A0 Gy y
(3.14)
108
3 The Second Problem of Algebraic Regression
Due to yl D A.A0 Gy A/1 A0 Gy y Gy -LESS has the reproducing property. As projection matrices A.A0 Gy A/1 A0 Gy and ŒIn A.A0 Gy A/1 A0 Gy are independent. The “goodness of fit” of Gy -LESS is 0
ky Axl k2Gy D kil k2Gy D y ŒIn A.A0 Gy A/1 A0 Gy y:
(3.15)
The third case Gy positive semidefinite will be treated independently. The proof of Lemma 3.1 is straightforward. The result that LESS generates the left inverse, Gy -LESS the weighted left inverse will be proved later. An alternative way of producing the least squares solution with respect to the Gy -seminorm of the linear model is based upon the constrained Lagrangean (3.16), namely L.i; x; /. Indeed L.i; x; / integrates the linear model (3.1) by a vector valued Lagrange multiplier to the objective function of type “least squares”, namely the distance function in a finite dimensional Hilbert space. Such an approach will be useful when we apply “total least squares” to the mixed linear model (error-invariable model). Lemma 3.4. (least squares solution with respect to the Gy -norm, rkA D dimX, constrained Lagrangean): Gy -LESS is assumed to be defined with respect to the constrained Lagrangean L.i; x; / WD i0 Gy i C 20 .Ax C i y/ D min : i; x;
(3.16)
A vector Œi0 l ; x0 l ; 0 l 0 2 R.nCmCn/1 is Gy -LESS of (3.1) in the sense of the constrained Lagrangean L.i; x; /= min if and only if the system of normal equations 2
32 3 2 3 Gy 0 In il 0 4 0 0 A0 5 4 x l 5 D 4 0 5 In A 0 l y
(3.17)
with the vector l 2 Rn1 of “Lagrange multiplier” is fulfilled. .il ; xl ; l / exists and is in particular unique, if Gy is positive semidefinite. There holds .il ; xl ; l / D argfuser1L.i; x; / D ming: Proof. Gy -LESS is based on the constrained Lagrangean L.i; x; / WD i0 Gy i C 20 .Ax C i y/ D min
i; x;
(3.18)
3-2 The Least Squares Solution: “LESS”
109
such that the first derivatives @L .il ; xl ; l / D 2.Gy il C l / D 0 @i @L .il ; xl ; l / D 2A0 l D 0 @x @L .il ; xl ; l / D 2.Axl C il y/ D 0 @ or 32 3 2 3 0 Gy 0 In il 4 0 0 A0 5 4 x l 5 D 4 0 5 y In A 0 l 2
constitute the necessary conditions. (The theory of vector derivative is presented in Appendix B.) The second derivatives 1 @2 L .xl / D Gy 0 2 @ i @ i0 due to the positive semidefiniteness of the matrix Gy generate the sufficiency condition for obtaining the minimum of the constrained Lagrangean. Lemma 3.5. (least squares solution with respect to the Gy -norm, rkA D dimX, constrained Lagrangean) If Gy -LESS of the linear equations (3.1) is generated by the constrained Lagrangean (3.16) with respect to a positive definite weight matrix Gy , rk Gy D n; then the normal equations (3.17) are uniquely solved by xl D .A0 Gy A/1 A0 Gy y;
(3.19)
il D ŒIn A.A0 Gy A/1 A0 Gy y;
(3.20)
l D ŒGy A.A0 Gy A/1 A0 In Gy y:
(3.21)
Proof. A basis of the proof could be C.R. Raos Pandora Box, the theory of inverse partitioned matrices (Appendix A: Fact: Inverse Partitioned Matrix /IPM/ of a
110
3 The Second Problem of Algebraic Regression
symmetric matrix). Due to the rank identities rk Gy D n; rk A D rk .A0 Gy A/ D m < n, the normal equations can be solved faster directly by Gauss elimination. Gy il C l D 0 A0 l D 0 Axl C il y D 0: Multiply the third normal equation by A0 Gy , multiply the first normal equation by A0 and substitute A0 l from the second normal equation in the modified first one. 3 A0 Gy Axl C A0 Gy il A0 Gy y D 0 5) A0 Gy il C A0 l D 0 0 A l D 0 )
A0 Gy Axl C A0 Gy il A0 Gy y D 0 ) A0 Gy il D 0 )
A0 Gy Axl A0 Gy y D 0;
)
xl D .A0 Gy A/1 A0 Gy y: Let us subtract the third normal equation and solve for il . il D y Axl ; il D ŒIn A.A0 Gy A/1 A0 Gy y: Finally we determine the Lagrange multiplier: substitute il in the first normal equation in order to find l D Gy il l D ŒGy A.A0 Gy A/1 A0 Gy Gy y: Of course the Gy -LESS of type (3.2) and the Gy -LESS solution of type constrained Lagrangean (3.16) are equivalent, namely (3.11)–(3.19) and (3.14)– (3.20). In order to analyze the finite dimensional linear space Y called “the observation space”, namely the case of a singular matrix of its metric, in more detail, let us take reference to the following.
3-2 The Least Squares Solution: “LESS”
111
Theorem 3.6. (bilinear form) : Suppose that the bracket h:j:i or g.:; :/ W Y Y ! R is a bilinear form or a finite dimensional linear space Y, dim Y D n, for in-stance a vector space over the field of real numbers. There exists a basis fe1 ; : : : ; en g such that ˝ ˇ ˛ (i) ei ˇ ej D 0 or g.ei ; ej / D 0 for i ¤ j 8 ˆ hei1 j ei1 i D C1 or g .ei1 ; ei1 / D C1 for 1 i1 p < (ii) hei2 j ei2 i D 1 or g .ei2 ; ei2 / D 1 for p C 1 i2 p C q D r ˆ : hei3 j ei3 i D 0 or g .ei3 ; ei3 / D 0 for r C 1 i3 n : The numbers r and p are determined exclusively by the bilinear form. r is called the rank, r p D q is called the relative index and the ordered pair .p; q/ the signature. The theorem states that any two spaces of the same dimension with bilinear forms of the same signature are isometrically isomorphic. A scalar product (“inner product”) in this context is a nondegenerate bilinear form, for instance a form with rank equal to the dimension of Y. When dealing with low dimensional spaces as we do, we will often indicate the signature with a series of plus and minus signs when appropriate. For instance the signature of R41 may be written .C C C/ instead of (3,1). Such an observation space Y is met when we are dealing with observations in Special Relativity. For instance, let us summarize the peculiar LESS features if the matrix Gy 2 Rnn of the observation space is semidefinite, rkGy WD ry < n. By means of Box 3.7 we have collected the essential items of the eigenspace analysis as well as the eigenspace synthesis Gy versus Gy of the metric. ƒy D D Diag.1 ; : : : ; ry / denotes the matrix of non-vanishing eigenvalues f1 ; : : : ; ry g. Note the norm identity jjijj2Gy D jjijj2U1 ƒy U0
1
(3.22)
which leads to the U1 ƒy U01 -LESS normal equations A0 U1 ƒy U01 x` D A0 U1 ƒy U01 y:
(3.23)
Box 3.7. (Canonical representation of the rank deficient matrix of the matrix of the observation space Y).
rk Gy DW ry ; ƒy WD Diag.1 ; : : : ; ry /: “eigenspace analysis”
“eigenspace synthesis”
112
Gy
3 The Second Problem of Algebraic Regression
U01 Gy ŒU1 ; U2 D D U02 ƒy 01 2 Rnn D 02 03
ƒy 01 Gy D ŒU1 ; U2 02 03
U01 U02
(3.24)
2 Rnn
subject to ˚ U 2 SO.n.n 1/=2/ WD U 2 Rnn jU0 U D In ; jUj D C1 U1 2 Rnry ; U2 2 Rn.nry / ; ƒy 2 Rry ry 01 2 Rry n ; 02 2 R.nry /ry ; 03 2 R.nry /.nry / “norms” jjijj2Gy D jjijj2U1 ƒy U0
i0 Gy i D i0 U1 ƒy U01 i
1
LESS: jjijj2Gy D min , jjijj2U x
0 1 ƒ y U1
(3.25)
D min x
, A0 U1 ƒy U01 x` D A0 U1 ƒy U01 y: Another example relates to an observation space Y D R2k 1 .k 2 f1; : : : ; Kg/ of even dimension, but one negative eigenvalue. In such a pseudo-Euclidean space of signature .C C / the determinant of the matrix of metric Gy is negative, namely det Gy D 1 : : : 2K1 j2K j. Accordingly xmax D argfjjijj2Gy D max jy D Ax C i; rkA D mg is Gy -MORE (Maximal ObseRvational inconsistEncy solution), but not Gy LESS. Indeed, the structure of the observational space, either pseudo-Euclidean or Euclidean, decides upon MORE or LESS.
3-21 A Discussion of the Metric of the Parameter Space X With the completion of the proof we have to discuss the basic results of Lemma 3.3 in more detail. At first we have to observe that the matrix Gy of the metric of the observation space Y has to be given a priori. We classified LESS according to (i) Gy D In , (ii) Gy positive definite and (iii) Gy positive semidefinite. But how do we know the metric of the observation space Y? Obviously we need prior information
3-2 The Least Squares Solution: “LESS”
113
about the geometry of the observation space Y, namely from the empirical sciences like physics, chemistry, biology, geosciences, social sciences. If the observation space Y 2 Rn is equipped with an inner product hy1 jy2 i D y0 1 Gy y2 ; y1 2 Y; y2 2 Y where the matrix Gy of the metric k y k2 D y0 Gy y is positive definite, we refer to the metric space Y 2 Rn as Euclidean En . In contrast, if the observation space is positive semidefinite we call the observation space semi Euclidean En1 ;n2 . n1 is the number of positive eigenvalues, n2 the number of zero eigenvalues of the positive semidefinite matrix Gy of the metric (n D n1 C n2 ). In various applications, namely in the adjustment of observations which refer to Special Relativity or General Relativity we have to generalize the metric structure of the observation space Y: If the matrix Gy of the pseudometric k y k2 D y0 Gy y is built on n1 positive eigenvalues (signature +), n2 zero eigenvalues and n3 negative eigenvalues (signature -), we call the pseudometric parameter space pseudo Euclidean En1 ;n2 ;n3 , n D n1 C n2 C n3 . For such an observation space LESS has to be generalized to k y Ax k2Gy D extr, for instance “maximum norm solution”.
3-22 Alternative Choices of the Metric of the Observation Y Another problem associated with the observation space Y is the norm choice problem. Up to now we have used the `2 -norm, for instance p p .y Ax/.y Ax/ D i0 i q 2 D i12 C i22 C C in1 C in2 ; q `p -norm W k y Ax k p W D p ji1 jp C ji2 jp C C jin1 jp C jin jp ;
`2 -norm W k y Ax k 2 W D
1
are alternative norms of choice. Beside the choice of the matrix Gy of the metric within the weighted `2 -norm we like to discuss the result of the LESS matrix Gl of the metric. Indeed we have constructed LESS from an a priori choice of the metric G called Gy and were led to the a posteriori choice of the metric Gl of type (3.9) and (3.15). The matrices .i/ Gl D In A.A0 y A/1 A0 y .3:9/ .ii/ Gl D In A.A0 Gy A/1 A0 Gy .3:15/ are (i) idempotent and (ii) G1 y idempotent, in addition. There are various alternative scales or objective functions for projection matrices for substituting Euclidean metrics termed robustifying. In special cases those objective functions operate on
114
3 The Second Problem of Algebraic Regression
.3:11/ xl D Hy subject to Hx D .A0 Gy A/1 AGy ; .3:13/ y` D Hy y subject to Hy D A.A0 Gy A/1 AGy .3:14/ i` D H` y subject to H` D In A.A0 Gy A/1 AGy y; where fHx ; Hy ; H` g are called “hat matrices”. In other cases analysts have to accept that the observation space is non-Euclidean. For instance, direction observations in RP locate points on the hypersphere SP 1 . Accordingly we have to accept an objective function of von Mises–Fisher type which measures the spherical distance along a great circle between the measurement points on SP 1 and the mean direction. Such an alternative choice of a metric of a non-Euclidean space Y will be presented in Chap. 7. Here we discuss in some detail alternative objective functions, namely • Optimal choice of the weight matrix Gy : second order design SOD • Optimal choice of the weight matrix Gy by means of condition equations • Robustifying objective functions
3-221 Optimal Choice of Weight Matrix: SOD The optimal choice of the weight matrix, also called second order design (SOD), is a traditional topic in the design of geodetic networks. Let us refer to the review papers by A. A. Seemkooei (2001), W. Baarda (1968, 1973), P. Cross (1985), P. Cross and K. Thapa (1979), E. Grafarend (1970, 1972, 1974, 1975), E. Grafarend and B. Schaffrin (1979), B. Schaffrin (1981, 1983, 1985), F. Krumm (1985), S. L. Kuang (1991), P. Vanicek, K. Thapa and D. Schr¨oder (1981), B. Schaffrin, E. Grafarend and G. Schmitt (1977), B. Schaffrin, F. Krumm and D. Fritsch (1980), J. van Mierlo (1981), G. Schmitt (1980, 1985), C. C. Wang (1970), P. Whittle (1954, 1963), H. Wimmer (1982) and the textbooks by E. Grafarend, H. Heister, R. Kelm, H. Knopff and B. Schaffrin (1979) and E. Grafarend and F. Sanso (1985, editors). What is an optimal choice of the weight matrix Gy , what is “a second order design problem”? Let us begin with Fisher’s Information Matrix which agrees to the half of the Hesse matrix, the matrix of second derivatives of the Lagrangean L.x/ WD jjijj2Gy D jjy Axjj2Gy , namely Gx D A0 .x/Gy A.x/ D
1 @2 L DW FISHER 2 @x` @x0 `
at the “point” x` of type LESS. The first order design problem aims at determining those points x within the Jacobi matrix A by means of a properly chosen risk operating on “FISHER”. Here, “FISHER” relates the weight matrix of the observations Gy , previously called the matrix of the metric of the observation space, to the weight
3-2 The Least Squares Solution: “LESS”
115
matrix Gx of the unknown parameters, previously called the matrix of the metric of the parameter space. Gy
Gx weight matrix of the unknown parameters or matrix of the metric of the parameter space X
weight matrix of the observations or matrix of the metric of the observation space Y.
Being properly prepared, we are able to outline the optimal choice of the weight matrix Gy or X, also called the second order design problem, from a criterion matrix Y, an ideal weight matrix Gx (ideal) of the unknown parameters, We hope that the translation of Gx and Gy “from metric to weight” does not cause any confusion. Box 3.9 elegantly out-lines SOD. Box 3.8. (Second order design SOD, optimal fit to a criterion matrix of weights). “weight matrix of the parameter space” Y WD
3-21 weight matrix of the observation space”
y X WD Gy D Diag g1 ; : : : ; gny
1 @2 L D Gx 2 @x` @x0`
(3.26)
y y 0 x WD g1 ; : : : ; gn “inconsistent matrix equation of the second order design problem” A0 XA C A D Y
(3.27)
“optimal fit” jjjj2 D tr0 D .vec/0 .vec/ D min
(3.28)
xS WD argfjjjj2 D min jA0 XA C D Y; X D Diag xg
(3.29)
X
vec D D vecY vec.A0 XA/ D vecY .A0 ˝ A0 /vecX 0
(3.30)
0
vec D vecY .A ˝ A /x x 2 Rn ; vecY 2 Rn vec 2 Rn
2 1
2
2 1
; vechY 2 Rn.nC1/=21 2
2
; vecX 2 Rn 1 ; .A0 ˝ A0 / 2 Rn n ; A0 ˇ A0 2 Rn 1 0 xS D .A0 ˇ A0 /0 .A0 ˇ A0 / .A ˇ A0 /vecY:
2 n
(3.31)
116
3 The Second Problem of Algebraic Regression
In general, the matrix equation A0 XA C D Y is inconsistent. Such a matrix inconsistency we have called 2 Rmm : For a given ideal weight matrix Gx .ideal/, A0 Gy A is only an approximation. The unknown weight matrix of the observations Gy , here called X 2 Rnn , can only be designed in its diagonal form. A general weight matrix Gy does not make any sense since “oblique weights”
ycannot ybe associated to experiments. A natural restriction is therefore X D Diag g1 ; : : : ; gn . The “diagonal weights” are collected in the unknown vector of weights y 0 x WD g1 ; : : : ; gny 2 Rn The optimal fit “A0 XA to Y ” is achieved by the Lagrangean jjjj2 D min, the optimum of the Frobenius norm of the inconsistency matrix . The vectorized form of the inconsistency matrix, vec, leads us first to the matrix .A0 ˝ A0 /, the Zehfuss product of A0 , second to the Kronecker matrix .A0 ˇ A0 /, the Khatri–Rao product of A0 , as soon as we implement the diagonal matrix X. For a definition of the Kronecker–Zehfuss product as well as of the Khatri–Rao product and related laws we refer to Appendix A. The unknown weight vector x is LESS, if 1 0 xS D .A0 ˇ A0 /0 .A0 ˇ A0 / .A ˇ A0 /0 vecY Unfortunately, the weights xS may come out negative. Accordingly we have to build in extra condition, X D Diag.x1 ; : : : ; xm / to be positive definite. The given references address this problem as well as the datum problem inherent in Gx .ideal/. Example 3.2. (Second order design): The introductory example we outline here may serve as a firsthand insight into the observational weight design, also known as second order design. According to Fig. 3.4 we present you with the graph of a two-dimensional planar network. From three given points fP˛ ; Pˇ ; P g we measure distances to the unknown point Pı , a Pg
y3 = 6.94 km
Pd
Fig. 3.4 Directed graph of a trilateration network, known points fP˛ ; Pˇ ; P g, unknown point Pı , distance 0 observations Œy1 ; y2 ; y3 2 Y
y1 = 13.58 km Pa
y2 = 9.15 km
Pb
3-2 The Least Squares Solution: “LESS”
117
typical problem in densifying a geodetic network. For the weight matrix Gx Y of the unknown point we postulate I2 , unity. In contrast, we aim at an observational weight design characterized by a weight matrix Gx X D Diag.x1 ; x2 ; x3 /: The second order design equation A0 Diag.x1 ; x2 ; x3 /A C D I2 is supposed to supply us with a circular weight matrix Gy of the Cartesian coordinates .xı ; yı / of Pı . The observational equations for distances .s˛ı ; sˇı ; s ı / D .13:58; 9:15; 6:94 km/ have already been derived in Sect. 1-4. Here we just take advantage of the first design matrix A as given in Box 3.10 together with all further matrix operations. A peculiar situation for the matrix equation A0 XA C D I2 is met: In the special configuration of the trilateration network the characteristic equation of the second order design problem is consistent. Accordingly we have no problem to get the weights 2
3 0:511 0 0 Gy D 4 0 0:974 0 5; 0 0 0:515 which lead us to the weight Gx D I2 a posteriori. Note that the weights came out positive. Box 3.9. (Example for a second order design problem, trilateration network). 2
3 0:454 0:891 A D 4 0:809 C0:588 5; X D Diag.x1 ; x2 ; x3 /; Y D I2 C0:707 C0:707
A0 Diag.x1 ; x2 ; x3 /A D I2
0:206x1 C 0:654x2 C 0:5x3 0:404x1 0:476x2 C 0:5x3 , 0:404x1 0:476x2 C 0:5x3 0:794x1 C 0:346x2 C 0:5x3 10 D 01 “inconsistency D 0”
118
3 The Second Problem of Algebraic Regression
.1st/ 0:206x1 C 0:654x2 C 0:5x3 D 1 .2nd / 0:404x1 0:476x2 C 0:5x3 D 0 .3rd / 0:794x1 C 0:346x2 C 0:5x3 D 1 x1 D 0:511; x2 D 0:974; x3 D 0:515:
3-222 The Taylor Karman Criterion Matrix ? What is a proper choice of the ideal weight matrix Gx ? There has been made a great variety of proposals. First, Gx .ideal/ has been chosen simple: A weight matrix Gx is called ideally simple if Gx .ideal/ D Im . For such a simple weight matrix of the unknown parameters Example 3.2 is an illustration of SOD for a densification problem in a trilateration network. Second, nearly all geodetic networks have been SOD optimized by a criterion matrix Gx .ideal/ which is homogeneous and isotropic in a two- or three-dimensional Euclidean space. In particular, the Taylor–Karman structure of a homogeneous and isotropic weight matrix Gx .ideal/ has taken over the SOD network design. Box 3.11 summarizes the TK-Gx .ideal/ of a two-dimensional, planar network. Worth to be mentioned, TK-Gx .ideal/ has been developed in the Theory of Turbulence, namely in analyzing the two-point correlation function of the velocity field in a turbulent medium. (G. I. Taylor 1935, 1936, T. Karman (1937), T. Karman and L. Howarth (1936), C. C. Wang (1970), P. Whittle (1954, 1963)). Box 3.10. (Taylor–Karman structure of a homogeneous and isotropic tensor-valued, two-point function, two-dimensional, planar network). 2
gx1 x1 gx1 y1 6 g 6 y1 x1 gy1 y1 Gx D 6 4 gx2 x1 gx2 y1 gy2 x1 gy2 y1
3 gx1 x2 gx1 y2 gy1 x2 gy1 y2 7 7 7 Gx .x˛ ; xˇ / gx2 x2 gx2 y2 5 gy2 x2 gy2 y2
“Euclidean distance function of points P˛ .x˛ ; y˛ / and Pˇ .xˇ ; yˇ /” s˛ˇ WD jjx˛ xˇ jj D
q
.x˛ xˇ /2 C .y˛ yˇ /2
“decomposition of the tensor-valued, two-point weight function Gx .x˛ ; xˇ / into the longitudinal weight function f` and the transversal weight function fm ”
3-2 The Least Squares Solution: “LESS”
119
Gx .x˛ ; xˇ / D gj1 j2 .x˛ ; xˇ / xj1 .P˛ / xj1 .Pˇ / xj2 .P˛ / xj2 .Pˇ / D fm .s˛ˇ /ıj1 j2 C f` .s˛ˇ / fm .s˛ˇ / 2 s˛ˇ 8j1 ; j2 2 f1; 2g ; .x˛ ; y˛ / D .x1 ; y1 /; .xˇ ; yˇ / D .x2 ; y2 /: (3.32) 3-223 Optimal Choice of the Weight Matrix: The Space R .A/ and R.A/? In the introductory paragraph we already outlined the additive basic decomposition of the observation vector into y D yR.A/ C yR.A/? D y` C i` ; yR.A/ D PR.A/ y; yR.A/? D PR.A/? y; where PR.A/ and PR.A/? are projectors as well as y` 2 R .A/ is an element of the range space R .A/? , in general the tangent space Tx M of the mapping f .x/
versus
i` 2 R .A/ is an element of its orthogonal complement in general the normal space R .A/? .
Gy -orthogonality hy` j i` iGy D 0 is proven in Box 3.12. Box 3.11. (Gy -Orthogonality of y` D y.LESS/ and i` D i.LESS/) “Gy -orthogonality”
hy` j i` iG`
(3.33) hy` j i` iGy D 0 D y Gy A.A0 Gy A/1 A0 Gy In A.A0 Gy A/1 A0 Gy y 0
D y0 Gy A.A0 Gy A/1 A0 Gy y0 Gy A.A0 Gy A/1 A0 Gy A.A0 Gy A/1 A0 Gy y D 0:
There is an alternative interpretation of the equations of Gy -orthogonality hi` j y` iGy D i0` Gy y` D 0 of i` and y` . First, replace i` D PR.A/C y where PR.A/C is a characteristic projection matrix. Second, substitute y` D Ax` where x` is Gy -LESS of x. As outlined in Box 3.13, Gy -orthogonality i0` Gy y` of the vectors
120
3 The Second Problem of Algebraic Regression
i` and y` is transformed into the Gy -orthogonality of the matrices A and B. The columns of the matrices A and B are Gy -orthogonal. Indeed we have derived the basic equations for transforming parametric adjustment
into
y` D Ax` ;
adjustment of conditional equations B0 Gy y` D 0;
by means of B0 Gy A D 0: Box 3.12. Gy -orthogonality of A and B i` 2 R.A/? ; dim R.A/? D n rkA D n m y` 2 R .A/ ; dim R .A/ D rkA D m 0 hi` j y` iGy D 0 , In A.A0 Gy A/1 A0 Gy Gy A D 0 rk In A.A0 Gy A/1 A0 Gy D n rkA D n m
(3.34) (3.35)
“horizontal rank partitioning”
In A.A0 Gy A/1 A0 Gy D ŒB; C
(3.36)
B 2 Rn.nm/ ; C 2 Rnm ; rkB D n m hi` j y` iGy D 0 , B0 Gy A D 0:
(3.37)
Example 3.3 finally illustrates Gy -orthogonality of the matrices A and B. Example 3.3. (gravimetric leveling, Gy -orthogonality of A and B). Let us consider a triangular leveling network fP˛ ; Pˇ ; P g which consists of three observations of height differences. These height differences are considered holonomic, determined from gravity potential differences, known as gravimetric leveling. Due to h˛ˇ WD hˇ h˛ ; hˇ WD h hˇ ; h ˛ WD h˛ h the holonomity condition Z dh D 0 or h˛ˇ C hˇ C h ˛ D 0
3-2 The Least Squares Solution: “LESS”
121
applies. In terms of a linear model the observational equations can accordingly be established by 2 3 2 2 3 3 i˛ˇ 1 0 ˛ˇ 4 ˇ 5 D 4 0 1 5 h˛ˇ C 4 iˇ 5 hˇ i ˛ 1 1 ˛ 2 3 2 3 1 0 ˛ˇ h˛ˇ 5 4 4 5 y WD ˇ ; A WD 0 1 ; x WD hˇ 1 1 ˛ y 2 R31 ; A 2 R32 ; rkA D 2; x 2 R21 : First, let us compute .x` ; y` ; i` ; jji` jj/ I-LESS of .x; y; i; jjijj/. A. Bjerhammar’s left inverse supplies us with 2 3 y1 1 2 1 1 0 1 0 4 y2 5 x ` D A ` y D .A A/ A y D 3 1 2 1 y3 1 2y1 y2 y3 h˛ˇ D x` D hˇ ` 3 y1 C 2y2 y3
y` D
y` D i` D i` D jji` jj2 D jji` jj2 D jji` jj2 D
3 2 2 1 1 1 0 1 0 4 1 2 1 5 y Ax` D AA ` y D A.A A/ A y D 3 1 1 2 2 3 2y1 y2 y3 14 y1 C 2y2 y3 5 3 y1 y2 C 2y3
0 1 0 y Ax` D In AA ` y D In A.A A/ A y 2 2 3 3 111 y1 C y2 C y3 14 1 1 1 1 5 y D 4 y1 C y2 C y3 5 3 3 y1 C y2 C y3 111 0 0 1 0 y0 .In AA ` /y D y In A.A A/ A y 2 32 3 111 y1 14 Œy1 ; y2 ; y3 1 1 1 5 4 y2 5 3 y3 111 1 2 .y C y22 C y32 C 2y1 y2 C 2y2 y3 C 2y3 y1 /: 3 1
122
3 The Second Problem of Algebraic Regression
Second, we identify the orthogonality of A and B. A is given, finding B is the problem of horizontal rank partitioning of the projection matrix. 2 3 111 1 0 1 0 4 1 1 1 5 2 R33 ; G` WD In Hy D In AA ` D In A.A A/ A D 3 111 with special reference to the “hat matrix Hy WDA.A0 A/1 A0 ”. The diagonal elements of G` are of special interest for robust approximation. They amount to the uniform values 1 1 hi i D .2; 2; 2/; .gi i /` D .1 hi i / D .1; 1; 1/: 3 3 Note
det G` D det In AA ` D 0; rk.In AA` / D n m D 1 2 ˇ 3 1 ˇˇ 1 1 1 4 1 ˇ 1 1 5: G` D I3 AA ` D ŒB; C D ˇ 3 1ˇ 1 1 B 2 R31 ; C 2 R32 The holonomity condition h˛ˇ Chˇ Ch ˛ D 0 is reestablished by the orthogonality of B0 A D 0. 2 3 1 0 1 B0 A D 0 , Œ1; 1; 1 4 0 1 5 D Œ0; 0: 3 1 1 The Gy -orthogonality condition of the matrices A and B has been successfully used by G. Kampmann (1992, 1994, 1997), G. Kampmann and B. Krause (1996, 1997), R. Jurisch, G. Kampmann and B. Krause (1997), R. Jurisch and G. Kampmann (1997, 1998, 2001 a, b, 2002), G. Kampmann and B. Renner (1999), R. Jurisch, G. Kampmann and J. Linke (1999 a, b, c, 2000) in order to balance the observational weights, to robustify Gy -LESS and to identify outliers. The Grassmann–Pl¨ucker coordinates which span the normal space R .A/? will be discussed in Chap. 10 when we introduce condition equations. 3-224 Fuzzy Sets How are fuzzy sets defined? Let us start from uncertain quantities based on interval calculus. Inaccurate numbers are defined by convex sets demonstrated by Fig. 3.5. We define the determination of an interval by a lower and upper limit and its weighted center by its maximum of the property maximum. Let us call by A WD Œa ; a0 ; aC and B WD Œb ; b 0 ; b C
3-2 The Least Squares Solution: “LESS” Fig. 3.5 Schematic representation of two inaccurate intervals A WD Œa ; a0 ; aC and B WD Œb ; b 0 ; b C as well as the weighted centre, Œ0; 1 weight set
123 1
0 a- b- a0
a+ b0
b+
two inaccurate intervals. The lower and upper limit as well as the weighted center are called parameters. We need the following operations based on inaccurate intervals A and B: .i / sum A C B .ii/ difference A B .iii/ product A:B .iv/ substruction A=B As a result of these operations is built a new interval C WD Œc ; c 0 ; c C , namely .i / C D A C B W c D a C b ; c 0 D a0 b 0 ; c C D aC C b C .ii/ C D A B W c D a b C ; c 0 D a0 b 0 ; c C D aC b .iii/ C D A:B W c D minfa b ; a b C ; aC b C g c 0 D a0 b 0 c C D minfa b ; aC b ; a b C ; aC b C g .iv/ C D A=B W c D minfa =b ; aC =b ; a =b C ; aC =b C g c 0 D a0 =b 0 c C D maxfa =b ; aC =b ; a =b C ; aC =b C g if b > 0 or b C < 0 In addition, we define the square C WD A2 and the root C WD inaccurate intervals:
p A with respect to
.v/ C WD A2 D A:A 8 C C c D minfa a ; a a g; c 0 D .c 0 /2 ; c C D maxfa a ; aC aC g ˆ ˆ ˆ ˆ ˆ ˆ < if a 0 or aC 0 ˆ ˆ ˆ ˆ ˆ ˆ :
c D 0; c 0 D .a /2 ; c C D maxfa a ; aC aC g if a < 0 andp aC > 0 .vi / C WD A p 0 p 0 C p C c D a ; c D a ; c D a ; if a 0:
124
3 The Second Problem of Algebraic Regression
Example: Mathematical modelling of inaccurate surfaces 0 C Let us consider inaccurate height measurement by the interval zni WD Œz i ; zi ; zi for all i 2 Œ1; : : : ; n with sharp coordinates .xi ; yi / for all i 2 Œ1; : : : ; n. If these measurement data are given at a sharp grid .Xj ; Yk / for all j 2 Œ1; : : : ; N and k 2 Œ1; : : : ; M and followed by an interpolation, then the interpolated heights Znjk WD ŒZjk ; Zj0k ; ZjCk at the nodal point of the grid are also inaccurate. The results depend on the method of interpolation which are functions of inaccurate intervals. We will use distance depending weight functions such that jk
jk
Zjnk D ˛1 zn1 C ˛2 zn2 C C ˛nj k znn subject to n P i D1
ij
jk
jk
jk
ij
˛i D 1; ˛i D ˛i .di /
di D .xi Xj /2 C .yi Yk /2 for all i 2 Œ1; : : : ; n; j 2 Œ1; : : : ; N ; k2Œ1; : : : ; M The assumption is the one that weights are inversely proportional to the square of distances: jk
Zjnk D ˛1 zn1 C C ˛nj k znn n X jk jk jk ˛i D .wi /=Œ .wi / i D1
subject to jk
jk
wi D 1=di C " for all " > 0: By the choice of an artificial constant " we are able to tune in such a way that weights gets inaccurate if the prediction point is a measurement point. In summary, we can represent the interpolated height measurements for position weight functions by the intervals jk
jk
D
jk ˛1 z01
jk ˛n z0n
D
jk ˛1 zC 1
n P
Zjk D ˛1 z 1 C C ˛n zn D
Zj0k ZjCk
C C
C C
D
jk ˛n zC n
i D1
n P
jk
˛i z i jk
˛i z0i
i D1 n P
D
i D1
jk
˛i zC i
Please not our indices i 2 Œ1; : : : ; n, j 2 Œ1; : : : ; N and k 2 Œ1; : : : ; M . In the next step, we relate the inaccurate interpolated measurements to an analytical surface. Again, such a resulting surface is inaccurate, too. Here we use a two-dimensional Lagrange polynomial, for instance
3-2 The Least Squares Solution: “LESS”
125
˘ x Xj j .x/ D j 2 Œ1; : : : ; N .Xj Xi / i ¤j ˘ y Yi : k .y/ D i 2 Œ1; : : : ; M .Yk Xi / i ¤k These polynomial transfer the well known property at the nodal point fXj ; Yk g for j 2 Œ1; : : : ; N , k 2 Œ1; : : : ; M : 1; p D j 1; p D k j .xp / D ıjp D ; k .yp / D ıip D 0; p ¤ j 0; p ¤ k Now we are able to represent the approximating polynomial by inaccurate surface data by P p.x; Q y/ D ZQ j k j .x/k .y/ D j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M P ZQ j k Lj k .x; y/ j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M subject to Lj k .x; y/ WD j .x/k .y/ p .x; y/ D
P j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M Lj k 0 p 0 .x; y/ D
and Zjk Lj k .x; y/ C
P j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M
p C .x; y/ D
P j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M Lj k 0
P j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M Lj k < 0
ZjCk Lj k .x; y/
Zj0k Lj k .x; y/
ZjCk Lj k .x; y/ C
P j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M Lj k < 0
Zjk Lj k .x; y/
These three sets of surfaces relate to the lower, the middle and the upper limits of uncertain surfaces. These surfaces are continuous and differentiable. How to calculate inaccurate first differentials given inaccurate surfaces? We propose the following processes.
126
3 The Second Problem of Algebraic Regression
d d p.x; Q y/ D 4Q x WD dx dx
P
d d p.x; Q y/ D 4Q y WD dy dy
P
ZQ j k j .x/k .y/ D
j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M P d D ZQ j k k .y/ j .x/ dx j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M ZQ j k j .x/k .y/ D
j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M P d D ZQ j k j .x/ k .y/ dy j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M and
4 x WD
P j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M d k .y/ j .x/ 0 dx
Zjk k .y/
40x WD
4C x WD
P j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M d k .y/ j .x/ 0 dx
d j .x/ C dx
P j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M
ZjCk k .y/
P j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M d k .y/ j .x/ < 0 dx
Zj0k k .y/
d j .x/ C dx
ZjCk k .y/
d j .x/ dx
Zjk k .y/
d j .x/ dx
d j .x/ dx
P j 2 Œ1; : : : ; N k 2 Œ1; : : : ; M d k .y/ j .x/ < 0 dx
0 C An analog formula exist for f4 y ; 4y ; 4y g. The propagation of inaccurate data can be achieved by the following:
(i) Mean operation Define the mean m Q and separate the three parts belonging to inaccurate means. mC WD
L X
j; k D 1 ˇj k 0
ˇj k z i C
L X
j; k D 1 ˇj k < 0
ˇj k zC i
3-2 The Least Squares Solution: “LESS”
127
m0 WD
N X
ˇj k z0i
j;kD1 L X
m WD
ˇj k zC i C
j; k D 1 ˇj k 0
L X
ˇj k z i
j; k D 1 ˇj k < 0
The window is chosen by L D 3; 5; 7 etc. Partial derivatives Q j k WD 4 x
1 ŒZQ .j C1/k ZQ .j 1/k xj C1 xj 1
Q jyk WD 4
1 ŒZQ j.kC1/ ZQ j.k1/ yj C1 yj 1
Q j k 4 x
respectively 1 C WD ŒZQ ZQ .j 1/k ; xj C1 xj 1 .j 1/k
Q j k0 WD 4 x
1 0 ŒZQ 0 ZQ .j 1/k ; xj C1 xj 1 .j 1/k
Q jxkC WD 4
1 ŒZQ C ZQ .j 1/k ; xj C1 xj 1 .j 1/k
for all j 2 Œ2; : : : ; N 1 and k 2 Œ2; : : : ; M 1. The partial derivatives with respect to y can be determined in an analog manner. Higher derivatives can be computed in a recursive way, for instance Q x .p/j k WD 4
1 Q .j C1/k 4 Q .j 1/kC Œ4 x.p1/ xj C1 xj 1 x.p1/
Q x .p/j k0 WD 4 Q x .p/j kC WD 4
1 Q .j C1/0 4 Q .j 1/0 Œ4 x.p1/ x.p1/ xj C1 xj 1
1 Q .j C1/kC 4 Q .j 1/k Œ4 x.p1/ xj C1 xj 1 x.p1/
In analogous manner, we compute the derivatives with respect to y. Example: Deformation analysis of surfaces (E. Grafarend (2006), E, Grafarend and F. Krumm (2006), B. Voosoghi (2000), D. Waelder (2008)) Here we will apply the theory of inaccurate data to surface deformation, namely surface dilatation and surface maximum shear. In detail we refer to the sum of e 2 , namely Q 1 C Q 2 subject eigenvalues DIL, namely 1 C 2 , and its difference
A
128
3 The Second Problem of Algebraic Regression
to the eigenvalue representation 1 Q 1 WD Œtr.ErT Gr1 / ˙ 2
q
Œtr.Er Gr1 /2 C 4 det ŒEr Gr1
q 1 Q 2 WD Œtr.ErT Gr1 / ˙ Œtr.Er Gr1 /2 4 det ŒEr Gr1 2 The eigenvalues, namely in E. Grafarend and F. Krumm (2006 p.29), refer to the Hilbert basis invariants trace determinant. They are built on Euler–Lagrange strain ErT Gr1 where ErT is the “right strain component” and Gr is the matrix of “right Gauss metric”. Indeed we map from a left metric Gl and a right metric Gr , for instance from an ellipsoid to a sphere. The Hilbert basis invariants relate to e 2 by the following representation, dilatation DIL and surface maximum shear
A
A
e 2 D Q 1 Q 2 DIL D Q 1 C Q 2 and
A
e 2 D Œtr.EQ rT GQ r1 /2 4 det .EQ r GQ r1 / DIL D tr.ErT Gr1 / and
A
For detail computation we use right Gauss metric as well as the left Gauss metric e 2 , respectively. and derive the corresponding operators DIL and Coordinates of the right metric matrix # " Q jxk /2 .4 Q jxk /r .4 Q jyk /r 1 C .4 r Gr WD Q jxk /r .4 Q jyk /r 1 C .4 Q jyk /2r .4 # " Q jxk /2 .4 Q jxk /l .4 Q jyk /l 1 C .4 l Gl WD Q jxk /l .4 Q jyk /l 1 C .4 Q jyk /2 .4 l In order to calculate the Euler–Lagrange tensor EQ r WD .Gr Cr /=2 based on the Cauchy–Green tensor Cr WD UT Gl U; U WD Œ@u=@U we need mapping equations, namely the partial derivatives @u˛ =@U A u˛ D u˛ .U A /, indeed the mapping equations from the ellipsoid to the sphere. Then finally we are able to calculate the right Euler–Lagrange tensor EQ r WD .Gr Cr /=2. We are ready to present you the test results. Finally we refer to some recent publication on fuzzy sets and systems: H. Bunke and O. Bunke (1989), J. C. Bezdek (1973), G. Alefeld and J. Herzberger (1974,1983), H. Bandemer and S. Gottwald (1993), R. Fletling (2007), K. Grahisch (1998), O. Kaleva (1994), B. F. Arnold and P. Stahlecker (1999), A. Chaturvedi and A. T. K. Wan (1999), S. M. Guu, Y. Y. Lur and C. T. Pang (2001), H. Jshibuchi, K. Nozaki and H. Tanaka (1992), H. Jshibuchi, K. Nozaki, N. Yamamoto and H. Tanaka (1995), B. Kosko (1992), H. Kutterer (1994, 1999), V. Ravi, P. J. Reddy and H. J. Zimmermann (2000), V. Ravi and H. J. Zimmermann (2000), S. Wang, T. Shi and C. Wu (2001), L. Zadch (1965), B. Voosoghi (2000), and H. J. Zimmermann (1991).
3-2 The Least Squares Solution: “LESS”
129
Geodetic applications were reported by H. Kutterer (2002), J. Neumann and H. Kutterer (2007), S. Sch¨on (2003), S. Sch¨on and H. Kutterer (2006), W. N¨ather and K. W¨alder (2003,2007), O. W¨alder (2006, 2007a,b) and O. W¨alder and M. Buchroithner (2004).
3-23 Gx -LESS and Its Generalized Inverse A more formal version of the generalized inverse which is characteristic for Gy LESS is presented by Lemma 3.7. (characterization of Gy -LESS): x` D Ly is I-LESS of the inconsistent system of linear equations (3.1) AxCi D y, rkA D m, (or y … R.A/) if and only if the matrix L 2 Rmn fulfils ALA D A (3.38) AL D .AL/0 : The matrix L is the unique A1;2;3 generalized inverse, also called left inverse A L. x` D Ly is Gy -LESS of the inconsistent system of linear equations (3.1) Ax C i D y, rkA D m (or y … R.A/) if and only if the matrix L fulfils
Gy ALA D Gy A Gy AL D .Gy AL/0 :
(3.39)
The matrix L is the Gy weighted A1;2;3 generalized inverse, in short A ` , also called weighted left inverse. Proof. According to the theory of the generalized inverse presented in Appendix A x` D Ly is Gy -LESS of (3.1) if and only if A0 Gy AL D A0 Gy is fulfilled. Indeed A0 Gy AL D A0 Gy is equivalent to the two conditions Gy ALA D Gy A and Gy AL D .Gy AL/0 . For a proof of such a statement multiply A0 Gy AL D A0 Gy left by L0 and receive L0 A0 Gy AL D L0 A0 Gy : The left-hand side of such a matrix identity is a symmetric matrix. In consequence, the right-hand side has to be symmetric, too. When applying the central symmetry condition to A0 Gy AL D A0 Gy or Gy A D L0 A0 Gy A; we are led to what had to be proven.
Gy AL D L0 A0 Gy AL D .Gy AL/;
130
3 The Second Problem of Algebraic Regression
? How to prove uniqueness of A1;2;3 D A` ? Let us fulfil Gy Ax` by Gy AL1 y D Gy AL1 AL1 y D L0 1 A0 Gy AL1 y D L0 1 A0 L0 1 A0 Gy y D D L0 1 A0 L0 2 A0 Gy y D L0 1 A0 Gy L2 y D Gy AL1 AL2 y D Gy AL2 y; in particular by two arbitrary matrices L1 and L2 , respectively, which fulfil .i / Gy ALA D Gy A as well as .ii/ Gy AL D .Gy AL/0 : Indeed we have derived one result irrespective of L1 or L2 . If the matrix of the metric Gy of the observation space is positive definite, we can prove the following duality Theorem 3.8. (duality): Let the matrix of the metric Gx of the observation space be positive definite. Then x` D Ly is Gy -LESS of the linear model (3.1) for any observation vector y 2 Rn , if 0 1 0 x m D L y is Gy -MINOS of the linear model y D A x for all m 1 columns 0 y 2 R.A /. Proof. If Gy is positive definite, there exists the inverse matrix G1 y . Equation (3.39) can be transformed into the equivalent condition 0 0 1 0 0 0 A0 D A0 L0 A and G1 y L A D .Gy L A / ;
which is equivalent to (1.81).
3-24 Eigenvalue Decomposition of Gy -LESS: Canonical LESS For the system analysis of an inverse problem the eigenspace analysis and eigenspace synthesis of x` Gy -LESS of x is very useful and gives some peculiar insight into a dynamical system. Accordingly we are confronted with the problem to construct “canonical LESS”, also called the eigenvalue decomposition of Gy -LESS. First, we refer to the canonical representation of the parameter space X as well as the observation space introduced to you in the Chap. 1, Box 1.8 and Box 1.9. But here we add by means of Box 3.14 the comparison of the general bases versus the orthonormal bases spanning the parameter space X as well as the observation space Y. In addition, we refer to Definition 1.5 and Lemma 1.6 where the adjoint operator A# has been introduced and represented.
3-2 The Least Squares Solution: “LESS”
131
Box 3.13. (General bases versus orthonormal bases spanning the parameter space X as well as the observation space Y) “left” “parameter space” “general left base”
“right” “observation space” “general right base”
span fa1 ; : : : ; am g D X
Y D span fb1 ; : : : ; bn g
: matrix of the metric :
: matrix of the metric :
aa0 D Gx “orthonormal left base” span fex1 ; : : : ; exm g D X : matrix of the metric : ex e0 x D Im “base transformation” 1
a D ƒx2 Vex versus
1
ex D V0 ƒx 2 a span fex1 ; : : : ; exm g D X
bb0 D Gy
(3.40)
“orthonormal right base” y Y D span fe1 ; : : : ; eyn g : matrix of the metric : ey e0 y D In
(3.41)
“base transformation” 1
b D ƒy2 Uey
(3.42)
versus
1
ey D U0 ƒy 2 b
(3.43)
y
Y D span fe1 ; : : : ; eyn g
Second, we are going to solve the overdetermined system of fy D AxjA 2 Rnm ; rkA D n; n > mg by introducing • The eigenspace of the rectangular matrix A 2 Rnm of rank r WD rkA D m, n > m : A 7! A • The left and right canonical coordinates: x ! x , y ! y as supported by Box 3.15. The transformations x 7! x , y 7! y (3.44) from the original coordinates .x1 ; : : : ; xm / to the canonical coordinates .x1 ; : : : ; xm /, the left star coordinates, as well as from the original coordinates .y1 ; : : : ; yn / to the canonical coordinates .y1 ; : : : ; yn /, the right star coordinates, are polar 1
1
decompositions: a rotation fU; Vg is followed by a general stretch fGy2 ; Gx2 g. Those
132
3 The Second Problem of Algebraic Regression 1
1
root matrices are generated by product decompositions of type Gy D .Gy2 /0 Gy2 as 1
1
well as Gx D .Gx2 /0 Gx2 . Let us substitute the inverse transformations x 7! x D
1
1
Gx 2 Vx and y 7! y D Gy 2 Uy in (3.45) into the system of linear equations (3.1) y D Ax C i or its dual (3.46) y D A x C i . Such an operation leads us to y D f .x / as well as y D f .x/ in (3.47). Subject to the orthonormality conditions U0 U D In and V0 V D Im in (3.48) we have generated the left-right eigenspace analysis (3.49) ƒ ƒ D 0 subject to the horizontal rank partitioning of the matrix U D ŒU1 ; U2 . Alternatively, the left-right eigenspace synthesis (3.50) 1 2
A D Gy
ŒU1 ; U2
1 ƒ V0 Gx2 0
1
1
is based upon the left matrix L WD Gy 2 U and the right matrix R WD Gx 2 V in (3.51). Indeed the left matrix L in (3.52) LL0 D G1 y reconstructs the inverse matrix of the metric of the observation space Y. Similarly, the right matrix R by means of (3.52) RR0 D G1 x generates the inverse matrix of the metric of the parameter space X. In terms of “L, R” we have summarized the eigenvalue decompositions (3.53)–(3.55). Such an eigenvalue decomposition helps us to canonically invert y D A x C i by means of (3.56), (3.57), namely the rank partitioning of the canonical observation vector y into y1 2 Rr1 and y2 2 R.nr/1 to determine x` D ƒ1 y1 leaving y2 “unrecognized”. Next we shall proof i1 D 0 if i1 is LESS. Box 3.14. (Canonical representation, overdetermined system of linear equations) “parameter space X”
versus “observation space Y”
1
1
x D V0 Gx2 x
y D U0 Gy2 y
and
(3.44)
and
1
x D Gx 2 Vx
1
y D Gy 2 Uy
(3.45)
“overdetermined system of linear equations” fy D Ax C ijA 2 Rnm ; rkA D m; n > mg y D Ax C i
versus
y D A x C i
(3.46)
3-2 The Least Squares Solution: “LESS”
1
1
133
1
1
Gy 2 Uy D AGx 2 Vx C Gy 2 Ui ! 1
1
1
1
1
y D Gy 2 UA V0 Gx2
y D U0 Gy2 AGx 2 V x C i subject to U0 U D UU0 D In
1
U0 Gy2 y D A V0 Gx2 x C U0 Gy2 i ! xCi
(3.47)
subject to V0 V D VV0 D Im
versus
(3.48)
“left and right eigenspace” 1 1 U0 1 ƒ 2 2 G AG V D y x U0 2 0 1 1 ƒ “right eigenspace analysis00 W A D Gy 2 ŒU1 ; U2 V0 Gx2 0
“left eigenspace analysis00 W A D
(3.49) (3.50)
“dimension identities” ƒ 2 Rrr ; U1 2 Rnr 0 2 R.nr/r ; U2 2 Rn.nr/ ; V 2 Rrr r WD rkA D m; n > m “left eigenspace”
1
1
L WD Gy 2 U ) L1 D U0 Gy2
“right eigenspace”
1
versus
1
1
R WD Gx 2 V ) R1 D V0 Gx2 (3.51)
1
L1 WD Gy 2 U1 ; L2 WD Gy 2 U2 ) 1 0 1 D Gy LL0 D G1 y ) .L / L
L1 D
1 0 1 RR0 D G1 D Gx x ) .R / R (3.52)
versus
1 U0 1 L1 2 G DW y U0 2 L 2 A D LA R1
A D ŒL1 ; L2 A R #
1
A D L1 AR
(3.53)
ƒ L1 AR A D D 0 L 2
(3.54)
versus versus
A# AL1 D L1 ƒ2 A# AL2 D 0
versus
AA# R D Rƒ2
(3.55)
“overdetermined system of linear equations solved in canonical coordinates”
134
3 The Second Problem of Algebraic Regression
Fig. 3.6 Commutative diagram of coordinate transformations
A
∋x
y ∈ (A) ⊂
1
1
V′Gy2
U′Gy2
y*∈ (A* ) ⊂
∋ x*
A*
y ƒ i y DA x Ci D x C 1 D 1 i2 y2 0
(3.56)
“dimension identities” y1 2 Rr1 ; y2 2 R.nr/1 ; i1 2 Rr1 ; i2 2 R.nr/1 y1 D ƒx C i1 ) x D ƒ1 .y1 i1 /
“if i is LESS, then
x`
D
ƒ1 y1 ; i1
(3.57)
D0
Consult the commutative diagram of Fig. 3.6 for a shorthand summary of the newly introduced transformations of coordinates, both of the parameter space X as well as the observation space Y. Third, we prepare ourselves for LESS of the overdetermined system of linear equations fy D Ax C ijA 2 Rnm ; rkA D m; n > m; jjijj2Gy D ming by introducing Lemma 3.9, namely the eigenvalue-eigencolumn equations of the matrices A# A and AA# , respectively, as well as Lemma 3.11, our basic result of “canonical LESS”, subsequently completed by proofs. Throughout we refer to the adjoint operator which has been introduced by Definition 1.5 and Lemma 1.6. Lemma 3.9. (eigenspace analysis versus eigenspace synthesis of the matrix A 2 Rnm ; r WD rkA D m < n) The pair of matrices fL; Rg for the eigenspace analysis and the eigenspace synthesis of the rectangular matrix A 2 Rnm of rank r WD rkA D m < n, namely A D L1 AR versus A D LA R1 or
3-2 The Least Squares Solution: “LESS”
135
ƒ L1 ƒ A D D R AR versus A D ŒL1 ; L2 L2 0 0
are determined by the eigenvalue-eigencolumn equations (eigenspace equations) of the matrices A# A and AA# , respectively, namely A AR D Rƒ #
2
versus
AA# L1 D L1 ƒ2 AA# L2 D 0
subject to 2
21 : : : 6 ƒ2 D 4 ::: : : : 0
3 0 q q :: 7; ƒ D Diag C 2 ; : : : ; C 2 : 5 r 1 :
2r
Let us prove first A# AR D Rƒ2 , second A# AL1 D L1 ƒ2 , AA# L2 D 0: (i) A# AR D Rƒ2
D
0 A# AR D G1 x A Gy AR 0 1 1 1 1 1 2 2 U1 ƒ 1 2 0 0 Gx Gx V Œƒ; 0 .Gy / Gy Gy ŒU1 ; U2 V0 Gx2 Gx 2 V 0 U2 0
A AR D #
1 ƒ 0 2 Gx V ƒ; 0 0
1
D Gx 2 Vƒ2
A AR D Rƒ2 : #
(ii) AA# L1 D L1 ƒ2 ; AA# L2 D 0 0 AA# L D AG1 L x AG y 0 1 1 1 1 2 0 2 ƒ U 1 2 0 D .G ŒU1 ; U2 V0 Gx2 G1 G V Œƒ; 0 / G G ŒU1 ; U2 x y y y x 0 U0 2 U0 1 U1 U0 1 U2 ƒ ƒ; 00 AA# L D ŒL1 ; L2 U0 2 U1 U0 2 U2 0 2 0 Ir 0 ƒ 0 AA# L D ŒL1 ; L2 0 0 0 Inr AA# ŒL1 ; L2 D L1 ƒ2 ; 0 ; AA# L1 D L1 ƒ2 ; AA# L2 D 0: The pair of eigensystems fA# AR D Rƒ2 ; AA# ŒL1 ; L2 D L1 ƒ2 ; 0 g is unfor0 # tunately based upon non-symmetric matrices AA# D AG1 x A Gy and A A D 1 0 Gx A Gy A which make the left and right eigenspace analysis numerically more complex. It appears that we are forced to use the Arnoldi method rather than the 1 Gy 2
136
3 The Second Problem of Algebraic Regression
more efficient Lanczos method used for symmetric matrices. In this situation we look out for an alternative. Actually as soon as we substitute
1
1
fL; Rg by fGy 2 U; Gx 2 Vg
1
into the pair of eigensystems and consequently multiply AA# L by Gx 2 , we achieve a pair of eigensystems identified in Corollary 3.10 relying on symmetric matrices. In addition, such a pair of eigensystems produces the canonical base, namely orthonormal eigencolumns. Corollary 3.10. (symmetric pair of eigensystems): The pair of eigensystems 1
1
1
1
0 2 2 2 0 2 0 0 2 Gy2 AG1 x A .Gy / U1 D ƒ U1 versus .Gx / A Gy AGx V D Vƒ 1
1
1
0 2 2 0 2 0 0 jGy2 AG1 x A .Gy / i In j D 0 versus j.Gx / A Gy AGx
(3.58)
1 2
2j Ir j D 0 (3.59) is based upon symmetric matrices. The left and right eigencolumns are orthogonal. Such a procedure requires two factorizations, 1
1
1
1
1
1
1
1
2 2 0 2 2 0 and Gy D .Gy2 /0 Gy2 ; G1 Gx D .Gx2 /0 Gx2 ; G1 x D Gx .Gx / y D Gy .Gy /
via Choleski factorization or eigenvalue decomposition of the matrices Gx and Gy : Lemma 3.11. (canonical LESS): Let y D A x C i be a canonical representation of the overdetermined system of linear equations fy D Ax C ijA 2 Rnm ; r WD rkA D m; n > mg: 0 Then the rank partitioning of y D .y1 /0 ; .y2 /0 leads to the canonical unknown vector y1 y1 2 Rr1 y1 1 x` D ƒ1 ; 0 D ƒ y ; y D (3.60) 1 ; y2 y2 y2 2 R.nr/1 and to the canonical vector of inconsistency i`
i D 1 i2
`
y WD 1 y2
ƒ i D 0 ƒ1 y1 or 1 0 i2 D y2 `
(3.61)
3-2 The Least Squares Solution: “LESS”
137
of type Gy -LESS. In terms if the original coordinates a canonical representation of Gy -LESS is x` D
1 1 1 U0 1 2 Gy2 y Gx V ƒ ; 0 U0 2
(3.62)
1
1
x` D Gx 2 Vƒ1 U0 1 Gy2 y D Rƒ1 L 1 y:
(3.63)
x ` D A ` y is built on the canonical .Gx ; Gy / weighted right inverse. For the proof we depart from Gy -LESS (3.11) and replace the matrix by its canonical representation, namely by eigenspace synthesis. 3
1 0 x ` D A0 G y A A Gy y " # 1 7 1 ƒ 5 2 V0 Gx2 A D Gy ŒU1 ; U2 0 0
) A Gy A D
1 .Gx2 /0 V Œƒ; 0
"
# " # 1 1 1 0 U0 1 ƒ 0 2 2 2 .G / G G ŒU ; U V G y y x y 1 2 U0 2 0
1 1 1 1
1 A0 Gy A D .Gx2 /0 Vƒ2 V0 Gx2 ; A0 Gy A D Gx 2 Vƒ2 V0 .Gx 2 /0 " # 1 1 1 1 0 2 U 1 ) x` D Gx Vƒ2 V0 .Gx 2 /0 .Gx2 /0 V Œƒ; 0 .Gy 2 /0 Gy y 0 U2
x` D
1 Gx 2 V ƒ1 ; 0
"
# 1 U0 1 2 G y y U0 2
1
1
x` D Gx 2 Vƒ1 U0 1 Gy2 y D A ` y
1
1
1;2;3 1 0 2 2 A ` D Gx Vƒ U 1 Gy 2 AGy
(Gy weighted reflexive inverse) x`
DV
0
1 3 U0 1 Gy2 y 7 Dƒ U D ƒ ;0 U0 2 7 0 1 7) 5 U1 y1 2 y D D G y y y2 U0 2 1 y1 D ƒ1 y1 : ) x` D ƒ ; 0 y2
1 Gx2 x`
1
0
1 2 1 Gy y
1
138
3 The Second Problem of Algebraic Regression
Thus we have proven the canonical inversion formula. The proof for the canonical representation of the vector of inconsistency is a consequence of the rank partitioning ƒ i1 ; y1 2 Rr1 y1 x` ; ; 0 y2 i2 ; y2 2 R.nr/1 ` ƒ i1 y1 0 1 ƒ y1 D : i` D D 0 i2 ` y2 y2
il D
i1 i2
WD
The important result of x` based on the canonical Gy -LESS of fy D A x C i jA 2 Rnm , rkA D rkA D m; n > mg needs a comment. The rank partitioning of the canonical observation vector y , namely y1 2 Rr ; y2 2 R.nr/ again paved the way for an interpretation. First, we appreciate the simple “direct inversion” x` D q p ƒ1 y1 , ƒ D Diag C 21 ; : : : ; C 2r , for instance 2 1 3 3 1 y 1 x1 4 : : : 5 D 4 : : : 5: xm ` 1 r yr 2
Second, i1 D 0, eliminates all elements of the vector of canonical inconsistencies, 0 for instance i1 ; : : : ; ir ` D 0, while i2 D y2 identifies the deficient elements of the vector ofcanonical inconsistencies with the 0 0 vector of canonical observations for instance irC1 ; : : : ; in ` D yrC1 ; : : : ; yn ` . Finally, enjoy the commutative diagram of Fig. 3.6 illustrating our previously introduced transformations of type LESS and canonical LESS, by means of A ` and .A /` , respectively (Fig. 3.7). A first example is canonical LESS of the Front Page Example by Gy D I3 , Gx D I2 . 2 3 2 3 2 3 i1 1 11 x 1 4 4 5 4 5 C i2 5; r WD rkA D 2 y D Ax C i W 2 D 1 2 x2 4 13 i3
∋y
Al–
1
1
V′Gy2
U′Gy2
Fig. 3.7 Commutative diagram of inverse coordinate transformations
∋ y*
xl ∈
xl* ∈
(A*)l–
3-2 The Least Squares Solution: “LESS”
139
left eigenspace
right eigenspace A# AV D A0 AV D Vƒ2
AA# U1 D AA0 U1 D U1 ƒ2 AA# U2 D AA0 U2 D 0 2
3 23 4 AA0 D 4 3 5 7 5 4 7 10
3 6 D A0 A 6 14
eigenvalues jAA0 2i I3 j D 0 ,
jA0 A 2j I2 j D 0 ,
8i 2 f1; 2; 3g , 21 D
1p 17 17 1 p C 265; 22 D 265; 23 D 0 2 2 2 2
left eigencolumns 32 3 2 4 u11 2 21 3 (1st) 4 3 5 21 7 5 4 u21 5 D 0 u31 4 7 10 21 subject to
8j 2 f1; 2g
u211 C u221 C u231 D 1
right eigencolumns
6 3 21 (1st) 6 14 21
v11 v21
D0
subject to v211 C v221 D 1
.2 21 /u11 C 3u21 C 4u31 D 0 versus .3 21 /v11 C 6v21 D 0 3u11 C .5 21 /u21 C 7u31 D 0 2 36 72 v211 D D p 2 2 6 36 C .3 1 / 265 C 11p265 6 4 .3 21 /2 193 C 11 265 v221 D D p 2 2 36 C .3 1 / 265 C 11 265 2 3 2 2 3 .1 C 421 /2 u11 1 4 .2 72 /2 5 4 u2 5 D 21 1 .1 C 421 /2 C .2 721 /2 C .1 721 C 41 /2 2 u31 .1 721 C 41 /2 2
2 3 p 265 35 C 2 2 2 3 6 2 7 u11 7 6 p 2 7 115 7 6 4 u2 5 D p 6 C 265 7 21 7 6 2 2 43725 C 2685 265 4
2 5 u231 p 80 C 5 265
140
3 The Second Problem of Algebraic Regression
3 22 7 (2nd) 7 21 22
u12 u22
2
D0
32 3 2 22 3 v12 5 (2nd)4 3 5 22 9 5 4 v22 5 D0 5 9 17 22 v32
subject to
subject to
u212 C u222 C u232 D 1
v212 C v222 D 1
.2 22 /u12 C 3u22 C 4u32 D 0 versus .3 22 /v12 C 6v22 D 0 3u12 C .5 22 /u22 C 7u32 D 0 2 36 72 v2 D p D 2 2 6 12 36 C .3 / 265 11 265 2 6 p 6 4 2 .3 22 /2 193 11 265 D v22 D p 36 C .3 22 /2 265 11 265 2 3 2 2 3 .1 C 422 /2 u12 1 4 .2 72 /2 5 4 u2 5 D 22 2 2 2 2 2 2 4 2 .1 C 4 / C .2 7 / C .1 7 C / 2 2 2 2 u232 .1 722 C 42 /2 3 2 p .35 2 265/2 2 2 3 6 u12 2 7 7 6 2 7 6 115 7 p 4 u2 5 D p 22 7 6 265 43725 2685 265 4 2 2 5 2 u32 p .80 5 265/2 2 32 3 23 4 u13 (3rd) 4 3 5 7 5 4 u23 5 D 0 subject to u213 C u223 C u233 D 1 4 7 10 u33 2u13 C 3u23 C 4u33 D 0 3u13 C 5u23 C 7u33 D 0
23 35
u13 u23
D
4 5 3 4 u u33 , 13 D u33 u23 7 3 2 7
u13 D Cu33 ; u23 D 2u33 1 2 2 1 ; u23 D ; u233 D : 6 3 6 There are four combinatorial solutions to generate square roots. u213 D
2
u11 u12 4 u21 u22 u31 u32
q q 3 2 q 2 2 u ˙ u ˙ u2 ˙ u13 6 q 11 q 12 q 13 7 7 6 u23 5 D 6 ˙ u221 ˙ u222 ˙ u223 7 4 q q q 5 u33 ˙ u231 ˙ u232 ˙ u233 3
3-3 Case Study
141
v11 v12 v21 v22
2 q q 3 ˙ v211 ˙ v212 q 5: D4 q ˙ v221 ˙ v222
Here we have chosen the one with the positive sign exclusively. In summary, the eigenspace analysis gave the result as follows. 0s ƒ D Diag @
17 C
s 1 p p 265 17 265 A ; 2 2
p ˇ 3 p ˇ 1p p 35 2 265 p 35 C 2 265 ˇ 2p p 67 ˇ 6 2p p 7 6 43725 2685 265 ˇ 6 43725 C 2685 265 ˇ 7 6p p p p ˇ 7 6 115 7 265 2 ˇ 1p 7 6 2 115 C 7 265 p UD6 ˇ 7D ŒU1 ; U2 6 p p 7 6 2 p 2 43725 2685 265 ˇˇ 3 7 6 43725 C p 2685 265 p ˇ 7 6p p 80 C 5 265 1 ˇ p 5 4 80 5 265 2p ˇ 6 p 2p p ˇ 43725 C 2685 265 43725 2685 265 6 3 2 72 72 p p p p 7 6 6 265 C 11 265 265 11 265 7 7: 6 p VD6p p p 7 193 C 11 265 193 11 265 5 4 p p p p 265 C 11 265 265 11 265 2
3-3 Case Study Partial redundancies, latent conditions, high leverage points versus break points, direct and inverse Grassmann coordinates, Plucker ¨ coordinates This case study has various targets. First we aim at a canonical analysis of the hat matrices Hx and Hy for a simple linear model with a leverage point. The impact of a high leverage point is studied in all detail. Partial redundancies are introduced and interpreted in their peculiar role of weighting observations. Second, preparatory in nature, we briefly introduce multilinear algebra, the operations “join and meet”, namely the Hodge star operator. Third, we go “from A to B”: Given the columns space R.A/ D Gm;n .A/, identified as the Grassmann space Gm;n Rn of the matrix A 2 Rnm ; n > m; rkA D m, we construct the column space R.B/ D R? .A/ D Gnm;n Rn of the matrix B which agrees to the orthogonal column space R? .A/ of the matrix A. R? .A/ is identified as Grassmann space Gnm;n Rn and is covered by Grassmann coordinates, also called Pl¨ucker coordinates pij . The matrix B, alternatively the Grassmann coordinates (Pl¨ucker coordinates), constitute the latent restrictions, also called
142
3 The Second Problem of Algebraic Regression
latent condition equations, which control parameter adjustment and lead to a proper choice of observational weights. Fourth, we reverse our path: we go “from B to A”: Given the column space R.B/ of the matrix of restrictions B 2 R`n , ` < n, rkB D ` we construct the column space R? .B/ D R.A/ Rn , the orthogonal column space of the matrix B which is apex to the column space R.A/ of the matrix A. The matrix A, alternatively the Grassmann coordinates (Pl¨ucker coordinates) of the matrix B constitute the latent parametric equations which are “behind a conditional adjustment”. Fifth, we break-up the linear model into pieces, and introduce the notion of break points and their determination. The present analysis of partial redundancies and latent restrictions has been pioneered by Kampmann (1992), as well as R. Jurisch and G. Kampmann (2002 a, b). Additional useful references are D. W. Behmken and N. R. Draper (1972), S. Chatterjee and A. S. Hadi (1988), R. D. Cook and S. Weisberg (1982). Multilinear algebra, the operations “join and meet” and the Hodge star operator are reviewed in W. Hodge and D. Pedoe (1968), C. Macinnes (1999), S. Morgera (1992), W. Neutsch (1995), B. F. Doolin and C. F. Martin (1990). A sample reference for break point synthesis is C. H. Mueller (1998), N. M. Neykov and C. H. Mueller (2003) and D. Tasche (2003).
3-31 Canonical Analysis of the Hat Matrix, Partial Redundancies, High Leverage Points A beautiful example for the power of eigenspace synthesis is the least squares fit of a straight line to a set of observation: Let us assume that we have observed a dynamical system y.t/ which is represented by a polynomial of degree one with respect to time t. y.ti / D 1i x1 C ti x2 8 i 2 f1; ; ng: Due to y .t/ D x2 it is a dynamical system with constant velocity or constant first derivative with result to time t0 . The unknown polynomial coefficients are collected in the column array x D Œx1 ; x2 0 ; x 2 X D R2 ; dim X D 2 and constitute the coordinates of the two-dimensional parameter space X. For this example we choose n D 4 observations, namely y D Œy.t1 /; y.t2 /; y.t3 /; y.t4 /0 , y 2 Y D R4 , dim Y D 4. The samples of the polynomial are taken at t1 D 1; t2 D 2; t3 D 3 and t4 D a. With such a choice of t4 we aim at modeling the behavior of high leverage points, e.g. a >> .t1 ; t2 ; t3 / or a ! 1, illustrated by Fig. 3.8. Box 3.16 summarizes the right eigenspace analysis of the hat matrix Hy WD A.A0 A/ A0 . First, we have 42 computed the spectrum of A0 A and .A0 A/1 for p the given matrix A 2 R , 2 namely the eigenvalues squared 1;2 D 59 ˙ 3261. Note the leverage point t4 D a D 10. Second, we computed the right eigencolumns v1 and v2 which constitute the orthonormal matrix V 2 SO.2/. The angular representation of the orthonormal matrix V 2 SO.2/ follows: Third, we take advantage of the sinecosine representation (3.64) V 2 SO.2/, the special orthonormal group over R2 .
3-3 Case Study
143
Fig. 3.8 Graph of the function y.t /, high leverage point t4 D a
y4 y(t)
*
y3 y2 y1
* * t1 = 1
* t3 = 3
t2 = 2
t4 = a
t
0
00
Indeed, we find the angular parameter D 81ı 53 25:4 . Fourth, we are going to represent the hat matrix Hy in terms of the angular parameter namely (3.65)–(3.68). In this way, the general representation (3.69) is obtained, illustrated by four cases. Equation (3.65) is a special case of the general angular representation (3.69) of the hat matrix Hy . Five, we sum up the canonical representation AV0 ƒ2 V0 A0 (3.70), of the hat matrix Hy , also called right eigenspace synthesis. Note the rank of the hat matrix, namely rkHy D rkA D m D 2, as well as the peculiar fourth adjusted observation yO4 D y4 .I LESS/ D
1 .11y1 C y2 C 13y3 C 97y4 /; 100
which highlights the weight of the leverage point t4 : This analysis will be more pronounced if we go through the same type of right eigenspace synthesis for the leverage point t4 D a, a1, outlined in Box 3.19. Box 3.15. Right eigenspace analysis of a linear model of an univariate polynomial of degree one – high leverage point a D 10 – “Hat matrix Hy D A.A0 A/1 A D AVƒ2 V0 A0 ” 2 0 A AV D Vƒ2 right eigenspace analysis:4 subject to VV0 D I2 2
3 1 1 61 2 7 57 8 7; A0 A D 4 16 ; .AA/1 D 1 A WD 6 41 3 5 16 114 100 8 2 1 10 ˇ ˇ ˇ ˇ spec.A0 A/ D f21 ; 22 g W ˇA0 A 2j I2 ˇ D 0; 8 j 2 f1; 2g , ˇ ˇ ˇ 4 2 16 ˇˇ 4 2 ˇ ˇ 16 114 2 ˇ D 0 , 118 C 200 D 0 p 21;2 D 59 ˙ 3281 D 59 ˙ 57:26 D 0
144
3 The Second Problem of Algebraic Regression
spec.A0 A/ D f21 ; 22 g D f116:28; 1:72g versus spec.A0 A/1 D f 2
1 1 ; g D f8:60 103 ; 0:58g 21 22
.A0 A 2j I2 /V D 0 4 right eigencolumn analysis: subject to VV0 D I2
.1st/ .A0 A 21 I/
v11 v21
D 0 subject to v211 C v221 D 1
.4 21 /v11 C 16v21 D 0 v211 C v221 D 1
q v11 D C v211 D q q v21 D C v221 D q
# )
16 256 C .4 21 /2
D 0:141
4 21
D 0:990 256 C .4 21 /2 v .2nd/ .A0 A 22 I2 / 12 D 0 subject to v212 C v222 D 1 v22 3 .4 22 /v12 C 16v22 D 0 5) v212 C v222 D 1 q 16 v12 D C v212 D q D 0:990 256 C .4 22 /2 q v22 D C v222 D q
4 22 256 C .4 22 /2
D 0:141
2
spec .A0 A/ D f116:28; 1:72g 6 spec .A0 A/1 D f8:60 103 ; 0:58g right eigenspace: 6 4 0:141 0:990 v v 2 SO.2/ V D 11 12 D v21 v22 0:990 0:141 “Angular representation of V 2 SO.2/”
3-3 Case Study
145
cos sin VD sin cos
0:141 0:990 D 0:990 0:141
(3.64)
sin D 0:990; cos D 0:141; tan D 7:021 D 81ı :890; 386 D 81ı 530 25:4” hat matrix Hy D A.A0 A/1 A0 D AVƒ2 V0 A0 2
.A0 A/1
1 1 cos2 C 2 sin2 2 6 1 2 D V2 V D 6 4 1 1 2 C 2 sin cos 1 2 .A
0
A/1 j1 j2
D
mD2 X j3
3 1 1 2 C 2 sin cos 7 1 2 7 5 1 1 2 2 sin C 2 cos 2 1 2 (3.65)
1 cos j1 j3 cos j2 j3 2 D1 j3
(3.66)
subject to VV0 D I2
mD2 X
cos j1 j3 cos j2 j3 D ıj1 j2
(3.67)
j3 D1
case 1: j1 D 1; j2 D 1:
case 2: j1 D 1; j2 D 1:
cos2 11 C cos2 12 D 1 .cos2 C sin2 D 1/
cos 11 cos 21 C cos 12 cos 22 D 0 . cos sin C sin cos D 0/
case 3: j1 D 1; j2 D 1:
case 4: j1 D 1; j2 D 1: cos2 21 C cos2 22 D 1 .sin2 C cos2 D 1/
cos 21 cos 11 C cos 22 cos 12 D 0 . sin cos C cos sin D 0/
.A0 A/1 D 2 2 2 2 2 2 1 cos 11 C2 cos 12 1 cos 11 cos 21 C2 cos 12 cos 22 2 2 2 2 2 2 1 cos 21 cos 11 C2 cos 22 cos 12 1 cos 21 C2 cos 22 (3.68) mD2 X 1 Hy D AVƒ2 V0 A0 hi1 i2 D ai1 j1 ai2 j2 2 cos j1 j3 cos j2 j3 (3.69) j3 j ;j ;j D1 1
2
3
(3.70) Hy D A.A A/ A0 D AVƒ2 V0 A0 2 3 0:849 1:131 6 1:839 1:272 7 2 3 7 A WD AV D 6 4 2:829 1:413 5; ƒ D Diag.8:60 10 ; 0:58/ 9:759 2:400 0
1
146
3 The Second Problem of Algebraic Regression
2
3 43 37 31 11 7 1 6 6 37 33 29 1 7 Hy D A ƒ2 .A /0 D 4 31 29 27 13 5 100 11 1 13 97 rkHy D rkA D m D 2 1 .11y1 C y2 C 13y3 C 97y4 /: 100 By means of Box 3.17 we repeat the right eigenspace analysis for one leverage point t4 D a, later on a1, for both the hat matrix Hx W D .A0 A/1 A0 and Hy W D A.A0 A/1 A0 . First, Hx is the linear operator producing xO D x` .I-LESS/. Second, Hy as linear operator generates yO D y` .I-LESS/. Third, the complementary operator I4 Hy DW R as the matrix of partial redundancies leads us to the inconsistency vector Oi D i` .I-LESS/. The structure of the redundancy matrix R; rkR D n m, is most remarkable. Its diagonal elements will be interpreted soonest. Fourth, we have computed the length of the inconsistency vector jjOijj2 , the quadratic form y0 Ry. The highlight of the analysis of hat matrices is set by computing yO4 D y4 .I-LESS/ D
1st W Hx .a ! 1/ versus 2nd W Hy .a ! 1/ 3rd W R.a ! 1/ versus 4th W jjOi.a ! 1/jj2 for “highest leverage point” a ! 1, in detail reviewed Box 3.18. Please, notice the two unknowns xO 1 and xO 2 as best approximations of type I-LESS. xO 1 resulted in the arithmetic mean of the first three measurements. The point y4 ; t4 D a ! 1, had no influence at all. Here, xO 2 D 0 was found. The hat matrix Hy .a ! 1/ has produced partial hats h11 D h22 D h33 D 1=3, but h44 D 1 if a ! 1. The best approximation of the I-LESS observations were yO1 D yO2 D yO3 as the arithmetic mean of the first three observations but yO4 D y4 has been a reproduction of the fourth observation. Similarly the redundancy matrix R.a ! 1/ produced the weighted means iO1 , iO2 and iO3 . The partial redundancies r11 D r22 D r33 D 2=3; r44 D 0, sum up to r11 C r22 C r33 C r44 D n m D 2. Notice the value iO4 D 4 : The observation indexed four is left uncontrolled. Box 3.16. The linear model of a univariate polynomial of degree one – one high leverage point – 2
3 2 2 3 3 y1 11 i1 6 y2 7 6 1 2 7 x1 6 i2 7 7 6 6 7 7 y D Ax C i 6 4 y3 5 D 4 1 3 5 x2 C 4 i3 5 y4 i4 1a x 2 R2 ; y 2 R4 ; A 2 R42 ; rkA D m D 2
3-3 Case Study
147
dim X D m D 2 versus dim Y D n D 4 .1st/Ox D x` .I-LESS/ D .A0 A/1 A0 y D Hx y
(3.71) 2 3 y1 7 1 8 a C a2 2 2a C a2 4 3a C a2 14 6a 6 y 6 27 Hx D 2 a 2a 6a 6 C 3a 4 y3 5 18 12a C 3a2 y4 .2nd/Oy D y` .I-LESS/ D A0 .A0 A/1 A0 y D Hy y 0
0
1
(3.72)
0
“hat matrix”: Hy D A .A A/ A ; rkHy D m D 2 Hy D
1 1818aC3a2 2 6 2a C a2 6 4 3a C a2 6 4 2 4a C a2 8 3a
4 3a C a2 6 4a C a2 6 5a C a2 2
2 4a C a2 8 5a C a2 14 6a C a2 4 C 3a
3 8 3a 7 2 7 5 4 C 3a 14 12a C 3a2
(3rd) Oi D i` .I-LESS/ D .I4 A.A0 A/1 A0 /y D Ry “redundancy matrix”:R D I4 A.A0 A/1 A0 ; rkR D n m D 2 “redundancy”: n - rk A = n - m = 2 1 RD 18 12a C 3a2 2 12 10a C 2a2 4 C 3a a2 2 C 4a a2 6 4 C 3a a2 12 6a C 2a2 8 C 5a a2 6 4 2 C 4a a2 8 C 5a a2 4 6a C 2a2 8 C 3a 2 4 3a
3 8 C 3a 2 7 7 4 3a 5 4
.4th/jjOijj2 D jji` .I-LESS/jj2 D y0 Ry: At this end we shall compute the LESS fit 2 O lim jji.a/jj ;
a!1
which turns out to be independent of the fourth observation.
Box 3.17. The linear model of a univariate polynomial of degree one – extreme leverage point a ! 1 – (1st ) Hx .a ! 1/
148
3 The Second Problem of Algebraic Regression
2 8 1 2 3 2 4 14 6 3 C 1 C C 1 C 1 1 6 a2 a a2 a a2 a a2 a 7 Hx D 4 5 18 12 2 2 6 6 1 1 1 3 C 3 C C C a2 a a2 a a2 a a2 a a2 a 1 1110 lim Hx D ) a!1 3 0000 xO 1 D
1 .y1 C y2 C y3 /; xO 2 D 0 3 (2nd ) Hy .a ! 1/
2 6 2 C1 6 a2 a 6 6 4 3 6 6 a2 a C 1 1 6 Hy D 6 2 18 12 4 C3 6 6 2 C1 2 a a a 6a 4 8 3 2 a a 2
4 2 3 4 C1 C1 a2 a a2 a 6 8 4 5 C1 C1 2 2 a a a a 8 14 6 5 C1 C1 a2 a a2 a 2 4 3 2C 2 a a a 3 110 1 1 07 7; lim h44 D 1 ) 1 1 0 5 a!1
8 3 3 2 a a 7 7 7 2 7 7 2 a 7 4 3 7 7 2C 7 a a 7 5 14 12 C 3 a2 a
1 16 1 lim Hy D 6 4 a!1 3 1 0003
1 .y1 C y2 C y3 /; yO4 D y4 3 (3rd ) R.a ! 1/
yO1 D yO2 D yO3 D
2
4 3 10 10 6 a2 a C 2 a2 C a 1 6 6 4 6 C 3 1 12 8 C 2 6 1 a a2 a 6 a2 RD 6 18 12 6 2 8 4 5 C 3 6 C 1 C 1 6 a2 2 a2 a a a a 6 4 2 3 8 2 2C a a a 2 2 1 1 16 1 2 1 lim R.a/ D 6 a!1 3 4 1 1 2 0 0 0
2 4 C 1 a2 a 8 5 2 C 1 a a 4 6 C2 a2 a 4 3 2 a a 3 0 07 7: 05 0
3 8 3 C a2 a7 7 7 2 2 7 7 a 7 7 4 3 7 7 a2 a 7 7 5 4
a2
3-3 Case Study
149
1 1 1 iO1 D .2y1 y2 y3 /; iO2 D .y1 C 2y2 y3 /; iO3 D .y1 y2 C 2y3 /; iO4 D 0 3 3 3 2 (4th) LESS fit :jjiO jj 2
2 6 1 1 2 O lim jji.a/jj D y0 6 a!1 3 4 1 0
1 1 2 1 1 2 0 0
3 0 07 7y 05 0
1 .2y12 C 2y22 C 2y32 2y1 y2 2y2 y3 2y3 y1 /: a!1 3 A fascinating result is achieved upon analyzing (the right eigenspace of the hat matrix Hy .a ! 1/. First, we computed the spectrum of the matrices A0 A and .A0 A/1 . Second, we proved 1 .a ! 1/ D 1, 2 .a ! 1/ D 3 or 1 1 .a ! 1/ D 0, 1 .a ! 1/ D 1=3. 2 2 O D lim jji.a/jj
Box 3.18. Right eigenspace analysis of a linear model of a univariate polynomial of degree one – extreme leverage point a ! 1-. “Hat matrix Hy D A0 .A0 A/1 A0 ” 2 0 A AV D Vƒ2 right eigenspace analysis: 4 subject to VV0 D Im0 ˇ ˇ ˇ ˇ spec.A0 A/ D f21 ; 22 g W ˇA0 A 2j Iˇ D 0 8 j 2 f1; 2g , ˇ ˇ ˇ ˇ 4 2 6Ca 4 2 2 2 ˇ ˇ ˇ 6 C a 14 C a2 2 ˇ D 0 , .18 C a / C 20 12a C 3a D 0 21;2 D
p 1 tr .A0 A/ ˙ .trA0 A/2 4 det A0 A 2
trA0 A D 18 C a2 ; det A0 A D 20 12a C 3a3 .trA0 A/2 4 det A0 A D 244 C 46a C 25a2 C a4 r 2 a a4 ˙ 61 C 12a C 6a2 C 21;2 D 9 C 2 4 ( D 9C
a2 C 2
r
spec.A0 A/ D f21 ; 22 g D ) r 4 2 4 a a a 61 C 12a C 6a2 C 61 C 12a C 6a2 C ; 9 C 4 2 4 “inverse spectrum” 0
spec.A A/ D
f21 ; 22 g
0
, spec.A A/
1
D
1 1 ; 21 22
150
3 The Second Problem of Algebraic Regression
r r 4 9 61 1 12 6 1 a a2 C C 3 C 2C 61 C 12a C 6a2 C 9C 2 4 1 2 a a a 4 2 4 D a D 20 12 20 12a C 3a2 22 C3 a2 a 1 lim D0 a!1 2 1 r r 61 9 1 12 6 1 a2 a4 2 C C C 3 C 2C C 61 C 12a C 6a C 9C 2 4 1 a 2 a a a 4 2 4 D D 2 2 20 12 20 12a C 3a 1 C3 a2 a 1 1 lim D a!1 2 3 2 1 lim spec.A0 A/.a/ D f1; 3g , lim spec.A0 A/1 D f0; g a!1 3 2 A0 AV D Vƒ ) A0 A D Vƒ2 V0 , .A0 A/1 D Vƒ2 V0 VV0 D Im
a!1
“Hat matrix Hy D AVƒ2 V0 A0 ”.
3-32 Multilinear Algebra, “Join” and “Meet”, the Hodge Star Operator Before we can analyze the matrices “hat Hy ” and “red R” in more detail, we have to listen to an “intermezzo” entitled multilinear algebra, “join” and “meet” as well as the Hodge star operator. The Hodge star operator will lay down the foundation of “latent restrictions” within our linear model and of Grassmann coordinates, also referred to as Pl¨ucker coordinates. Box 3.20 summarizes the definitions of multilinear algebra, the relations “join and meet”, denoted by “^ ” and “*”, respectively. In terms of orthonormal base vectors ei1 ; : : : ; eik , we introduce by (3.73) the exterior product ei1 ^ ^ eim also known as “join”, “skew product” or 1st Grassmann relation. Indeed, such an exterior product is antisymmetric as defined by (3.74), (3.75), (3.76) and (3.77). The examples show e1 ^ e2 D -e2 ^ e1 and e1 ^ e1 D 0, e2 ^ e2 D 0. Though the operations“join”, namely the exterior product, can be digested without too much of an effort, the operation “meet”, namely the Hodge star operator, needs much more attention. Loosely speaking the Hodge star operator or 2nd Grassmann relation is a generalization of the conventional “cross product” symbolized by “”. Let there be given an exterior form of degree k as an element of k .Rn / over the field of real numbers Rn . Then the “Hodge *” transforms the input exterior form of degree m to
3-3 Case Study
151
the output exterior form of degree n m, namely an element of nk .Rn /. Input W X 2 m .Rn / ! Output W X 2 nm : Applying the summation convention over repeated indices, (3.79) introduces the input operation “join”, while (3.80) provides the output operation “meet”. We say that X, (3.80) is a representation of the adjoint form based on the original form X, (3.79). The Hodge dualizer is a complicated exterior form (3.80) which is based upon Levi–Civita’s symbol of antisymmetry (3.81) which is illustrated by 3 examples. "k1 k` is also known as the permutation operator. Unfortunately, we have no space and time to go deeper into “join and meet”. Instead we refer to those excellent text-books on exterior algebra and exterior analysis, differential topology, in short exterior calculus. Box 3.19. (“join and meet”): “Hodge star operator ”^;
I WD fi1 ; : : : ; ik ; ikC1 ; : : : ; in g f1; : : : ; ng “join”: exterior product, skew product, 1st Grassmann relation ei1 im WD ei1 ^ ^ ej ^ ej C1 ^ ^ eim
(3.73)
“antisymmetry”: ei1 ij im D ei1 ji im 8i ¤ j
(3.74)
ei1 ^ : : : ^ ejk ^ ejkC1 ^ : : : ^ eim D ei1 ^ : : : ^ ejkC1 ^ ejk ^ : : : ^ eim (3.75) ei1 ii ij im D 0 8i D j
(3.76)
ei1 ^ ei ^ ej ^ ^ eim D 0 8i D j
(3.77)
Example: e1 ^ e2 D e2 ^ e1 or ei ^ ej D ej ^ ei 8i ¤ j Example: e1 ^ e1 D 0; e2 ^ e2 D 0 or ei ^ ej D 0
8i D j
“meet”: Hodge star operator, Hodge dualizer 2nd Grassmann relation
W ƒm .Rn / ! m
n
nm
ƒ.Rn / n
(3.78)
“a m degree exterior form X 2 ƒ .R / over R is related to a n m degree exterior form X called the adjoint form” :summation convention: “sum up over repeated indices” input: “join”
152
3 The Second Problem of Algebraic Regression
XD
X WD
1 ei ^ ^ eim Xi1 im mŠ 1 output: “meet”
1 p g ej1 ^ ^ ejnm "i1 im j1 jnm Xi1 im mŠ.n m/Š
(3.79)
(3.80)
antisymmetry operator (“Eddington’s epsilons”): 2
C1 for an even permutation of the indices k1 k` "k1 k` WD 4 1 for an oded permutation of the indices k1 k` 0 otherwise (for a repetition of the indices) :
(3.81)
Example W "123 D "231 D "312 D C1 Example W "213 D "321 D "132 D 1 Example W "112 D "223 D "331 D 0: For our purposes two examples on “Hodge’s star” will be sufficient for the following analysis of latent restrictions in our linear model. In all detail, Box 3.21 illustrates “join and meet” for ƒ2 .R3 / ! 1 .R3 /: Given the exterior product a ^ b of two vectors a and b in R3 with ai1 1 D col1 A; ai2 2 D col2 A as their coordinates, the columns of the matrix A with respect to the orthonormal frame of reference fe1 ; e2 ; e3 j 0g at the origin 0. a^bD
nD3 X
ei1 ^ ei2 ai1 1 ai2 2 2 ƒ2 .R3 /
i1 ;i2 D1
is the representation of the exterior form a^b DW X in the multibasis ei1 i2 D ei1 ^ei2 . By cyclic ordering, (3.105) is an explicit write-up of a ^ b 2 R.A/. Please, notice that there are n 3 D D3 m 2 p subdeterminants of A. If the determinant of the matrix G D I4 , det G p D 1 g D 1, then according to (16.80), (3.86)
.a ^ b/ 2 R.A/? D G1;3 represent the exterior form X, which is an element of R.A/ called Grassmann space G1;3 . Notice that .a ^ b/ is a vector whose Grassmann coordinate (Pl¨ucker
3-3 Case Study
153
coordinate) are
n m
D
3 D3 2
subdeterminants of the matrix A, namely a21 a32 a31 a22 ; a31 a12 a11 a32 ; a11 a23 a21 a12 : Finally, (3.87) .e2 ^ e3 / D e2 e3 D e1 for instance demonstrates the relation between called “join, meet” and the “cross product”. Box 3.20. (The first example: ) “join and meet”
W 2 .R3 / ! 2 .R3 / Input: “join”
aD
nD3 X
ei1 ai1 1 ;
i D1
bD
nD3 X
ei2 ai2 2
(3.82)
i D1
ai1 1 D col1 AI ai2 2 D col2 A nD3
a^bD
1 X ei ^ei 2Š i ;i D1 1 2
ai1 1 ai2 2 2 ƒ2 .R3 /
(3.83)
1 2
“cyclic order” 1 e2 ^ e3 .a21 a32 a31 a22 / 2Š 1 C e3 ^ e1 .a31 a12 a11 a32 / (3.84) 2Š 1 C e1 ^ e2 .a11 a23 a21 a12 / 2 R.A/ D G2;3 : 2Š p Output: “meet00 . g D 1; Gy D I3 ; m D 2; n D 3; n m D 1/ a^b D
.a ^ b/ D
nD2 X i1 ;i2
1 ej "i1 ;i2 ;j ai1 1 ai2 2 2Š ;j D1
(3.85)
154
3 The Second Problem of Algebraic Regression
1 e1 .a21 a32 a31 a22 / 2Š 1 C e2 .a31 a12 a11 a32 / 2Š 1 C e3 .a11 a23 a21 a12 / 2 R? .A/ D G1;3 2Š n 3 D subdeterminant of A m 2
.a ^ b/ D
(3.86)
Grassmann coordinates (Pl¨ucker coordinates)
.e2 ^ e3 / D e1 ; .e3 ^ e1 / D e2 ;
.e1 ^ e2 / D e3
(3.87)
Alternatively, Box 3.22 illustrates “join and meet” for selfduality
W 2 .R4 / ! 2 .R4 /: Given the exterior product a ^ b of two vectors a 2 R4 and b 2 R4 , namely the two column vectors of the matrix A 2 R42 ai1 1 D col1 A; ai2 2 D col2 A as their coordinates with respect to the orthonormal frame of reference fe1 ; e2 ; e3 ; e4 j 0 g at the origin 0. is the representation of the exterior form a^b WD X in the multibasis ei1 i2 D ei1 ^ ei2 . By lexicographic ordering, (3.90) is an explicit write-up of a ^ b .2 R.A//. Notice that these are n 4 D D6 m 2 subdeterminants of A. If the determinant of the matrix G of the metric is one G D p p I4 ; det G D g D 1, then according to (3.91), (3.92)
.a ^ b/ 2 R.A/? DW G2;4 represents the exterior form X, an element of R.A/? ; called Grassmann space G2;4 . Notice that .a ^ b/ is an exterior 2-form which has been generated by an exterior 2-form, too. Such a relation is called “selfdual”. Its Grassmann coordinates (Pl¨ucker coordinates) are n 4 D D6 m 2 subdeterminants of the matrix A, namely a11 a12 a21 a12 ; a11 a32 a31 a22 ; a11 a42 a41 a12 ; a21 a32 a31 a22 ; a21 a42 a41 a22 ; a31 a41 a41 a32 :
3-3 Case Study
155
Finally, (3.92), for instance .e1 ^ e2 / D e3 ^ e4 , demonstrates the operation called “join and meet”, indeed quite a generalization of the “cross product”. Box 3.21. (The second example: ) “join and meet”
W 2 .R4 / ! 2 .R4 / “selfdual” Input: “join”
aD
nD4 X
ei1 ai1 1 ; b D
i1 D1
nD4 X
ei2 ai2 2
(3.88)
i2 D1
.ai1 1 D col1 .A/; ai2 2 D col2 .A// nD4
1 X ei ^ei ai 1 ai 2 2 ƒ2 .R4 / a^bD 2Š i ;i D1 1 2 1 2
(3.89)
1 2
“lexicographical order”
a^bD
1 e1 ^ e2 .a11 a22 a21 a12 / 2Š 1 C e1 ^ e3 .a11 a32 a31 a22 / 2Š 1 C e1 ^ e4 .a11 a42 a41 a12 / 2Š 1 C e2 ^ e3 .a21 a32 a31 a22 / 2Š 1 C e2 ^ e4 .a21 a42 a41 a22 / 2Š 1 C e3 ^ e4 .a31 a42 a41 a32 / 2 R.A/? D G2;4 2Š n 4 D subdeterminants of A W m 2
Grassmann coordinates ( Pl¨ucker coordinates). p Output:“meet00 g D 1; Gy D I4 ; m D 2; n D 4; n m D 2
(3.90)
156
3 The Second Problem of Algebraic Regression
.a ^ b/ D
1 2Š i
nD4 X 1 ;i2 ;j1 ;j2
1 ej ^ ej2 "i1 i2 j1 j2 ai1 1 ai2 2 2Š 1 D1
1 e3 ^ e4 .a11 a22 a21 a12 / 4 1 C e2 ^ e4 .a11 a32 a31 a22 / 4 1 C e3 ^ e2 .a11 a42 a41 a12 / 4 1 C e4 ^ e1 .a21 a32 a31 a22 / 4 1 C e3 ^ e1 .a21 a22 a41 a22 / 4 1 C e1 ^ e2 .a31 a42 a41 a32 / 2 R.A/? D G2;4 4 n 4 D subdeterminants of A W m 2
D
(3.91)
Grassmann coordinates (Pl¨ucker coordinates).
.e1 ^ e2 / D e3 ^ e4 ; .e1 ^ e3 / D e2 ^ e4 ; .e1 ^ e4 / D e3 ^ e2 ;
.e2 ^ e3 / D e4 ^ e1 ; .e2 ^ e4 / D e3 ^ e1 ;
.e3 ^ e4 / D e1 ^ e2 :
(3.92)
3-33 From A to B: Latent Restrictions, Grassmann Coordinates, Plucker Coordinates ¨ Before we return to the matrix A 2 R42 of our case study, let us analyze the matrix A 2 R23 of Box 3.23 for simplicity. In the perspective of the example of our case study we may say that we have eliminated the third observation, but kept the leverage point. First, let us go through the routine to compute the hat matrices Hx D .A0 A/1 A0 and Hy D A.A0 A/1 A0 , to be identified by (3.93) and (3.94). The corresponding estimations xO D x` .I-LESS/, (3.94), and y D y` .I-LESS/, (3.96), prove the different weights of the observations .y1 ; y2 ; y3 / influencing xO 1 and xO 2 as well as .yO1 ; yO2 ; yO3 /. Notice the great weight of the leverage point t3 D 10 on yO3 . Second, let us interpret the redundancy matrix R D I3 A.A0 A/1 A0 , in particular the diagonal elements. A0 A.1/ A0 A.2/ A0 A.3/ 64 81 1 D ; r22 D D ; r33 D D ; 0 0 det A A 146 det A A 146 det A0 A 146 nD3 P 0 1 trR D .A A/.i / D n rkA D n m D 1; 0 det A A i D1
r11 D
3-3 Case Study
157
the degrees of freedom of the I3 -LES S problem. There, for the first time, we meet the subdeterminants .A0 A/.i / which are generated in a two step procedure. “First step” “Second step” eliminate the i th row from A as well as compute the determinant A0 .i / A.i / . the i th column of A.
A0 .1/ A.1/ –1 1 1 1 2 10 –
A0 .2/ A.2/ 1– 1 1 1– 2 10
A0 .3/ A.3/ 11 – 1 121 –0
Example :.A0 A/1 1 – – 1 1 2 .A0 A/.1/ D det A0 .1/ A.1/ D 64 1 10 det A0 A D 146 2 12 12 104 Example:.A0 A/2 1 1 1 – – 2 .A0 A/.2/ D det A0 .2/ A.2/ D 81 1 10 det A0 A D 146 2 11 11 101 Example:.A0 A/3 1 1 1 2 .A0 A/.3/ D det A0 .3/ A.3/ D 1 1– – 10 det A0 A D 146 23 35
Obviously, the partial redundancies .r11 ; r22 ; r33 / are associated with the influence of the observation y1 ; y2 ory3 on the total degree of freedom. Here the observation y1 and y2 had the greatest contribution, the observation y3 at a leverage point a very small influence. The redundancy matrix R, properly analyzed, will lead us to the latent restrictions or “from A to B”. Third, we introduce the rank partitioning R D Œ B; C, rkR D rk B D n m D 1; (3.98), of the matrix R of spatial redundancies. Here, b 2 R31 , (3.100), is normalized to generate b D b=jjbjj 2 , (3.101). Note, C 2 R32 is a dimension identity. We already introduced the orthogonality condition b0 A D 0 .b /0 A D 0
or or
b0 Ax` D b0 yO D 0 .b /0 Ax` D .b /0 yO D 0 ;
which establishes the latent restrictions (3.127) 8yO1 9yO2 C yO3 D 0 :
158
3 The Second Problem of Algebraic Regression
We shall geometrically interpret this essential result as soon as possible. Fourth, we aim at identifying R.A/ and R.A/? for the linear model fAx C i D y; A 2 Rnm ; rkA D m D 2g 2 3 1 @y y y y t1 WD D Œe1 ; e2 ; e3 4 1 5; @x1 1 2 3 1 @y y y y t2 WD D Œe1 ; e2 ; e3 4 2 5; @x2 10 as derivatives of the observation functional y D f .x1 ; x2 / establish the tangent vectors which span a linear manifold called Grassmann space. G2;3 D spanft1 ; t2 g R3 ; in short GRASSMANN (A). Such a notation becomes more obvious if we compute 3 nD3 mD2 a11 x1 C a12 x2 X X y y y y ei aij xj ; y D Œe1 ; e2 ; e3 4 a21 x1 C a22 x2 5 D i D1 j D1 a31 x1 C a32 x2 3 2 a11 nD3 X @y y y y 4 y 5 .x1 ; x2 / D Œe1 ; e2 ; e3 a21 D ei ai1 @x1 i D1 a31 3 2 a12 nD3 X @y y y y y .x1 ; x2 / D Œe1 ; e2 ; e3 4 a22 5 D ei ai 2 : @x2 i D1 a32 2
Indeed, the columns of the matrix A lay the foundation of GRASSMANN (A). Five, let us turn to GRASSMANN (B) which is based on the normal space R.A/? . The normal vector n D t1 t2 D .t1 ^ t2 / which spans GRASSMANN (B) is defined by the “cross product” identified by “^; ”, the skew product symbol as well as the Hodge star symbol. Alternatively, we are able to represent the normal vector n, (3.107), (3.109), (3.112), constituted by the columns col1A, col2A of the matrix, in terms of the Grassmann coordinates (Pl¨ucker coordinates). p23
ˇ ˇ ˇ ˇ ˇ ˇ ˇ a21 a22 ˇ ˇ a31 a32 ˇ ˇ a11 a12 ˇ ˇ ˇ ˇ D 1; ˇ ˇ ˇ D 8; p31 D ˇ D 9; p12 D ˇ Dˇ a31 a32 ˇ a11 a12 ˇ a21 a22 ˇ
identified as the subdeterminants of the matrix A, generated by
3-3 Case Study
159 nD3 X
.ei1 ^ ei2 /ai1 1 ai2 2 :
i1 ;i2 D1
If we normalize the vector b to b D b=jjbjj2 and the vector n to n D n=jjnjj2 , we are led to the first corollary b D n . The space spanned by the normal vector n, namely the linear manifold G1;3 R3 defines GRASSMANN (B). In exterior calculus, the vector built on Grassmann coordinates (Pl¨ucker coordinates) is called Grassmann vector g or normalized Grassmann vector g*, here 2
2
3 2 3 p23 8 g g WD 4 p31 5 D 4 9 5; g WD kgk 1 p12
2
3 8 1 4 9 5: D 146 1
The second corollary identifies b D n D g . “The vector b which constitutes the latent restriction (latent condition equation) coincides with the normalized normal vector n 2 R.A/? , an element of the space R.A/? , which is normal to the column space R.A/ of the matrix A. The vector b is built on the Grassmann coordinates (Pl¨ucker coordinates), Œp23 ; p31 ; p12 0 , subdeterminant of vector g in agreement with b .” Box 3.22. (Latent restrictions Grassmann coordinates (Plucker ¨ coordinates) the second example): 2 3 3 2 3 1 1 1 1 y1 x 4 y2 5 D 4 1 2 5 1 , A D 4 1 2 5; rkA D 2 x2 1 10 1 10 y3 2
.1st/ Hx D .A0 A/1 A0 1 92 79 25 Hx D .A0 A/1 A0 D 146 10 7 17 1 92y1 C 79y2 25y3 Ox D x` .I LESS/ D 146 10y1 7y2 C 17y3 .2nd/ Hy D A.A0 A/1 A0 2 3 82 72 8 1 4 72 65 9 5; rkHy D rkA D 2 Hy D .A0 A/1 A0 D 146 8 9 145 2 3 82y1 C 72y2 8y3 1 4 72y1 C 65y2 C 3y3 5 yO D y` .I LESS/ D 146 8y1 C 9y2 C 145y3
(3.93) (3.94)
(3.95)
(3.96)
160
3 The Second Problem of Algebraic Regression
yO3 D
1 .8y1 C 9y2 C 145y3 / 146
.3rd / R D I3 A.A0 A/1 A0 2 3 64 72 8 1 4 72 81 9 5 R D I3 A.A0 A/1 A0 D 146 8 9 1 r11 D
(3.97)
(3.98)
A0 A.1/ A0 A.2/ A0 A.3/ 64 81 1 D ; r D ; r D D D 22 33 146 det A0 A 146 det A0 A 146 det A0 A nD3
trR D
X 1 .A0 A/.i / D n rkA D n m D 1 det A0 A i D1 latent restriction
2 3 64 72 8 1 4 R D ŒB; C D 72 81 9 5; rkR D 1 146 8 9 1 2 3 2 3 64 0:438 1 4 b WD 72 5 D 4 0:493 5 146 8 0:053 2 3 2 3 8 0:662 1 b 4 9 5 D 4 0:745 5 b WD Dp kbk 146 1 0:083
(3.100)
b0 A D 0 , .b /0 A D 0
(3.102)
0
0
(3.99)
(3.101)
b yO D 0 , .b / yO D 0
(3.103)
8yO1 9yO2 C yO3 D 0
(3.104)
“R.A/ and R.A/? Tangent space Tx M2 versus normal space Nx M2 3 1;3 3 Grassmann manifold G2;3 m R versus Grassmann manifold Gnm R ” 2 3 1 @y y y y 00 4 “the first tangent vector W t1 WD D Œe1 ; e2 ; e3 1 5 @x1 1 2 3 1 @y y y y D Œe1 ; e2 ; e3 4 2 5 “the second tangent vector00 W t2 WD @x2 10 Gm;n
(3.105)
(3.106)
3-3 Case Study
161
G2;3 D spanft1 ; t2 g 2 R3 W Grassman.A/ “the normal vector” n WD t1 t2 D .t1 ^ t2 / t1 D
nD3 X
ei1 ai1 1
and
t1 D
i D1
nD
nD3 X
ei2 ai2 2
(3.108)
i D1
ei1 ei2 ai1 1 ai2 2 D
i1 ;i2 D1
nD3 X
(3.107)
nD3 X
.ei1 ^ ei2 /ai1 1 ai2 2
(3.109)
i1 ;i2 D1
8i; i1 ; i2 2 f1; : : : ; n D 3g nD
versus
n
D .e2 e3 /.a21 a32 a31 a22 / C .e3 e1 /.a31 a12 a11 a32 / C .e1 e2 /.a11 a22 a21 a12 / (3.110) 2
.e2 ^ e3 / D e2 e3 D e1 Hodge star operator W 4 .e3 ^ e1 / D e3 e1 D e2 (3.111)
.e1 ^ e2 / D e1 e2 D e3 2 3 8 y y y n D t1 t2 D .t1 t2 / D Œe1 ; e2 ; e3 4 9 5 (3.112) 1 2 3 8 n 1 y y y 4 9 5 n WD (3.113) D Œe1 ; e2 ; e3 p jjnjj 146 1
D e2 e3 .a21 a32 a31 a22 / Ce3 e1 .a31 a12 a11 a32 / Ce1 e2 .a11 a22 a21 a12 /
Corollary: b D n “Grassmann manifold Gnm;n ” G1;3 D span n R3 W Grassman.B/ Grassmann coordinates (Pl¨ucker coordinates) 2
3 ˇ ˇ ˇ ˇ ˇ ˇ 1 1 ˇa a ˇ ˇa a ˇ ˇa a ˇ A D 4 1 2 5; g.A/ WD ˇˇ 21 22 ˇˇ ; ˇˇ 31 32 ˇˇ ; ˇˇ 11 12 ˇˇ a31 a32 a11 a12 a21 a22 10 10ˇ ˇ ˇ ˇ ˇ ˇ ˇ 1 2 ˇ ˇ 1 10 ˇ ˇ 1 1 ˇ ˇ; ˇ ˇ; ˇ ˇ D f8; 9; 1g D ˇˇ 1 10 ˇ ˇ 1 1 ˇ ˇ 1 2 ˇ (cyclic order)
(3.114)
162
3 The Second Problem of Algebraic Regression
g.A/ D fp23 ; p31 ; p12 g p23 D 8; p31 D 9; p12 D 1 3 2 3 2 8 p23 Grassmann vector W g WD 4 p31 5 D 4 9 5 1 p12 2 3 8 1 g 4 9 5 Dp normalized Grassmann vector W g WD jjgjj 146 1
(3.115)
(3.116)
Corollary : b D n D g Now we are prepared to analyze the matrix A 2 R24 of our case study. Box 3.24 outlines first the redundancy matrix R 2 R24 (3.117) used for computing the inconsistency coordinates iO4 D i4 .I LESS/, in particular. Again it is proven that the leverage point t4 D 10 has little influence on this fourth coordinate of the inconsistency vector. The diagonal elements .r11 ; r22 ; r33 ; r44 / of the redundancy matrix are of focal interest. As partial redundancy numbers in (3.123) r11 D AA.1/ AA.2/ AA.3/ AA.4/ 57 67 73 3 det A0 A D 100 ; r22 D det A0 A D 100 ; r33 D det A0 A D 100 ; r44 D det A0 A D 100 ; they sum up to nD4
X 1 .A0 A/.i / D n rk; A D n m D 2 trR D det A0 A i D1 the degree of freedom of the I4 -LESS problem. Here for the second time we meet the subdeterminants .A0 A/.i / which are generated in a two-step procedure. “First step” “Second step” eliminate the i th row from A as well ascompute the determinant of the i th column of A0 A0 .i / A.i / . Box 3.23. (Redundancy matrix of a linear model of a uninvariant polynomial of degree one) – light leverage point a D 10 – “Redundancy matrix R D .I4 A.A0 A/1 A0 /” 2
3 57 37 31 11 7 1 6 6 37 67 29 1 7 I4 A.A0 A/1 A0 D 4 100 31 29 73 13 5 11 1 13 3
(3.117)
iO4 D i4 .I-LESS/ D Ry
(3.118)
3-3 Case Study
163
1 .11y1 y2 13y3 C 3y4 / iO4 D i4 .I-LESS/ D 100 57 67 73 3 ; r22 D ; r33 D ; r44 D r11 D 100 100 100 100 “rank partitioning”
(3.119) (3.120)
R 2 R44 ; rkR D n rkA D n m D 2; B 2 R42 ; C 2 R42 R D I4 A.A0 A/1 A0 D ŒB; C 2 3 57 37 7 1 6 6 37 67 7; then B0 A D 000 “if B WD 100 4 31 29 5
(3.121)
(3.122)
11 1
0
r11 D
0
A A.1/ A A.2/ A0 A.3/ A0 A.4/ ; r ; r ; r D D D 22 33 44 det A0 A det A0 A det A0 A det A0 A
(3.123)
nD4
trR D
X 1 .A0 A/.i / D n rkA D n m D 2 0 det A A i D1
(3.124)
Example: .A0 A/.1/ : A0 .1/ A.1/ –11 1 1 1 2 3 10 –
A0 .2/ A.2/ 1– 11 1 1– 2 3 10
A0 .3/ A.3/ 11– 1 1 12– 3 10
– – 1 1 1 2 1 3 1 10 3 15 15 113
.A0 A/.1/ D det.A0 .1/ A.1/ / D 114, det A0 A D 200
1 1 1 – – 2 1 3 1 10 3 14 14 110
.A0 A/.2/ D det.A0 .2/ A.2/ / D 134, det A0 A D 200
1 1 1 2 1 – – 3 1 10 3 13 13 105
.A0 A/.3/ D det.A0 .3/ A.3/ / D 146, det A0 A D 200
Example: .A0 A/.2/ :
Example:.A0A/.3/
164
3 The Second Problem of Algebraic Regression
Example:.A0A/.4/ A0 .2/ A.2/ 111 – 1 123– 10
1 1 1 2 1 3 1– – 10 3 6 6 10
.A0 A/.4/ D det.A0 .4/ A.4/ / D 6, det A0 A D 200 d
Again, the partial redundancies .r11 ; : : : ; r44 / are associated with the influence of the observation y1 ; y2 ; y3 or y4 on the total degree of freedom. Here the observations y1 ; y2 and y3 had the greatest influence, in contrast the observation y4 at the leverage point a very small impact. The redundancy matrix R will be properly analyzed in order to supply us with the latent restrictions or the details of “from A to B”. The rank partitioning R D ŒB; C; rkR D rk B D n m D 2, leads us to (3.22) of the matrix R of partial redundancies. Here, B 2 R42 , with two column vectors is established. Note C 2 R42 is a dimension identity. We already introduced the orthogonality conditions in (3.22) B0 A D 0 or B0 Ax` D B0 y` D 0; which establish the two latent conditions 57 37 31 11 yO1 yO2 yO3 C yO4 D 0 100 100 100 100
37 67 29 1 yO1 C yO2 yO3 yO4 D 0: 100 100 100 100
Let us identify in the context of this paragraph R.A/ and R.A/? for the linear model fAx C i WD y; A 2 Rmn ; rkA D m D 2g: The derivatives 2 3 2 3 1 1 6 6 7 @y @y 1 2 7 y y y y y y y y 7; t2 WD 7; t1 WD D Œe1 ; e2 ; e3 ; e4 6 D Œe1 ; e2 ; e3 ; e4 6 4 4 5 1 3 5 @x1 @x2 1 10 of the observational functional y D f .x1 ; x2 / generate the tangent vectors which span a linear manifold called Grassmann space G2;4 D spanft1 ; t2 g R4 ; in short GRASSMANN (A). An illustration of such a linear manifold is
3-3 Case Study
165
3 a11 x1 C a12 x2 nD2 nD4 X X y y y y 6 a21 x1 C a22 x2 7 y 7D y D Œe1 ; e2 ; e3 ; e4 6 ei aij xj ; 4 a31 x1 C a32 x2 5 i D1 j D1 a41 x1 C a42 x2 2 3 a11 nD4 X @y y y y y 6 a21 7 y 7D .x1 ; x2 / D Œe1 ; e2 ; e3 ; e4 6 ei ai1 ; 4 a31 5 @x1 i D1 a41 3 2 a12 nD4 X @y y y y y 6 a22 7 y 7D .x1 ; x2 / D Œe1 ; e2 ; e3 ; e4 6 ei ai 2 : 4 a32 5 @x2 2
a42
i D1
Box 3.24. (Latent restrictions Grassmann coordinates (Plucker ¨ coordinates): the first example) B0 A D 0 , B0 y D 0 2 2 3 3 1 1 57 37 61 2 7 7 1 6 6 37 67 7 7 AD6 4 1 3 5 ) B D 100 4 31 29 5 1 10 11 1
(3.125)
(3.126)
“latent restriction” 57yO1 37yO2 31yO3 C 11yO4 D 0
(3.127)
37yO1 C 67yO2 29yO3 yO4 D 0
(3.128) 2
R.A/ W the tangent space Tx M
the Grassmann manifold G2;4 2 3 1 6 @y 17 y y y y 7 “the first tangent vector00 W t1 WD Œe1 ; e2 ; e3 ; e4 6 4 15 @x1
(3.129)
1 2
3 1 @y y y y y 6 2 7 7 “the second tangent vector00 W t2 WD Œe1 ; e2 ; e3 ; e4 6 43 5 @x2 10 G2;4 D spanft1 ; t2 g R4 ; Grassman.A/
(3.130)
166
3 The Second Problem of Algebraic Regression
“the first normal vector” W n1 WD
b1 jjb1 jj
jjb1 jj2 D 104 .572 C 372 C 312 C 112 / D 57 102 2 3 0:755 y y y y 6 0:490 7 7 n1 D Œe1 ; e2 ; e3 ; e4 6 4 0:411 5 0:146 “the second normal vector00 W n2 WD
b2 jjb2 jj
jjb2 jj2 D 104 .372 C 672 C 292 C 12 / D 67 102 2 3 0:452 y y y y 6 0:819 7 7 n2 D Œe1 ; e2 ; e3 ; e4 6 4 0:354 5
(3.131) (3.132)
(3.133)
(3.134) (3.135)
(3.136)
0:012 Grassmann coordinates (Pl¨ucker coordinates) 2
3 1 1 ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ1 1ˇ ˇ1 1ˇ ˇ1 1 ˇ ˇ1 2ˇ ˇ1 2 ˇ ˇ1 3 ˇ 61 2 7 ˇ D ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ 6 7 ; ; ; ; ; ) g.A/ WD ˇ AD4 1 2 ˇ ˇ 1 3 ˇ ˇ 1 10 ˇ ˇ 1 3 ˇ ˇ 1 10 ˇ ˇ 1 10 ˇ 1 3 5 1 10 D fp12 ; p13 ; p14 ; p23 ; p24 ; p34 g (3.137) p12 D 1; p13 D 2; p14 D 9; p23 D 1; p24 D 8; p34 D 7: Again, the columns of the matrix A lay the foundation of GRASSMANN (A). Next we turn to GRASSMANN (B) to be identified as the normal space R.A/? . The normal vectors 3 3 2 b11 b21 7 y y y y 6 b21 7 y y y y 6 7 6 b22 7 n1 D Œe1 ; e2 ; e3 ; e4 6 4 b31 5 jjcol1 Bjj; n2 D Œe1 ; e2 ; e3 ; e4 4 b32 5 jjcol2 Bjj b41 b42 2
are computed from the normalized column vectors of the matrix B D Œb1 ; b2 . The normal vectors fn1 ; n2 g span the normal space R.A/? , also called GRASSMANN(B). Alternatively, we may substitute the normal vectors n1 and n2 by the Grassmann coordinates (Pl¨ucker coordinates) of the matrix A, namely by the Grassmann column vector.
3-3 Case Study
167
p12 p23
ˇ ˇ1 D ˇˇ ˇ1 ˇ1 D ˇˇ 1
ˇ ˇ ˇ ˇ ˇ ˇ1 1ˇ ˇ1 1 ˇ 1 ˇˇ ˇ ˇ ˇ ˇ D 1; p13 D ˇ ˇ D 2; p14 D ˇ 1 10 ˇ D 9 2 ˇˇ 1 3 ˇ ˇ ˇ ˇ ˇ1 2 ˇ ˇ1 3 ˇ 2 ˇˇ ˇ ˇ ˇ ˇ D D D 1; p D 8; p 24 34 ˇ 1 10 ˇ ˇ 1 10 ˇ D 7 3ˇ n D 4; m D 2; n m D 2
nD4 X
.ei1 ^ ei2 /ai1 1 ai2 2 D
i1 ;i2 D1
1 2Š i
nD4 X
ej1 ^ej2 "i1 ;i2 ;j1 ;j2 ai1 1 ai2 2
1 ;i2 ;j1 ;j2 D1
3 2 3 p12 1 6p 7 627 6 13 7 6 7 7 6 7 6 6p 7 697 g WD 6 14 7 D 6 7 2 R61 : 6 p23 7 6 1 7 7 6 7 6 4 p24 5 4 8 5 p34 7 2
?How do the vectors b1, b2,n1, n2 and g relate to each other? Earlier we already normalized, fb1 ;b2 g to fb1 ; b2 g; when we constructed fn1 ; n2 g. Then we are left with the question how to relate fb1 ;b2 g and fn1 ; n2 g to the Grassmann column vector g. The elements of the Grassmann column vector g.A/ associated with matrix A are the Grassmann coordinates (Pl¨ucker coordinates)fp12; p13 ; p14 ; p23 ; p24 ; p34 g in lexicographical order. They originate from the dual exterior form ˛m D ˇnm where ˛m is the original m-exterior form associated with the matrix A. n D 4; n m D 2 nD4
˛2 WD
1 X ei ^ ei2 ai1 ai2 2Š i i D1 1 1; 2
D
1 1 e1 ^ e2 .a11 a22 a21 a12 / C e1 ^ e3 .a11 a32 a31 a22 / 2Š 2Š 1 1 C e1 ^ e4 .a11 a42 a41 a12 / C e2 ^ e3 .a21 a32 a31 a22 / 2Š 2Š 1 1 C e2 ^ e4 .a21 a42 a41 a22 / C e3 ^ e4 .a31 a42 a41 a32 / 2Š 2Š
1 ˇ WD ˛2 .R / D 4i 4
nD4 X 1; i2 ;j1 ;j2 D1
ej1 ^ ej2 "i1 i2 j1 j2 ai1 1 ai1 2
168
3 The Second Problem of Algebraic Regression
D
1 1 1 e3 ^ e4 p12 C e2 ^ e4 p13 C e3 ^ e2 p14 4 4 4 1 1 1 C e4 ^ e1 p23 C e3 ^ e1 p24 C e1 ^ e2 p34 : 4 4 4
The Grassmann coordinates (Pl¨ucker coordinates) fp12 ; p13 ; p14 ; p23 ; p24 ; p34 g refer to the basis fe3 ^ e4 ; e2 ^ e4 ; e3 ^ e2 ; e4 ^ e1 ; e3 ^ e1 ; e1 ^ e2 g: Indeed the Grassmann space G2;4 spanned by such a basis can be alternatively covered by the chart generated by the column vectors of the matrix B, 2 WD
nD4 X
ej1 ej2 bj1 bj2 2 GRASSMANN.B/;
j1 ;j2
a result which is independent of the normalisation of fbj1 1 ; bj2 2 g. As a summary of the result of the two examples (i) A 2 R32 and (ii) A 2 R42 for a general rectangular matrix A 2 Rnm is needed. “The matrix B constitutes the latent restrictions also called latent condition equations. The column space R.B/ of the matrix B coincides with complementary column space R.A/? orthogonal to column space R.A/ of the matrix A. The elements of the matrix B are the Grassmann coordinates, also called Plucker ¨ coordinates, special sub determinants of the matrix A D Œai1 ; : : : ; ai m pj1 j2 WD
n X
"i1 im j1 jnm ai1 1 aim m :
i1 ;:::;im D1
The latent restrictions control the parameter adjustment in the sense of identifying outliers or blunders in observational data.”
3-34 From B to A: Latent Parametric Equations, Dual ¨ Grassmann Coordinates, Dual Plucker Coordinates While in the previous paragraph we started from a given matrix A 2 Rnm , n > m; rkA D m representing a special inconsistent systems of linear equations y D Ax C i, namely in order to construct the orthogonal complement R.A/? of R.A/, we now reverse the problem. Let us assume that a matrix B 2 Rln , ` < n, rkB D ` is given which represents a special inconsistent system of linear homogeneous condition equations B0 y D B0 i. How can we construct the orthogonal complement R.A/? of R.B/ and how can we relate the elements of R.B/? to the matrix A of parametric adjustment?
3-3 Case Study
169
First, let us depart from the orthogonality condition B0 A D 0 or A0 B D 0 we already introduced and discussed at length. Such an orthogonality condition had been the result of the orthogonality of the vectors y` D yO .LES S / and i` D Oi.LESS/. We recall the general condition of the homogeneous matrix equation. B0 A D 0 , A D ŒI` B.B0 B/1 B0 Z; which is, of course, not unique since the matrix Z 2 Rll is left undetermined. Such a result is typical for an orthogonality conditions. Second, let us construct the Grassmann space G`;n , in short GRASSMANN (B) as well as the Grassmann space Gn`;n , in short GRASSMANN (A) representing R.B/ and R.B/? , respectively. ` D
n 1 X ej ^ ^ ej ` b j 1 1 b j ` ` `Š j j D1 1 1
ın` WD ` D
1 .n `/Š i
(3.138)
`
n X 1 ;:::;in` ;j1 ;:::;j`
1 ei ^ ^ ein` "i1 in` j1 j` bj1 1 bj` ` : `Š 1 D1
The exterior form ` which is built on the column vectors fbj11 ; : : : ; bj` ` g of the matrix B 2 Rln is an element of the column space R.B/. Its dual exterior form
D ın` , in contrast, is an element of the orthogonal complement R.B/? . qi1 in` WD "i1 in` j1 j` bj1 1 bj` `
(3.139)
denote the Grassmann coordinates (Pl¨ucker coordinates) which are dual to the Grassmann coordinates (Pl¨ucker coordinates) pj1 jnm q WD Œqi1 : : : qn` is constituted by subdeterminants of the matrix B, while pWDŒpj1 : : : pnm by subdeterminants of the matrix A. The .˛; ˇ; ; ı/-diagram of Fig. 3.9 is commutative. If R.B/ D R.A/? , then R.B/? D R.A/. Identify ` D n m in order to convince yourself about the .˛; ˇ; ; ı/- diagram to be commutative. ˛m ! ˛m D ˇnm D nm ! nm D ˇnm D ˛m D .1/m.nm/ ˛m dn–l = ∗g1 id (l = n – m)
Fig. 3.9 Commutative diagram
am
g1 id (l = n – m) bn–m = ∗am
170
3 The Second Problem of Algebraic Regression
Third, let us specialize R.A/ D R.B/? and R.A/? D R.B/ by ` D n m. ˛m ! ˛m D ˇnm D nm ! nm D ˇnm D ˛m D .1/m.nm/ ˛m (3.140) The first and second example will be our candidates for test computations of the diagram of Fig. 3.9 to be commutative. Box 3.26 reviews direct and inverse Grassmann coordinates (Pl¨ucker coordinates) for A 2 R32 , B 2 R31 , Box 3.27 for A 2 R42 , B 2 R41 . Box 3.25. (Direct and inverse Grassmann coordinates (Plucker ¨ coordinates)) first example The forward computation 2
3 1 1 nD3 nD3 X X A D 4 1 2 5 2 R32 W a1 D ei1 ai1 1 and a2 D ei2 ai2 2 i1 D1 i2 D1 1 10 ˛2 WD
nD3 X 1 ei1 ^ ei2 ai1 1 ai2 2 2 ƒ2 .R3 / ƒm .Rn / 2Š i ;i D1 1 2
nD3 X
ˇ1 WD ˛2 WD
i1 ;i2 ;j1
1 ej1 "i1 i2 j1 ai1 1 ai2 2 2 ƒ2 .R3 / ƒm .Rn / 2Š D1
Grassmann coordinates (Pl¨ucker coordinates) ˇ1 D
1 1 1 e1 p23 C e2 p31 C e3 p12 2 2 2
p23 D a21 a32 a31 a22 ; p31 D a31 a12 a11 a32 ; p12 D a11 a22 a21 a12 ; p23 D 8; p31 D 9; p12 D 1 The backward computation 1 WD
nD3 X i1 ;i2 ;j1 D1
1 ej "i i j ai 1 ai 2 D e1 p23 C e2 p31 C e3 p12 2 ƒ1 .R3 / 1Š 1 1 2 1 1 2
ı2 WD 1 WD
ı2 D
1 2Š
1 2Š
nD3 X
ei1 ^ ei2 "i1 i2 j1 "j2 j3 j1 aj2 1 aj3 2 )
i1 ;i2 ;j1 ;j2 ;j3 D1
nD3 X i1 ;i2 ;j1 ;j2 ;j3 D1
ei1 ^ ei2 .ıi1 j2 ıi2 j3 ıi1 j3 ıi2 j2 /aj2 1 aj3 2
3-3 Case Study
171 nD3 1 X ei1 ^ ei2 ai1 1 ai2 2 D ˛2 2 ƒ2 .R3 / ƒm .Rn / 2
ı2 D
i1 ;i2 D1
inverse Grassmann coordinates (dual Grassmann coordinates, dual Pl¨ucker coordinates) ı2 D ˛2 D
1 1 e2 ^ e3 .a21 a32 a31 a22 / C e3 ^ e1 .a31 a12 a11 a32 / 2 2
1 C e1 ^ e2 .a11 a22 a21 a12 / 2 1 1 1 ı2 D ˛2 D e2 ^ e3 q23 C e2 ^ e3 q31 C e2 ^ e3 q12 2 ƒ2 .R3 /: 2 2 2 Box 3.26. (Direct and inverse Grassmann coordinates (Plucker ¨ coordinates)) second example The forward computation 2
3 1 1 nD4 nD4 X X 61 2 7 7 2 R42 W a1 D e a and a D ei2 ai2 2 AD6 i i 1 2 1 1 41 3 5 i1 D1 i2 D1 1 10 ˛2 WD
nD4 X 1 ei1 ^ ei2 ai1 1 ai2 2 2 ƒ2 .R4 / ƒm .Rn / 2Š i ;i D1 1 2
ˇ2 WD ˛2 WD
1 2Š i
nD4 X 1 ;i2 ;j1 ;j2
1 ej1 ^ ej2 "i1 i2 j1 j2 ai1 1 ai2 2 2 ƒ2 .R4 / ƒnm .Rn / 2Š D1
1 1 1 e3 ^ e4 p12 C e2 ^ e4 p13 C e3 ^ e2 p14 C 4 4 4 1 1 1 C e4 ^ e1 p23 C e3 ^ e1 p24 C e1 ^ e2 p34 4 4 4 p12 D 1; p13 D 2; p14 D 9; p23 D 1; p34 D 7
ˇ2 D
The backward computation 2 W D
1 2Š i
nD4 X
ej1 ^ ej2 "i1 i2 j1 j2 ai1 1 ai2 2 ƒ2 .R4 / ƒnm .Rn /
1 ;i2 ;j1 ;j2 D1
ı2 W D 2 WD
1 2Š i
nD4 X 1 ;i2 ;j1 ;j2 ;j 3 ;j4
1 ei1 ^ ei2 "i1 i2 j1 j2 "j1 j2 j3 j4 aj3 1 aj4 2 D 2Š D1
172
3 The Second Problem of Algebraic Regression
D ˛2 2 ƒ2 .R4 / ƒm .Rn / nD4
ı2 D ˛2 D
1 X ei ^ ei2 ai1 1 ai2 2 4 i ;i D1 1 1 2
1 1 1 e3 ^ e4 q12 C e2 ^ e4 q13 C e3 ^ e2 q14 C 4 4 4 1 1 1 C e4 ^ e1 q23 C e3 ^ e1 q24 C e1 ^ e2 q34 4 4 4
ı2 D ˛2 D
q12 D p12 ; q13 D p13 ; q14 D p14 ; q23 D p23 ; q24 D p24 ; q34 D p34 .
3-35 Break Points Throughout the analysis of high leverage points and outliers within the observational data we did assume a fixed linear model. In reality such an assumption does not apply. The functional model may change with time as Fig. 3.9 indicates. Indeed we have to break-up the linear model into pieces. Break points have to be introduced as those points when the linear model changes. Of course, a hypothesis test has to decide whether the break point exists with a certain prob-ability. Here we only highlight the notion of break points in the context of leverage points. For localizing break points we apply the Gauss–Jacobi Combinatorial Algorithm following J.L. Awange and E. W. Grafarend 2005 (2002), J. L. Awange (2002), A. T. Hornoch (1950), S. Wellisch (1910). Table 3.1 summarises a set of observations yi with n D 10 elements. Those measurements have been taken at time instants ft1 ; : : : ; t10 g. Figure 3.9 illustrates the graph of the corresponding function y.t/. By means of the celebrated Gauss– Jacobi Combinatorial Algorithm we aim at localizing break points. First, outlined in Box 3.28 we determine all the combinations of two points which allow the fit
Table 3.1 Test “ break points” observations for a piecewise linear model y t y1 y2 y3 y4 y5 y6 y7 y8 y9 y10
1 2 2 3 2 1 0.5 2 4 4.5
1 2 3 4 5 6 7 8 9 10
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
3-3 Case Study
173
Table 3.2
of a straight line without any approximation error. As a determined linear model y D Ax, A 2 R22 , rkA D 2 namely x D A1 y we calculate (3.142) x1 and (3.143) x2 in a closed form. For instance, the pair of observations .y1 ; y2 /, in short (1, 2) at .t1 ; t2 / D .1; 2/ determines .x1 ; x2 / D .0; 1/. Alternatively, the pair of observations .y1 ; y3 /, in short (1, 3), at .t1 ; t3 / D .1; 3/ leads us to .x1 ; x2 / D .0:5; 0:5/. Table 3.2 contains the possible 45 combinations which determine .x1 ; x2 / from .y1 ; : : : ; y10 /. These solutions are plotted in Figure 3.10.
174
3 The Second Problem of Algebraic Regression
Fig. 3.10 Top left: Graph of the function y(t), two break points. Top right: Gauss–Jacobi Combinatorial Algorithm, piecewise linear model, 1st cluster: (ti ; tj ). Bottom left: Gauss–Jacobi Combinatorial Algorithm, 2nd cluster: (ti ; tj ). Bottom right: Gauss–Jacobi Combinatorial Algorithm, 3rd cluster: (ti ; tj )
Box 3.27. (Piecewise linear model Gauss–Jacobi combinatorial algorithm) 1st step x1 y.ti / 1 ti D Ax 8i < j 2 f1; : : : ; ng yD D y.tj / 1 tj x2 y 2 R2 ; A 2 R22 I rkA D 2; x 2 R2 1 y.ti / tj ti x1 D x D A1 y , x2 y.tj / tj ti 1 1 x1 D
tj y1 ti y2 tj ti
and x2 D
yj yi tj ti
ti D t1 D 1; tj D t2 D 2 Example: y.t1 / D y1 D 1; y.t2 / D y2 D 2 x1 D 0; x2 D 1:
(3.141)
(3.142) (3.143)
3-3 Case Study
175
ti D t1 D 1; tj D t3 D 3 Example: y.t1 / D y1 D 1; y.t3 / D y3 D 2 x1 D 0:5 and x2 D 0:5 : Second, we introduce the pullback operation Gy ! Gx : The matrix of the metric Gy of the observation space Y is pulled back to generate by (3.144) the matrix of the metric Gx of the parameter space X for the “determined linear model” y D Ax, A 2 R22 ; rkA D 2, namely Gx D A0 Gy A. If the observation space y y y y Y D spanfe1 ; e2 g is spanned by two orthonormal vectors e1 ; e2 relating to a pair of observations .yi ; yj /; i < j; i; j 2 f1; : : : ; 10g, then the matrix of the metric Gy D I2 of the observation space is the unit matrix. In such an experimental situation (3.145) Gx D A0 A is derived. For the first example .ti ; tj / D .1; 2/ we are led to vech Gx D Œ2; 3; 50 . “Vech half ” shortens the matrix of the metric Gx 2 R22 of the parameter space X Ö .x1 ; x2 / by stacking the columns of the lower triangle of the symmetric matrix Gx . Similarly, for the second example .ti ; tj / D .1; 3/ we produce vech Gx D Œ2; 4; 100 . For all the 45 combinations of observations .yi ; yj /. In the last column Table 3.2 contains the necessary information of the matrix of the metric Gx of the parameter space X formed by .vech Gx /0 . Indeed, such a notation is quite economical. Box 3.28. (Piecewise linear model: Gauss–Jacobi combinatorial algorithm) 2nd step pullback of the matrix GX the metric from Gy G x D A0 G y A
(3.144)
“if Gy D I2 , then Gx D A0 A” " 0
Gx D A A D
2 ti C tj ti C tj ti2 C tj2
# 8 i < j 2 f1; : : : ; ng:
(3.145)
Example: ti D t1 D 1, tj D t2 D 2
23 Gx D ; vechGx D Œ2; 3; 50 : 35 Example: ti D t1 D 1, tj D t3 D 3
2 4 Gx D ; vechGx D Œ2; 4; 100 : 4 10 Third, we are left the problem to identify the break points. C.F. Gauss (1828) and C.G.J. Jacobi (1841) have proposed to take the weighted arithmetic mean of the
176
3 The Second Problem of Algebraic Regression
combinatorial solutions .x1; x2/.1;2/ ; .x1; x2/.1;3/ , in general .x1; x2/.i;j / ; i < j , are considered as Pseudo-observations. Box 3.29. (Piecewise linear model: Gauss–Jacobi combinatorial algorithm) 3rd step pseudo-observations Example 2
3 2 3 2 3 .1;2/ x1 10 i1 6 .1;2/ 7 6 6 i2 7 x 6 x2 7 6 0 1 7 1 7 7 C6 6 .1;3/ 7 D 4 4 i3 5 4 x1 5 1 0 5 x2 .1;3/ i4 01 x2
2 R41
(3.146)
Gx -LESS 2
3 .1;2/ x1 .1;2/ 7 1 .1;2/ .1;3/ 6 xO 1 6 x2 7 D G.1;2/ Gx ; Gx C G.1;3/ x` WD xO D 6 .1;3/ 7 x x 4 x1 5 xO 2 .1;3/ x2 vechG.1;2/ D Œ2; 3; 50 ; vechG.1;3/ D Œ2; 4; 100 x x 23 2 4 .1;3/ G.1;2/ D ; G D x x 35 4 10 " # .1;2/ x1 0 D ; .1;2/ 1 x2 " # .1;3/ x1 0:5 D .1;3/ 0:5 x2
2 R21 (3.147)
1 1 15 7 4 7 D ŒG.1;2/ D C G.1;3/ 1 x x 7 15 11 7 4 1 6 6 xO 1 D 0:545; 454: D x` WD , xO 1 D xO 2 D xO 2 11 6 11 G1 x D
Box 3.30 provides us with an example. For establishing the third step of the Gauss– Jacobi Combinatorial Algorithm. We outline GX -LESS for the set of pseudoobservations (3.146) .x1 ; x2 /.1;2/ and .x1 ; x2 /.1;3/ solved by (3.177) x` D .xO 1 ; xO 2 /.
3-3 Case Study
177 .1;2/
.1;3/
The matrices Gx and Gx representing the metric of the parameter space X derived from .x1 ; x2 /.1;2/ and .x1 ; x2 /.1;3/ are additively composed and inverted, a result which is motivated by the special design matrix A D ŒI2 ; I2 0 of “direct” .1;2/ .1;3/ pseudo-observations. As soon as we implement the weight matrices Gx and Gx .1;2/ .1;3/ from Table 3.2 as well as .x1 ; x2 / and .x1 ; x2 / we are led to the weighted arithmetic mean xO 1 D xO 2 D 6=11. Such a result has to be compared with the componentwise median x1 (median) = 1/4 and x2 .med i an/ D 3=4. 2
.1; 2/; .1; 3/ .1; 2/; .1; 3/ 6 combination combination 6 6 median 6 Gx -LESS 6 4 xO 1 D 0:545; 454 x1 .median/ D 0:250 xO 2 D 0:545; 454 x2 .median/ D 0:750 : .1;2/
.1;3/
.1;2/
.1;3/
Here, the arithmetic mean of x1 ; x1 and x2 ; x2 median neglecting the weight of the pseudo-observations.
coincides with the
Box 3.30. (Piecewise linear models and two break points: “Example”) 3 x1 2 3 2 3 6 x2 7 7 2i 3 y1 1 n1 tn1 0 0 0 0 6 6 7 y1 x 37 4 y2 5 D 4 0 0 1n tn 0 0 5 6 4 iy 5 C 7 6 2 2 2 6 x4 7 y3 iy3 0 0 0 0 1 n3 tn3 6 7 4 x5 5 x6 2
(3.148)
In1 -LESS; In2 -LESS; In3 -LESS; 0 1 x1 .t n1 tn1 /.10 n1 yn1 / .10 n1 tn1 /.t0 n1 yn1 / (3.149) D .10 n1 tn1 /.10 n1 y1 / C n1 t0 n1 y1 x2 ` n1 t0 n1 tn1 .10 n1 tn1 /2 0 1 x3 .t n2 tn2 /.10 n2 yn2 / .10 n2 tn2 /.t0 n2 yn2 / (3.150) D x4 ` .10 n2 tn2 /.10 n2 y2 / C n2 t0 n2 y2 n2 t0 n2 tn2 .10 n2 tn2 /2 0 1 x5 .t n3 tn3 /.10 n3 yn3 / .10 n3 tn3 /.t0 n3 yn3 / : (3.151) D x6 ` .10 n3 tn3 /.10 n3 y3 / C n3 t0 n3 y3 n3 t0 n3 tn3 .10 n3 tn3 /2
Box 3.31. (Piecewise linear models and two break points: “Example”) 1st interval: n = 4, m = 2, t 2 ft1 ; t2 ; t3 ; t4 g
178
3 The Second Problem of Algebraic Regression
2
1 61 y1 D 6 41 1
3 t1 t2 7 7 x1 C iy D 1n x1 C tn x2 C iy 1 1 t3 5 x2 t4
(3.152)
2nd interval: n = 4, m = 2, t 2 ft4 ; t5 ; t6 ; t7 g 2
1 61 y2 D 6 41 1
3 t4 t5 7 7 x3 C iy D 1n x3 C tn x4 C iy 2 2 t6 5 x4 t7
(3.153)
3rd interval: n = 4, m = 2, t 2 ft7 ; t8 ; t9 ; t10 g 3 1 t7 6 1 t8 7 x5 7 6 y3 D 4 C iy3 D 1n x5 C tn x6 C iy3 : 1 t9 5 x6 1 t10 2
(3.154)
Figure 3.10 have illustrated the three clusters of combinatorial solutions referring to the first, second and third straight line. Outlined in Box 3.31 and Box 3.31, namely by (3.148) to (3.151), n1 D n2 D n3 D 4, we have computed .x1 ; x2 /` for the first segment, .x3 ; x4 /` for the second segment and .x5 ; x6 /` for the third segment of the least squares fit of the straight line. Table 3.3 contains the results explicitly. Similarly, by means of the Gauss–Jacobi Combinatorial Algorithm of Table 3.4 we have computed the identical solution .x1 ; x2 /` , .x3 ; x4 /` and .x5 ; x6 /` as “weighted arithmetic means” numerically presented only for the first segment. Table 3.3 (In -LESS solutions for those segments of a straight line, two break points)
0:5 D W y.t/ D 0:5 C 0:6t I4 -LESS W 0:6 ` 1 126 x3 W y.t/ D 6:30 0:85t I4 -LESS W D x4 ` 20 17 1 183 x5 I4 -LESS W D W y.t/ D 9:15 C 1:40t x6 ` 28 20 x1 x2
Table 3.4 (Gauss–Jacobi Combinatorial Algorithm for the first segment of a straight line)
3-3 Case Study
179
i1 h .1;2/ .1;3/ .1;4/ .2;3/ .2;4/ .3;4/ D Gx C Gx C Gx C Gx C Gx C Gx
2 .1;2/ 3 x1 6 x .1;2/ 7 6 2 7 6 .1;3/ 7 6 x1 7 6 .1;3/ 7 6 x2 7 6 .1;4/ 7 6x 7 6 1 7 h i 6 x .1;4/ 7 7 .1;2/ .1;3/ .1;4/ .2;3/ .2;4/ .3;4/ 6 2 Gx ; Gx ; Gx ; Gx ; Gx ; Gx 6 .2;3/ 7 6 x1 7 6 .2;3/ 7 6x 7 6 2 7 6 x .2;4/ 7 6 1 7 6 .2;4/ 7 6 x2 7 6 .3;4/ 7 4 x1 5 xO 1 xO 2
(3.155)
.3;4/
x2
G.1;2/ x
C
G.1;3/ x
C
G.1;4/ x
" G.1;2/ x " G.1;3/ x " G.1;4/ x " G.2;3/ x " G.2;4/ x " G.3;4/ x
.1;2/
x1 .1;2/ x2 .1;3/
x1 .1;3/ x2 .1;4/
x1 .1;4/ x2 .2;3/
x1 .2;3/ x2 .2;4/
x1 .2;4/ x2
C
G.2;3/ x
#
23 D 35
#
D
#
D
# #
C
G.2;4/ x
C
1 G.3;4/ x
1 15 5 D 30 5 2
0 3 D 1 5
2 4 4 10 2 5 5 17
0:5 3 D 0:5 7
0:333 4 D 0:666 13
2 4 D 0 10
2 5 D 5 13 2 6 D 6 20
1 5 D 0:5 16
# .3;4/ x1 2 7 1 5 D D .3;4/ 7 25 1 18 x2 1 15 5 24 0:5 xO 1 D : D 69 0:6 xO 2 30 5 2
180
3 The Second Problem of Algebraic Regression
3-4 Special Linear and Nonlinear Models: A Family of Means for Direct Observations In case of direct observations, LESS of the inconsistent linear model y D 1n x C i has led to ˇ 1 x` D argfy D 1n x C i ˇ jj i jj2 D min g; x` D .10 1/1 10 y D .y1 C C yn /: n Such a mean has been the starting point of many alternatives we present to you by Table 3.5 based upon S.R. Wassels (2002) review.
3-5 A Historical Note on C.F. Gauss and A.M. Legendre The inventions of Least Squares and its generalization The historian S.M. Stigler (1999, pp 320, 330–331) made the following comments on the history of Least Squares. “The method of least squares is the automobile of modern statistical analysis: despite its limitations, occasional accidents, and incidental pollution, this method and its numerous variations, extensions, and related conveyances carry the bulk of statistical analyses, and are known and valued by nearly all. But there has been some dispute, historically, as to who is the Henry Ford of statistics. Adrian Marie Legendre published the method in 1805, an American, Robert Adrian, published the method in late 1808 or early 1809, and Carl Fiedrich Gauss published the method in 1809. Legendre appears to have discovered the method in early 1805, and Robert Adrain may have “discovered” it in Legendre’s 1805 book (Stigler 1977c, 1978c), but in 1809 Gauss had the temerity to claim that he had been using the method since 1795, and one of the most famous priority disputes in the history of science was off and running. It is unnecessary to repeat the details of the dispute – R.L. Plackett (1972) has done a masterly job of presenting and summarizing the evidence in the case.
Table 3.5 (A family of means) Name arithmetic mean
Formula 1 x` D .y1 C C yn / n 1 n D 2 W x` D .y1 C y2 / 2 x` D .10 Gy 1/1 10 Gy y if Gy D Diag.g1 ; : : : ; g1 /
weighted arithmetic mean
x` D
g1 y1 C gn yn g1 C C gn
(continued)
3-5 A Historical Note on C.F. Gauss and A.M. Legendre
181
Table 3.5 (continued) Name
Formula
geometric mean
p xg D n y1 yn D
logarithmic mean
xlog
n Q
1=n
yi pi D1 n D 2 W xg D y1 y2 1 D .ln y1 C C ln yn / n xlog D ln xg y.1/ < < y.n/
median
Wassels family of means
ordered set of observations 2 y.kC1/ if n D 2k C 1 6 “add00 med y D 6 4 .y.k/ C y.kC1/ /=2 if n D 2k “even00 .y1 /pC1 C C .yn /pC1 .y1 /p C C .yn /p n n P P xp D .yi /pC1 .yi /p
xp D
i D1
i D1
Case p D 0 W xp D x` Case p D 1=2; n D 2 W xp D x` Hellenic mean
Casep D 1 W nD2W 1 1 y1 C y21 2y1 y2 D : H D H.y1 ; y2 / D 2 y1 C y2
Let us grant, then, that Gauss’s later accurate were substantially accunate, and that he did device the method of least squares between 1794 and 1799, independently of Legendre or any other discoverer. There still remains the question, what importance did he attach to the discovery? Here the answer must be that while Gauss himself may have felt the method useful, he was unsuccessful in communicating its importance to other before 1805. He may indeed have mentioned the method to Olbers, Lindemau, or von Zach before 1805, but in the total lack of applications by others, despite ample opportunity, suggests the message was not understood. The fault may have been more in the listener than in the teller, but in this case its failure serves only to enhance our admiration for Legendre’s 1805 success. For Legendre’s description of the method had an immediate and widespread effect – as we have seen, it even caught the eye and under-standing of at least one of those astronomers (Lindemau) who had been deaf to Gauss’s message, and perhaps it also had an influence upon the form and emphasis of Gauss’s exposition of the method. When Gauss did publish on least squares, he went far beyond Legendre in both conceptual and technical development, linking the method to probability and providing algorithms for the computation of estimates. His work has been discussed often, including by
182
3 The Second Problem of Algebraic Regression
H.L. Seal (1967), L. Eisenhart (1968), H. Goldsteine (1977, 4.9, 4.10), D.A. Sprott (1978), O.S. Sheynin (1979, 1993, 1994), S.M. Stigler (1986), J.L. Chabert (1989), W.C. Waterhouse (1990), G.W. Stewart (1995), and J. Dutka (1996). Gauss’s development had to wait a long time before finding an appreciative audience, and much was intertwined with other’s work, notably Laplace’s. Gauss was the first among mathematicians of the age, but it was Legendre who crystallized the idea in a form that caught the mathematical public’s eye. Just as the automobile was not the product of one man of genius, so too the method of least squares is due to many, including at least two independent discoverers. Gauss may well have been the first of these, but he was no Henry Ford of statistics. If these was any single scientist who first put the method within the reach of the common man, it was Legendre.”
Indeed, these is not much to be added. G.W. Stewart (1995) recently translated the original Gauss text “Theoria Combinationis Observationum Erroribus Minimis Obmaxial, Pars Prior. Pars Posterior” from the Latin origin into English. F. Pukelsheim (1998) critically reviewed the sources, the reset Latin text and the quality of the translation. Since the English translation appeared in the SIAM series “Classics in Applied Mathematics”, he concluded: “Opera Gaussii contra SIAM defensa”. “Calculus probilitatis contra La Place defenses.” This is Gauss’s famous diary entry of 17 June 1798 that he later quoted to defend priority on the Method of Least Squares (Werke, Band X, 1, p.533).
C.F. Gauss goes Internet With the Internet Address http://gallica.bnf.fr you may reach the catalogues of digital texts of Bibliotheque Nationale de France. Fill the window “Auteur” by “Carl Friedrich Gauss” and you reach “Types de documents”. Continue with “Touts les documents” and click “Rechercher” where you find 35 documents numbered 1 to 35. In total, 12732 “Gauss pages” are available. Only the Gauss–Gerling correspondence is missing. The origin of all texts are the resources of the Library of the Ecole Polytechnique. Meanwhile Gauss’s Werke are also available under http://www.sub.uni-goettingen.de/. A CD-Rom is available from “Nieders¨achsische Staats- und Universit¨atsbibliothek.” For the early impact of the Method of Least Squares on Geodesy, namely W. Jordan, we refer to the documentary by S. Nobre and M. Teixeira (2000).
Chapter 4
The Second Problem of Probabilistic Regression Special Gauss–Markov method without Datum Defect Setup of BLUUE for the moments of first order and of BIQUUE for the central moment of second order In probabilistic regression we study the determination of first moments as well as second central moments of type special Gauss–Markov model without Datum Defect. Theoretical results and geodetic applications are taken from E. Grafarend’s “Variance–covariance component estimation” (Statistical and Decision, Supplement Issue No. 2, pages 401–441, 105 references, Oldenburg Verlag, M¨unchen 1989). The original contributions of variance–covariance component estimation as well as geodetic examples are presented in “Gewichtssch¨atzung in Geod¨atischen Netzen” (weight estimation in geodetic networks) by E. Grafarend and A. d’Hone (Deutsche Geod¨atische Kommission, Bayerische Akademie der Wissenschaften, Series A, Report 88, pages 1–43, 46 references, M¨unchen 1978) namely introducing (a) quadratic unbiased estimators: QUE (b), invariant quadratic unbiased estimators: IQUE, (c) locally best, quadratic unbiased estimators, (d) locally best, invariant quadratic unbiased estimators, (e) best QUE and best IQUE, (f) quadratic unbiased estimators of minimum norm: MINQUE. The geodetic examples are reproduced here. The references include G. Dewess (1973), H. Drygas (1972, 1977), H. Ebner (1977), J. Focke and G. Dewess (1972), E. Grafarend (1974, 1978), F. Graybill (1954), G. Hein (1977), F.R. Helmert (1907), C.R. Hendersen (1953), S.D. Horn, R.A. Horn and D.B. Dungan (1975), P.L. Hsu (1938), R. Keln (1978), J. Kleffe (1975 i, ii, 1976, 1977), J. Kleffe and R. Pincus (1974 i,ii), K.R. Koch (1976, 1977), K. Kubik (1970), L.R. Lamotte (1973), H. Neudecker (1969), R. Pincus (1974), F. Pukelsheim (1976), C.R. Rao (1970, 1971 i,ii, 1972), C.R. Rao and S.K. Mirra (1971), J.N.K. Rao (1973), R. Rudolph (1976), M. Schuler (1932), M. Schuler and H. Wolf (1954), R.S. Searle (1971), J. Seely (1970 i,ii, 1971), J. Seely and C. Zyskind (1971), E.C. Townsend and S.R. Searle (1971), W. Welsch (1977) and H. Wolf (1975). Real problems are nonlinear. We have to divide the solutions space into three classes: .a/ strong nonlinear, .b/ weak nonlinear and .c/ linear. Weak nonlinearity is defined by nonlinear systems which allow a Taylor series approximation. Linear system equations, for instance “linear best uniformly unbiased estimator” or “best invariant quadratic uniformly unbiased estimator,” have the advantage that their normal equation have only one minimum solution. We call such a solution “global” or “uniform.” In contrast to linear estimation, weak nonlinear equations produce E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 4, © Springer-Verlag Berlin Heidelberg 2012
183
184
4 The Second Problem of Probabilistic Regression
“local” solutions. They have the disadvantage that many local solutions may exist. A typical example is a geodetic network analysis by P. Lohse (1994). For more details read C.R. Rao and J. Kleffe (1969, pages 161–180). For beginners, not for mathematicians, is our introductory example (see Tables 4.1 and 4.2) in Sect. 4-1. In Sect. 4-2 we introduce the “Best Linear Uniformly Unbiased estimator” by means of Gauss elimination. Famous is the Gauss–Markov Equivalence Theorem (Gy –LESS) (†y –BLUUE), namely weighted least squares solution equivalent to †y – best linear uniformly unbiased estimator. For instance, F. Pukelsheim (1990, pages 20–24) is an excellence reference. For the setup of the “best invariant quadratic uniformly unbiased” estimation in Sect. 4-3, namely by BLOCK PARTITIONING, we use the E-D equivalence presented in Sect. 4-37. Alternative estimators of type IQE, “Invariant Quadratic Uniformly Unbiased Estimation” of variance–covariance components of type IQUUE, especially F.R. Helmer’s BIQUUE which was developed as early as 1907 by two methods. He used two setups presented in detail by Sect. 4-35 “Invariant Quadratic Uniformly Unbiased Estimators” of variance–covariance components of Helmert type: HIQUUE versus HIQE. We give credit to B. Schaffrin (1983) who classified the Helmert test matrix for the first time correctly. E. Grafarend (1994) used variance–covariance component estimation of Helmert type for condition equations in the model B© = By–c, By … R(A) + c (E. Grafarend 1994, pages 38–40) as well as for condition equations with unknown parameters in the model Ax + B© = By–c, By … R(A) + c (E. Grafarend 1994, pages 41–43), geodetic are also given. The various methods for solving the basic problem of variance–covariance component estimation was the final in the eighties of the twentieth century. Since F.R. Helmerts invention of the variance–covariance estimation in the early twentieth century, geodesists have used his results in many contributions listed in Sect. 4-37. The first example is a linear model Efyg = Ay = Ax1 , Dfyg = I 2 = Ix2 with one covariance component. We estimate (x1 ˆ, x2 ˆ) of type linear and unbiased. We apply again the IPM method in order to find x1 ˆ by (4.176), (4.177) and x2ˆ by (4.82), (4.183). The second example is variance component estimation of Helmert type resulting in two cases (a) det(H) ¤ 0 and (b) det(H) = 0 in a two step procedure. The result is given by (4.210)–(4.212). Special geodetic problems will be discussed in Examples 3 and 4. Example 3 treats a two variance component model for a hybrid direction – distance network, the Huaytapallana network in Peru illustrated by Fig. 4.2 and resulted in Tables 4.1–4.3. Figure 4.3 illustrates the two variance component estimation of Helmert type for the reproducing point for this network. Example 4 is based on a two variance component, multivariate gyrocompass observation model for a first order Markov process: special attention is payed to the reproducing point. Example 5 is a special treatment of Bayes design for first and second order statistics, the moment estimation using prior information for the mean parameter vector Ÿ as well as the model variance 2 . We have followed F. Pukelsheim (1990 pp. 270–272).
4 The Second Problem of Probabilistic Regression
185
Finally we have highlighted the five postulates for inhomogeneous, bilinear estimation: 1. 2. 3. 4. 5.
Inhomogeneous, multilinear Invariance “unbiased” or “minimum bias in the mean” Second order statistics, quasi-Gauss normal distribution Best estimation of the first and central second order moments In particular, we emphasize the “E-D-correspondance”.
Fast track reading : Read only Theorem 4.3 and Theorem 4.13.
Lemma 4.2 ξˆ : Σy - BLUUE of ξ
Definition 4.1 ξˆ : Σy - BLUUE of ξ
Theorem 4.3 ξˆ : Σy - BLUUE of ξ
Lemma 4.4 E{y}, ˆ Σy -BLUUE of E{y} e% y, D{e%y}, D{y}
“The first guideline of chapter four: definition, lemmas and theorem” In 1823, supplemented in 1828, C.F. Gauss put forward a new substantial generalization of “least squares” pointing out that an integral measure of loss, more definitely the principle of minimum variance, was preferable to least squares and to maximum likelihood. He abandoned both his previous postulates and set high store by the formula O 2 which provided an unbiased estimate of variance 2 . C. F. Gauss’s contributions to the treatment of erroneous observations, later on extended by F. R. Helmert, defined the state of the classical theory of errors. To the analyst C.F. Gauss’s preference to estimators of type BLUUE (Best Linear Uniformly Unbiased Estimator) for the moments of first order as well as of type BIQUUE (Best Invariant Quadratic Uniformly Unbiased Estimator) for the moments of second order is completely unknown. Extended by A.A. Markov who added correlated observations to the Gauss unbiased minimum variance estimator we present to you BLUUE of fixed effects and † y -BIQUUE of the variance component. “The second guideline of chapter four: definitions, lemmas, corollaries and theorems”
186
4 The Second Problem of Probabilistic Regression
In the third chapter we have solved a special algebraic regression problem, namely the inversion of a system of inconsistent linear equations of full column rank classified as “overdetermined”. By means of the postulate of a least squares solution jjijj2 D jjy Axjj2 D min we were able to determine m unknowns from n observations (n > m: more equations n than unknowns m). Though “LESS” generated a unique solution to the “overdetermined” system of linear equations with full column rank, we are unable to classify “LESS”. There are two key questions we were not able to answer so far: In view of “MINOS” versus “LUMBE” we want to know whether “LESS” produces an unbiased estimation or not. How can we attach to an objective accuracy measure “LESS”?
Theorem 4.5 equivalence of S y-BLUUE and Gy-LESS
Corollary 4.15 IQUUE of Helmert type: HIQUUE
Corollary 4.6 multinomial inverse
Definition 4.7 invariant quadratic estimation 2 2 sˆ of s : IQE
Lemma 4.8 invariant quadratic estimation 2 2 sˆ of s : IQE
Definition 4.9 variance-covariance components model sˆk IQE of sk
Lemma 4.10 invariant quadratic estimation sˆk of sk : IQE
Definition 4.11 invariant quadratic unformly unbiased estimation: IQUUE
Lemma 4.12 invariant quadratic unformly unbiased estimation: IQUUE Lemma 4.13 var-cov components: IQUUE
Corollary 4.16 Helmert equation det H ≠ 0 Corollary 4.17 Helmert equation det H = 0
Definition 4.18 best IQUUE
Corollary 4.19 Gauss normal IQE
Lemma 4.20 Best IQUUE
Theorem 4.21 2 sˆ BIQUUE of s
Corollary 4.14 translational invariance
The key for evaluating “LESS” is handed over to us by treating the special algebraic regression problem by means of a special probabilistic regression problem, namely a special Gauss–Markov model without datum defect. We shall prove that uniformly unbiased estimations of the unknown parameters of type “fixed effects” exist. “LESS” is replaced by “BLUUE” (Best Linear Uniformly Unbiased Estimation). The fixed effects constitute the moments of first order of the under-lying probability distributions of the observations to be specified. In contrast, its central moments of second order, known as the variance-covariance matrix or dispersion matrix, open the door to associate to the estimated fixed effects an objective accuracy measure.
4-1 Introduction
187
? What is a probabilistic problem ? By means of certain statistical objective function, here of type “best linear uniformly unbiased estimation” (BLUUE) for moments of first order
“best quadratic invariant uniformly unbiased estimation” (BIQUUE) for the central moments of second order
we solve the inverse problem of linear, lateron nonlinear equations with fixed effects which relates stochastic observations to parameters. According to the Measurement Axiom, observations are elements of a probability space. In terms of second order statistics the observation space Y of integer dimension, d i m Y D n, is characterized by the first moment Efyg, the expectation of (BLUUE) y 2 fY; pdfg
and
the central second moment Dfyg the dispersion matrix or (BIQUUE) variance-covariance matrix † y :
In the case of “fixed effects” we consider the parameter space „, dim „ D m, to be metrical. Its metric is induced from the probabilistic measure of the metric, the variance-covariance matrix † y of the observations y 2 fY; pdfg. In particular, its variance-covariance matrix is pulled-back from the variancecovariance matrix † y . In the special probabilistic regression model with unknown “fixed effects” 2 „ (elements of the parameter space) are estimated while the random variables like y Efyg are predicted.
4-1 Introduction Our introduction has four targets. First, we want to introduce , O a linear estimation of the mean value of “direct” observations, and O 2 , a quadratic estimation of their variance component. For such a simple linear model we outline the postulates of uniform unbiasedness and of minimum variance. We shall pay special attention to the key role of the invariant quadratic estimation (“IQE”) O 2 of 2 . Second, we intend to analyse two data sets, the second one containing an outlier, by comparing the arithmetic mean and the median as well as the “root mean square error” (r.m.s.) of type BIQUUE and the “median absolute deviation” (m.a.d.). By proper choice of the bias term we succeed to prove identity of the weighted arithmetic mean and the median for the data set corrupted by an obvious outlier. Third, we discuss the competitive estimator “MALE”, namely Maximum Likelihood Estimation which does not produce an unbiased estimation O 2 of 2 , in general. Fourth, in order to develop the best quadratic uniformly unbiased estimation O 2 of 2 , we have to
188
4 The Second Problem of Probabilistic Regression
highlight the need for fourth order statistic. “IQE” as well as “IQUUE” depend on the central moments of fourth order which are reduced to central moments of second order if we assume “quasi-normal distributed” observations.
4-11 The Front Page Example By means of Table 4.1 let us introduce a set of “direct” measurements yi ; i 2 1; 2; 3; 4; 5 of length data. We shall outline how we can compute the arithmetic mean 13.0 as well as the standard deviation of 1.6. In contrast, Table 4.2 presents an augmented observation vector: The observations six is an outlier. Again we have computed the new arithmetic mean 30:16N as well as the standard deviation 42.1. In addition, for both examples we have calculated the sample median and the sample absolute deviation for comparison. All definitions will be given in the context as well as a careful analysis of the two data sets. Table 4.1 “direct” observations, comparison of mean and median (mean y D 13; med y D 13; Œn=2 D 2; Œn=2 C 1 D 3; med y D y.3/ ; mad y D med jy.i / med yj=1; r:m:s:.I BIQU UE/ D 1:6) Number of observ. 1 2 3 4 5
Observ. Difference of yi observation and mean 15 C2 12 1 14 C1 11 2 13 0
Difference of observation and median C2 1 C1 2 0
Ordered set of observations y.i / 11 12 13 14 15
Ordered set of jy.i / med yj 0 1 1 2 2
Ordered set of y.i / mean y C2 1 C1 1 0
N Table 4.2 ”direct” observations, effect of one outlier (mean y D 30:1 ˇ 6; ˇ med y D .13 C 14/=2 D 13:5; r:m:s:.I BLU UE/ D 42:1; med y.i / medyj D mady D 1:5) Number of observ. 1 2 3 4 5 6
Observ. Difference of yi observation and mean 15 15:16N 12 18:16N 14 16:16N 11 19:16N 13 17:16N 116 C85:83N
Difference of observation and median C1.5 1.5 C0.5 2.5 0.5 C102.5
Ordered set of observations y.i / 11 12 13 14 15 116
Ordered set of jy.i / med yj 0.5 0.5 1.5 1.5 2.5 C102.5
Ordered set of y.i / mean y 15:16N 16:16N 17:16N 18:16N 19:16N C85:83N
4-1 Introduction
189
4-12 Estimators of Type BLUUE and BIQUUE of the Front Page Example In terms of a special Gauss–Markov model our data set can be described as following. The statistical moment of first order, namely the expectation Efyg D 1 of the observation vector y 2 Rn , here n D 5 and the central statistical moment of second order, namely the variance-covariance matrix † y , also called the dispersion matrix Dfyg D In 2 ; Dfyg DW † y 2 Rnn ; rk† y D n; of the observation vector y 2 Rn , with the variance 2 characterize the stochastic linear model. The mean 2 R of the “direct” observations and the variance factor 2 are unknown. We shall estimate .; 2 / by means of three postulates: • First postulate: O W linear estimation, O 2 : quadratic estimation n X
O D
lp yp
or
O D l0 y
pD1
O 2 D
n X
mpq yp yq
or
p;qD1
O 2 D y0 My D .y0 ˝ y0 /.vecM/ D .vecM/0 .y ˝ y/
• The second postulate: uniform unbiasedness Efg O D for all 2 R EfO 2 g D 2 for all 2 2 RC • The third postulate: minimum variance Dfg O D EfŒO Efg O 2 g D min and DfO 2 g D EfŒO 2 EfO 2 g2 g D min `
M
O D arg min Dfj O O D `0 y; Efg O D g `
O 2 D arg min DfO 2 jO 2 D y0 My; EfO 2 g D 2 g: M
First, we begin with the postulate that the fixed unknown parameters .; 2 / are estimated by means of a certain linear form O D `0 y C D y0 ` C and by means of a certain quadratic form O 2 D y0 My C x0 y C ! D .vecM/0 .y ˝ y/C Cx0 y C ! of the observation vector y, subject to the symmetry condition M 2 SYM WD fM 2 Rnn j M D M0 g; namely the space of symmetric matrices. Second we demand Efg O D ; EfO 2 g D 2 , namely unbiasedness of the estimations 2 .; O O /. Since the estimators .; O O 2 / are special forms of the observation vector n y 2 R , an intuitive understanding of the postulate of unbiasedness is the following: If the dimension of the observation space Y Ö y, d i m Y D n, is going to infinity, we expect information about the “two values” .; 2 /, namely
190
4 The Second Problem of Probabilistic Regression
lim .n/ O D ; lim O 2 .n/ D 2 :
n!1
n!1
Let us investigate how LUUE (Linear Uniformly Unbiased Estimation) of as well as IQUUE (Invariant Quadratic Uniformly Unbiased Estimation) operate. LUUE 0 0 Efg O D Efl y C g D l Efyg C ) Efyg D 1n 0 0 Efg O D l Efyg C D l 1n C 0 Efg O D , D 0; .l 1n 1/ D 0 0 , D 0; l 1n 1 D 0 for all 2 R: 0
Indeed O is LUUE if and only if D 0 and .l 1n 1/ D 0 f or al l 2 R. The zero identity .`0 1n 1/ D 0 is fulfilled by means of `0 1n 1 D 0; `0 1n D 1; if we restrict the solution by the quantor “f or al l 2 R”. D 0 is not an admissible solution. Such a situation is described as “uniformly unbiased”. We summarize that LUUE is constrained by the zero identity `0 1n 1 D 0: Next we shall prove that O 2 is IQUUE if and only if IQUUE EfO 2 g D Efy0 My C x0 y C !g D
EfO 2 g D Efy0 My C x0 y C !g D
Ef.vec M/0 .y ˝ y/ C x0 y C !g D
Ef.y0 ˝ y0 /.vec M/0 C y0 x C !g D
.vec M/0 Efy ˝ yg C x0 Efyg C !
Efy0 ˝ y0 g.vec M/0 C Efy0 gx C !:
O 2 is called translational invariant with respect to y 7! y Efyg if O 2 D y0 My C x0 y C ! D .y Efyg/0 M.y Efyg/ C x0 .y Efyg/ C ! and uniformly unbiased if EfO 2 g D .vec M/0 Efy ˝ yg C x0 Efyg C ! D 2 for all 2 2 RC : Finally we have to discuss the postulate of a best estimator of type BLUUE of and BIQUUE of 2 . We proceed sequentially, first we determine O of type BLUUE and second O 2 of type BIQUUE. At the end we shall discuss simultaneous estimation of .; O O 2 /.
4-1 Introduction
191
The scalar O D `0 y is BLUUE of (Best Linear Uniformly Unbiased Estimation) with respect to the linear model Efyg D 1n ; Dfyg D In 2 ; if it is uniformly unbiased in the sense of and in comparison of all linear, uniformly unbiased estimations possesses the smallest variance in the sense of Dfg O D EfŒO Efg O 2g D 2 `0 ` D 2 tr`0 ` D 2 jj`jj2 D min : The constrained Lagrangean L.`; /, namely L.`; / W D 2 `0 ` C 2.`0 1n 1/ D 2 `0 ` C 2.1n` 1/ D min; `;
produces by means of the first derivatives 1 @L O O .l; / D 2 lO C 1n O D 0 2 @l 1 @L O O .l; / D lO0 1n 1 D 0 2 @ the normal equations for the augmented unknown vector .`; /, also known as the necessary conditions for obtaining an optimum. Transpose the first normal equation, right multiply by 1n , the unit column and substitute the second normal equation in O If we substitute the solution O in the order to solve for the Lagrange multiplier . O first normal equation, we directly find the linear operator `. 2 `O0 C 10 n O D 00 2 `O0 1n C 10 n 1n O D 2 C 10 n 1n O D 0 ) O D
2 2 D n n 1n
10
2
D 00 2 `O C 1n O D 2 lO 1n n 1 1 ) `O D 1n and O D `O0 y D 10 n y: n n The second derivatives
1 @2 L O O .`; / D 2 In > 00 2 @`@`0
192
4 The Second Problem of Probabilistic Regression
constitute the sufficiency condition which is automatically satisfied. Let us briefly summarize the first result O BLUUE of . The scalar O D `0 y is BLUUE of with respect to the linear model Efyg D 1n ; Dfyg D In 2 ; if and only if
1 1 `O0 D 10 n and O D 10 n y n n is the arithmetic mean. The observation space y 2 fY; pdf g is decomposed into y.BLUUE/ WD 1n O versus ey .BLUUE/ WD y y.BLUUE/; 1 1 1n 10 n y versus ey .BLUUE/ D ŒIn .1n 10 n /y; n n which are orthogonal in the sense of y.BLUUE/ D
ˇ ˛ ˝ ey .BLUUE/ ˇ y.BLUUE/ D 0 or
1 1 In .1n 10 n / .1n 10 n / D 0: n n
Before we continue with the setup of the Lagrangean which guarantees BIQUUE, we study beforehand ey WD y Efyg and ey .BLUUE/ WD y y.BLUUE/. Indeed the residual vector ey .BLUUE/ is a linear form of residual vector ey . 1 ey .BLUUE/ D In .1n 10 n / ey : n For the proof we depart from 1 0 ey .BLUUE/ W D y 1n O D In .1n 1 n / y n 1 D In .1n 10 n / .y Efyg/ n D In
1 .1n 10 n /; n
where we have used the invariance property y 7! y Efyg based upon the idempotence of the matrix ŒIn .1n 10 n /=n. Based upon the fundamental relation ey .BLUUE/ D Dey , where D WD In .1n 10 n /=n is a projection operator onto the normal space R.1n /? , we are able to derive an unbiased estimation of the variance component 2 . Just compute
4-1 Introduction
193
Efe0 y .BLUUE/ey .BLUUE/g D trEfey .BLUUE/e0 y .BLUUE/g D trD Efey e0 y gD0 D 2 trD D0 D 2 trD trD D tr .In / tr n1 .1n 10 n / D n 1 Efe0 y .BLUUE/ey .BLUUE/g D 2 .n 1/: Let us define the quadratic estimator O 2 of 2 by O 2 D
1 0 e y .BLUUE/ey .BLUUE/; n1
which is unbiased according to EfO 2 g D
1 Efe0 y .BLUUE/ey .BLUUE/g D 2 : n1
Let us briefly summarize the first result O 2 IQUUE of 2 . The scalar O 2 D e0 y .BLUUE/ey .BLUUE/=.n 1/ is IQUUE of 2 based upon the BLUUE-residual vector ey .BLUUE/ D In n1 .1n 10 n / y: Let us highlight O 2 BIQUUE of 2 . A scalar O 2 is BIQUUE of 2 (Best Invariant Quadratic Uniformly Unbiased Estimation) with respect to the linear model Efyg D 1n ; Dfyg D In 2 ; if it is (i) Uniformly unbiased in the sense of EfO 2 g D 2 for all 2 2 RC (ii) Quadratic in the sense of O 2 D y0 My for all M D M0 (iii) Translational invariant in the sense of y0 My D .y Efyg/0M.y Efyg/ D .y 1n /0 M.y 1n / (vii) Best if it possesses the smallest variance in the sense of DfO 2 g D EfŒO 2 EfO 2 g2 g D min. M
First, let us consider the most influential postulate of translational invariance of the quadratic estimation O 2 D y0 My D .vecM/0 .y ˝ y/ D .y0 ˝ y0 /.vecM/ to comply with
194
4 The Second Problem of Probabilistic Regression
O 2 D e0 y Mey D .vecM/0 .ey ˝ ey / D .e0 y ˝ e0 y /.vecM/ subject to M 2 SYM: D fM 2 R nn jM D M0 g: Translational invariance is understood as the action of transformation group y D Efyg C ey D 1n C ey with respect to the linear model of “direct” observations. Under the action of such a transformation group the quadratic estimation O 2 of 2 is specialized to 0 O 2 D y0 My D Efyg C ey M Efyg C ey D .10 n C e0 y /M.1n C ey / O 2 D 2 10 n M1n C 10 n Mey C e0 y M1n C e0 y Mey y0 My D e0 y Mey , 10 n M D 00 and 10 n M0 D 00 : IQE, namely 10 n M D 00 and 10 n M0 D 00 has a definite consequence. It is independent of , the first moment of the probability distribution (“pdf ”). Indeed, the estimation procedure of the central second moment 2 is decoupled from the estimation of the first moment . Here we find the key role of the invariance principle. Another aspect is the general solution of the homogeneous equation 10 n M D 00 subject to the symmetry postulate M D M0 . 10 M D 00 ,
M D In 10 n .10 n 1n /1 10 n Z M D .In n1 1n 10 n /Z ;
where Z is an arbitrary matrix. The general solution of the homogeneous matrix equation contains the left inverse (generalized inverse .10 n 1n /1 10 n D 1 n ) which takes an exceptionally simple form, here. The general form of the matrix Z 2 Rnn is in no agreement with the symmetry postulate M D M0 . 10 n M D 00 , M D ˛.In n1 1n 10 n /: M D M0 Indeed, we made the choice Z D ˛In which reduces the unknown parameter space to one-dimension. Now by means of the postulate “best” under the constraint generated by “uniform unbiasedness” O 2 of 2 we shall determine the parameter ˛ D 1=.n 1/. The postulate IQUUE is materialized by EfO 2 j† y D In 2 g D 2 , trM D 1 , trM 1 D 0: For the simple case of “i.i.d.” observations, namely † y D In 2 , EfO 2 g D 2 for an IQE, IQUUE is equivalent to trM D 1 or .trM/ 1 D 0 as a condition equation.
4-1 Introduction
195
˛tr.In n1 1n 10 n / D ˛.n 1/ D 1 1 trM D 1 , : ˛ D n1 IQUUE of the simple case invariance W .i /10 M D 00 and M D M0 QUUE W .ii/trM 1 D 0
)MD
1 1 In 1n 10 n n1 n
has already solved our problem of generating the symmetric matrix M: O 2 D y0 My D
1 0 y .In n1 1n 10 n /y 2 IQUUE n1
1.1 Is there still a need to apply “best” as an optimability condition for BIQUUE ? Yes, there is! The general solution of the homogeneous equations 10 n M D 00 and M0 1n D 0 generated by the postulate of translational invariance of IQE did not produce a symmetric matrix. Here we present the simple symmetrization. An alternative approach worked depart from 1 1 .M C M0 / D fŒIn 1n .10 n 1n /1 10 n Z C Z0 ŒIn 1n .10 n 1n /1 10 n g; 2 2 leaving the general matrix Z as an unknown to be determined. Let us therefore develop BIQUUE for the linear model Efyg D 1n ; Dfyg D In 2 DfO 2 g D Ef.O 2 EfO 2 g/2 g D EfO 4 g EfO 2 g2 : Apply the summation convention over repeated indices i; j; k; ` 2 f1; : : : ; ng: 1st W EfO 2 g2 y y
y y
EfO 2 g2 D mij Efei ej gmk` Efek e` g D mij mkl ij k` subject to ij W D
y y Efei ej g
y y
2
D ıij andk` WD Efek e` g D 2 ık`
EfO 2 g2 D 4 mij ıij mk` ık` D 4 .trM/2 2nd W EfO 4 g y y y y
EfO 4 g D mij mk` Efei ej ek e` g D mij mk` ij k` subject to
196
4 The Second Problem of Probabilistic Regression y y y y
ij k` W D Efei ej ek e` g8i; j; k; ` 2 f1; : : : ; ng: For a normal pdf, the fourth order moment ij k` can be reduced to second order moments. For a more detailed presentation of “normal models” we refer to Appendix A-7 ij k` D ij k` C i k j ` C i ` j k D 4 .ıij ık` C ıi k ıj ` C ıi ` ıj k / EfO 4 g D 4 mij mk` .ıij ık` C ıi k ıj ` C ıi ` ıj k / EfO 4 g D 4 Œ.trM/2 C 2trM0 M: Let us briefly summarize the representation of the variance DfO 2 g D Ef.O 2 EfO 2 g/2 g for normal models. Let the linear model of i.i.d. direct observations be defined by Efyjpdf g D 1n ; Dfyjpdfg D In 2 : The variance of a normal IQE can be represented by DfO 2 g W D Ef.O 2 EfO 2 g/2 g D D 2 4 Œ.trM/2 C tr.M2 /: In order to construct BIQUUE, we shall define a constrained Lagrangean which takes into account the conditions of translational invariance, uniform unbiased-ness and symmetry. L.M; 0 ; 1 ; 2 / WD 2trM0 MC20 .trM1/C21 10 n M1n C22 10 n M0 1n D
min :
M;0 ;1 ;2
Here we used the condition of translational invariance in the special form 1 10 n .M C M0 /1n D 0 , 10 n M1n D 0 and 10 n M0 1n D 0; 2 which accounts for the symmetry of the unknown matrix. We here conclude with the normal equations for BIQUUE generated from @L @L @L @L D 0; D 0; D 0; D 0: @.vecM/ @0 @1 @2 3 2 3 32 0 vecM 2.In ˝ In / vecIn In ˝ 1n 1n ˝ In 7 6 0 7 6 1 7 6 .vecIn /0 0 0 0 7 D 6 7: 76 6 4 In ˝ 10 n 0 0 0 5 4 1 5 4 0 5 0 0 0 2 10 n ˝ In 0 2
4-1 Introduction
197
These normal equations will be solved lateron. Indeed M 1/ is a solution.
1 1 10 /= .n n n n
D
.In
1 0 y .In n1 1n 10 n /y n 1 BIQUUE W 2 DfO 2 g D 4 n1 O 2 D
Such a result is based upon .trM/2 .BIQUUE/ D
1 1 ; .trM2 /.BIQUUE/ D ; n1 n1
DfO 2 jBIQUUEg D DfO 2 g D 2 4 Œ.trM/2 C .trM2 /.BIQUUE/; DfO 2 g D
2 4: n1
Finally, we are going to outline the simultaneous estimation of f; 2 g for the linear model of direct observations. • First postulate: inhomogeneous, multilinear (bilinear) estimation O D 1 C `01 y C m0 1 .y ˝ y/ O 2 D O D O 2
2 C `02 y C .vecM2 /0 .y ˝ y/ 0 1 `1 m0 1 y C 0 y˝y 2 `2 .vecM2 /0 1 `01 m0 1 O ; XD x D XY , x WD 2 `02 .vecM2 /0 O 2 2 3 1 YWD4 y 5 y˝y
• Second postulate: uniform unbiasedness E fxg D E
O O 2
D 2
• Third postulate: minimum variance Dfxg WD trEfŒx E fxg Œx E fxg0 g D min :
198
4 The Second Problem of Probabilistic Regression
4-13 BLUUE and BIQUUE of the Front Page Example, Sample Median, Median Absolute Deviation According to Tables 4.1 and 4.2 we presented you with two sets of observations yi 2 Y, dimY D n; i 2 f1; : : : ; ng the second one qualifies to certain “one outlier”. We aim at a definition of the median and of the median absolute deviation which is compared to the definition of the mean (weighted mean) and of the root-meansquare error. First we order the observations according to y.1/ < y.2/ < : : : < y.n1/ < y.n/ by means of the permutation matrix 2 3 3 y1 y.1/ 6 y 7 7 6 y 6 2 7 6 .2/ 7 6 7 7 6 6 : : : 7 D P 6 : : : 7; 6 7 7 6 4 yn1 5 4 y.n1/ 5 y.n/ yn 2
namely data set one 3 2 000 11 6 12 7 6 0 1 0 6 7 6 6 7 6 6 13 7 D 6 0 0 0 6 7 6 4 14 5 4 0 0 1 100 15 2
2
0 60 6 6 P5 D 6 0 6 40 1
32 3 15 10 6 7 0 0 7 6 12 7 7 76 7 0 1 7 6 14 7 76 7 0 0 5 4 11 5 13 00
00 10 00 01 00
2
versus
2
0 11 6 12 7 7 60 7 6 13 7 6 0 7D6 14 7 6 0 7 6 15 5 4 1 0 116 2
3
10 0 07 7 7 0 17 7 0 05 00
6 6 6 6 6 6 6 4
3
versus
00 10 00 01 00 00
0 60 6 6 60 P6 D 6 60 6 41 0
data set two 3 32 15 100 7 6 0 0 07 7 6 12 7 7 76 0 1 0 7 6 14 7 7 76 0 0 0 7 6 11 7 7 76 0 0 0 5 4 13 5 116 001
00 10 00 01 00 00
3 100 0 0 07 7 7 0 1 07 7: 0 0 07 7 0 0 05 001
Note PP0 D I, P1 D P0 . Second, we define the sample median med y as well as the median absolute deviation mad y of y 2 Y by means of 2
y.Œn=2C1/ if n is an odd number medy WD 4 1 .y.n=2/ C y.n=2C1/ / if n is an even number 2 mady WD medjy.i / medyj; where Œn=2 denotes the largest integer (“natural number”) n=2.
4-1 Introduction
199
Third, we compute I-LESS, namely mean y D .10 1/1 y D n1 10 y listed in Table 4.3. Obviously for the second observational data set the Euclidean metric of the observation space Y is not isotropic. Indeed let us compute Gy -LESS, namely the weighted mean y D .10 Gy 1/1 10 Gy y. A particular choice of the matrix of the metric, also called “weight matrix”, is Gy D Diag.1; 1; 1; 1; 1; x/ such that weighted mean y D
y1 C y2 C y3 C y4 C y5 C y6 x ; 5Cx
where x is the unknown weight of the extreme value (“outlier”) y6 . A special robust design of the weighted mean y is the median y, namely weighted mean y D med y such that xD
y1 C y2 C y3 C y4 C y5 5 med y med y y6 here
24 : 1000 Indeed the weighted mean with respect to the weight matrix Gy D Diag.1; 1; 1; 1, 1; 24=1000/ reproduces the median of the second data set. The extreme value has been down-weighted by a weight 24=1000 approximately. Four, with respect to the x D 0:024; 390; 243
Table 4.3 “direct” observations, comparison two data sets by means of med y, mad y (I-LESS, Gy -LESS), r.m.s. (I-BIQUUE) data set one n D 5 (“odd”) n=2 D 2:5; Œn=2 D 2 Œn=2 C 1 D 3 med y D y.3/ D 13 mady D 1 mean y.I -LESS/ D 13
.I-BLUUE/ O D 13 O 2 .I-BIQUUE/ D 2:5 r.m.s. .I-BIQUUE/ D O .I-BIQUUE/ D 1:6
data set two n D 6 (“even”) n=2 D 3; n=2 C 1 D 4 med y D 13:5 mad y D 1:5 mean y.I -LESS/ D 30:16N weighted mean y .Gy -LESS/ D 13:5 24 / Gy D Diag.1; 1; 1; 1; 1; 1000 N .I-BLUUE/ O D 30:16 O 2 .I-BIQUUE/ D 1770:1 r.m.s. .I-BIQUUE/ D .I-BIQUUE/ O D 42:1
200
4 The Second Problem of Probabilistic Regression
simple linear model Efyg D 1, Dfyg D I 2 we compute I-BLUUE of and I-BIQUUE of 2 , namely O D .10 1/1 10 y D n1 10 y 1 0 1 1 0 O y I 1.10 1/1 y D y I n1 110 y D .y 1/ O 0 .y 1/: n1 n1 n1 Obviously the extreme value y6 in the second data set has spoiled the specification of the simple linear model. The r.m.s. (I-BLUUE) = 1.6 of the first data set is increased to the r.m.s. (I-BIQUUE) = 42.1 of the second data set. Five, we setup the alternative linear model for the second data set, namely O 2 D
82 ˆ ˆ ˆ 6 ˆ ˆ 6 ˆ ˆ <6 6 E 6 ˆ6 ˆ ˆ 6 ˆ ˆ4 ˆ ˆ :
y1 y2 y3 y4 y5 y6
39 2 3 2 3 2 3 2 3 > 1 1 0 > > 6 7 6 7 617 7> 607 > 6 17 6 7> 7 6 7 6 7 7> = 6 7 6 7 6 7 6 7 6 1 7 6 7 6 1 7 7 607 7 D6 7D6 7 D 6 7C6 7 6 1 7 6 7 6 1 7 7> 607 > 6 7 6 7> 7 6 7 6 7 > 4 1 5 4 5 4 1 5 5> 405 > > ; 2 C 1 1
Efyg D A W
A WD Œ1; a 2 52 ; 1 WD Œ1; 1; 1; 1; 1; 10 2 R 61 WD Œ; 0 2 21 ; a WD Œ0; 0; 0; 0; 0; 10 2 R 61 2 3 100000 60 1 0 0 0 07 6 7 6 7 60 0 1 0 0 07 2 Dfyg D 6 7 2 R66 60 0 0 1 0 07 6 7 40 0 0 0 1 05 000001 Dfyg D I6 2 ; 2 2 R C ;
adding to the observation y6 the bias term . Still we assume the variance-covariance matrix Dfyg of the observation vector y 2 R66 to be isotropic with one variance component as an unknown. .; O / O is I6 -BLUUE if O D .A0 A/1 A0 y O O 13 D O 103 O D 13; O D 103; 1 D O D 13; yO2 D 116
4-1 Introduction
201
O D D .A0 A/1 2 O 2 1 1 O D D O 5 1 6 2O D
2 2 6 1 2 ; D 2 ; O O D 5 O 5 5 O 2
is I6 -BIQUUE if
1 y0 I6 A.A0 A/1 A0 y n rkA 3 2 4 1 1 1 1 0 6 1 4 1 1 1 0 7 7 6 7 16 6 1 1 4 1 1 0 7 0 1 0 I6 A.A A/ A D 6 7 5 6 1 1 1 4 1 0 7 7 6 4 1 1 1 1 4 0 5 0 0 0 0 0 5 4 4 4 4 4 0 1 0 ri WD I6 A.A A/ A i i D ; ; ; ; ; 1 8i 2 f1; : : : ; 6g 5 5 5 5 5 O 2 D
are the redundancy numbers. y0 .I6 A.A0 A/1 A0 /y D 13;466 1 13;466 D 3;366:5; O D 58:02 4 3;366:5 2O .O 2 / D D 673:3; O ./ O D 26 5 6 O2 .O 2 / D 3;366:5 D 4;039:8; O .O / D 63:6: 5 O 2 D
Indeed the r.m.s. value of the partial mean O as well as of the estimated bias O have changed the results remarkably, namely from r.m.s. (simple linear model) 42.1 to r.m.s. (linear model) 26. A r.m.s. value of the bias O in the order of 63.6 has been documented. Finally let us compute the empirical “error vector” lQ and is variance-covariance matrix by means of eQ y D I6 A.A0 A/1 A0 y; DfQey g D I6 A.A0 A/1 A0 2 ;
202
4 The Second Problem of Probabilistic Regression
0 eQ y D 2 1 1 2 0 116 3 2 4 1 1 1 1 0 6 1 4 1 1 1 0 7 7 6 n o 6 1 1 4 1 1 0 7 2 7 6 D lQ D 6 7 6 1 1 1 4 1 0 7 5 7 6 4 1 1 1 1 4 0 5 0 0 0 0 0 5 3 2 4 1 1 1 1 0 6 1 4 1 1 1 0 7 7 6 7 6 6 1 1 4 1 1 0 7 2 Q DfljO g D 673:3 6 7: 6 1 1 1 4 1 0 7 7 6 4 1 1 1 1 4 0 5 0 0 0 0 0 5
4-14 Alternative Estimation Maximum Likelihood (MALE) Maximum Likelihood Estimation (“MALE”) is a competitor to BLUUE of the first moments Efyg and to the BIQUUE of the second central moments Dfyg of a random variable y 2 fY; pdf g, which we like to present at least by an example. Maximum Likelihood Estimation :linear model: Efyg D 1n ; Dfyg D In 2 “independent, identically normal distributed observations” Œy1 ; : : : ; yn 0 “direct observations” unknown parameter: f; g 2 fR; RC g DW X “simultaneous estimations of f; 2 g”. Given the above linear model of independent, identically, normal distributed observations Œy1 ; : : : ; yn 0 D y 2 fRn ; pdf g. The first moment as well as the central second moment 2 constitute the unknown parameters .; 2 / 2 R RC where R RC is the admissible parameter space. The estimation of the unknown parameters .; 2 / is based on the following optimization problem Maximize the log-likelihood function n Y ˇ ˇ ln f .y1 ; : : : ; yn ˇ; 2 / D ln f .yi ˇ; 2 / i D1
4-1 Introduction
203
( D ln
1
n
.2/ 2
n
n 1 X exp 2 .yi /2 2 i D1
!) D
n n n 1 X D ln 2 ln 2 2 .yi /2 D max 2 2 i D1 ; 2
of the independent, identically normal distributed random variables fy1 ; : : : ; yn g. The log-likelihood function is simple if we introduce the first sample moment m1 and the second sample moment m2 , namely n
n
1X 1 1X 2 1 m1 W D yi D 10 y; m2 WD y D y0 y n i D1 n n i D1 i n
ˇ
n n n
ln f y1 ; : : : ; yn ˇ; 2 D ln 2 ln 2 2 m2 2m1 C 2 ; 2 2 2 Now we are able to define the optimization problem ˇ
` ; 2 WD ln f y1 ; : : : ; yn ˇ; 2 D max ; 2
more precisely. Definition. (Maximum Likelihood Estimation, linear model Efyg D 1n , Dfyg D In 2 , independent, identically normal distributed observations fy1 ; : : : ; yn g): A 2x1 vectorŒ` ; `2 0 is called MALE of Œ; 2 0 , (Maximum Likelihood Estimation) with respect to the linear model 0.1 if its log-likelihood function ˇ `.; 2 / W D ln f .y1 ; : : : ; yn ˇ; 2 / D n n n D ln 2 ln 2 2 .m2 2m1 C 2 / 2 2 2 is minimal. The simultaneous estimation of .; 2 / of type MALE can be characterized as following. Corollary. (MALE with respect to the linear model Efyg D 1n ; Dfyg D In 2 , independent identically normal distributed observations fy1 ; : : : ; yn g):
The log-likelihood function ` ; 2 with respect to the linear model Efyg D 1n ; Dfyg D In 2 ,.; 2 2 R RC /, of independent, identically normal distributed observations fy1 ; : : : ; yn g is maximal if ` D m1 D
1 0 1 1 y; `2 D m2 m21 D .y y` /0 .y y` / n n
204
4 The Second Problem of Probabilistic Regression
is a simultaneous estimation of the mean volume (first moment) ` and of the variance (second moment) `2 . Proof. The Lagrange function n n L.; 2 / WD ln 2 2 .m2 2m1 C 2 / D max 2 2 ; 2 leads to the necessary conditions nm1 n @L .; 2 / D 2 D 2 D 0 @ @L n n .; 2 / D 2 C 4 .m2 2m1 C 2 / D 0; @ 2 2 2 also called the likelihood normal equations. Their solution is 1 10 y 1 m1 D D : `2 m2 m21 n y0 y .10 y/2 The matrix of second derivates constitutes as a negative matrix the sufficiency conditions. 2 3 1 0 6 2 7 @2 L 7 > 0: ` .` ; ` / D n 6 4 5 1 2 2 0 @.; /@.; / 0 4 ` Finally we can immediately check that `.; 2 / ! 1 as .; 2 / approaches the boundary of the parameter space. If the log-likelihood function is sufficiently regular, we can expand it as ` C 2 `2 1 ` ` ˝ C O3 : C D 2 `.` ; `2 / 2 `2 2 `2 2
`.; 2 / D `.` ; `2 / C D`.` ; `2 /
Due to the likelihood normal likelihood equations D`.` ; `2 / vanishes. Therefore the behavior of `.; 2 / near.` ; `2 / is largely determined by D 2 `.` ; `2 / 2 R RC , which is a measure of the local curvature the log-likelihood function `.; 2 /. The non-negative Hesse matrix of second derivatives I.` ; `2 / D
@2 ` .` ; `2 / > 0 @.; 2 /@.; 2 /0
is called observed Fischer information. It can be regarded as an index of the steepness of the log-likelihood function moving away from .; 2 /, and as an
4-2 Setup of the Best Linear Uniformly Unbiased Estimator
205
Table 4.4 .` ; `2 / MALE of .; 2 / 2 fR; RC g: the front page examples 1st example (n D 5) 2nd example (n D 6)
` 13 30.16
`2 2 1474.65
j j p` 2 36.40
indicator of the strength of preference for the MLE point with respect to the other points of the parameter space. Finally, compare by means of Table 4.4 .` ; `2 / MALE of .; 2 / for the front page example of Tables 4.1 and 4.2
4-2 Setup of the Best Linear Uniformly Unbiased Estimator of type BLUUE for the moments of first order Let us introduce the special Gauss–Markov model y D A C e specified in Box 4.1, which is given for the first order moments in the form of a inconsistent system of linear equations relating the first non-stochastic (“fixed”), real-valued vector of unknowns to the expectation Efyg of the stochastic, real-valued vector y of observations, A D Efyg, since Efyg 2 R.A/ is an element of the column space R.A/ of the real-valued, non-stochastic (“fixed”) “first order design matrix” A 2 Rnm . The rank of the fixed matrix A; rkA, equals the number m of unknowns, 2 Rm . In addition, the second order central moments, the regular variancecovariance matrix †y , also called dispersion matrix Dfyg constitute the second matrix † y 2 Rnm of unknowns to be specified as a linear model further on. Box 4.1. (Special Gauss–Markov model y D A C e)
1st moments A D Efyg; A 2 Rnm ; Efyg 2 R.A/; rkA D m
(4.1)
2nd moments † y D Dfyg 2 Rnn ; † y positive definite; rk† y D n „; Efyg; y Efyg D e unk nown † y unk nown :
(4.2)
206
4 The Second Problem of Probabilistic Regression
4-21 The Best Linear Uniformly Unbiased Estimation O of W †y -BLUUE Since we are dealing with a linear model, it is “a natural choice” to setup a linear form to estimate the parameters of fixed effects, namely O D Ly C ;
(4.3)
where fL 2 Rmn ; 2 Rm g are fixed unknowns. In order to determine the real-valued m n matrix L and the real-valued m 1 vector ; independent of the variance-covariance matrix † y ; the inhomogeneous linear estimation O of the vector of fixed effects has to fulfil certain optimality conditions. (1st) O is an inhomogeneous linear unbiased estimation of O D EfLy C g D for all 2 Rm ; Efg
(4.4)
and (2nd) in comparison to all other linear uniformly unbiased estimations O has minimum variance O W D Ef.O /0 .O /g D tr Dfg D trL † y L0 D jjL0 jj† D min :
(4.5)
L
O D EfLyCg D First the condition of a linear uniformly unbiased estimation Efg with respect to the Special Gauss–Markov model (4.1), (4.2) has to be considered in more detail. As soon as we substitute the linear model (4.1) into the postulate of uniformly unbiasedness (4.4) we are led to O D EfLy C g D LEfyg C D for all 2 Rm Efg
(4.6)
and LA C D for all 2 Rm :
(4.7)
Beside D 0 the postulate of linear uniformly unbiased estimation with respect to the special Gauss–Markov model (4.1), (4.2) leaves us with one condition, namely LA Im D 0 for all 2 Rm
(4.8)
or LA Im D 0:
(4.9)
Note that there are locally unbiased estimations such that .LA Im /0 D 0 for LA Im ¤ 0: Alternatively, B. Schaffrin (2000) has softened the constraint of unbiasedness (4.9) by replacing it by the stochastic matrix constraint A0 L0 D Im C
4-2 Setup of the Best Linear Uniformly Unbiased Estimator
207
E0 subject to Efvec E0 g D 0; Dfvec E0 g D .Im ˝ † 0 /; † 0 a positive definite matrix. For † 0 ! 0, uniform unbiasedness is restored. Estimators which fulfill the stochastic matrix constraint A0 L0 D Im C E0 for finite † 0 are called “softly unbiased” or “unbiased in the mean”. Second, the choice of norm for “best” of type minimum variance has to be discussed more specifically. Under the condition of a linear uniformly unbiased estimation let us derive the specific representation of the weighted Frobenius matrix norm of L0 . Indeed let us define the dispersion matrix O WD Ef.O Efg/. O O Efg/ O 0g D Dfg D Ef.O /.O /0 g ;
(4.10)
which by means of the inhomogeneous linear form O D Ly C is specified to O D LDfygL0 Dfg
(4.11)
and Definition 4.1. (O † y -BLUUE of ): An m1 vector O D Ly C is called † y -BLUUE of (Best Linear Uniformly Unbiased Estimation) with respect to the † y -norm in (4.1) if (1st) O is uniformly unbiased in the sense of O W D trL Dfyg L0 D jjL0 jj† : tr Dfg y
(4.12)
Now we are prepared for Lemma 4.2. ( O † y -BLUUE of ): An m1 vector O D Ly C is † y -BLUUE of in (4.1) if and only if D0
(4.13)
holds and the matrix L fulfils the system of normal equations
or
˙y A A0 0
L0 ƒ
D
0 Im
˙y L0 C A D 0
(4.14)
(4.15)
208
and
4 The Second Problem of Probabilistic Regression
A0 L0 D Im
(4.16)
with the m m matrix of “Lagrange multipliers”. Proof. Due to the postulate of an inhomogeneous linear uniformly unbiased estimation with respect to the parameters 2 Rm of the special Gauss–Markov model we were led to D 0 and one conditional constraint which makes it plausible to minimize the constraint Lagrangean L.L; ƒ/ WD tr L† y L0 C 2tr ƒ.A0 L0 Im / D min L;ƒ
(4.17)
for † y -BLUUE. The necessary conditions for the minimum of the quadratic constraint Lagrangean L.L; ƒ/ are @L O O O 0D0 .L; ƒ/ WD 2.†y LO 0 C Aƒ/ @L @L O O .L; ƒ/ WD 2.A0 LO 0 Im / D 0 @ƒ which agree to the normal equations (4.14). The second derivatives @2 L O D 2.† y ˝ Im / > 0 O ƒ/ .L; @.vecL/@.vecL/0
(4.18)
(4.19)
(4.20)
constitute the sufficiency conditions due to the positive-definiteness of the matrix † for L.L; ƒ/ D min |. Obviously, a homogeneous linear form O D Ly is sufficient to generate †BLUUE for the special Gauss–Markov model (4.1), (4.2). Explicit representations O O † y -BLUUE of †-BLUUE of type O as well as of its dispersion matrix Dfj generated by solving the normal equations (4.14) are collected in Theorem 4.3. (O † y -BLUUE of ): Let O D Ly be †-BLUUE of , in the special linear Gauss–Markov model (4.1), (4.2). Then 1 1 (4.21) O D .A0 † y A/1 A0 † y y 1 O D † O A0 † y y
(4.22)
are equivalent to the representation of the solution of the normal equations (4.14) subjected to the related dispersion matrix O WD † O D .A0 † A/1 : Dfg y 1
4-2 Setup of the Best Linear Uniformly Unbiased Estimator
209
Proof. We shall present two proofs of the above theorem: The first one is based upon Gauss elimination in solving the normal equations (4.14), the second one uses the power of the IPM method (Inverse Partitioned Matrix, C.R. Rao’s Pandora Box). (i) forward step (Gauss elimination): 0 Multiply the first normal equation by † 1 y , multiply the reduced equation by A and subtract the result from the second normal equation. Solve for ƒ
O 0 C Aƒ D 0 .first equation: 3 ˙y L 5 multiply byA0 ˙y1 / 0 O AL D Im .second equation/ # O D0 O 0 A0 ˙y1 Aƒ A0 L A0 LO 0 D Im O DI ) , A0 ˙y1 Aƒ
,
,
,
O D .A0 ˙y1 A/1 ) ƒ
(4.23)
(ii) backward step (Gauss elimination): O in the modified first normal equation and solve for L. O Substitute ƒ O D0,L O 0 A 0 ˙y1 O Dƒ LO 0 C ˙y1 Aƒ O D .A0 ˙ 1 A/1 ƒ y
,
# ,
LO D .A0 ˙y1 A/1 A0 ˙y1 :
)
(4.24)
(iii) IPM (Inverse Partitioned Matrix): Let us partition the symmetric matrix of the normal equations (4.14)
˙y A A0 0
A11 A12 ˙y A D : A0 12 0 A0 0
1
D
A11 A12 A0 12 0
1
D
B11 B12 B0 12 B22
B11 D Im ˙y1 A.A0 ˙y1 A/1 A0 ˙y1 B0 12 D .A0 ˙y1 A/1 A0 ˙y1 B22 D .A0 ˙y1 A/1 :
210
4 The Second Problem of Probabilistic Regression
The normal equations are now solved by
LO 0 O ƒ
A11 A12 D A0 12 0
1
0 Im
B11 B12 D B0 12 B22
0 Im
LO D B0 12 D .A0 ˙y1 A/1 A0 ˙y1 O D B22 D .A0 ˙y1 A/1 : ƒ
(4.25)
(iv) dispersion matrix The related dispersion matrix is computed by means of the “Error Propagation Law”. O D DfLy j LO D .A0 † 1 A/1 g D LDfyg O O0 L Dfg y O D .A0 † 1 A/1 A0 † 1 † y † 1 A.A0 † 1 A/1 Dfg y y y y O D .A0 † 1 A/1 : Dfg y
(4.26)
Here is our proof’s end. O By means of Theorem 4.3 we succeeded to produce -BLUUE of . In consequence, we have to estimate E fyg as † y -BLUUE of Efyg as well as the “error vector” ey WD y Efyg (4.27)
1
1
eQ y WD y E fyg D y AO D .In AL/y
(4.28)
out of
1 1 fyg be †-BLUUE of Efyg D A with respect to the special Gauss– (i) Let E
Lemma 4.4. (E fyg † y -BLUUE of Efyg, eQ y ; DfQey g; Dfyg):
Markov model (4.1), (4.2), then
1
1 0 1 E fyg D AO D A.A0 † 1 y A/ A † y y
(4.29)
leads to the singular variance-covariance matrix (dispersion matrix) O D A.A0 † 1 A/1 A0 : DfAg y
(4.30)
(ii) If the error vector e is empirically determined, we receive for 1 0 1 eQ y D ŒIn A.A0 † 1 y A/ A † y y
(4.31)
and its singular variance-covariance matrix (dispersion matrix) 1 0 ey g D n m DfQey g D † y A.A0 † 1 y A/ A ; rkDfQ
(4.32)
4-2 Setup of the Best Linear Uniformly Unbiased Estimator
211
(iii) The dispersion matrices of the special Gauss–Markov model (4.1), (4.2) are related by O C DfQey g Dfyg D DfAO C eQ y g D DfAg D DfQey ey g C DfQey g; O D 0; C fQey ; eQ y ey g D 0: C fQey ; Ag
(4.33) (4.34)
eQ y and AO are uncorrelated: Proof.
1
1 0 1 .i / E fyg D AO D A.A0 † 1 y A/ A † y y As soon as we implement O † y -BLUUE of , namely (4.21), into AO we are directly led to the desired result.
O D A.A0 † 1 A/1 A0 .ii/ DfAg y O † y -BLUUE of ; namely (4.21), implemented in O W D EfA.O Efg/. O O Efg/ O 0 A0 g DfAg O D AEf.O Efg/. O O Efg/ O 0 gA0 DfAg O D A.A0 † 1 A/1 A0 † 1 Ef.y Efyg/.y Efyg/0 g† 1 A.A0 ˙ 1 A/1 A0 DfAg y y y y O D A.A0 † 1 A/1 A0 † 1 A.A0 † 1 A/1 A0 DfAg y y y O D A.A0 † 1 A/1 A0 DfAg y leads to the proclaimed result. 1 0 1 .iii/ eQ y D ŒIn A.A0 † 1 y A/ A † y y:
Similarly if we substitute † y -BLUUE of , namely (4.21), in
1
1 0 1 eQ y D y E fyg D y AO D ŒIn A.A0 † 1 y A/ A † y y
we gain what we wanted! 1 0 .iv/ DfOey g D † A.A0 † 1 y A/ A
DfQey g WD Ef.Qey EfQey g/.Qey EfQey g/0 g: As soon as we substitute 1 0 1 EfQey g D ŒIn A.A0 † 1 y A/ A † y Efyg
212
4 The Second Problem of Probabilistic Regression
in the definition of the dispersion matrix DfQey g; we are led to 1 0 1 1 0 1 1 0 DfQey g W D ŒIn A.A0 † 1 y A/ A † y † ŒIn † y A.A † y A/ A ; 1 0 1 0 1 1 0 DfQey g D Œ† y A.A0 † 1 y A/ A ŒIn † y A.A † y A/ A 1 0 0 1 1 0 0 1 1 0 D † y A.A0 † 1 y A/ A A.A † y A/ A C A.A † y A/ A 1 0 D † y A.A0 † 1 y A/ A : 1 0 rkDfQey g D rkDfyg rkA.A† 1 y A/ A D n m:
O C DfQey g D DfQey ey g C DfQey g .v/ Dfyg D DfAO C eQ y g D DfAg y Efyg D y A D y AO C A.O / y Efyg D A.O / C eQ y : The additive decomposition of the residual vector y Ey left us with two terms, namely the predicted residual vector eQ y and the term which is a linear functional of O : The corresponding product decomposition Œy EfygŒy Efyg0 D A.O /.O /0 C A.O /Qe0y CQey .O /0 A0 C eQ y eQ 0y O D ; and for O † y -BLUUE of , in particular Efg O O Efg/ O 0 C A.O Efg/Q O e0 Œy EfygŒy Efyg0 D A.O Efg/. y O 0 A0 C eQ y eQ 0 CQey .O Efg/ y O C DfQey g Dfyg D EfŒy EfygŒy Efyg0 g D DfAg D DfQey ey g C DfQey g due to O e0 g D EfA.O Efg/.y O O 0g D 0 EfA.O Efg/Q A/ y O 0 A0 g D EfQey .O Efg/ O 0 A0 g D 0 EfQey .O Efg/ or O eQ y g D 0; C fQey ; Ag O D 0: C fA; These covariance identities will be proven next. O eQ y g D A.A0 † 1 A /1 A0 † 1 Ef.y Efyg/.y Efyg/0 g C fA; y y 1 0 1 0 ŒIn A.A0 † 1 y A/ A † y
4-2 Setup of the Best Linear Uniformly Unbiased Estimator
213
O eQ y g D A.A0 † 1 A/1 A0 † 1 † y ŒIn † 1 A.A0 † 1 A/1 A0 C fA; y y y y O eQ y g D A.A0 † 1 A/1 A0 A.A0 † 1 A/1 A0 D 0: C fA; y y Here is our proof’s end. We recommend to consider the exercises as follows. Exercise 4.1. (translation invariance: y 7! y Efyg): Prove that the error prediction of type O † y -BLUUE of , namely 1 0 1 eQ y D ŒIn A.A0 † 1 y A/ A † y y
is translation invariant in the sense of y 7! y Efyg that is 1 0 1 eQ y D ŒIn A.A0 † 1 y A/ A † y ey
subject to ey WD y Efyg: Exercise 4.2. (idempotence): 1 0 1 Is the matrix In A.A0 † 1 y A/ A † y idempotent?
Exercise 4.3. (projection matrices): 1 0 1 0 1 1 0 1 Are the matrices A.A0 † 1 y A/ A † y and In A.A † y A/ A † y projection matrices?
4-22 The Equivalence Theorem of Gy -LESS and †y -BLUUE We have included the fourth chapter on † y -BLUUE in order to interpret Gy -LESS of the third chapter. The key question is open: ?When are † y -BLUUE and Gy -LESS equivalent? The answer will be given by Theorem 4.5. (equivalence of † y -BLUUE and Gy -LESS): With respect to the special linear Gauss–Markov model of full column rank (4.1), (4.2) O D Ly is † y -BLUUE, if ` D Ly is Gy -LESS of (3.1) for 1 Gy D † 1 y Gy D † y :
(4.35)
In such a case, O D ` is the unique solution of the system of normal equations 0 1 O .A0 † 1 y A/ D A † y y
(4.36)
214
4 The Second Problem of Probabilistic Regression
attached with the regular dispersion matrix O D .A0 † 1 A/1 : Dfg y
(4.37)
The proof is straight forward if we compare the solution (3.11) of Gy -LESS and (4.21) of † y -BLUUE: Obviously the inverse dispersion matrix Dfyg; y 2 fY; pdf g is equivalent to the matrix of the metric Gy of the observation space Y. Or conversely the inverse matrix of the metric of the observation space Y determines the variance-covariance matrix Dfyg † y of the random variable y 2 fY; pdf g.
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator of type BIQUUE for the central moments of second order The subject of variance-covariance component estimation within Mathematical Statistics has been one of the central research topics in the nineteen eighties. In a remarkable bibliography up-to-date to the year 1977 H. Sahai listed more than 1,000 papers on variance-covariance component estimations, where his basic source was “Statistical Theory and Method” abstracts (published for the International Statistical Institute by Longman Groups Limited), “Mathematical Reviews” and “Abstract Service of Quality Control and Applied Statistics”. Excellent review papers and books exist on the topic of variance-covariance estimation such as C.R. Rao and J. Kleffe (1988), R.S. Rao (1977) S. B. Searle (1978), L.R. Verdooren (1980), J. Kleffe (1980), and R. Thompson (1980). The PhD Thesis of B. Schaffrin (1983) offers a critical review of state-of-the-art of variance-covariance component estimation. In Geodetic Sciences variance components estimation originates from F.R. Helmert (1924) who used least squares residuals to estimate heterogeneous variance components. R. Kelm (1974) and E. Grafarend, A. Kleusberg and B. Schaffrin (1980) proved the relation of † 0 Helmert type IQUUE balled †-HIQUUE to BIQUUE and MINQUUE invented by C.R. Rao. Most notable is the Ph.D. Thesis of M. Serbetci (1968) whose gravimetric measurements were analyzed by † 0 -HIQUUE Geodetic extensions of the Helmert method to compete variance components originate from H. Ebner (1972, 1977), W. F¨orstner (1979, 1980), W. Welsch (1977, 1978, 1979, 1980), K. R. Koch (1978, 1981), C. G. Persson (1981), L. Sjoeberg (1978), E. Grafarend and A. d’Hone (1978), E. Grafarend (1984) B. Schaffrin (1979, 1980, 1981). W. F¨orstner (1979), H. Fr¨ohlich (1980), and K.R. Koch (1981) used the estimation of variance components for the adjustment of geodetic networks and the estimation of a length dependent variance of distances. A special field of geodetic application has been oscillation analysis based upon a fundamental paper by H. Wolf (1975), namely M. Junasevic (1977) for the estimation of signal-to-noise ratio in gyroscopic azimuth observations. The
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
215
Helmert method of variance component estimation was used by E. Grafarend and A. Kleusberg (1980) and A. Kleusberg and E. Grafarend (1981) to estimate variances of signal and noise in gyrocompass observations. Alternatively K. Kubik (1967a, b, c, 1970) pioneered the method of Maximum Likelihood (MALE) for estimating weight ratios in a hybrid distance–direction network. “MALE” and “FEMALE” extensions were proposed by B. Schaffrin (1983), K. R. Koch (1986), and Z. C. Yu (1996). A typical problem with † 0 -Helmert type IQUUE is that it does not produce positive variances in general. The problem of generating a positive-definite variance-covariance matrix from variance-covariance component estimation has already been highlighted by J. R. Brook and T. Moore (1980), K.G. Brown (1977, 1978), O. Bemk and H. Wandl (1980), V. Chew (1970), Han Chien-Pai (1978), R. R. Corbeil and S. R. Searle (REML, 1976), F. J. H. Don and J. R. Magnus (1980), H. Drygas (1980), S. Gnot, W. Klonecki and R. Zmyslony (1977). H. O. Hartley and J. N. K. Rao (ML, 1967), in particular J. Hartung (1979, 1980), J. L. Hess (1979), S. D. Horn and R. A. Horn (1975), S. D. Horn, R. A. Horn and D. B. Duncan (1975), C. G. Khatri (1979), J. Kleffe (1978, 1980), ), J. Kleffe and J. Z¨ollner (1978), in particular L. R. Lamotte (1973, 1980), S. K. Mitra (1971), R. Pincus (1977), in particular F. Pukelsheim (1976, 1977, 1979, 1981 a, b), F. Pukelsheim and G. P. Styan (1979), C. R. Rao (1970, 1978), S. R. Searle (1979), S. R. Searle and H. V. Henderson (1979), J. S. Seely (1972, 1977), in particular W. A. Thompson (1962, 1980), L. R. Verdooren (1979), and H. White (1980). In view of available textbooks, review papers and basic contributions in scientific journals we are only able to give a short introduction. First, we outline the general model of variance-covariance components leading to a linear structure for the central second order moment, known as the variance-covariance matrix. Second, for the example of one variance component we discuss the key role of the postulate’s (i) symmetry, (ii) invariance, (iii) uniform unbiasedness, and (iv) minimum variance. Third, we review variance-covariance component estimations of Helmert type.
4-31 Block Partitioning of the Dispersion Matrix and Linear Space Generated by Variance-Covariance Components The variance-covariance component model is defined by the block partitioning (4.33) of a variance-covariance matrix † y , also called dispersion matrix Dfyg, which follows from a corresponding rank partitioning of the observation vector y D Œy10 ; : : : ; y`0 0 . The integer number l is the number of blocks. For instance, the variance-covariance matrix † 2 Rnn in (4.41) is partitioned into l D 2 blocks. The various blocks consequently factorized by variance j2 and by covariances j k D j k j k . j k 2 Œ1; C1 denotes the correlation coefficient between the blocks. For instance, Dfy1 g D V11 12 is a variance factorization, while Dfy1 ; y2 g D V12 12 D V12 12 1 2 is a covariance factorization. The matrix blocks Vjj are built into the matrix Cjj , while the off-diagonal blocks Vj k ; V0 j k into the matrix Cj k of the same dimensions.
216
4 The Second Problem of Probabilistic Regression
dim † D dim Cjj D dim Cj k D n n: The collective matrices Cjj and Cj k enable us to develop an additive decomposition (4.36), (4.43) of the block partitioning variance-covariance matrix † y . As soon as we collect all variance-covariance components in an peculiar true order, namely WD Œ12 ; 12 ; 22 ; 13 ; 23 ; 32 ; : : : ; `1`; ; `2 0 ; we are led to a linear form of the dispersion matrix (4.37), (4.43) as well as of the dispersion vector (4.39), (4.44). Indeed the dispersion vector d.y/ D X builds up a linear form where the second order design matrix X, namely X WD ŒvecC1 ; : : : ; vecC`.`C1/ 2 Rn
2 `.`C1/=2
;
reflects the block structure. There are l.l C 1/=2 matrices Cj ; j 2 f1; : : : ; ` .` C 1/=2g. For instance, for l D 2 we are left with 3 block matrices fC1 ; C2 ; C3 g. Before we analyze the variance-covariance component model in more detail, we briefly mention the multinominal inverse † 1 of the block partitioned matrix †. For instance by “JPM” and “SCHUR” we gain the block partitioned inverse matrix † 1 with elements fU11 ; U12 ; U22 g (4.51)–(4.54) derived from the block partitioned matrix † with elements fV11 ; V12 ; V22 g (4.47). “Sequential JPM” solves the block inverse problems for any block partitioned matrix. With reference to Box 4.2 and Box 4.3 † D C1 1 C C2 2 C C3 3 ) † 1 D E1 ./ C E2 ./ C E3 ./ is an example. Box 4.2. (Partitioning of variance-covariance matrix):
2
V11 12 V0 12 12 :: :
6 6 6 †D6 6 4 V0 1`1
1`1
V0 1` 1`
0
V12 12 V22 22 :: :
V1`1 1`1 V2`1 2`1 :: :
V 2`1 2`1 V0 2` 2`
2 V`1`1 `1 V0 `1` `1`
V1` 1` V2` 2` :: : V`1` `1` V`` `2
3 7 7 7 7>0 7 5
(4.38)
l second moments 2 of type variance, `.` 1/=2 second moment j k of type covariance matrix blocks of second order design
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
2
Cjj
0 6 :: WD 4 : V
jj
(4.39)
0 0
2
Cj k
3 0 :: 7 8j 2 f1; : : : ; `g :5
217
3 0 0 6 0 V 7 " 6 7 subject to j < k jk 6 7 WD 6 Vkj 7 6 7 and j; k 2 f1; : : : ; `g 4 5 0 0 †D
` X
Cjj j2 C
j D1
`1;` X
Cj k j k
(4.40)
(4.41)
j D1;kD2;j
X
†D
Cj j 2 Rnm
(4.42)
j D1
Œ12 ; 12 ; 22 ; 13 ; 23 ; 32 ; : : : ; `1`; ; `2 0 DW
(4.43)
“dispersion vector” Dfyg WD † y , d fyg D vecDfyg D vec† `.`C1/=2
d.y/ D
X
.vecCj /j D X
(4.44)
j D1
“X is called second order design matrix” X WD ŒvecC1 ; : : : ; vecC`.`C1/=2
(4.45)
“dimension identities.” d.y/ 2 Rn
2 1
; 2 R; X 2 Rn
2 `.`C1/=2
:
Box 4.3. (Multinomial inverse): W I nput W † 11 † 12 V11 12 V12 12 D †D 0 V0 12 12 V22 22 † 12 † 22 V11 0 2 0 V12 0 0 D 12 C 2 2 Rnm C V0 12 0 0 V22 2 0 0 1
(4.46)
218
4 The Second Problem of Probabilistic Regression
V11 0 0 V12 0 0 ; C22 WD C3 WD C11 WD C1 WD ; C12 WD C2 WD V0 12 0 0 V22 0 0 (4.47) 3 X † D C11 12 C C12 12 C C22 22 D C1 1 C C2 2 C C3 3 D Cj j (4.48)
vec˙ D
2
3
j D1
1 .vecCj /j DŒvecC1 ; vecC2 ; vecC3 4 2 5 D X j D1 3 3 X
vecCj 2 Rn
2 1
(4.49)
8j 2 f1; : : : ; `.` C 1/=2g
“X is called second order design matrix” X WD ŒvecC1 ; : : : ; vecC`.`C1/=2 2 Rn
† 1
2 `.`C1/=2
here: l D 2 :output: U11 0 2 0 U12 0 0 1 2 D C 1 C 12 0 U22 2 0 0 U0 12 0
(4.50)
subject to 1 0 1 U11 D V1 11 C qV11 V12 SV 12 V11 0
U12 D U 21 D
qV1 11 V12 S
U22 D S D .V22 qV q WD
0
(4.52)
1 1 12 V11 V12 /
2 12 2 2 1 2
† 1 D E1 C E2 C E3 D
(4.51)
(4.53) (4.54)
`.`C1/=2D3
X
Ej
(4.55)
j D1
U11 0 2 0 U12 0 0 1 12 ; E3 ./ WD 2 E1 ./ WD ; E2 ./ WD U0 12 0 0 U22 2 0 0 1 (4.56) The general result that inversion of a block partitioned symmetric matrix conserves the block structure is presented in
Corollary 4.6. Corollary 4.6 (multinomial inverse): `.`C1/=2
†D
X
j D1
Cj j , †
1
`.`C1/=2
D
X
Ej ./
(4.57)
j D1
We shall take advantage of the block structured multinominal inverse when we are reviewing HIQUUE or variance-covariance estimations of Helmert type.
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
219
The variance component model as well as the variance-covariance model are defined next. A variance component model is a linear model of type 2
V11 12 0 6 0 V22 2 2 6 6 :: † D 6 ::: : 6 4 0 0 0 0
:: :
3
0 0 :: :
0 0 :: :
0
0 V`` `2
7 7 7 7 7 5
(4.58)
3 12 d fyg D vec† D ŒvecC11 ; : : : ; vecCjj 4 5 `2
(4.59)
d fyg D X 8 2 RC :
(4.60)
2 V`1`1 `1
2
In contrast, the general model (4.49) is the variance-covariance model with a linear structure of type 3 12 6 7 6 12 7 7 6 d fyg D vec† D ŒvecC11 ; vecC12 ; vecC12 ; : : : ; vecC`` 6 22 7 7 6 4 5 `2 2
d fyg D X 8 j2 2 RC ; † positive definite:
(4.61)
(4.62)
The most popular cases of variance-covariance components are collected in the examples. Example 4.1. (one variance components, i.i.d. observations) Dfyg W † y D In 2 subject to † y 2 S Y M.Rnn /; 2 2 RC : Example 4.2. (one variance component, correlated observations) Dfyg W † y D Vn 2 subject to † y 2 S Y M.Rnn /; 2 2 RC : Example 4.3. (two variance components, two sets of totally uncorrected observations “heterogeneous observations”) Dfyg W † y D
In1 12 0
0 In2 22
2
n D n1 C n2 subject to 4 † y D S Y M.Rnn / 12 2C ; 22 2 RC :
(4.63)
Example 4.4. (two variance components, one covariance components, two sets of correlated observations “heterogeneous observations”)
220
4 The Second Problem of Probabilistic Regression
2
n D n1 C n2 4 subject to V11 2 Rn1 n1 ; V22 2 Rn2 n2 V12 2 Rn1 n2 nn 2 C 2 C † y 2 S Y M.R /; 1 2 R ; 2 2 R ; † y positive definite: (4.64)
V11 12 V12 12 Dfyg W † y D V0 12 12 V11 22
Special case: V11 D In1 ; V22 D In2 : Example 4.5. (elementary error model, random effect model) ey D y Efy jz g D
` X
Aj .zj Efzj g/ D
j D1
` X
Aj ejz
k
Efejz g D 0; Efejz ; e0 z g D ıj k Iq Dfyg W † y D
` X j D1
Aj A0 j j2 C
(4.65)
j D1
` X
.Aj A0 k C Ak A0 j /j k
(4.66) (4.67)
j;kD1;j
At this point, we should emphasize that a linear space of variance-covariance components can be build up independently of the block partitioning of the dispersion matrix Dfyg. For future details and explicit examples let us refer to B. Schaffrin (1983).
4-32 Invariant Quadratic Estimation of Variance-Covariance Components of Type IQE By means of Definition 4.2 (one variance component) and Definition 4.9 (variancecovariance components) we introduce O 2 IQE of 2 and O k IQE of k : Those conditions of IQE, represented in Lemmas 4.7 and 4.9 enable us to separate the estimation process of first moments j (like BLUUE) from the estimation process of central second moments k (like BIQUUE). Finally we provide you with 1=2 the general solution (4.74) of the in homogeneous matrix equations Mk A D 0 (orthogonality conditions) for all k21, ,l(l+1)/2 where l.l C 1/=2 is the number of variance-covariance components, restricted to the special Gauss–Markov model E fyg D A, d fyg D X of “full column rank”, A 2 Rnm , rkA D m. Definition 4.2. (invariant quadratic estimation O 2 of 2 : IQE ):
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
221
The scalar O 2 is called IQE (Invariant Quadratic Estimation) of 2 2 RC with respect to the special Gauss–Markov model of full column rank. Efyg D A; A 2 Rnm ; rkA D m Dfyg D V 2 ; V 2 Rnn ; rkV D n; 2 2 RC ;
(4.68)
if the “variance component 2 is ” (i) a quadratic estimation O 2 D y0 My D .vecM/0 .y ˝ y/ D .y0 ˝ y0 /.vecM/
(4.69)
subject to M 2 S Y M WD fM 2 Rnn jM0 D Mg
(4.70)
(ii) transformational invariant: y ! y Efyg DW ey in the sense of 0
or
O 2 D y0 My D ey Mey
(4.71)
O 2 D .vecM/0 .y ˝ y/ D .vecM/0 .ey ˝ ey /
(4.72)
O 2 D tr.Myy0 / D tr.Mey e0y /
(4.73)
or
Already in the introductory paragraph we emphasized the key of “IQE”. Indeed by the postulate “IQE” the estimation of the first moments Efyg D A is supported by the estimation of the central second moments Dfyg D V 2 or d fyg D X. Let us present to you the fundamental result of “O 2 IQE OF 2 ”. Lemma 4.8. (invariant quadratic estimation O 2 of O 2 :IQE): Let M D .M1=2 /0 M1=2 be a multiplicative decomposition of the symmetric matrix M. The scalar O 2 is IQE of 2 , if and only if M1=2 D 0 or A0 .M1=2 /0 D 0
(4.74)
for all M1=2 2 Rnn . Proof. First, we substitute the transformation y D Efyg C ey subject to expectation identity Efyg D A; A 2 Rnm ; rkA D m; into y0 My:
222
4 The Second Problem of Probabilistic Regression
y0 My D 0 A0 MA C 0 A0 Mey C e0y MA C e0y Mey : Second, we take advantage of the multiplicative decomposition of the matrix M, namely M D .M1=2 /0 M1=2 ; (4.75) which generates the symmetry of the matrix M 2 SYM: D fM 2 Rmn jM0 D Mg y0 My D 0 A0 .M1=2 /0 M1=2 A C 0 A0 .M1=2 /0 M1=2 ey Ce0y .M1=2 /0 M1=2 A C e0y Mey : Third, we postulate “IQE”. y0 My D e0y Mey , M1=2 A D 0 , A0 .M1=2 /0 D 0: For the proof, here is our journey’s end. Let us extend “IQE” from a “one variance component model” to a “variancecovariance components model”. First, we define “IQE” (4.81) for variancecovariance components, second we give necessary and sufficient conditions identifying “IQE”. Definition 4.9. (variance-covariance components model O k IQE of k ): The dispersion vector dO .y/ is called IQE (“Invariant Quadratic Estimation”) with respect to the special Gauss–Markov model of full column rank. Efyg D A; A 2 fRnm gI rkA D m (4.76) d fyg D X ; Dfyg † y positive definite; rk† y D n; if the variance-covariance components WD Œ12 ; 12 ; 22 ; 13 ; 23 ; : : : ; `2 0
(4.77)
(i) bilinear estimations O k D y0 My D .vecM/0 .y ˝ y/ D trMk yy0 8Mk 2 Rnn`.`C1/=2
(4.78)
subject to Mk 2 S Y M WD fMk 2 Rnn`.`C1/=2 jMk D M0k g; (ii) translational invariant y ! y Efyg DW ey
(4.79)
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
223
O k D y0 Mk y D e0y Mk ey
(4.80)
O k D .vecMk /0 .y ˝ y/ D .vecMk /0 .ey ˝ ey /:
(4.81)
Note the fundamental lemma “O k IQE of k ” whose proof follows the same line as the proof of Lemma 4.7. Lemma 4.10. (invariant quadratic estimation O k of k : IQE): 1=2
1=2
Let Mk D .Mk /0 Mk be a multiplicative decomposition of the symmetric matrix Mk . The dispersion vector O k is IQE of k , if and only if 1=2
1=2
Mk A D 0 or A0 .Mk /0 D 0 1=2
for all Mk
(4.82)
2 Rnn`.`C1/=2 .
? How can we characterize “O 2 IQE of 2 ” or “O k IQE of k ” ? The problem is left with the orthogonality conditions (4.74) and (4.82). Box 4.4 reviews the general solutions of the homogeneous equations (4.83) and (4.85) for our “full column rank linear model”. Box 4.4. (General solutions of homogeneous matrix equations): 1=2
Mk
D0
,
Mk D Zk .In AA /
“for all A 2 G WD fA 2 R
nm
jAA A D Ag
(4.83) 00
W rkA D m
A D
A L
D .A0 Gy A/1 A0 Gy
mn “for all left inverses A j.A A/0 D A Ag00 L 2 fA 2 R # 1=2 Mk D 0 1=2 ) Mk D Zk ŒIn A.A0 Gy A/1 A0 Gy rkA m
(4.84)
(4.85)
“unknown matrices : Zk and Gy .” First, (4.83) is a representation of the general solutions of the inhomogeneous matrix equations (4.82) where Zk ; k 2 f1; : : : ; `.` C 1/=2g; are arbitrary matrices. Note that k D 1; M1 describes the “one variance component model”, otherwise the general variance-covariance components model. Here we are dealing with a special Gauss–Markov model of “full column rank”, rkA D m. In this case, the generalized inverse A is specified as the “weighted left inverse” A L of type (4.70) whose weight Gy is unknown. In summarizing, representations of two matrices Zk and 1=2 Gy to be unknown, given Hk , Mk is computed by
224
4 The Second Problem of Probabilistic Regression 1=2
1=2
D ŒIn Gy A.A0 Gy A/1 A0 Z0 k Zk ŒIn A.A0 Gy A/1 A0 Gy (4.86) definitely as a symmetric matrix. Mk D .Mk /0 Mk
4-33 Invariant Quadratic Uniformly Unbiased Estimations of Variance-Covariance Components of Type IQUUE Unbiased estimations have already been introduced for the first moments Efyg D A; A 2 Rnm ; rkA D m. Similarly we like to develop the theory of the one variance component 2 and the variance-covariance unbiased estimations for the central second moments, namely components k , k 2 f1; : : : ; `.` C 1/=2g, where l is the number of blocks. Definition 4.11 tells us when we use the terminology “invariant quadratic uniformly unbiased estimation” O 2 of 2 or O k of k , in short “IQUUE” Lemma 4.12 identifies O 2 IQUUE of 2 by the additional trVM D 1. In contrast, Lemma 4.12 focuses on O k IQUUE of k by means of the additional conditions trCj Mk D ı j k . Examples are given in the following paragraphs. Definition 4.11. (invariant quadratic uniformly unbiased estimation O 2 of 2 and O k of k : IQUUE): The vector of variance-covariance components O k is called IQUUE (Invariant Quadratic Uniformly Unbiased Estimation) of k with respect to the special Gauss– Markov model of full column rank. 2
Efyg D A; A 2 Rnm ; rkA D m 4 d fyg D X ; X 2 Rn2 `.`C1/=2 ; Dfyg † y positive definite; rk† y rk˙y D n; vechDfyg D d fyg;
(4.87)
if the variance-covariance components WD Œ12 ; 12 ; 22 ; 13 ; 23 ; : : : ; `2
(4.88)
O k D y0 Mk y D .vecMk /0 .y ˝ y/ D trMk yy0 8Mk 2 Rnn`.`C1/=2
(4.89)
are (i) a bilinear estimation
subject to 1=2
1=2
Mk D .Mk /0 .Mk / 2 S Y M WD fMk 2 Rnm`.`C1/=2 jMk D M0k g (ii) translational invariant in the sense of
(4.90)
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
225
y ! y Efyg DW ey
or
O k D y0 Mk y D e0y Mk ey
(4.91)
O k D .vecMk /0 .y ˝ y/ D .vecMk /0 .ey ˝ ey /
(4.92)
O k D trMk yy0 D trMk ey e0y ;
(4.93)
or
(iii) uniformly unbiased in the sense of k D 1 (one variance component): EfO 2 g D 2 ; 8 2 2 RC ;
(4.94)
k 1 (variance-covariance components): EfO k g D k ; 8k 2 fR`.`C1/=2 j† y positive definiteg;
(4.95)
with l variance components and l.l 1/=2 covariance components. Note the quantor “for all 2 2 RC ” within the definition of uniform unbiasedness (4.79) for one variance component. Indeed, weakly unbiased estimators exist without the quantor (B. Schaffrin 2000). A similar comment applies to the quantor “for all k 2 fR`.`C1/=2 j† y positive definite” within the definition of uniform unbiasedness (4.80) for variance-covariance components. Let us characterize “O 2 IQUUE of 2 ”. Lemma 4.12. (O 2 IQUUE of 2 ): The scalar O 2 is IQUUE of 2 with respect to the special Gauss–Markov model of full column rank.
“first moment00 W Efyg D A; A 2 Rnm ; rkA D m “central second moment00 W Dfyg D 2 ; V 2 Rnn ; rkV D n; 2 2 RC ; if and only if .i / M1=2 A D 0
(4.96)
.i i / trVM D 1:
(4.97)
and :Proof : 2
First, we compute EfO g.
226
4 The Second Problem of Probabilistic Regression
O 2 D trMey e0y ) EfO 2 g D trM†y D tr† y M: Second, we substitute the “one variance component model” † y D V 2 . EfO 2 g WD 2 8 2 2 R
, trVM D 1:
Third, we adopt the first condition of type “IQE”. The conditions for “O k IQUUE of k ” are only slightly more complicated. Lemma 4.13. (O k IQUUE of 2 ): The vector O k , k 2 f1; : : : ; `.` C 1/=2g is IQUUE of k with respect to the block partitioned special Gauss–Markov model of full column rank. “first moment” 82 39 2 3 A1 y1 > ˆ > ˆ ˆ 6 > 6 A2 7 ˆ 7> ˆ = 6 <6 y2 7> 7 6 6 :: 7 7 E 6 : 7 D 6 ::: 7 D A; A 2 Rn1 n2 n`1 ;n` m ; rkA D m 6 6 > ˆ 7 7 ˆ > ˆ 4 A`1 5 4 y`1 5> > ˆ > ˆ ; : y` A`
(4.98)
n1 C n2 C C n`1 C n` D n “central second moment” 82 39 2 V12 12 y1 > V11 12 ˆ ˆ > ˆ > 6 7 6 ˆ > y V V22 22 ˆ > 2 12 12 <6 7= 6 6 :: 7 6 :: :: D 6 : 7 D6 : : ˆ 6 7> 6 ˆ > ˆ 4 y`1 5> 4 V1`1 1`1 V2`1 2`1 ˆ > ˆ > : ; y` V1` 1l V1` 1l Dfyg D
` X
Cjj j2 C
j D1
V1`1 1`1 V2`1 2`1 :: :
2 V`1;`1 `1 V`1;` `1;`
V1` 1l V2` 2` :: :
3
7 7 7 7 7 V`1;` `1;` 5 V`;` `2 (4.99)
`.`C1/=2
X
Cj k j k
(4.100)
j;kD1 j
Dfyg D
X
Cj j
(4.101)
j D1
V11 2 R
WD Œ12 ; 12 ; 22 ; 13 ; 23 ; 32 ; : : : ; `2 2 R`.`C1/=2C1
(4.102)
Cj 2 Rnn`.`C1/=2 .3d array/
(4.103)
n1 n1
; V12 2 R
n1 n2
; : : : ; V`1;` 2 R
n`1 nl
; V`` 2 R
n` n`
(4.104)
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
227
Dfyg † y 2 Rnn D R.n1 C:::Cnl /.n1 C:::Cnl /
(4.105)
rk† y D n; † y positive definite
(4.106)
if and only if 1=2
.i / Mk A D 0
(4.107)
.i i / trCj Mk D ıj k :
(4.108)
and Before we continue with the proof we have to comment on our setup of the variancecovariance components model. For a more easy access of an analyst we have demonstrated the blocks partitioning of the observation vector
and
2
3 y1 ; dim y1 D n1 6 :: 7 :: ; 4 : 5 : y` ; dim y` D n`
the variance-covariance 3 V11 12 V1` 1` 6 :: 7: † y D 4 ::: : 5 2
V1` 1` V`` `2
n1 observations build up the observation vector y1 as well as the variance factor V11. Similarly, n2 observations build up the variance factor V22 . Both observations collected in the observations vectors y1 and y2 , constitute the covariance factor V12 . This scheme is to be continued for the other observations and their corresponding variance and covariance factors. The matrices Cjj and Cj k which map variance components j k .k > j / to the variance-covariance matrix † y contain the variance factors Vjj at colj, rowj while the covariance factors containfV0j k ; Vj k g at colk ; rowj and colj ; rowk , respectively. The following proof of Lemma 4.12 is based upon the linear structure (4.85). :Proof: First, we compute EfO k g. EfO k g D trMk † y D tr† y Mk : Second, we substitute the block partitioning of the variance-covariance matrix † y . `.`C1/=2 P
3
`.`C1/=2 X Cj j 7 ) tr† M D tr Cj Mk j 5 y k j D1 j D1 EfO k g D tr† y Mk
†y D
trCj Mk ıj k D 0: Third, we adopt the first conditions of the type “IQE”.
(4.109)
228
4 The Second Problem of Probabilistic Regression
4-34 Invariant Quadratic Uniformly Unbiased Estimations of One Variance Component (IQUUE) from †y -BLUUE: HIQUUE Here is our first example of “how to use IQUUE”. Let us adopt the residual vector eQ y as predicted by † y -BLUUE for a “one variance component” dispersion model, namely Dfyg D V 2 ; rkV D m. First, we prove that M1=2 generated by V-BLUUE fulfils both the conditions of IQUUE namely M1=2 A D 0 and trVM D trV.M1=2 /0 M1=2 D 1. As outlined in Box 4.5, the one condition of uniform unbiasedness leads to the solutions for one unknown ˛ within the “ansatz” Z 0 Z D ˛V1 ; namely the number nm of “degrees of freedom” or the “surjectivity defect”. Second, we follow “Helmert’s” ansatz to setup IQUUE of Helmert type, in Short “HIQUUE”. Box 4.5. (IQUUE : one variance component) 1st variations fEfyg D Ax; A 2 Rnm ; rkA D m; Dfyg D V 2 ; rkV D m; 2 2 RC g eQ y D ŒIn A.A0 V1 A/1 A0 V1 y(4.31) 1st test: IQE M1=2 A D 0 “if M1=2 D ZŒIn A.A0 V1 A/1 A0 V1 ; ; then M1=2 A D 000 2nd test : IQUUE “if trVM D 1, then trfVŒIn V1 A.A0 V1 A/1 A0 Z0 ZŒIn A.A0 V1 A/1 A0 V1 g D 1 ansatz W Z0 Z D ˛V1 trVM D ˛trfVŒV
1
1
0
1
1
(4.110) 0
1
1
0
1
V A.A V A/ ŒIn A.A V A/ A V g D 1 1
1
trVM D ˛trŒIn A.A0 V A/A0 V D 1
trIn D 0 ) trŒA.A0 V1 A/1 A0 V1 D trA.A0 V1 A/1 A0 V1 A D trIm D m 1 (4.111) nm Let us make a statement about the translational invariance of eQy predicted by † y BLUUE and specified by the “one variance component” model † y D V 2 . trVM D ˛.n m/ D 1 ) ˛ D
1 0 1 eQ y D ey .† y -BLUUE/ D ŒIn A.A0 † 1 y A/ A † y y:
(4.112)
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
229
Corollary 4.14. (translational invariance): 1 0 1 eQy D ŒI A.A0 † 1 y A/ A † y ey D Pey
(4.113)
subject to 1 0 1 P WD In A.A0 † 1 y A/ A † y :
(4.114)
The proof is “a nice exercise”: Use eQ y D Py and replace y D Efyg C ey D A C ey . The result is our statement, which is based upon the “orthogonality condition” PA D 0. Note that P is idempotent in the sense of P D P2 . In order to generate “O 2 IQUUE of 2 ” we start from “Helmert’s ansatz”. Box 4.6. (Helmert’s ansatz)
one variance component 1 0 Q y D e0 y P0 † 1 eQ 0y † 1 y e y Pey D tr P† y Pey e y 0 0 1 Q y g D tr.P0 † 1 EfQe0y † 1 y e y P Efey e y g/ D tr.P † y P† y /
(4.115) (4.116)
“one variance component” † y D V 2 D C1 2 EfQe0y V1 eQ y g D .tr P0 V1 PV/ 2
8 2 2 R2
(4.117)
t r P0 V1 PV D tr ŒIn V1 A.A0 V1 A/A0 D n m
(4.118)
EfQe0y V1 eQ y g D .n m/ 2
(4.119)
O 2 W D
1 eQ 0 V1 eQ y ) EfO 2 g D 2 : nm y
(4.120)
Let us finally collect the result of “Helmert’s ansatz” in Corollary 4.15. (O 2 of HIQUUE of 2 ): Helmert’s ansatz O 2 D is IQUUE, also called HIQUUE.
1 e0 V1 ey nm y
(4.121)
230
4 The Second Problem of Probabilistic Regression
4-35 Invariant Quadratic Uniformly Unbiased Estimators of Variance Covariance Components of Helmert Type: HIQUUE Versus HIQE In the previous paragraphs we succeeded to prove that first M 1=2 generated by eQ y D ey .† y -BLUUE/ with respect to “one variance component” leads to IQUUE and second Helmert’s ansatz generated “O 2 IQUUE of 2 ”. Here we reverse the order. First, we prove that Helmert’s ansatz for estimating variance-covariance components may lead (or may, in general, not) lead to “O k IQUUE of k 00 : 1=2
1=2
Second, we discuss the proper choice of Mk and test whether (i) Mk A D 0 and 1=2 (ii) tr "j Mk D ıj k is fulfilled by HIQUUE of whether Mk A D 0 is fulfilled by HIQE.
Box 4.7. (Helmert’s ansatz variance-covariance components): step one: make a sub order device of variance-covariance components: 0 WD Œ12 ; 12 ; 22 ; 13 ; 12 ; : : : ; `2 00 `.`C1/=2
step two W compute † 0 WD .† y /0 D †
X
Cj j .0 /
(4.122)
j D1
step three: compute eQ y D ey .† 0 BLUUE/, namely 1 0 1 P.0 / WD .I A.A0 † 1 0 A/ A † 0
eQ y D P0 y D P0 ey
(4.123) (4.124)
step four: Helmert’s ansatz 1 0 0 Q y D e0 y P0 0 † 1 eQ 0y † 1 0 e 0 P0 ey D tr.P0 † 0 P0 ey e y /
(4.125)
0 Q y g D t r.P0 † 1 EfOe0y † 1 0 e 0 P0 †/
(4.126)
“variance-covariance components” `.`C1/=2
†y D †
X
Ck k
(4.127)
kD1 0 Q 0y g D tr.P0 † 1 EfQe0y † 1 0 e 0 P0 Ck /k
(4.128)
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
231
step five: multinomial inverse `.`C1/=2
†D
X
Ck k , † 1 D
`.`C1/=2
X
Ek .j /
(4.129)
i; j 2 f1; : : : ; `.` C 1/=2g `.`C1/=2 P .trP.0 /Ei .0 /P0 .0 /Cj /j EfQe0y Ei .0 /Qey g D
(4.130)
kD1
kD1
input:0 ; † 0 ; output:Ek .0 /: step six: Helmert’s equation
kD1
“Helmert’s choice” `.`C1/=2
e0 y Ei .0 / ey D 2
X
.trP.0 /Ei .0 /P0 .0 /Cj /j
(4.131)
j D1
q WD eQ y0 Ei .0 /Qey 6 6 q D HO 6 H WD trP.0 /Ei .0 /P0 .0 /Cj .Helmert’s process/ 4 O WD ŒO 12 ; O 12 ; O 22 ; O 13 ; O 23 ; O 32 ; : : : ; O `2 :
(4.132)
Box 4.7 summarizes the essential steps which lead to “O k HIQUUE of k ” if det H D 0, where H is the Helmert matrix. For the first step, we use some prior information 0 D O 0 for the unknown variance-covariance components. For instance, .† y /0 D † 0 D DiagŒ.12 /0 ; : : : ; .`2 /0 may be the available information on variance components, but leaving the covariance components with zero. Step two enforces the block partitioning of the variance-covariance matrix generating the linear space of variance-covariance components. eQ y D D0 ey in step three is the local generator of the Helmert ansatz in step four. Here we derive the key equation 0 Q y g D t r.D0 † 1 EfQe0y † 1 0 e 0 D0 †/ k . Step five focuses on the multinormal inverse of the block partitioned matrix †, also called “multiple IPM”. Step six is taken if we replace ˙01 by the block partitioned inverse matrix, on the “Helmert’s ansatz”. The fundamental expectation equation which maps the variance-covariance components j by means of the “Helmert traces” H to the quadratic terms q.0 /. Shipping the expectation operator on the left side, we replace j by their estimates O j . As a result we have found the aborted Helmert equation q D H O which has to be inverted. Note Efqg D H reproducing unbiasedness. Let us classify the solution of the Helmert equation q D H with respect to bias. First let us assume that the Helmert matrix is of full rank, vkH D `.` C 1/=2 the number of unknown variance-covariance components. The inverse solution, Box 4.8, produces an update O 1 D H1 .O 0 /0 q.O 0 / out of the zero order information O 0 we have implemented. For the next step, we iterate O 2 D H1 .O 1 /q.O 1 / up to the
232
4 The Second Problem of Probabilistic Regression
reproducing point O w D O w1 with in computer arithmetic when iteration ends. Indeed, we assume “Helmert is contracting”. Box 4.8. (Solving Helmert’s equation): the fast case : rkH D ` .` C 1/ =2; det H ¤ 0 “iterated Helmert equation00 W O 1 D H .O 0 /q.O 0 /; : : : ; O ! D H1 O !1 /q.O !1 / ! . 1
(4.133)
“reproducing point00 start W 0 D O 0 ) O 1 D H1 O 2 D H1 0 q0 ) 1 q1 subject to H1 WD H.O 1 /; q1 WD q.O 1 / ) ) O ! D O !1 .computer arithmetic/ W e nd: ?Is the special Helmert variance-covariance estimator Corollary 4.16 gives a positive answer. Corollary 4.16. (Helmert equation, det H¤ 0); In case the Helmert matrix H is a full rank matrix, namely rkH D ` .` C 1/ =2 O D H1 q
(4.134)
is † 1 -HIQUUE at reproducing point. Proof. q WD eQ 0y Ei eQ y Efg O D H1 Efqg D H1 H D :
|
For the second case of our classification, let us assume that Helmert matrix is no longer of full rank, rk H < `.` C 1/=2, det H D 0. Now we are left with the central question. C ? Is the special Helmert variance-covariance estimator D H ln1 q D H q of type “MINOLESS” “IQUUE”? Unfortunately, the MINOLESS of the rank factorized Helmert equation q D JKO outlined in Box 4.9 by the weighted Moore–Penrose solution, indicates a negative answer. Instead, Corollary 4 proves O is only HIQE, but resumes also in establishing estimable variance-covariance components as “Helmert linear combinations” of them.
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
233
Box 4.9. (Solving Helmerts equation the second case): rk H < `.` C 1/=2, det H D 0 “rank factorization” “MINOLESS” H D JK; rkH D rkF D rkG DW v
(4.135)
“dimension identities00 H 2 R`.`C1/=2`.`C1/=2 ; J 2 R`.`C1/=2v ; G 2 Rv`.`C1/=2 C H lm D H .weighted/ D KR .weighted/ D JL .weighted/
O lm D
0 1 1 0 1 G1 K .KG K /.J Gq J/ Gq q
D
HC ;q q:
(4.136) (4.137)
In case “det H D 0” Helmerts variance-covariance components estimation is no longer unbiased, but estimable functions like H O exist: Corollary 4.17. (Helmert equation, det H=0): In case the Helmert matrix H, rkH¡`.` C 1/=2, det H D 0, is rank deficient, the Helmert equation no longer generates an unbiased IQE. An estimable parameter set is HO : .i / H O D HHC q is† 0 HIQUUE
(4.138)
.ii/ O i s IQE: Proof. .i / Efg O D HC Efqg D HC H ¤ ; O 2 IQE .ii/ EfH g O D HHC Efqg D HHC H D H; H O 2 HIQUUE: | In summary, we lost a bit of our illusion that Q y .† y BLUUE/ now always produces IQUUE. “The illusion of progress is short, but exciting” “Solving the Helmert equations” Figure 4.1 illustrates the result of Corollary 4 and Corollary 5. Another drawback is that we have no guarantee that HIQE or HIQUUE
234 Fig. 4.1 Solving the Helmert equation for estimating variance-covariancecomponents
4 The Second Problem of Probabilistic Regression IQUUE versus IQE
det H0 sˆ k is  0 −HIQUUE of σk
det H0 sˆ k is only HIQE of sk Hsˆ k is  0 -IQUUE.
O Such a postulate can generates a positive definite variance-covariance matrix †. be enforced by means of an inequality constraint on the Helmert equation H O D q of type “O > 0” or “O > ” in symbolic writing. Then consult the text books on “positive variance-covariance component estimation”. At this end, we have to give credit to B. Schaffrin (1983, p. 62) who classified Helmerts variance-covariance components estimation for the first time correctly.
4-36 Best Quadratic Uniformly Unbiased Estimations of One Variance Component: BIQUUE First, we give a definition of “best” O 2 IQUUE of 2 within Definition 4.18 namely for a Gauss normal random variable y 2 Y D fRn ; pdf g. Definition 4.19 presents a basic result representing “Gauss normal” BIQUUE. In particular we outline the reduction of fourth order moments to second order moments if the random variable y is Gauss normal or, more generally, quasi-normal. At same length we discuss the suitable choice of the proper constrained Lagrangean generating O 2 BIQUUE of 2 . The highlighted is Lemma 4 where we resume the normal equations typical for O O 2 g of BIQUUE and Theorem 4 with explicit representations of O 2 , DfO 2 g and Df type BIQUUE with respect to the special Gauss–Markov model with full column rank. ? What is the “best” O 2 IQUUE of 2 ? First, let us define what is “best” IQUUE.
Definition 4.18. (O 2 best invariant quadratic uniformly unbiased estimation of 2 : BIQUUE) Let y 2 fRn ; pdf g be a Gauss normal random variable representing the stochastic observation vector. Its central moments up to order four
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
235
y
Efei g D 0; Efei ej g D ij D vij 2 y y
(4.139)
y y y
(4.140)
Efei ej ek g D ij k D 0; .obliquity/ y y y y
Efei ej ek el g D ij kl D ij kl C i k j l C i l j k D .vij vkl C vi k vj l C vi l vj k / 4
(4.141)
relate to the “centralized random variable” y
ey WD y Efyg D Œei :
(4.142)
The moment arrays are taken over the index set i; j; k; l 2 1;000; n when the natural number n is identified as the number of observations. n is the dimension of the observation space y 2 fRn ; pdf g. The scalar O 2 is called BIQUUE of 2 (Best Invariant Quadratic Uniformly Unbiased Estimation) of the special Gauss–Markov model of full column rank. “first moments00 W Efyg D A; A 2 Rnm ; ¸ 2 Rm ; rkA D m
(4.143)
00
“central second moments W Dfyg ay D V 2 ; V 2 Rnm ; 2 2 RC ; rkV D n m
2
(4.144)
C
where ¸ 2 R is the first unknown vector and 2 R the second un-known “one variance component”, if it is. (i) a quadratic estimation (IQE): O 2 D y0 My D .vecM/0 y ˝ y D trMyy0
(4.145)
subject to 1
1
M D .M 2 /0 M 2 2 S Y M WD fM 2 Rnm j M D M0 g
(4.146)
(ii) translational invariant, in the sense of y ! y Efyg DW ey O 2 D y 0 My D ey0 Mey
(4.147) (4.148)
or equivalently O 2 D .vecM/0 y ˝ y D .vecM/0 ey ˝ ey
(4.149)
O 2 D trMyy 0 D trMey e0 y
(4.150)
236
4 The Second Problem of Probabilistic Regression
(iii) uniformly unbiased in the sense of EfO 2 g D 2 ; 8 2 2 RC
(4.151)
and (iv) of minimal variance in the sense DfO 2 g WD EfŒO 2 EfO 2 g2 g D min :
(4.152)
M
In order to produce “best” IQUUE we have to analyze the variance EfŒO 2 EfO 2 g1 g of the invariant quadratic estimation O 2 the “one variance component”, of 2 . In short, we present to you the result in Corollary 4.19. (the variance of O with respect to a Gauss normal IQE): If O 2 is IQE of 2 , then for a Gauss normal observation space Y D fRn ; pdf g the variance of 2 of type IQE is represented by EfŒO 2 EfO 2 g2 g D 2trM0 VMV:
(4.153)
Proof.ansatz W IQE O 2 D trMey e0 y ) EfO 2 g D .trMV/ 2 EfŒO 2 EfO 2 gg D EfŒtrMey e0 y .trMV/ 2 ŒtrMey e0 y .trMV/ 2 g D Ef.trMey e0 y /.trMey e0 y /g .trMV/2 4 ) (4.153):
|
With the “ansatz” O 2 IQE of 2 we have achieved the first decomposition of varfO 2 g. The second decomposition of the first term will lead us to central moments of fourth order which will be decomposed into central moments of second order for a Gauss normal random variable y. The computation is easiest in “Ricci calculus”. An alternative computation of the reduction “fourth moments to second moments” in “Cayley calculus” which is a bit more advanced, is gives in Appendix A-7. Ef.trMey e0 y /.trMey e0 y /g D
n X i;j;k;lD1
D
n X
y y y y
mij mkl Efei ej ek el g D
n X
mij mkl ßij kl
i;j;k;lD1
mij mkl .ßij ßkl C ßi k ßj l C ßi l ßj k /
i;j;k;lD1
D
n X
mij mkl .vij vkl C vi k vj l C vi l vj k / 4
i;j;k;lD1
Ef.trMey e0 y /.trMey e0 y /g D 4 .trMV/2 C 2 4 tr.MV/2 :
(4.154)
A combination of the first and second decomposition leads to the final result.
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
237
EfŒO 2 EfO 2 gg D Ef.trMey e0 y /.trMey e0 y /g 4 .trMV/ D 2 4 .trMVMV/:
|
A first choice of a constrained Lagrangean for the optimization problems “BIQUUE”, namely (4.158) of Box 4.10, is based upon the variance EfŒO 2 EfO 2 g j IQEg constrained to “IQE” and (i) the condition of uniform unbiasedness .trVM/ 1 D 0 as well as (ii) the condition of the invariant quadratic estimation A0 .M1=2 / D 0. A second choice of a constrained Lagrangean generating O 2 BIQUUE of 2 , namely (4.160) of Box 4.10, takes advantage of the general solution of the homogeneous matrix equation M1=2 A D 0 which we already obtained for “IQE”. Equation (4.72) is the matrix container for M. In consequence, building into the Lagrangean the structure of the matrix M, desired by the condition of the invariance quadratic estimation O 2 IQE of 2 reduces the first Lagrangean by the second condition. Accordingly, the second choice of the Lagrangean (4.160) includes only one condition, in particular the condition for an uniformly unbiased estimation .trVM/ 1 D 0. Still we are left with the problem to make a proper choice for the matrices Z0 Z and Gy . The first “ansatz” Z0 Z D ˛Gy produces a specific matrix M, while the second “ansatz” Gy D V1 couples the matrix of the metric of the observation space to the inverse variance factor V1 . Those “natural specifications” reduce the second Lagrangean to a specific form (4.161), a third Lagrangean which only depends on two unknowns, ˛ and 0 . Now we are prepared to present the basic result for O 2 BIQUUE of 2 . Box 4.10. (Choices of constrained Lagrangeans generating O 2 BIQUUE of 2) “a first choice” 1=2 0
L.M1=2 ; 0 ; A1 / WD 2tr.MVMV/ C 20 Œ.trVM/ 1 C 2trA1 A0 .M
/ (4.155)
“a second choice” M D .M1=2 /0 M
1=2
ansatz : Z0 Z D ˛G y
D ŒIn Gy A.A0 Gy A/1 A0 Z0 ZŒIn A.A0 Gy A/1 A0 Gy (4.156) M D ˛Gy ŒIn A.A0 Gy A/1 A0 Gy
(4.157)
238
4 The Second Problem of Probabilistic Regression
VM D ˛VG y ŒIn A.A0 Gy A/1 A0 Gy ansatz W Gy D V
(4.158)
1
1
1
VM D ˛ŒIn A.A0 V A/1 A0 V
(4.159)
L.˛; 0 / D trMVMV C 20 Œ.VM 1/
(4.160)
1
1
trMVMV D ˛ 2 trŒIn A.A0 V A/1 A0 V D ˛ 2 .n m/ 1
1
trVM D ˛trŒIn A.A0 V A/1 A0 V D ˛.n m/ L.˛; 0 / D ˛ 2 .n m/ C 20 Œ˛.n m/ 1 D min : ˛;0
(4.161)
Lemma 4.20. (O 2 BIQUUE of 2 ): The scalar O 2 D y0 My is BIQUUE of 2 with respect to special Gauss–Markov model of full column rank, if and only if the matrix ˛ together with the “Lagrange multiplier” fulfills the system of normal equations
1 1 nm 0
"
˛O O 0
# D
0 1
(4.162)
solved by ˛O D
1 1 ; 0 D : nm nm : Proof:
(4.163)
Minimizing the constrained Lagrangean L.˛; 0 / D ˛ 2 .n m/ C 20 Œ˛.n m/ 1 D min ˛;0
leads us to the necessary conditions 1 @L .˛; O O 0 / D ˛.n O m/ C O 0 .n m/ D 0 2 @˛ 1 @L .˛; O O 0 / D ˛.n O m/ 1 D 0 2 @0
solved by ˛O D O 0 D
1 . nm
1 1 nm 0
or "
˛O O 0
# D
0 1
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
239
1 @2 L .˛; O O 0 / D n m 0 2 @˛ 2 constitutes the necessary condition, automatically fulfilled. Such a solution for the parameter ˛ leads us to the “BIQUUE” representation of the matrix M. MD
1 1 1 V1 ŒIn A.A0 V A/1 A0 V nm
(4.164)
Explicit representations O 2 BIQUUE of 2 , of the variance DfO 2 g and its estimate DfO 2 g are highlighted by Theorem 4.21. (O BIQUUE of 2 ): Let O 2 D y0 My D .vecM/0 .y ˝ y/ D trMyy0 be BIQUUE of 2 with reseat to the special Gauss–Markov model of full column rank. (i) O 2 BIQUUE of 2 Explicit representations of O 2 BIQUUE of 2 1
1
O 2 D .n m/1 y0 ŒV1 V1 A.A0 V A/1 A0 V y 1
O 2 D .n m/1 eQ 0 V eQ
(4.165) (4.166)
subject toQe D eQ ( BLUUE). (ii) DO 2 j BIQUUE BIQUUEs variance is explicitly represented by DfO 2 jBIQUUEg D EfŒO 2 EfO 2 g2 j BIQUUEg D 2.n m/1 . 2 /2 : (4.167) Q O 2 g (iii) Df An estimate of BIQUUEs variance is O O 2 g D 2.n m/1 .O 2 / Df
(4.168)
O O 2 g D 2.n m/3 .Qe0 V1 eQ /2 : Df
(4.169)
: Proof: We have already prepared the proof for (i). Therefore we continue to prove (ii) and (iii) .ii/DfO 2 j BIQUUEg DfO 2 g D EfŒO 2 EfO 2 g2 g D 2 2 trMVMV;
240
4 The Second Problem of Probabilistic Regression
1 1 1 ŒIn A.A0 V A/1 A0 V ; nm 1 1 1 MVMV D ŒIn A.A0 V A/1 A0 V ; .n m/2 MV D
trMVMV D
1 ) nm
) DfO 2 g D 2.n m/1 . 2 /2 : .iii/ DfO 2 g Just replace within DfO 2 g the variance 2 by the estimate O 2 . O O 2 g D 2.n m/1 .O 2 /2 : Df
|
Upon writing the chapter on variance-covariance component estimation we learnt about the untimely death of J.F. Seely, Professor of Statistics at Oregon State University, on 23 February 2002. J.F. Seely, born on 11 February 1941 in the small town of Mt. Pleasant, Utah, who made various influential contributions to the theory of Gauss–Markov linear model, namely the quadratic statistics for estimation of variance components. His Ph.D. adviser G. Zyskind had elegantly characterized the situation where ordinary least squares approximation of fixed effects remains optimal for mixed models: the regression space should be invariant under multiplication by the variance-covariance matrix. J.F. Seely extended this idea to variance-covariance component estimation, introducing the notion of invariant quadratic subspaces and their relation to completeness. By characterizing the class of admissible unbiased estimators of variance-covariance components. In particular, the usual ANOVA estimator in 2-variance component models is inadmissible. Among other contributions to the theory of mixed models, he succeeded in generalizing and improving on several existing procedures for tests and confidence intervals on variance-covariance components.
Additional Reading Seely. J. and Lee, Y. (confidence interval for a variance: 1994), Azzam, A., Birkes, A.D. and Seely, J. (admissibility in linear models, polyhydral covariance structure: 1988), Seely, J. and Rady, E. (random effects–fixed effects, linear hypothesis: 1988), Seely, J. and Hogg, R.V. (unbiased estimation in linear models: 1982), Seely, J. (confidence intervals for positive linear combinations of variance components, 1980), Seely, J. (minimal sufficient statistics and completeness, 1977), Olsen, A., Seely, J. and Birkes, D. (invariant quadratic unbiased estimators for two variance components, 1975), Seely, J. (quadratic sub-spaces and completeness, 1971) and Seely, J. (linear spaces and unbiased estimation, 1970).
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
241
4-37 Simultaneous Determination of First Moment and the Second Central Moment, Inhomogeneous Multilinear Estimation, the E D Correspondence, Bayes Design with Moment Estimations The general linear model with linear dispersion structure is reviewed with respect to (i) unbiasedness, (ii) invariance and (iii) nonnegativeness of the variance-covariance component estimation. The E D correspondence is outlined as well as the importance of quadratic dispersion component estimations and their reproducing property. We illustrate the theoretical investigations by two examples: (i) one variance component estimation and (ii) variance-covariance component estimation of Helmert type. These results are extended by three other examples: (iii) hybrid direction-distance network, (iv) gyroscopic azimuth observations and (v) Bayes design with moment estimation. The subject of variance-covariance component estimation within an adjustment process was one of the central research topics in the last decade, both in geodesy and mathematical statistics. In a remarkable bibliography which up-to-date to the year 1977 H. Sahai listed more than 1,000 papers on variance component estimation where the basic source was “Statistical Theory and Method Abstracts”, published for the International Statistical Institute by Longman Group Limited, “Mathematical Reviews” and “Abstract Service of Quality Control and Applied Statistics”. Let us refer also to his bibliography (1979) as well as H. Sahai, A.J. Khuri and C.H. Kapadia (1985), C.R. Rao and J. Kleffe (1988), R.D. Anderson (1979), C.R. Henderson (1953), J. Kleffe (1975, 1977), H. Drygas (1977,1981), S. Gnot and J. Kleffe (1983), S. Gnot, J. Kleffe and R. Zymslong (1985), P.S.R.S. Rao (1977), S.R. Searle(1978, 1979), L.R. Verdooren (1979,1980) and R. Thompson (1962, 1980). The PhD thesis of B. Schaffrin (1983) offers a critical review of the state-of-the-art. In Geodetic Sciences variance component estimation originates from F.R. Helmert (1907) who used least squares residuals to estimate heterogeneous variance components. Instead K. Kubik (1967 i, ii, iii, 1970) used early the method on Maximum Likelihood for estimating weight ratios in a hybrid distance-direction network. In this context we refer also to H.O Hartley and J.N.K Rao, H.D. Patterson and R. Thompson 1971) and K. R. Koch (1986). R. Kelm (1978) and E. Grafarend (1978 i, ii, 1982) used Helmert’s variance component estimation for geodetic examples. Monte-Carlo algorithms were used for variance components estimation by J. Kusche (2003). In his PhD thesis M. Serbetci analyzed gravimetric measurements by the Helmert estimation techniques. Numerical extensions of his estimation method originate from H. Ebner (1972, 1977), W. Foerstner (1979, 1982) and W. Welsch (1977, 1978, 1979, 1980). MINQUE estimators were used by R. Kelm (1978), with respect to the Gauss–Markov model various geodetic applications have proven the power of variance component estimation: H. Fr¨ohlich (1982, 1983, 1984), H. Fr¨ohlich and D. Duddek (1983), K. R. Koch (1981) and L. Sj¨oberg (1980) used variance component estimation for the analysis of the constant and distance dependent variance factor in electro-optical distance measurements.
242
4 The Second Problem of Probabilistic Regression
The analysis of deforming networks was based on variance component estimation by Y. Chen (1983), K. R. Koch(1981), B. Schaffrin (1981, 1983) and P. Schwintzer (1984). H. Pelzer (1982) presented a special variance component model for eccentric direction measurements . J. D. Bossler and R.H. Hanson (1980) used Bayesian variance component estimation for densification networks. A first application of variance component estimation of Helmert type has been in heterogeneously observed networks, for instance direction and distance measurements characterized by an unknown variance component. Examples are given by E. Grafarend and A. d’Hone (1978), E. Grafarend, A. Kleusberg and B. Schaffrin (1980), K. Kubik (1967), O. Remmer (1971) and W. Welsch (1978, 1979, 1980). A special field of geodetic application has been oscillation analysis based on a basic contribution of H. Wolf (1975) which was applied by M. Junasevic (1977) for estimating the signal-to-noise ratios in gyroscopic azimuth observations and has been extended by E. Grafarend and A. Kleusberg (1980) and A. Kleusberg and E. Grafarend (1981). Geodetic examples in Condition Adjustment and Condition Adjustment with Unknowns (Gauss–Helmert model) including variance-covariance component estimation, namely in collocation problems, have been given by E. Grafarend (1984), C. G. Persson (1981, 1982) and L. Sj¨oberg (1983 i, ii, 1983, 1984 i, ii, 1985, 1995). A. Amiri-Simkooej (2007) applied variance component estimation to GPS (Global Positioning System). W. Lindlohr (1983) used variance component estimation for the analysis of a special auto regression process in photogrammetric networks. P. Xu et al. (2007) discussed the estimating problem of variancecovariance components and proved conditions with respect to the rank of the variance-covariance matrix to be estimate. Of course, not all components of a full variance and covariance matrix are estimated, only r.r 1/j2 when we assume a variance–covariance matrix of rank r.r 1/. In a definition in P. Xu et al (2006), they presented a zero order Thykhonov regularization method for estimating variance-covariance components. A basic result which they found is the simultaneous estimation of the regularization parameter and related variance component estimation is advantages. P.J.G. Teunissen and A. R. Amiri-Simkooej (2008) studied least-squares variance component estimation: they treated an elliptical contoured distribution, best linear uniformly unbiased estimator (BLUE), best invariant quadratic unbiased estimator (BIQUE), minimum norm quadratic unbiased estimator (MINQUE) and restricted maximum likelihood estimator (REML). N. Crocetto, M. Gatti and P. Russo (2000) presented simplified formulae for the BIQUE estimation observation groups. K. R. Koch (1986, 1987) compares Maximum Likelihood Estimation and Bayesian Inference for variance components. They also studied in K. R. Koch and J. Kusche (2002) the regularization of potential determination from satellite data by variance components. J. Kusche (2003) analyzed noise variance estimation and optimal weight determination for GOCE gravity recovery. Z. Ou (1989, 1991) discussed various methods for variancecovariance component estimation and applied it to approximation Bayes estimation. A special procedure for estimating the variance function of linear models for GPS carrier-phase observations has been developed by W. Bischoff et al (2006).
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
243
C. C. J. M. Tiberius and F. Kenselaar applied variance components estimation for precise GPS positioning. Z. Yu (1992, 1996) tried to generalizes variancecovariance component estimation and applied it to Maximum Likelihood Estimation. Finally we have to refer to the most important geodetic contributions of K. R. Koch (1980, 1990, 1999, 2000) and K.R. Koch and M. Schmidt (1994): They estimated a set-up of a covariance and designed a test about the identity with respect to a prescribed design matrix in K. R. Koch (1976). Various estimators of variance components were analyzed in K. R. Koch (1978, 1979i (mixed models), 1979ii(Gauss–Helmert)) of variance-covariance components in K. R. Koch (1981, “Gauge unknown in distance measurements”, 1983 “Multivariate and incomplete multivariate models”), Bayes estimates of variance and Bayes Inference (1994, 1988i, ii) as well as maximum likelihood estimates of variance components in K. R. Koch and A. J. Pope (1986). Notable is also the contribution “Comments on P. Xu et al.” in K. R. Koch and J. Kusche (2006) relating to variance component estimation in linear inverse ill-posed models. The basic problem with the analysis of first moments and central second moments with second order statistics is their problematic structure design of their components. Have the first order moments a linear or nonlinear structure? Is the rank detectable? With respect to second order control moments it is very difficult to design. How many variances and covariances are needed for a reasonable structure? In case of variances it is very difficult to find out what is the rank of the variance matrix. A more question is the structure of covariances or of a variance-covariance matrix. What is the rank of an estimate set of the variances and of the covariances. Who decides upon a fixing those variances and covariances? One key problem, in addition, is the one how to produce realistic, non-negative variances or non-negative variance-covariance components. Most of the analysis programs work well to determine variances and covariances when there is no bias among the observations and the redundancy of the observations is “reasonable high”. How good are the initial values when variance-covariance values in an iteration process are used? How can we guarantee non-negative estimates for a variance-covariance matrix without any physical meaning? L.R. La Motte (1973) presented “a” non-negative quadratic unbiased estimation procedure for variance components. An alternative non-negative estimation procedure for variance which is “almost” unbiased non-negative iterative estimator was designed by S.D. Horn (1975). W. Foerstner (1979) came up with another non-negative estimator of variances. It is well-known that MINQUE (“Minimum Norm Quadratic Unbiased Estimator”) and the Foerstner’s estimator provide the same results in an iterative process. If the a priori weights are correct, no bias exist in the model as was proven by C. G. Persson (1980). It is also well known that MINQUE produces the same result as Best Quadratic Unbiased Estimator (BQUE)
for instance shown by L. Sj¨oberg (1984i). Based on results by F. Pukelsheim (1981) and L. R. la Motte (1973),
244
4 The Second Problem of Probabilistic Regression Best Quadratic Unbiased Non-negative Estimator (BQUNE)
were defined by L. Sj¨oberg (1984i) for the linear Gauss–Markov model and generalized to the Gauss–Helmert model. A special contribution of L . Sj¨oberg (1984i) is appreciated which is taken care of the problem when negative variance components appear. However, the two constraints of unbiasedness and non-negative of variance components cannot necessarily be satisfied simultaneously, an effect which led L . Sj¨oberg (1984) to consider. Best Quadratic Minimum Bias Non-negative Estimator (BQMBNE)
presented in M. Eshagh and L. Sj¨oberg (2008) partially. They focussed mainly on a Modified Best-Quadratic unbiased Non-negative Estimator. Example 1 (one variance components)
Assume a linear model which will be characterized by first and second moments on the n 1 vector of observations Efyg D A D Ax1
Dfyg D I 2 D Ix2 ;
where A is the nm first order design matrix, I the nn second order design matrix (here: a unit matrix). D x1 is the n 1 vector of unknown parameters, 2 D x2 the 1 1 vector (scalar) of unknown variance components. We intend to estimate the unknown vector or x1 linearly from the vector y of observations, the unknown 2 or x2 quadratically from the observation vector y. 1st postulate Qi D
n X
Lip yp or Q D Ly
(4.170)
pD1
Q 2 D
n X
Mpq yp yq or Q 2 D y0 My
p;qD1
In the classical procedure we at first estimate x1 : (i) Estimation of x1 2nd postulate We postulate unbiased estimation of x1 : EfQx1 g D x1
8x1
Corollary 1. 3 Efyg D Ax1 Efx1 g D x1 5 ) LA I D 0 xQ1 D Ly
(4.171)
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
245
PROOF: EfxQ1 g D EfLyg D LEfyg D LAx1 D x1 8x1 ” .LA I/x1 D 0 8x1 ” LA I D 0 Corollary 2. LA I D 0
has a solution
” rkA D m
x1 is unbiased estimable if and only if the rank rkA D r of the first order design matrix is identical with the number m of unknown parameters: r D m. 3rd postulate var xQ 1 D min
or
0
EfŒQx1 EfQx1 g ŒQx1 EfQx1gg D min L Corollary 3. var xQ 1 D tr 2 LL0 Proof: var xQ 1 D EfŒy Efyg0 L0 LŒy Efygg D Ef"0 L0 L"g D trLEf""0 gL0 D trLDfygL0 D tr 2 LL0 In order to assure BLUE (best linear unbiased estimations) of the vector x1 of unknowns we minimize tr 2 LL0 C 2tr.LA I/ D min L; A where the postulate of x1 – unbiasedness is added by the matrix of the Lagrange multipliers. As a result we find 2 " 0# IA 0 LO D 0 A 0 I O 2 IA D det. 2 I/det.A 2 A/ ¤ 0; (4.172) det A0 0 if
rkA D m
IPM-method (‘Pandora Box’ of C.R. Rao, “inverse partitioned matrix”)
:
246
4 The Second Problem of Probabilistic Regression
2I A A0 0
2 2 2 D C DC IA In 0 D D C0 E C0 E A0 0 0 Im
9 ID C AC0 D DI C CA0 D In > > > > = A0 C D C0 A D Im 2 IC C AE D 2 DA D 0 A0 D 2 D C0 2 I # EA0 D 0
> > > > ;
(4.173)
)
D D I 2 A.A0 2 A/1 A0 D I A.A0 A/1 A0 C D 2 A.A0 2 A/1 D A.A0 A/1 0 2
E D .A
A/
(4.174)
1
Result: LO D C0 D .A0 2 A/1 A0 2 D .A0 A/1 A0
(4.175)
Corollary 4. x1 BLUE O D .A0 A/1 A0 y xO 1 D Ly 2
0
DfOx1 g D .A A/
1
D E
(4.176) (4.177)
Next in the classical procedure we shall estimate x2 : (ii) Estimation of x2 The residual vector "O is defined by O D y A "O WD y E.y/ O Corollary 5. "O D D"
(4.178)
Proof:
.a/ "O D y O D y A.A0 A/1 Ay D .I A.A0 A/1 A0 /y D Dy (4.179) D WD .I A.A0 A/1 A0 / “projection operator, idempotent matrix”
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
.b/ Dy D D.y A/ D D"
“invariance property y 7! y A”
247
(4.180)
since .I A.A0 A/1 A0 /.y A/ D .I A.A0 A/1 A0 /y .A.A0 A/1 A0 A/ D .I A.A0 A/1 A0 /y Corollary 6. Ef"O 0 "O g D 2 .n m/
(4.181)
Proof:
Ef"O 0 "g O D trEf"O O "0 g D tr 2 DD0 D 2 trD D 2 tr.In / 2 tr.A.A0 A/1 A0 / D 2 tr.In / 2 tr..A0 A/1 A0 A/ D 2 tr.In / 2 tr.Im / D 2 .n m/ 2nd postulate We postulate unbiased estimation of x2 : EfxQ 2 g D x2 Corollary 7. If then
8x2
2 D .n m/1 "O 0 "O EfO 2 g D 2
or
EfxO 2 g D x2
(4.182) (4.183)
2
It can also be shown that xO 2 D O is BIQUE (best invariant quadratic unbiased estimation). The construction of an unbiased estimation of N 2 described above is the guiding principle for the estimation of variance components according to F.R. Helmert. Example 2 (variance component estimation of Helmert type): Assume a linear model which will be characterized by first and second moments of the n 1 vector of observations Efyg D
m X i D1
ai i
(4.184)
248
4 The Second Problem of Probabilistic Regression
Dfyg D
l X
Cjj j2 C
j D1
2
l l1 X X
Cj k j k D
(4.185)
j D1 kD2
12 6 2 12 D6 4 : 1l2
Q11 Q12 : Q011
3 12 Q12 : : : 1l Q1l 2 Q22 : : : 2l Q2l 7 7 5 : 0 2 2l Q2l : : : l Ql l
(4.186)
where the dispersion or variance-covariance matrix Dfyg is assumed to be positive definite. 2
Cjj
3 0 ::: 0 D 4 : : : Qjj : : : 5 row j 0 ::: 0
(4.187)
column j j D 1; : : : ; l column k 3 0 0 6 7 row j 0 Qj k 6 7 6 7 D 6::: :::7 6 7 0 4 5 row k Qj k 0 0 0 2
Cj k
(4.188)
column j A WD Œa1 ; : : : ; am is the n m first order design matrix, Cjj , Cj k are n x n second order design matrix blocks. m first moments (regression parameter), l second moments j2 of type variance and l.l 1/=2 second moments j2k of type covariance are unknown. The above linear dispersion structure may be derived from the linear structure of the residual vector # P " WD y Efyg D y A D lj D1 Aj "j ) Ef"j g D 0; Ef"j "0k g D ı j k I Dfyg D
l X j D1
Aj A0j j2 C
l X
.Aj A0k C Ak A0j j2k
(4.189)
(4.190)
j;k j
The variance-covariance model of Helmert type may be referred to as a linear dispersion structure of heterogeneous, but independent observations with respect to each other.
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
249
Let us rewrite the first two moments in vector form. Efyg D
m X
ai i
Efyg D A D Ax1
or
(4.191)
i D1 l.lC1/=2
Dfyg D
X
Cj j
d fyg WD vecDfyg D B D Bx2
or
(4.192)
j D1
x1 WD Œ1 ; : : : ; m 0 D ; o./ D m 1 x2 WD Œi2 ; 12 ; 22 ; 13 ; 23 ; 32 ; : : : ; l1;l ; l2 0 DW o./ D l.l C 1/=2 The first step : estimation of k (reverse of the classical procedure) n X
Qk D
Mkpq yp yq
(4.193)
p;qD1
(0) Choose a priori a vector o of variance-covariance components such that Do fyg D †.o / D † o is positive definite. Construct the residual vector "O which belongs to the † o – least square solution (1) 1 0 1 O D y A "O WD y Efyg O D .In A.A0 † 1 o A/ A † o /y
D D.o /y D D.o /.y A/ D D.o /"
(4.194)
Thus the estimator "O is invariant with respect to y 7! y A (2) 1 0 0 O D "0 D0 .o /† 1 "O 0 † 1 o " o D.o /" D trD .o /† o D.o /""
(3)
Ef"O 0 † 1 O g D trD0 .o /† 1 o " o D.o /†
(4.195)
(4.196)
Corollary. (multinomial inverse): l.lC1/=2
†D
X kD1
Ck k ) †
1
l.lC1/=2
D
X kD1
Ek
(4.197)
250
4 The Second Problem of Probabilistic Regression
e.g. † D 12
† 1
Q11 0 0 Q12 2 0 0 D C 12 C 2 0 Q22 0 0 Q012 0
D 12 C11 C 12 C12 C 22 C22 0 P12 2 P11 0 1 2 0 0 C 2 D D 1 C 12 P012 0 0 P22 0 0
(4.198)
(4.199)
D E1 C E2 C E3 0 P11 D Q1 11 C kKP22 K
(4.200)
P12 D P012 D kKP22 0
P22 D .Q22 kK Q11 K/ 2 12 ; 2 2 1 2
k WD
(4.201) 1
(4.202)
K WD Q1 11 Q12
(4.203)
Once we apply the above corollary for the multinomial inverse we arrive at l.lC1/=2
0
Ef"O Ei .o /"g O D
X
trD0 .o /Ei .o /D.o /Cj j
(4.204)
j D1
The Helmert equation for variance-covariance component estimation is now gained once we write for all i D 1; : : : ; l.l C 1/=2 l.lC1/=2
0
"O Ei .o /"O D
X
trD0 .o /Ei .o /D.o /Cj Oj
(4.205)
j D1
or in a more suitable form q D HO
(4.206)
q qi WD "O 0 Ei .o /"O D D0 .o /Ei .o /D.o /y
(4.207)
H Hij WD trD0 .o /Ei .o /D.o /Cj
(4.208)
O D H q
(4.209)
is the g-inverse solution of the Helmert equation. The inversion of the Helmert equation is supposed to give the factors which latter on guarantee an unbiased estimation of variance-covariance estimation. Two cases will be reported.
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
251
Case 1: detH ¤ 0 ) rkH D l.l C 1/=2 ) O
is
† o HIQUE.† o Helmert type IQUE/ ¤ † o HIQUE ¤ † o MINIQUE.† o minimum IQUE/
Case 2: detH D 0 ) rkH < l.l C 1/=2 ) O
is
† o HIQUE.† o Helmert type invariant quadratic estimation/
Note that in case 2 there is no word about unbiasedness. Still there remains the problem of positive variance components O12 ; O22 ; : : : ; Ol2 . O may differ significantly Actually the estimated variance-covariance matrix † O D † o is called from the assumed dispersion matrix † o . But the special case † of reproducing type. We also say a priori and a posteriori variance-covariance components coincide. The second step: estimation of i (reverse of the classical procedure) Q i D
m X
Lip yp
(4.210)
pD1
If
Q
solves the consistent system of equations
A0 † 1 ./A O O D A0 † 1 .O /y then
O
is called
(4.211)
O least squares solution where
L.O / D .A0 † 1 .O /A/1 A0 † 1 .O /
(4.212)
Many problems in regression parameter estimation, e.g. deformation analysis, are characterized by a more general linear dispersion structure that the Helmert one. A lot of study has been done for the linear model with linear dispersion structure, namely: Efyg D A D Ax1 vec Dfyg D B D Bx2 Solution for the unknown variance-covariance components are called admissible, if the dispersion matrix Dfyg is nonnegative definite and the variance components Oj2 are nonnegative The procedure of the simultaneous estimation of the unknowns x1 and x2 of first and second moment type is the proper estimation concept pioneered by J. Kleffe (1978), S. Gnot et al. (1977).
252
4 The Second Problem of Probabilistic Regression
x1 xD x2
xQ1 D K1 C L1 y C M1 .y ˝ y/
xQ2 D K2 C L2 y C M2 .y ˝ y/ xQ1 K1 L1 M1 y D C xQ2 K2 L2 M2 y˝y
“˝” indicates the “Kronecker-Zehfuß product”. 1st postulate “inhomogeneous, multilinear (bilinear) estimation” xQ D XY
2 K1 L1 M1 X WD ; Y WD 4 K2 L2 M2
3 1 y 5 y˝y
2nd postulate “invariance” y 7! y A W O invariant 3rd postulate “unbiasedness” or “minimum bias” Let us define the bias vector b and the bias matrix S by b W EfQxg x D Sx; kSk D
S WD X
A I B
p O D minfkSk jS D X A Ig trS0 S D minkSk B
The resulting condition equations have to be added via Lagrange multipliers to the postulate kDfQxgk = min of best estimation. A special case will be noted: K1 D 0; M1 D 0 W b1 D .LA I/ K2 D 0; M2 D 0 W b2 D M2 .A ˝ A/ C M2 vec † 4th postulate Note that the dispersion matrix DfQxg contains beside second moments also third and fourth moments of the observation vector y. In order to restrict the multilinear estimation process to second order statistics, normally (or quasinormally) distributed
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
253
observations are introduced which guarantee the reduction, for instance, ij k D Ef"i "j "k g D 0 ij kl D Ef"i "j "k "l g D ij kl C i k j l C i l j k or ij k D D 0 ij kl D fij klpqrs pq rs 5th postulate “best estimation” DfQxg WD EfŒQx EfQxgŒQx EfQxg0 g p kDfQxgk WD trDfQxg D min X Finally within the context of the five postulates important results will be mentioned. (1) For the linear model Efyg D A, vecDfyg D B, Dfyg positive definite, there always exist linear unbiased estimations zO D ZY of block-diagonal structure
L 0 y Z WD ; Y WD ; o.Y/ D n.n C 1/ 1 0M y˝y z WD vecŒF; G of certain optimality properties. (2) For the linear model Efyg D A, vecDfyg D B, Dfyg positive definite, there exist invariant quadratic estimations of arbitrary vectors G which are independent of the special estimations of vectors A by Ly D AA y. This result motivates the two steps estimation process: At first the vector G is invariant estimated, secondly the vector F is estimated dependent of G. O (3) “E-D-correspondence” Once we construct invariant quadratic estimation of vectors G, namely linear in Py ˝ Py where P WD I A.A0 A/ A0 , then the whole theory of regression parameter estimation with respect to a singular Gauß–Markov model can be transferred to the estimation of dispersion components, if the observation vector y is at least quasi-normally distributed. This idea has been pioneered by S.K. Mitra (1971), F.Pukelsheim (1976,1977,1979,1981) and J. Seely (1971,1972,1977).
254
4 The Second Problem of Probabilistic Regression
(4) We have briefly mentioned that the dispersion component estimation based on the a priori dispersion matrix † o leads to the concept of reproducing estimations O Actually an initial estimate † o does not lead to the in the sense of † o D †. O value †. We have to iterate, e.g. the Helmert equation, up to the reproducing by iteration. B. Schaffrin (1983) has described quite a number of iteration procedures. For instance, he proved that the diagonalization technique of the Helmert equation pioneered by W. F¨orstner (1979) leads to repro-BIQUE. (5) With respect to the Helmert equation we have mentioned that unbiased dispersion component estimations might not exist. Then we would prefer minimum bias estimation. In addition, variance component might be negative, a wellknown result from practical work. In this context we mention the nonnegative quadratic minimum bias estimator, pioneered by J. Hartung (1979,1980). B. Schaffrin (1981ii) has efficiently used the linear complementarity problem (LCP algorithm), e.g. to solve the Helmert equation under the constraints of nonnegative variance components. Let us discuss the intuitive Helmert estimation of variance-covariance components with respect to the linear model with linear dispersion structure. We take advantage of the quadratic set-up of variance-covariance components estimation of invariant type, namely O D XO" ˝ "O D Xvec "O "O 0 "O D Do " D Do y 1 0 1 where the projection matrix Do WD I A.A0 † 1 o A/ A † o refers to † o by 4.194. Combine (4.213) in order to find
O D Xvec Do ""0 D0o D XDo ˝ Do vec ""0
(4.213)
The postulate of unbiased variance-covariance component estimation and the linear dispersion structure lead to Efg O D XDo ˝ Do vec† D XDo ˝ Do B D for all
(4.214)
XDo ˝ Do B D I
(4.215)
X D Œ.Do ˝ Do /B l
(4.216)
or Obviously
is the left generalized inverse which solve (4.215) such that 0 O D Œ.Do ˝ Do /B l Do ˝ Do vec yy
(4.217)
We mention two general representations of the left generalized inverse which are equivalent, namely
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
X D fB0 .Do ˝ Do /0 R.Do ˝ Do /Bg1 B0 .Do ˝ Do /0 R
255
(4.218)
and X D fS0 .Do ˝ Do /Bg1 S0
(4.219)
R D SfB0 .Do ˝ Do /0 Sg1 fS0 .Do ˝ Do /Bg1 S0
(4.220)
The arbitrary matrices R and S have to be chosen in such a way that the regular inverse of the matrices f g exist.The equivalence of the various representations can be shown once we use R D X0 X, for instance. The Helmert estimator is obtained once we choose 0 1 1 1 0 1 SH D fŒ† 1 o † o A.A † o A/ A † o ˝ 0 1 1 1 1 0 1 † o † o A.A † o A/ A † o gf† o ˝ † o gBo
(4.221)
where we have introduced the vector of a multinomial inverse of † o according to (4.197) namely vec † 1 (4.222) o D Eo s and s the “summation vector”. Finally we present two characteristic geodetic numerical example: Example 3. (two variance components, hybrid direction-distance network): An example for a combined distance-direction two-dimensional network has been taken from W. Welsch. Figure 4.2 is the network graph, Table 4.1 the set of approximate (x,y)-coordinates, Table 4.2, the set of 40 distance and 99 direction observations. There are 12 points, 21 coordinate correction unknowns and 23
Fig. 4.2 Huaytapallana network, Peru
256
4 The Second Problem of Probabilistic Regression
Table 4.1 Approximate coordinates, Huaytapallana network, Peru
Table 4.2 Distance observations, Huaytapallana network, Peru
2 3 5 6 7 8 9 10 11 12 1 4 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 4 4
5 4 3 11 10 3 10 8 6 4 9 7 5 11 5 4 6 10 7 5
y
x
2 175.874 m 2 577.368 m 1 252.818 m 1 774.584 m 1 262.644 m 1 725.782 m 2 253.863 m 2 424.436 m 2 053.959 m 1 252.980 m 2 232.194 m 1 000.000 m
1 737.156 m 1 148.348 m 1 893.092 m 1 778.635 m 1 596.958 m 1 596.606 m 1 554.891 m 1 237.101 m 1 282.128 m 1 871.469 m 1 000.000 m 1 000.000 m
1325.4268 m 1232.1958 m 375.7020 m 333.6888 m 305.2402 m 712.6460 m 558.3987 m 471.5552 m 403.4410 m 1387.8132 m 198.2577 m 923.9211 m 936.1232 m 540.2310 m 1519.5508 m 1584.3316 m 1020.6574 m 176.8214 m 652.1740 m 928.1742 m
4 4 4 5 5 5 6 6 6 6 7 7 7 8 8 8 9 9 9 10
6 8 11 6 8 12 7 9 8 10 8 11 12 9 11 10 3 11 10 11
1098.2724 m 939.4676 m 1091.0600 m 534.1382 m 558.1884 m 21.6076 m 543.2001 m 528.9682 m 188.4908 m 845.9200 m 463.1096 m 851.6534 m 274.6834 m 529.7660 m 454.5350 m 785.7282 m 519.5172 m 338.1710 m 360.6380 m 373.1988 m
orientation unknowns. The coordinate system is geodetically oriented (y right, x top) Œ x1 ; x4 ; y4 0 D 0 has been chosen for the network datum. Units of the computations are meter for distances and rad for directions. Figure 6.183 is the plot of a posteriori variance ratio O12 =O22 as a function of the a priori variance ration 12 =22 in the interval 102 12 =22 105 with the reproducing point O2 =O2 D 2 = 2 D 2:2 104 or O D 0:0022 m for distances and O D 9; 4cc 1
2
1
2
1
2
for directions. Example 3 has been taken from E. Grafarend, A Kleusberg and B. Schaffrin (1980). Note that the abscissa in Fig. 4.3 contains logarithmic scale while its ordinate linear scale which explains the dashed curve which in linear-linear scale is a straight line.
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator Table 4.3 Direction observations, Huaytapallana network, Peru
1
11 10 3 4
0 79.24383 g 110.03086 g 335.87130 g
2
3 10 9 4 8 6
0 8.73148 g 12.35741 g 102.45062 g 118.82870 g 144.65833 g
1 4 11 5 2
0 19.87037 g 41.76821 g 58.44753 g 87.73951 g
5 7 6 8 2
0 8.82377 g 32.26790 g 38.63395 g 46.78519 g
2 6 8 4
0 3.09259 g 24.99630 g 106.90741 g
2 9 8 4 7 5
0 21.24722 g 110.11574 g 143.27346 g 171.73210 g 207.18827 g
6 2 8 4
0 12.01327 g 21.76358 g 148.09691 g
2 9 4 7 5 6
0 24.28580 g 175.47037 g 219.32438 g 254.92191 g 335.94846 g
3
4
5
6
7
8
11
4 8 10 3 1
0 65.28951 g 224.34846 g 232.58056 g 280.78056 g
2
8 7 6 5
0 9.57346 g 25.82932 g 29.92500 g
3
10 6 2
59.31235 g 68.20741 g 87.73488 g
4
5 8 11 3 1
0 38.63519 g 65.78457 g 76.46451 g 82.43395 g
5
3 8 1 12 4
0 3.04444 g 14.45988 g 66.93951 g 84.95370 g
6
3 10 4 5
0 1.85370 g 107.46080 g 171.37500 g
7
12 8 11 4
0 102.29383 g 126.34753 g 228.62654 g
8
10 11 4 5
0 18.38642 g 125.94784 g 205.39877 g
10
11 8 6 9 4 7
0 22.55494 g 36.53086 g 60.94383 g 0 40.75679 g
11
257
(Continued)
258 Table 4.3 (Continued)
4 The Second Problem of Probabilistic Regression 9
10
3 10 11 8 6 2
0 11.42994 g 83.05864 g 147.80833 g 170.59784 g 217.04722 g
1 11 6 2 3
0 64.32253 g 100.85370 g 127.25864 g 290.10370 g
1 1
12
9 3
156.91512 g 232.57685 g
4 5 4 10
0 47.06883 g 0 143.37284 g
5 7
0 198.20988 g
Fig. 4.3 Two variance component estimations of Helmert type: O12 =O22 D f .12 =22 /, Huaytapallana network, Peru, O1 D 0:0022m, O2 D 9; 4cc at reproducing point
Example 4. (two variance components, multivariate gyrocompass observations): An experimental data set of gyrocompass reversal point observations has been taken from M. Junasevic (1977) and are listed in Table 4.4: They include 18 reversal points with our TK3 instrument. The data analysis was based on a frequency oscillation model with linear damping and disturbances of first order Markov type, namely "i D i C "i 1 . The variance components within E. i j / D 12 ı ij , E."20 / D 22 have been estimated according to the Helmert method. Figure 4.4 is a plot of the a posteriori variance ratios as a function of a priori variance ratios O12 =O22 D f .12 =22 / and of different values we could find positive variances. For D 0:6 we arrived at 1 D 7”, 2 D 97” and for D 0:6 1 D 9”, 2 D 75” at the reproducing point.
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
259
Fig. 4.4 Two variance component estimations of Helmert type: O12 =O22 D f .12 =22 /, linear damping, first order Markov model D 0:5, O1 D 7”, O2 D 97” at reproducing point, D 0:6, O1 D 9”, O2 D 75” at the righthand true reproducing point Table 4.4 Data set of 18 reversal points, gyrocompass TK3
68.g 080 68.080 68.110
70.g 114 70.094 70.071
68.g 072 68.085 68.113
70.g 108 70.092 70.070
68.g 072 68.090 68.118
70.g 104 70.083 70.064
Inputs of 0:4 or 0:7 led to instabilities. The examples has been chosen since it demonstrates that, in general, there might exist more than one reproducing point, e.g. at 12 =22 D 0; 0005 and 12 =22 D 0; 015 for D 0:6. Only in the latter case the single variance components have been reproducing. Thus in using numerical algorithms to find the reproducing point by iteration, there is the danger that alternative reproducing points are overlooked. Example 4 has been taken from E. Grafarend and A. Kleusberg (1980). Again we note the logarithmic-linear scale of Fig. 4.4 similar to Fig. 4.3. Example 5. (Bayes design with moment estimation) A Bayes linear model with moment assumption is using, for instance, prior information of type mean parameter vector and the model variance 2 : This is taking into account by using and 2 follow a distribution, turned the prior distribution which is to be special by the experiment at the beginning of the modelling phase. Thus the parameters and 2 become random variables, in addition the response vector y. The model lying distribution now determines the joint distribution of y, and 2 . The expecting vector and the dispersion matrix of the response vector y conditionally with and 2 given, are given by 4.223. The expectation vector and the dispersion matrix is determined by the prior mean vector o , a prior dispersion matrix Ro , non-negative definite and a prior sample
260
4 The Second Problem of Probabilistic Regression
size o 0, given by 4.224. Finally we take advantage of prior model variance 2 > 0 being the expected value of the model variance 2 of type 4.223. The assumption (i) 4.223 (ii) 4.224, and (iii) 4.225 are called the Bayes linear model with moment assumptions. The assumption (ii) 4.224 are the critical ones: The prior estimation for , before any sampling evidence is available, is o and has uncertainty . 2 =o /Ro . In case of a full rank m, the Gauss–Markov theorem yields the sampling estimate O D .A0 A/1 A0 y, with dispersion matrix . 2 =/M1 , where M WD A0 A=. The uncertainty of the prior estimate and the variability of the sampling estimate are made comparable to the experimentor through specifying o , assesses the weight of the prior information on the same observation basis that applies to the sampling information. The scaling issue is of central importance because it touches on the essence of the Bayes philosophy, namely to combine prior information and sampling information being insured in comparable units. Accordingly the optimal estimator is easily understood. Otherwise, looks bad. Bayes design with moment estimation Assumption of type one Ep fyj; 2 g D A; Dp fyj; 2 g D In 2
(4.223)
Assumption of type two Ep fj 2 g D o ; Dp fj 2 g D
2 Ro o
(4.224)
Assumption of type three Ep f 2 g D o2
(4.225)
Matrix-valued risk function (mean squared error matrix) MSEfT.y/I K0 g D Ep Œ.T.y/ K0 /.T.y/ K0 /0
(4.226)
Let us assume a full column rank coefficient matrix K in order to transform the parameter system K0 into estimated quantities T(y). A proper choice of a matrixvalued risk function called mean squared error of type 4.227 is used to minimize the mean squared error matrix among all affine estimators By C b where the matrix B 2 Rln and the shift B 2 Rl . The shift b is needed since prior to sampling there is a bias. Ep fK0 g D Ep ŒEp .K0 k 2 / D Ep .K0 o / D K0 o (4.227) Unless o D 0, Any affine estimator that guarantees the minimum mean squared error matrix is called a Bayes estimator for K0
4-3 Setup of the Best Invariant Quadratic Uniformly Unbiased Estimator
261
In the prescribed Bayes linear model with moment assumptions, let the prior dispersion matrix Ro be positive definite, Then the unique Bayes estimator for K0 is K0 given by 4.228. The mean squared error matrix is accordingly given by 4.229. For the proof we leave it to F. Pukelheim (1990), pp. 270–272. Unique Bayes estimator for K0 0 0 1 1 Q D .o R1 o C A A/ .o Ro o C A y/
(4.228)
Minimum mean squared error matrix Q K0 / D 2 K0 . R1 C A0 A/1 K MSEp .K0 I o o o
(4.229)
Chapter 5
The Third Problem of Algebraic Regression Inconsistent system of linear observational equations with datum defect: overdetermined – undetermined system of linear equations: fAx C i D yjA 2 Rnm ; y … R .A/ rkA < minfm; ngg The optimisation problem which appears in general linear systems and weakly nonlinear systems is a standard topic in many textbooks on optimization. Here we again define nearly nonlinear systems of equations as a problem which allows a Taylor expansion. We start in the first section with a front example, an inconsistent system of linear equations with datum defect. We illustrate (a) algebraic partitioning (b) geometric partitioning and (c) set-theoretic partitioning. This setup is written not for mathematicians, but for the analyst with remote control of the notions herein. We construct the minimum norm-least squares solution (MINOLESS) leads to the solution by additive or multiplicative rank partitioning (5.55). The second section is based on a review of Minimum Norm - Least Squares Solution, both weighted norms, based on the four axioms of the generalized inverse, again called Rao’s Pandora box (C.R. Rao and S.K. Mitra (1971, pages 50–55, ”three basic g-inverses, Sect. 5.3.3: minimum norm least squares solution”)). A special highlight is the eigenvalue decomposition of (Gx ; Gy )-MINOLESS and its canonical representation: “left” and “right” eigenspace analysis versus synthesis. Please, pay attention to our notes. In Sect. 5-3 we shortly review Total Least Squares, namely the hybrid approximation solution of type “’-HAPS” and the celebrated Tykhonov-Phillips regularization, also called “Ridge Estimation” or “’-HAPS”: (5.108–5.110). Read only Lemma 5.6 (MINOS) and Lemma 5.9 (HAPS). We shall outline three aspects of the general inverse problem given in discrete form (i) set-theoretic (fibering), (ii) algebraic (rank partitioning; “IPM”, the Implicit Function Theorem) and (iii) geometrical (slicing). Here we treat the third problem of algebraic regression, also called the general linear inverse problem: An inconsistent system of linear observational equations fAx C i D y jA 2 Rnm ; rk A < min fn; mgg also called “under determined – over determined system of linear equations” is solved by means of an optimization problem. The introduction presents us with the front page example of inhomogeneous equations with unknowns. In terms of boxes and figures we review the minimum norm, least squares solution (“MINOLESS”) of such an inconsistent, rank deficient system of linear equations which is based upon the trinity.
E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 5, © Springer-Verlag Berlin Heidelberg 2012
263
264
5 The Third Problem of Algebraic Regression
Lemma5.2 Gx -minimum norm, Gy -least squares solution
Lemma 5.3 Gx -minimum norm, Gy -least squares solution
Definition 5.1 Gx-minimum norm, Gy-least squares solution
Lemma 5.4 MINOLESS, rank factorization
Lemma 5.5 MINOLESS additive rank partitioning
Lemma 5.6 characterization of Gx, Gy -MINOS
Lemma 5.7 eigenspace analysis versus eigenspace synthesis
Lemma 5.9 α-HAPS
Definition 5.8 α-HAPS
Lemma 5.10 α-HAPS
5-1 Introduction
265
5-1 Introduction With the introductory paragraph we explain the fundamental concepts and basic notions of this section. For you, the analyst, who has the difficult task to deal with measurements, observational data, modeling and modeling equations we present numerical examples and graphical illustrations of all abstract notions. The elementary introduction is written not for a mathematician, but for you, the analyst, with limited remote control of the notions given hereafter. May we gain your interest ? Assume an n-dimensional observation space, here a linear space parameterized by n observations (finite, discrete) as coordinates y D Œy1 ; : : : ; yn 0 2 Rn in which an m-dimensional model manifold is embedded (immersed). The model manifold is described as the range of a linear operator f from an m-dimensional parameter space X into the observation space Y. As a mapping f is established by the mathematical equations which relate all observables to the unknown parameters. Here the parameter space X, the domain of the linear operator f , will be also restricted to a linear space which is parameterized by coordinates x D Œx1 ; : : : ; xm 0 2 Rm : In this way the linear operator f can be understood as a coordinate mapping A W x 7! y D Ax. The linear mapping f W X ! Y is geometrically characterized by its range R.f /, namely R.A/, defined by R.f / WD fy 2 Rn jy D f.x/ forsome x 2 Xg which in general is a linear subspace of Y and its kernel N .f /, namely N .A/, defined by N .f / WD fx 2 jf .x/ D 0g: Here the range R.f /, namely the range space R.A/, does not coincide with the n-dimensional observation space Y such that y … R.f /, namely y … R.A/. In addition, we shall assume here that the kernel N .f /, namely null space N .A/ is not trivial: Or we may write N .f / ¤ 0. First, Example 1.3 confronts us with an inconsistent system of linear equations with a datum defect. Second, such a system of equations is formulated as a special linear model in terms of matrix algebra. In particular we are aiming at an explanation of the terms “inconsistent” and “datum defect”. The rank of the matrix A is introduced as the index of the linear operator A. The left complementary index n rk A is responsible for surjectivity defect, which its right complementary index m rk A for the injectivity (datum defect). As a linear mapping f is neither “onto”, nor “one-to-one” or neither surjective, nor injective. Third, we are going to
266
5 The Third Problem of Algebraic Regression
open the toolbox of partitioning. By means of additive rank partitioning (horizontal and vertical rank partitioning) we construct the minimum norm – least squares solution (MINOLESS) of the inconsistent system of linear equations with datum defect Ax C i D y; rkA minfn; mg. Box 5.3 is an explicit solution of the MINOLESS of our front page example. Fourth, we present an alternative solution of type “MINOLESS” of the front page example by multiplicative rank partitioning. Fifth, we succeed to identify the range space R.A/ and the null space N .A/ using the door opener “rank partitioning”.
5-11 The Front Page Example Example 5.1. (inconsistent system of linear equations with datum defect: Ax C i D y; x 2 X D Rm ; y 2 Y 2 Rn A 2 Rnm ; r D rkA minfn; mg): Firstly, the introductory example solves the front page inconsistent system of linear equations with datum defect, : x1 C x2 D 1 x1 C x2 C i1 D 1 : x2 C x3 D 1 or x2 C x3 C i2 D 1 : Cx1 x3 D 3 Cx1 x3 C i3 D 3 obviously in general dealing with the linear space X D Rm Ö x; d i m X D m, here m D 3, called the parameter space, and the linear space Y D Rn Ö y; d i mY D n, here n = 3, called the observation space.
5-12 The Front Page Example in Matrix Algebra Secondly, by means of Box 5.1 and according to A. Cayley’s doctrine let us specify the inconsistent system of linear equations with datum defect in terms of matrix algebra. Box 5.1. (Special linear model): three observations, three unknowns, rk A D 2 3 2 32 3 2 3 a11 a12 a13 x1 i1 y1 y D 4 y2 5 D 4 a21 a22 a23 5 4 x2 5 C 4 i2 5 , y3 a31 a32 a33 x3 i3 2 3 2 32 3 2 3 1 1 1 0 x1 i1 , y D Ax C i W 4 1 5 D 4 0 1 1 5 4 x2 5 C 4 i2 5 3 1 0 1 x3 i3 2
,
5-1 Introduction
267
, x0 D Œx1 ; x2 ; x3 ; y0 D Œy1 ; y2 ; y3 D Œ1; 1; 3; i0 D Œi1 ; i2 ; i3 x 2 R31 ; y 2 Z 31 R31 2 3 1 1 0 A WD 4 0 1 1 5 2 Z 33 R33 1 0 1 r D rkA D 2: The matrix A 2 Rnm ; here A 2 R33 ; is an element of Rnm generating a linear mapping f W x 7! Ax. A mapping f is called linear if f .x1 C x2 / D f .x1 / C f .x2 / holds. The range R.f /, in geometry called “the range space R.A/”, and the kernel N .f /, in geometry called “the null space N .A/” characterized the linear mapping as we shall see. ? Why is the front page system of linear equations called inconsistent ? For instance, let us solve the first two equations, namely x1 C x3 D 2 or x1 x3 D 2, in order to solve for x1 and x3 . As soon as we compare this result to the third equation we are led to the inconsistency 2 D 3: Obviously such a system of linear equations needs general inconsistency para-meters .i1 ; i2 ; i3 / in order to avoid contradiction. Since the right-hand side of the equations, namely the inhomogeneity of the system of linear equations, has been measured as well as the linear model (the model equations) has been fixed, we have no alternative but inconsistency. Within matrix algebra the index of the linear operator A is the rank r D rk A, here r = 2, which coincides neither with d i m X D m, (“parameter space”) nor with d i m Y D n (“observation space”). Indeed r D rk A < mi nn; m, here r D rk A < mi n3; 3. In the terminology of the linear mapping f , f is neither onto (“surjective”), nor one-to-one (“injective”). The left complementary index of the linear mapping f , namely the linear operator A, which accounts for the surjectivity defect, is given by ds D n rkA, also called “degree of freedom” (here ds D n rkA D 1). In contrast, the right complementary index of the linear mapping f , namely the linear operator A, which accounts for the injectivity defect is given by d D m rkA (here d D m rkA D 1 ). While “surjectivity” relates to the range R.f / or “the range space R.A/” and “injectivity” to the kernel N .f / or “the null space N .A/” we shall constructively introduce the notion of rangeR.f / range spaceR.A/
versus
kernelN .f / nullspaceN .f /
by consequently solving the inconsistent system of linear equations with datum defect. But beforehand let us ask: ? Why is the inconsistent system of linear equations called deficient with respect to the datum ?
268
5 The Third Problem of Algebraic Regression
At this point we have to go back to the measurement process. Our front page numerical example has been generated from measurements with a leveling instrument: Three height differences .y˛ˇ ; yˇ ; y ˛ / in a triangular network have been observed. They are related to absolute height x1 D h˛ ; x2 D hˇ ; x3 D h by means of h˛ˇ D hˇ h˛ ; hˇ D h hˇ ; h ˛ D h˛ h at points fP˛ ; Pˇ ; P g, outlined in more detail in Box 5.1. Box 5.2. (The measurement process of leveling and its relation to the linear model): y1 D y˛ˇ D h˛ˇ C i˛ˇ D h˛ C hˇ C i˛ˇ y2 D yˇ D hˇ C iˇ D hˇ C h C iˇ y3 D y ˛ D h ˛ C i ˛ D h C h˛ C i ˛ 2 3 2 3 2 3 2 32 3 2 3 y1 h˛ Chˇ Ci˛ˇ x1 Cx2 Ci1 1 1 0 i1 x1 4 y2 5 D 4 hˇ Ch Ciˇ 5 D 4 x2 Cx3 Ci2 5 D 4 0 1 1 5 4 x2 5 C 4 i2 5: y3 h Ch˛ Ci ˛ x3 Cx1 Ci3 x3 i3 1 0 1 : Thirdly, let us begin with a more detailed analysis of the linear mapping f W Ax D y nm , r D rk A minfn; mg. or Ax C i D y; namely of the linear operator A 2 R We shall pay special attention to the three fundamental partitioning, namely (i) Algebraic partitioning called additive and multiplicative rank partitioning of the matrix A (ii) Geometric partitioning called slicing of the linear space X (parameter space) as well as of the linear space Y (observation space) (iii) Set-theoretical partitioning called fibering of the set X of parameter and the set Y of observations.
5-13 Minimum Norm: Least Squares Solution of the Front Page Example by Means of Additive Rank Partitioning Box 5.3 is a setup of the minimum norm – least squares solution of the inconsistent system of inhomogeneous linear equations with datum defect following the first principle “additive rank partitioning”). The term “additive” is taken from the additive decomposition y1 D A11 x1 C A12 x2 and y2 D A21 x1 C A22 x2 of the observational equations subject to A11 2 Rnm , rk A11 minfn; mg. Box 5.3. (Minimum norm-least squares solution of the inconsistent system of inhomogeneous linear equations with datum defect, “additive rank partitioning ”.
5-1 Introduction
269
The solution of the hierarchical optimization problem (1st) jjijj2I D min W x
xl D argfjjy Axjj2I D min jAx C i D y; A 2 Rnm ; rkA minfn; mgg .2nd / jjxl jj2I D min W xl
xlm D
argfjjxl jj2I
0
0
D min jA Axl D A y; A0 A 2 Rmm ; rkA0 A mg
is based upon the simultaneous horizontal and vertical rank partitioning of the matrix A, namely A11 A12 ; A11 2 Rrr ; rkA11 D rkA DW r AD A21 A22
with respect to the linear model
y1 y2
D
A11 A12 A21 A22
x1 x2
y D Ax C i ˇ ˇ y1 2 Rr1 ; x1 2 Rr1 i C 1 ; ˇˇ i2 y2 2 R.nr/1 ; x2 2 R.mr/1 :
First, as shown before, we compute the least-squares solution jjijj2I D min orjjy x
Axjj2I D min which generates standard normal equations x
A0 Axl D A0 y or
A0 11 A11 C A0 21 A21 A0 11 A12 C A0 21 A22 A0 12 A11 C A0 22 A21 A0 12 A12 C A0 22 A22
x1 x2
A0 11 A0 21 D A0 12 A0 22
y1 y2
or
N11 N12 N21 N22
x1l x2l
m1 D m2
subject to N11 WD A0 11 A11 C A0 21 A21 ; N12 WD A0 11 A12 C A0 21 A22 ; m1 D A0 11 y1 C A0 21 y2 N21 WD A0 12 A11 C A0 22 A21 ; N22 WD A0 12 A12 C A0 22 A22 ; m2 D A0 12 y1 C A0 22 y2 ; which are consistent linear equations with an (injectivity) defect d D m rkA. The front page example leads us to
270
5 The Third Problem of Algebraic Regression
A11 A12 A21 A22
AD
D
1 1 0 1 10
0 1 -1
or
1 1 0 A11 D ; A12 D 1 0 1 A22 D 1 A21 D 1 0 ; 2 3 2 1 1 A0 A D 4 1 2 1 5 1 1 2 2 1 1 ; N12 D ; jN11 j D 3 ¤ 0; N11 D 1 1 2 N21 D 1 1 ; N22 D 2 1 4 0 0 y1 D ; m1 D A 11 y1 C A 21 y2 D 1 0 0 0 y2 D 3; m2 D A 12 y1 C A 22 y2 D 4: Second, we compute as shown before the minimum norm solution jjxl jj2I D min or x0 1 x1 Cx0 2 x2 which generates the standard normal equations in the following xl way. L.x1 ; x2 / D x0 1 x1 C x0 2 x2 0 1 1 1 0 D .x0 2l N0 12 N1 11 m 1 N11 /.N11 N12 x2l N11 m1 / C x 2l x2l D min x2
“additive decomposition of the Lagrangean” L D L0 C L1 C L2 0
L0 WD m
2 1 N11 m1 ;
L1 WD 2x0 2l N0 12 N2 11 m1
0 L2 WD x0 2l N0 12 N2 11 N12 x2l C x 2l x2l
@L 1 @L 1 1 @L 2 .x2lm / D 0 , .x2lm / C .x2lm / D 0 @x2 2 @x2 2 @x2 0 2 , N0 12 N2 11 m1 C .I C N 12 N11 N12 /x2lm D 0 , 1 0 2 , x2lm D .I C N0 12 N2 11 N12 / N 12 N11 m1 ;
which constitute the necessary conditions. Using Cayley inverse sum of two matrices formula, namely .I C BC1 A0 /1 BC1 D B.AB C C/1 for appropriate dimensions of the involved matrices, such that the identities holds
5-1 Introduction
271
1 0 2 0 0 2 1 .I C N0 12 N2 11 N12 / N 12 N11 D N 12 .N12 N 12 C N11 /
we finally find x2lm D N0 12 .N12 N0 12 C N211 /1 m1 : The second derivatives 1 @2 L .x2lm / D .N0 12 N2 11 N12 C I/ > 0 2 @x2 @x0 2 due to positive-definiteness of the matrix I C N0 12 N2 11 N12 generate the sufficiency condition for obtaining the minimum of the unconstrained Lagrangean. Finally let us backward transform 1 x2l 7! x1m D N1 11 N12 x2l C N11 m1 ; 0 0 2 1 1 x1lm D N1 11 N12 N 12 .N12 N 12 C N11 / m1 C N11 m1 :
Let us right multiply the identity N12 N0 12 D N11 N0 11 C N12 N0 12 C N11 N0 11 by .N12 N0 12 C N11 N0 11 /1 such that N12 N0 12 .N12 N0 12 C N11 N0 11 /1 D N11 N0 11 .N12 N0 12 C N11 N0 11 /1 C I holds, and left multiply by N1 11 , namely 0 0 0 1 N1 D N0 11 .N12 N0 12 C N11 N0 11 /1 C N1 11 N12 N 12 .N12 N 12 C N11 N 11 / 11 :
Obviously we have generated the linear form
xlm
x1lm D N0 11 .N12 N0 12 C N11 N0 11 /1 m1 x2lm D N0 12 .N12 N0 12 C N11 N0 11 /1 m1 or 0 N 11 x1lm D .N12 N0 12 C N11 N0 11 /1 m1 xlm D x2lm N0 12 or 0 0 A 11 A11 C A 21 A21 D A0 12 A11 C A0 22 A21 Œ.A0 11 A12 C A0 21 A22 /.A0 12 A11 C A0 22 A21 / C .A0 11 A11 C A0 21 A21 /2 1 Œ.A0 11 y1 C A0 21 y2 :
272
5 The Third Problem of Algebraic Regression
Let us compute numerically xlm for the front page example.
5 4 11 ; N12 N0 12 D 4 5 11 1 63 6 3 0 0 0 0 1 ; ŒN12 N 12 C N11 N 11 D N12 N 12 C N11 N 11 D 3 6 27 3 6 1 4 4 4p 4 m1 D ; x2lm D ; jjxlm jj2I D ) x1lm D 2 0 3 0 3 3 N11 N0 11 D
4 4 x1lm D hO ˛ D ; x2lm D hO ˇ D 0; x3lm D hO D 3 3 4p jjxlm jj2I D 2 3 x1lm C x2lm C x3lm D 0 hO˛ C hO ˇ C hO D 0: The vector ilm of inconsistencies has to be finally computed by means of ilm D y Axlm
ilm
2 3 1 14 5 1p D 3: 1 ; A0 il D 0; jjilm jj2I D 3 3 1
The technique of horizontal and vertical rank partitioning has been pioneered by H. Wolf (1972,1973). |
5-14 Minimum Norm: Least Squares Solution of the Front Page Example by Means of Multiplicative Rank Partitioning Box 5.4 is a setup of the minimum norm-least squares solution of the inconsistent system of inhomogeneous linear equations with datum defect following the first principle “multiplicative rank partitioning”. The term “multiplicative” is taken from the multiplicative decomposition y D Ax C i D DEy C i of the observational equations subject to A D DE; D 2 Rnr ; E 2 Rrm ; rkA D rkD D rkE minfn; mg: Box 5.4. (Minimum norm-least squares solution of the inconsistent system of inhomogeneous linear equations with datum defect multiplicative rank partitioning ) The solution of the hierarchical optimization problem
5-1 Introduction
273
.1st/jj ijj2I D min x
xl D argfjjy Axjj2I D min jAx C i D y; A 2 Rnm ; rkA minfn; mg .2nd/jj xl jj2I D min xl
xlm D argfjjxl jj2I D min jA0 Axl D A0 y; A0 A 2 Rmm ; rkA0 A mg is based upon the rank factorization A D DE of the matrix A 2 Rnm subject to simultaneous horizontal and vertical rank partitioning of the matrix A, namely A D DE D
D 2 Rnr ; rkD D rkA DW r minfn; mg E 2 Rrm ; rkE D rkA DW r minfn; mg
with respect to the linear model y D Ax C i
y D Ax C i D DEx C i
Ex DW z DEx D Dz
) y D Dz C i:
First, as shown before, we compute the least-squares solution jjijj2I D min orjjy Axjj2I D min which generates standard normal equations
x
x
D0 Dzl D D0 y ) zl D .D0 D/1 D0 y D D0 l y; which are consistent linear equations of rank rkD D rkD0 D D rkA D r: The front page example leads us to
A11 A12 A D DE D A21 A22
D
1 0 1
1 1 0
2 3 0 1 1 1 ; D D 4 0 1 5 2 R32 -1 1 0
or
D0 DE D D0 A ) E D .D0 D/1 D0 A 1 21 1 0 1 2 1 0 0 1 )ED 2 R23 ; .D D/ D DDD 0 1 1 1 2 3 12 1 1 0 1 zl D .D0 D/1 D0 D y 3 0 1 1
274
5 The Third Problem of Algebraic Regression
2
3 1 4 2 4 5 yD ) zl D 1 3 1 3 1 1 0 1 zl D .D0 D/1 D0 D y 3 0 1 1 2 3 1 4 2 4 5 yD ) zl D : 1 3 1 3 Second, as shown before, we compute the minimum norm solution jjx` jj2I D min of x`
the consistent system of linear equations with datum defect, namely xlm D argfjjxl jj2I D min j Exl D .D0 D/1 D0 y g : xl
As outlined in Box 1.3 the minimum norm solution of consistent equations with datum defect namely Exl D .D0 D/1 D0 y; rkE D rkA D r is xlm D E0 .EE0 /1 .D0 D/1 D0 y C xlm D E m Dl y D Alm y D A y; which is limit on the minimum norm generalized inverse. In summary, the minimum norm-least squares solution generalized inverse (MINOLESS g-inverse) also called pseudo-inverse AC or Moore–Penrose inverse is the product of the MINOS g-inverse E m (right inverse) and the LESS g-inverse Dl (left inverse). For the front page example we are led to compute 1 0 1 21 ED ; EE0 D 0 1 1 12 2 3 2 1 1 1 2 1 .EE0 /1 D ; E0 .EE0 /1 D 4 1 2 5 3 1 2 3 1 1 2 3 1 0 1 1 xlm D E0 .EE0 /1 .D0 D/1 D0 y D 4 1 1 0 5 y 3 0 1 1 2 3 2 3 2 3 1 4 1 1 4p 4 y D 4 1 5 ) xlm D 4 0 5 D 4 0 5 ; jjxlm jj D 2 3 3 3 3 C4 C1
5-1 Introduction
275
3 2O 3 2 3 h˛ x1lm 1 4 7 6 xlm D 4 x2lm 5 D 4 hO ˇ 5 D 4 0 5 3 x3lm C1 hO 4p jjxlm jj D 2 3 O x1lm C x2lm C x3lm D 0 h˛ C hO ˇ C hO D 0: 2
The vector ilm of inconsistencies has to be finally computed by means of ilm WD y Axlm D ŒIn AA lm y; ilm D ŒIn E0 .EE0 /1 .D0 D/1 D0 yI
ilm
2 3 1 14 5 1p D 3 1 ; A0 il D 0; jjilm jj D 3 3 1
| Box 5.5 summarizes the algorithmic steps for the diagnosis of the simultaneous horizontal and vertical rank partitioning to generate (Fm1 Gy )-MINOS. Box 5.5. (algorithm): The diagnostic algorithm for solving a general rank deficient system of linear equations y D Ax; A 2 Rnm ; rkA < minfn; mg by means of simultaneous horizontal and vertical rank partioning Determine the rank of the matrixA rkA < min fn; mg: # C ompute “the simultaneous horizontal and vertical rank partioning00 " # A11 A12 A11 2 Rrr ; A12 2 Rr.mr/ AD ; A21 A22 A21 2 R.nr/r ; A22 2 R.nr/.mr/ “n r is called the left complementary index; m r the right complementary index00 “A as a linear operator is neither injective .m r ¤ 0/ ; nor surjective .n r D 0/ :00
#
276
5 The Third Problem of Algebraic Regression
C ompute the range space R.A/ and the null space N .A/ of the linear operator A R.A/ D span fwl1 .A/; : : : ; wlr .A/g N .A/ D fx 2 Rn j N11 x1` C N12 x2` D 0g or x1` D N1 11 N12 x2` : # C ompute .Tm ; Gy / MI NOS
0 N 11 x1 y y y 0 0 1 0 0 D D ŒN12 N 12 C N11 N 11 ŒA 11 G11 ; A 21 G12 1 x`m D x2 N0 12 y2 y y y y N11 WD A0 11 G11 A11 C A0 21 G22 A21 ; N12 WD A0 11 G11 A12 C A0 21 G22 A22 y y N21 WD N0 12 ; N22 WD A0 12 G11 A12 C A0 21 G22 A22 :
5-15 The Range R(f) and the Kernel N(f) Interpretation of “MINOLESS” by Three Partitionings (i) Algebraic (rank partitioning) (ii) Geometric (slicing) (iii) Set-theoretical (fibering) Here we will outline by means of Box 5.6 the range space as well as the null space of the general inconsistent system of linear equations. Box 5.6. (The range space and the null space of the general inconsistent system of linear equations): Ax C i D y; A 2 Rnm ; rkA min fn; mg “additive rank partitioning”. The matrix A is called a simultaneous horizontal and vertical rank partitioning, if AD
A11 A12 ; A11 D Rrr ; rkA11 D rkA DW r A21 A22 with respect to the linear model
5-1 Introduction
277
ec1
ec2
O e3
y
e1
e2
Fig. 5.1 Range R.f / range space R.A/, .y … R.A// (A) =
1 0
(A) = G1, 3
3
x2
x3
Fig. 5.2 Kernel N .f /, null space N .A/, “the null space N .A/ as the linear manifold (Grassmann space G1;3 ) slices the parameter space X D R3 ”, x3 is not displayed
y D Ax C i; A 2 Rnm ; rkA min fn; mg identification of the range space R.A/ D span f
n X
ei aij j j 2 f1; : : : ; rgg
i D1
“front page example”
y1 y2
D
1
1 3
2
3 1 1 0 ; A D 4 0 1 1 5 2 R33 ; rkA DW r D 2 1 0 1
R.A/ D span fe1 a11 C e2 a21 C e3 a31 ; e1 a12 C e2 a22 C e3 a32 g R3 or R.A/ D span fe1 C e3 ; e1 e2 g R3 D Y c1 D Œ1; 0; 1; c2 D Œ1; 1; 0; R3 D spanf e1 ; e2 ; e3 g
278
5 The Third Problem of Algebraic Regression
A
(A-)
(A-) ⊂ (A)
(A-) A-
(A)
(A) (A)
P (A-)
(A) ⊂ (A-)
A
A1;2;3;4 Fig. 5.3 Least squares, minimum norm generalized inverse A or AC /, the Moore– `m .A Penrose-inverse (Tseng inverse)
identification of the null space N11 x1` C N12 x2` D A0 11 y1 C A0 21 y2 N12 x1` C N22 x2` D A0 12 y1 C A0 22 y2 N .A/ WD fx 2 Rn jN11 x1` C N12 x2` D 0g or N11 x1` C N12 x2` D 0 , x1` D N1 11 N12 x2` “front page example”
x1 x3
`
N11
1 21 1 x x3` D 3 D 1 x3 ` 3 12 1 21 2 1 1 ; N D ; N1 D D 12 11 1 2 1 3 12
x1` D u; x2` D u; x3` D u N .A/ D H01 D G 1;3 : Box 5.7 is a summary of MINOLESS of a general inconsistent system of linear equations y D Ax C i. Based on the notion of the rank r D rkA < minfn; mg, we designed the generalized inverse of MINOS type or or A1;2;3;4 A `m
5-1 Introduction
279
Box 5.7. (MINOLESS of a general inconsistent system of linear equations): f W x ! y D Ax C i; x 2 X D Rm .parameter space/; y 2 Y D Rn .observation space/ r WD rkA < minfn; mg A generalized inverse of MINOS type W A1;2;3;4 or A `m Condition#1 f .x/ D f .g.y// , fDf ıgıf Condition#2 g.y/ D g.f .x// , g Dgıf ıg Condition#3 f .g.y// D yR.A/ ,
Condition#1 Ax D AA Ax , AA A D A Condition#2 A y D A Ax D A AA y , A AA D A Condition#3 A Ay D yR.A/ , A A D PR.A /
f ı g D PR.A/ Condition#4
Condition#4 AA D PR.A/
g.f .x// D yR.A/? g ı f D PR.g/ A A D PR.A / : AA D PR.A/ f ı g D PR.f /
A similar construction of the generalized inverse of a matrix applies to the diagrams of the mappings: (1) Under the mapping A: D.A/ ! R.A/ AA D PR.A/ f ı g D PR.f / (2) Under the mapping A: R.A/ ! PR.A / A A D PR.A / g ı f D PR.g/ :
280
5 The Third Problem of Algebraic Regression A
X
−
Y
X
Y A+ = A*
A =A
(A*GyA)−
*
A*GyA
X*
Gy
(Gy)−1 = Gy* Gx* = (Gx)−1
Y*
A*
A*GyA
Gx
X*
A*
(A*Gy*A)−
Y*
y |→ y*= Gvy ∀ y ∈Y, y* ∈ Y*
Fig. 5.4 Orthogonal inverses and adjoints in reflexive dual Hilbert spaces
Fig. 5.5 Venn diagram, trivial fibering of the domain D .f /: Trivial fibers N .f /? , trivial fibering of the range R.f / W trivial fibers R.f / and R.f /? , f W Rm D X ! Y D Rn ; X set system of the parameter space, Y set system of the observation space
In addition, we follow Figs. 5.4 and 5.5 for the characteristic diagrams describing: (i) Orthogonal inverses and adjoints in reflexive dual Hilbert spaces (ii) Venn diagrams, trivial fiberings In particularly, if Gy is rank defect we proceed as follows. Gy
Gy D
y 0 0 0
synthesi s
analysi s 0 U1 0 Gy ŒU1 ; U2 0 Gy D U Gy U D Gy D UGy U U0 2 U0 1 G y U1 U0 1 G y U2 D U1 ƒy U0 1 D U0 2 G y U0 1 U0 2 G y U2 ƒy D U0 1 Gy U1 ) U1 ƒy D Gy U1 0 D Gy U0 2 and U0 1 Gy U2 D 0
5-1 Introduction
281
# jjy Axjj2Gy D jjijj2 D i0 Gy i ) G y D U0 1 ƒ y U1 .y Ax/0 U0 1 ƒy U1 .y Ax0 / D min ) x
) U1 .y Ax/ D U1 i D i If we use simultaneous horizontal and vertical rank partitioning yD
y1 i A11 A12 x1 i C 1 D C 1 y2 i2 A21 A22 x2 i2
subject to special dimension identities y1 2 Rr1 ; y2 2 R.nr/1 A11 2 Rrr ; A12 2 Rr.mr/ A21 2 R.nr/r ; A22 2 R.nr/.mr/; we arrive at Lemma 5.0. Lemma 5.1. (Gx ; Gy )-MINOLESS, simultaneous horizontal and vertical rank partitioning):
y1 i1 A11 A12 x1 i yD C D C 1 y2 i2 A21 A22 x2 i2
(5.1)
subject to the dimension identities y1 2 Rr1 ; y2 2 R.nr/1 ; x1 2 Rr1 ; x2 2 R.mr/1 A11 2 Rrr ; A12 2 Rr.mr/ A21 2 R.nr/r ; A22 2 R.nr/.mr/ is a simultaneous horizontal and vertical rank partitioning of the linear model (5.1) fy D Ax C i; A 2nm ; r WD rkA < minfn; mgg r is the index of the linear operator A, n r is the left complementary index and m r is the right complementary index. x` is Gy -LESS if it fulfils the rank x`m is MINOS of A0 Gy Ax` D A0 Gy y, if
(5.2)
282
5 The Third Problem of Algebraic Regression 1 x x 1 x 1 0 .x1 /`m D N1 11 N12 ŒN 12 N11 G11 2G21 N11 N12 C G22 0 1 x 1 x 1 1 .N 12 N11 G11 N11 2G21 N11 /m1 C N11 m1
(5.3)
x 1 x 1 x 1 .x2 /`m D ŒN0 12 N1 11 G11 N11 N12 2G21 N11 N12 C G22 0 1 x 1 x 1 .N 12 N11 G11 N11 2G21 N11 /m1 :
(5.4)
The symmetric matrices .Gx ; Gy / of the metric of the parameter space X as well as of the observation space Y are consequently partitioned as Gy D
y
y
G11 G12 y y G21 G22
and
Gx11 Gx12 Gx21 Gx22
D Gx
(5.5)
subject to the dimension identities Gx11 2 Rrr ; Gx12 2 Rr.mr/ G11 2 Rrr ; G12 2 Rr.nr/ y y .nr/r .nr/.nr/ versus G21 2 R ; G22 2 R Gx21 2 R.mr/r ; Gx22 2 R.mr/.mr/ y
y
deficient normal equations A0 Gy Ax` D A0 Gy y
(5.6)
or
N11 N12 N21 N22
x1 x2
D
`
M11 M12 M21 M22
y1 y2
D
m1 m2
(5.7)
subject to N11 WD A0 11 G11 A11 C A0 21 G21 A11 C A0 11 G12 A21 C A0 21 G22 A21 y
N12 WD A
0
y 11 G11 A12
y
CA
0
y 21 G21 A12
y
CA
0
y 11 G12 A22
CA
0
y
(5.8)
y 21 G22 A22 ;
(5.9)
0
N21 D N 12 ; N22 WD A
0
y y y C A 22 G21 A12 C A0 12 G12 A22 C A0 22 G22 A22 ; y y y y A0 11 G11 C A0 21 G21 ; M12 WD A0 11 G12 C A0 21 G22 ; y y y y A0 12 G11 C A0 22 G21 ; M22 WD A0 12 G12 C A0 22 G22 ;
y 12 G11 A12
M11 WD M21 WD
0
m1 WD M11 y1 C M12 y2 ; m2 WD M21 y1 C M22 y2 :
(5.10) (5.11) (5.12) (5.13) (5.14)
5-2 MINOLESS and Related Solutions Like Weighted Minimum
283
5-2 MINOLESS and Related Solutions Like Weighted Minimum Norm-Weighted Least Squares Solutions 5-21 The Minimum Norm-Least Squares Solution: “MINOLESS” The system of the inconsistent, rank deficient linear equations Ax C i D y subject to A 2 Rnm ; rkA < min fn; mg allows certain solutions which we introduce by means of Definition 5.1 as a solution of a certain hierarchical optimization problem. Lemma 5.2 contains the normal equations of the hierarchical optimization problems. The solution of such a system of the normal equations is presented in Lemma 5.3 for the special case (i) jGx j ¤ 0 and case (ii) jGx C A0 Gy Aj ¤ 0; but jGx j D 0. For the analyst: Le mma 5.4 Le mma 5.5 presents the toolbox of presents the toolbox of MINOLESS for multiplicative and MINOLESS for additive rank partitioning, known as rank partitioning. rank factorization: Definition 5.1. (Gx -minimum norm-Gy -least squares solution): A vector x`m 2 X D Rm is called Gx , Gy -MINOLESS (MInimum NOrm with respect to the Gx -seminorm-Least Squares Solution with respect to the Gy seminorm) of the inconsistent system of linear equations with datum defect 2 6 Ax C i D y 4
rkA min fn; mg nm A2R
y … R.A/; N .A/ ¤ f0g x`m 2 X D Rm ; y 2 Y D Rn ;
(5.15)
if and only if first x` D argfjjijjGy D min jAx C i D y; rkA minfn; mgg;
(5.16)
x`m D argfjjxjjGx D min jA0 Gy Ax` D A0 Gy yg
(5.17)
x
second
x
is Gy -MINOS of the system of normal equations A0 Gy Ax` D A0 Gy y which are Gx -LESS. The solutions of type Gx , Gy -MINOLESS can be characterized as following.
284
5 The Third Problem of Algebraic Regression
Lemma 5.2. (Gx -minimum norm, Gy least squares solution): A vector x`m 2 X D Rm is called Gx , Gy -MINOLESS of (5.1), if and only if the system of normal equations
G x A0 G y A 0 A Gy A 0
x`m `m
D
0 A0 Gy y
(5.18)
with respect to the vector `m of “Lagrange multipliers” is fulfilled. x`m always exists and is uniquely determined, if the augmented matrix ŒGx ; A0 Gy A agrees to the rank identity rkŒGx ; A0 Gy A D m (5.19) or, equivalently, if the matrix Gx C A0 Gy A is regular. Proof. Gy -MINOS of the system of normal equations A0 Gy Ax` D A0 Gy is constructed by means of the constrained Lagrangean L.x` ; ` / WD x0 ` Gx x` C 20` .A0 Gy Ax` A0 Gy y/ D min; x;
such that the first derivatives 3 1 @L .x`m ; `m / D Gx x`m C A0 Gy A`m D 0 7 2 @x 7 7, 5 1 @L .x`m ; `m / D A0 Gy Ax`m A0 Gy y D 0 2 @ Gx A0 Gy A 0 x`m , D `m A0 Gy A 0 A0 G y y constitute the necessary conditions. The second derivatives 1 @2 L .x`m ; `m / D Gx 0 2 @x@x0
(5.20)
due to the positive semidefiniteness of the matrix Gx generate the sufficiency condition for obtaining the minimum of the constrained Lagrangean. Due to the assumptionA0Gy y 2 R.A0 Gx A/ the existence of Gy -MINOS x`m is granted. In order to prove uniqueness of Gy -MINOS x`m we have to consider case (i) Gx positive definite
and
case (ii) Gx positive semidefinite.
case (i): Gx positive definite
5-2 MINOLESS and Related Solutions Like Weighted Minimum
285
ˇ ˇ ˇ G x A0 G y A ˇ ˇ ˇ 0 1 0 ˇ ˇ ˇ ˇ ˇ A0 Gy A 0 ˇ D jGx j A Gy AGx A Gy A D 0:
(5.21)
First, we solve the system of normal equations which characterize x`m Gx , Gy MINOLESS of x for the case of a full rank matrix of the metric Gx of the parametric space X, rkGx D m in particular. The system of normal equations is solved for
x`m `m
G x A0 G y A D A0 G y A 0
0 0 C1 C2 D C3 C4 A0 Gy y A0 Gy y
(5.22)
subject to
Gx A0 Gy A A0 Gy A 0
C1 C2 C3 C4
Gx A0 Gy A Gx A0 Gy A D A0 Gy A 0 A0 Gy A 0
(5.23)
as a postulate for the g-inverse of the partitioned matrix. Cayley multiplication of the three partitioned matrices leads us to four matrix identities. Gx C1 Gx C Gx C2 A0 Gy A C A0 Gy AC3 Gx C A0 Gy AC4 A0 Gy A D Gx Gx C1 A0 Gy A C A0 Gy AC3 A0 Gy A D A0 Gy A 0
0
0
0
(5.24) (5.25)
A Gy AC1 Gx C A Gy AC2 A Gy A D A Gy A
(5.26)
A0 Gy AC1 A0 Gy A D 0:
(5.27)
0 Multiply the third identity by G1 x A Gy A from the right side and substitute the fourth identity in order to solve for C2 . 1 0 0 0 A0 Gy AC2 A0 Gy AG1 x A Gy A D A Gy AGx A Gy A
(5.28)
1 0 0 0 C2 D G1 x A Gy A.A Gy AGx A Gy A/
solves the fifth equation 1 0 1 0 0 0 0 A0 Gy AG1 x A Gy A.A Gy AGx A Gy A/ A Gy AGx A Gy A 1 0 0 D A Gy AGx A Gy A
(5.29)
by the axiom of a generalized inverse x`m D C2 A0 Gy y
(5.30)
1 0 0 0 0 x`m D G1 y A Gy A.A Gy AGx A Gy A/ A Gy y:
(5.31)
1 0 0 0 0 We leave the proof for “G1 x A Gy A.A Gy AGx A Gy A/ A Gy is the weighted C pseudo-inverse or Moore Penrose inverse AGy Gx ” as an exercise.
case (ii): Gx positive semidefinite
286
5 The Third Problem of Algebraic Regression
Second, we relax the condition rkGx D m by the alternative rkŒGx ; A0 Gy A D m Gx positive semidefinite. Add the second normal equation to the first one in order to receive the modified system of normal equations
G x C A0 G y A A0 G y A
A0 Gy A 0
x`m `m
A0 G y y D A0 G y y
rk.Gx C A0 Gy A/ D rkŒGx ; A0 Gy A D m:
(5.32) (5.33)
The condition rkŒGx ; A0 Gy A D m follows from the identity
G 0 x Gx C A Gy A D ŒGx ; A Gy A 0 .A0 Gy A/ 0
0
Gx ; A0 Gy A
(5.34)
ˇ ˇ namely ˇGx C A0 Gy Aˇ ¤ 0: The modified system of normal equations is solved for
0 x`m G x C A0 G y A A0 G y A A Gy y D A0 G A A0 Gy y `m 0 0 y 0 C1 A Gy y C C2 A0 Gy y A Gy y C1 C2 D D C3 C4 A0 Gy y C3 A0 Gy y C C4 A0 Gy y
(5.35)
subject to
C1 C2 Gx C A0 Gy A A0 Gy A Gx C A0 Gy A A0 Gy A 0 C3 C4 0 A0 Gy A A0 Gy A 0 Gx C A Gy A A0 Gy A D A0 Gy A 0
(5.36)
as a postulate for the g-inverse of the partitioned matrix. Cayley multiplication of the three partitioned matrices leads us to the four matrix identities “element (1,1)” .Gx C A0 Gy A/C1 .Gx C A0 Gy A/ C A0 Gy AC3 .Gx C A0 Gy A/ C.Gx C A0 Gy A/C2 A0 Gy A C A0 Gy AC4 A0 Gy A D Gx C A0 Gy A
(5.37)
“element (1,2)” .Gx C A0 Gy A/C1 A0 Gy A C A0 Gy AC3 D A0 Gy A
(5.38)
“element (2,1)” A0 Gy AC1 .Gx C A0 Gy A/ C A0 Gy AC2 A0 Gy A D A0 Gy A “element (2,2)”
(5.39)
5-2 MINOLESS and Related Solutions Like Weighted Minimum
287
A0 Gy AC1 A0 Gy A D 0:
(5.40)
First, we realize that the right sides of the matrix identities are symmetric matrices. Accordingly the left sides have to constitute symmetric matrices, too. .1; 1/ W
.Gx C A0 Gy A/C0 1 .Gx C A0 Gy A/ C .Gx C A0 Gy A/C0 3 A0 Gy A CA0 Gy AC0 2 .Gx C A0 Gy A/ C A0 Gy AC0 4 A0 Gy A D Gx C A0 Gy A .1; 2/ W A0 Gy AC0 1 .Gx C A0 Gy A/ C C0 3 A0 Gy A D A0 Gy A
.2; 1/ W .Gx C A0 Gy A/C0 1 A0 Gy A C A0 Gy AC0 2 A0 Gy A D A0 Gy A .2; 2/ W A0 Gy AC0 1 A0 Gy A D A0 Gy AC1 A0 Gy A D 0: We conclude
C1 D C0 1 ; C2 D C0 3 ; C3 D C0 2 ; C4 D C0 4 :
(5.41)
Second, we are going to solve for C1 ; C2 ; C3 D C2 and C4 . C1 D .Gx C A0 Gy A/1 fIm A0 Gy AŒA0 Gy A.Gx C A0 Gy A/1 A0 Gy A (5.42) A0 Gy A.Gx C A0 Gy A/1 g C2 D .Gx C A0 Gy A/1 A0 Gy AŒA0 Gy A.Gx C A0 Gy A/1 A0 Gy A 0
0
1
0
0
0
C3 D ŒA Gy A.Gx C A Gy A/ A Gy A A Gy A.Gx C A Gy A/ 0
0
1
0
1
C4 D ŒA Gy A.Gx C A Gy A/ A Gy A :
(5.43) (5.44) (5.45)
For the proof, we depart from (1,2) to be multiplied by A0 Gy A.Gx C A0 Gy A/1 from the left and implement (2,2) A0 Gy AC2 A0 Gy A.Gx C A0 Gy A/1 A0 Gy A D A0 Gy A.Gx C A0 Gy A/1 A0 Gy A: Obviously, C2 solves the fifth equation on the basis of the g-inverse ŒA0 Gy A.Gx C A0 Gy A/1 A0 Gy A or A0 Gy A.Gx C A0 Gy A/1 A0 Gy AŒA0 Gy A.Gx C A0 Gy A/1 A0 Gy A A0 Gy A.Gx C A0 Gy A/1 A0 Gy A D A0 Gy A.Gx C A0 Gy A/1 A0 Gy A: We leave the proof for “.Gx C A0 Gy A/1 A0 Gy AŒA0 Gy A.Gx C A0 Gy A/1 A0 Gy A A0 Gy
(5.46)
288
5 The Third Problem of Algebraic Regression
is the weighted pseudo-inverse or Moore–Penrose inverse AC Gy .Gx CA0 Gy A/ ” as an exercise. Similarly, C1 D .Im C2 A0 Gy A/.Gx C A0 Gy A/1
(5.47)
solves (2,2) where we again take advantage of the axiom of the g-inverse, namely A0 Gy AC1 A0 Gy A D 0 ,
(5.48)
A0 Gy A.Gx C A0 Gy A/1 A0 Gy A.Gx C A0 Gy A/1 A0 Gy A A0 Gy A.Gx C A0 Gy A/1 A0 Gy AŒA0 Gy A.Gx C A0 Gy A/1 A0 Gy A A0 Gy A.Gx C A0 Gy A/1 A0 Gy A.Gx C A0 Gy A/1 A0 Gy A D 0 , , A0 Gy A.Gx C A0 Gy A/1 A0 Gy A A0 Gy A.Gx C A0 Gy A/1 A0 Gy A.A0 Gy A.Gx C A0 Gy A/1 A0 Gy A/ A0 Gy A.Gx C A0 Gy A/1 A0 Gy A D 0: For solving the system of modified normal equations, we have to compute C1 A0 Gy D 0 , C1 D A0 Gy A D 0 , A0 Gy AC1 A0 Gy A D 0; a zone identity due to (2,2). In consequence, x`m D C2 A0 Gy y
(5.49)
has been proven. The element (1,1) holds the key to solve for C4 . As soon as we substitute C1 , C2 D C0 3 ; C3 D C0 2 into (1,1) and multiply left by A0 Gy A.Gx C A0 Gy A/1
and
right by .Gx C A0 Gy A/1 A0 Gy A;
we receive 2A0 Gy A.Gx C A0 Gy A/1 A0 Gy AŒA0 Gy A.Gx C A0 Gy A/1 A0 Gy A A0 Gy A .Gx CA0 Gy A/1A0 Gy ACA0 Gy A.Gx CA0 Gy A/1A0 Gy AC4 A0 Gy A.Gx CA0 Gy A/1 A0 Gy A D A0 Gy A.Gx C A0 Gy A/1 A0 Gy A: Finally, substitute C4 D ŒA0 Gy A.Gx C A0 Gy A/1 A0 Gy A
(5.50)
to conclude A0 Gy A.Gx C A0 Gy A/1 A0 Gy AŒA0 Gy A.Gx C A0 Gy A/1 A0 Gy A A0 Gy A .Gx C A0 Gy A/1 A0 Gy A D A0 Gy A.Gx C A0 Gy A/1 A0 Gy A;
5-2 MINOLESS and Related Solutions Like Weighted Minimum
289
namely the axiom of the g-inverse. Obviously, C4 is a symmetric matrix such that C4 D C0 4 . Here ends our elaborate proof. The results of the constructive proof of Lemma 5.2 are collected in Lemma 5.3. Lemma 5.3. (Gx -minimum norm, Gy -least squares solution: MINOLESS): O is Gx -minimum norm, Gy -least squares solution of (5.1) subject to x`m D Ly r WD rkA D rk.A0 Gy A/ < minfn; mg rk.Gx C A0 Gy A/ D m if and only if
LO D AC Gy Gx D .A`m /Gy Gx
(5.51)
LO D .Gx C A0 Gy A/1 A0 Gy AŒA0 Gy A.Gx C A0 Gy A/1 A0 Gy A A0 Gy 0
1
0
0
0
1
0
(5.52)
0
x`m D .Gx C A Gy A/ A Gy AŒA Gy A.Gx C A Gy A/ A Gy A A Gy y (5.53) 1;2;3;4 where AC is the Gy ; Gx -weighted Moore–Penrose inverse. If Gy Gx D AGy Gx rkGx D m, then 1 0 0 0 0 LO D G1 x A Gy A.A Gy AGx A Gy A/ A Gy
(5.54)
1 0 0 0 0 x`m D G1 x A Gy A.A Gy AGx A Gy A/ A Gy y
(5.55)
is an alternative unique solution of type MINOLESS. Perhaps the lengthy formulae which represent Gy ; Gx -MINOLESS in terms of a g-inverse motivate to implement explicit representations for the analyst of the Gx -minimum norm (seminorm), Gy -least squares solution, if multiplication rank partitioning, also known as rank factorization, or additive rank partitioning of the first order design matrix A is available. Here, we highlight both representations of AC D A `m . Lemma 5.4. (Gx -minimum norm, Gy -least squares solution: MINOLESS, rank factorization): O is Gx -minimum norm, Gy -least squares solution (MINOLLES) of x`m D Ly (5.1) fAx C i D yjA 2 Rnm ; r WD rkA D rk.A0 Gy A/ < minfn; mgg; if it is represented by multiplicative rank partitioning or rank factorization
290
5 The Third Problem of Algebraic Regression
A D DE; D 2nr ; E 2 Rrm
(5.56)
as case .i / W Gy D In ; Gx D Im 0 0 1 0 1 0 LO D A `m D E .EE / .D D/ D 0 0 1 right inverse ER D Em D E .EE / LO D E D R L 0 DL D D` D .D D/1 D0 left inverse C 0 0 1 0 1 0 x`m D A `m y D A y D E .EE / .D D/ D y:
(5.57) (5.58) (5.59)
The unknown vector x`m has the minimum Euclidean length 0 0 1 jjx`m jj2 D x0 `m x`m D y0 .AC /0 AC y D y0 .D ` / .EE / D` y:
y D y`m C i`m
(5.60) (5.61)
is an orthogonal decomposition of the observation vector y 2 Y D Rn into Ax`m D y`m 2 R.A/ and y Ax`m D i`m 2 R.A/? ;
(5.62)
the vector of inconsistency. y`m D Ax`m D AAC y D D.D0 D/1 D0 y D DD ` y
and
i`m D y y`m D .In AAC /y D ŒIn D.D0 D/1 D0 y D .In DD ` /y
AAC y D D.D0 D/1 D0 y D DD ` y D y`m is the projection PR.A/ and .In AAC /y D ŒIn D.D0 D/1 D0 y D .In DD ` /y is the projection PR.A/? . i`m and y`m are orthogonal in the sense of hi`m jy`m i D 0 or .In AAC /0 A D ŒIn D.D0 D/1 D0 0 D D 0. The “goodness of fit” of MINOLESS is jjy Ax`m jj2 D jji`m jj2 D y0 .In AAC /y D y0 ŒIn D.D0 D/1 D0 y D y0 .In DD1 ` /y:
(5.63)
case .i i / W Gx and Gy positive definite LO D
1 0 1 0 0 1 0 D G x E .EGx E / .D Gy D/ D Gy ER D Em weighted right inverse O L D ER .weighted/ DL .weighted/ E L D E` weighted left inverse
.A m` / .weighted/
(5.64) (5.65)
5-2 MINOLESS and Related Solutions Like Weighted Minimum
291
C x`m D .A `m /Gy Gx y AGy Gx y 1 0 1 0 0 1 D G1 x E .EGx E / .D Gy D/ DGy y:
(5.66)
The unknown vector x`m has the weighted minimum Euclidean length jjx`m jj2Gx D x0 `m Gx x`m D y0 .AC /0 Gx AC y 1 0 1 0 1 0 0 1 0 0 D y0 Gy D.D0 Gy D/1 .EG1 x E / EE .EGx E / .D Gy D/ D Gy y : y D y`m C i`m
(5.67) (5.68)
n
is an orthogonal decomposition of the observation vector y 2 Y D R into Ax`m D y`m 2 R.A/ and y Ax`m DW i`m 2 R.A/?
(5.69)
of inconsistency. y`m D AAC Gy Gx y C AAGy Gx D PR.A/
i` D .In AAC Gy Gx /y C In AAGy Gx D PR.A/?
(5.70)
hi`m jy`m iGy D 0 or .In AAC .weighted //0 Gy A D 0:
(5.71)
and
are Gy -orthogonal
The “goodness of fit” of Gx ; Gy -MINOLESS is jjy Ax`m jj2Gy D jji`m jj2Gy C 0 D y0 ŒIn AAC Gy Gx Gy ŒIn AAGy Gx y D y0 ŒIn D.D0 Gy D/1 D0 Gy 0 Gy ŒIn D.D0 Gy D/1 D0 Gy y D y0 ŒGy Gy D.D0 Gy D/1 D0 Gy y:
(5.72)
While Lemma 5.4 took advantage of rank factorization, Lemma 5.5 will alternatively focus on additive rank partitioning. Lemma 5.5. (Gx -minimum norm, Gy -least squares solution: MINOLESS, additive rank partitioning) O is Gx -minimum norm, Gy -least squares solution (MINOLESS) of (5.1) x`m D Ly fAx C i D yjA 2 Rnm ; r WD rkA D rk.A0 Gy A/ < minfn; mgg; if it is represented by additive rank partitioning
292
5 The Third Problem of Algebraic Regression
A11 2 Rrr ; A12 2 Rr.mr/ A11 A12 ; AD A21 A22 A21 2 R.nr/r ; A22 2 R.nr/.mr/
(5.73)
subject to the rank identity rkA D rkA11 D r
(5.74)
as case (i): Gy D In ; Gx D Im 0 N 11 LO D A D .N12 N0 12 C N11 N0 11 /1 ŒA0 11 ; A0 21 `m N0 12
(5.75)
subject to N11 WD A0 11 A11 C A0 21 A21 ; N12 WD A0 11 A12 C A0 21 A22 0
0
0
0
N21 WD A 12 A11 C A 22 A21 ; N22 WD A 12 A12 C A 22 A22
(5.76) (5.77)
or N0 11 y 0 0 1 0 0 .N12 N 12 C N11 N 11 / ŒA 11 ; A 21 1 : D 0 N 12 y2
x`m
(5.78)
The unknown vector x`m has the minimum Euclidean length jjx`m jj2 D x0 `m x`m
A11 y 0 0 1 0 0 .N12 N 12 C N11 N 11 / ŒA 11 ; A 21 1 : (5.79) D Œy 1 ; y 2 A21 y2 0
0
y D y`m C i`m is an orthogonal decomposition of the observation vector y 2 Y D R into Ax`m D y`m 2 R.A/ and y Ax`m DW i`m 2 R.A/? ;
(5.80)
the vector of inconsistency. y`m D Ax`m D AA `m y
and
i`m D y Ax`m D .In AA `m /y
are projections onto R(A) and R.A/? , respectively. i`m and y`m are orthogonal 0 in the sense of hi`m jy`m i D 0 or .In AA `m / A D 0. The “goodness of fit” of MINOLESS is
5-2 MINOLESS and Related Solutions Like Weighted Minimum
293
jjy Ax`m jj2 D jji`m jj2 D y0 .In AA `m /y: In AA `m ; rk.In AA`m / D n rkA D n r, is the rank deficient a posteriori weight matrix .Gy /`m .
case (ii): Gx and Gy positive definite LO D .A `m /Gy Gx :
5-22 .Gx ; Gy /-MINOS and Its Generalized Inverse A more formal version of the generalized inverse which is characteristic for Gx MINOS, Gy -LESS or .Gx ; Gy /-MINOS is presented by Lemma 5.6. (characterization of Gx ; Gy -MINOS): rk.A0 Gy A/ D rkA R.A0 Gy / D R.A0 /
(5.81)
is assumed. x`m D Ly is .Gx ; Gy /-MINOLESS of (5.1) for all y 2 Rn if and only if the matrix L 2 Rmn fulfils the four conditions Gy ALA D Gy A
(5.82)
Gx LAL D Gx L
(5.83)
Gy AL D .Gy AL/0 0
Gx LA D .Gx LA/ :
(5.84) (5.85)
In this case Gx x`m D Gx Ly is always unique. L, fulfilling the four conditions, is called the weighted MINOS inverse or weighted Moore–Penrose inverse. Proof. The equivalence of (5.81) follows from R.A0 Gy / D R.A0 Gy A/: .i / Gy ALA D Gy A and Gy AL D .Gy AL/0 : Condition (i) Gy ALA D Gy A and (iii) Gy AL D .Gy AL/0 are a consequence of Gy -LESS. jjijj2Gy D jjy Axjj2Gy D min ) A0 Gy Ax` D A0 Gy y: x
294
5 The Third Problem of Algebraic Regression
If Gx is positive definite, we can represent the four conditions (i)–(iv) of L by .Gx ; Gy /-MINOS inverse of A by two alternative solutions L1 and L2 , namely AL1 D A.A0 Gy A/ A0 Gy AL1 D A.A0 Gy A/ A0 L0 1 Gy D A.A0 Gy A/ A0 Gy D A.A0 Gy A/ A0 L0 2 A0 Gy D A.A0 Gy A/ A0 Gy AL2 D AL2 and 0 0 1 0 0 0 0 1 0 0 0 0 L2 A D G1 x .A L 2 Gx / D Gx .A L 2 A L 2 Gx / D Gx .A L 1 A L 2 Gx / 1 0 0 1 D Gx .A L 1 Gx L2 A/ D Gx .Gx L1 AL2 A/ D G1 x .Gx L1 AL1 A/ D L1 A; 1 L1 D Gx .Gx L1 AL1 / D G1 x .Gx L2 AL2 / D L2
concludes our proof. The inequality jjx`m jj2Gx D jjLy jj2Gx jjLy jj2Gy C 2y0 L0 Gx .In LA/z Cjj.Im LA/zjj2Gx 8y 2 Rn
(5.86)
is fulfilled if and only if the “condition of Gx -orthogonality” L0 Gx .Im LA/ D 0
(5.87)
applies. An equivalence is L0 Gx D L0 Gx LA
or
L0 Gx L D L0 Gx LAL;
which is produced by left multiplying with L. The left side of this equation is a symmetric matrix. Consequently, the right side has to be a symmetric matrix, too. Gx LA D .Gx LA/0 : Such an identity agrees to condition (iv). As soon as we substitute in the “condition of Gx -orthogonality” we are led to L0 Gx D L0 Gx LA ) Gx L D .Gx LA/0 L D Gx LAL ; a result which agrees to condition (ii). ? How to prove uniqueness of A1;2;3;4 D A`m D AC ? Uniqueness of Gx x`m can be taken from Lemma 1.4 (characterization of Gx MINOS). Substitute x` D Ly and multiply the left side by L. A0 Gy ALy D A0 Gy y , A0 Gy AL D A0 Gy
5-2 MINOLESS and Related Solutions Like Weighted Minimum
295
L0 A0 Gy AL D L0 A0 Gy , Gy AL D .Gy AL/0 D L0 A0 Gy : The left side of the equation L0 A0 Gy AL D L0 A0 Gy is a symmetric matrix. Consequently the right side has to be symmetric, too. Indeed we have proven condition (iii) .Gy AL/0 D Gy AL. Let us transplant the symmetric condition (iii) into the original normal equations in order to benefit from A0 Gy AL D A0 Gy or Gy A D L0 A0 Gy A D .Gy AL/0 A D Gy ALA: Indeed, we have succeeded to have proven condition (i), in condition .i i / Gx LAL D Gx L and Gx LA D .Gx LA/0 : Condition (ii) Gy LAL D Gx L and (iv) Gx LA D .Gx LA/0 are a consequence of Gx -MINOS. The general solution of the normal equations A0 Gy Ax` D A0 Gy y is x` D x`m C ŒIm .A0 Gy A/ .A0 Gy A/z
(5.88)
for an arbitrary vector z 2 Rm . A0 Gy ALA D A0 Gy A implies x` D x`m C ŒIm .A0 Gy A/ A0 Gy ALAz D x`m C ŒIm LAz: Note 1: The following conditions are equivalent: 2
.1/ AA A D A 6 .2/ A AA D A .1st/ 6 4 .3/ .AA /0 Gy D Gy AA .4/ .A A/0 Gx D Gx A A .2nd /
A# Gy AA D A0 Gy .A / Gx A A D .A /0 Gx 0
“if Gx and Gy are positive definite matrices, then A# G y D G x A# or 0 A# D G1 x A Gy are representations for the adjoint matrix” “if Gx and Gy are positive definite matrices, then
296
5 The Third Problem of Algebraic Regression
.A0 Gy A/AA D A0 Gy .A /0 Gx A A D .A /0 G00x .3rd /
AA D PR.A/ A A D PR.A / :
The concept of a generalized inverse of an arbitrary matrix is originally due to E.H. Moore (1920) who used the 3rd definition. R. Penrose (1955), unaware of E.H. Moores work, defined a generalized inverse by the 1st definition to Gx D Im ; Gy D In of unit matrices which is the same as the Moore inverse. Y. Tseng (1949, a, b, 1956) defined a generalized inverse of a linear operator between function spaces by means of AA D PR.A/ ; A A D PR.A / ; where R.A/; R.A /, respectively are the closure of R.A/, R.A /, respectively. The Tseng inverse has been reviewed by B. Schaffrin, E. Heidenreich and E. Grafarend (1977). A. Bjerhammar (1951, 1957, 1956) initiated the notion of the least-squares generalized inverse. C.R. Rao (1967) presented the first classification of g-inverses. Note 2: Let jjyjjGy D .y0 Gy y/1=2 and jjxjjGx D .x0 Gx x/1=2 , where Gy and Gx are positive semidefinite. If there exists a matrix A which satisfies the definitions of Note 1, then it is necessary, but not sufficient that .1/ Gy AA A D Gy A .2/ Gx A AA D Gx A .3/ .AA /0 Gy D Gy AA .4/ .A A/0 Gx D Gx A A: Note 3: A g-inverse which satisfies the conditions of Note 1 is denoted by AC Gy Gx and referred to as Gy ; Gx -MINOLESS g-inverse of A. AC Gy Gx is unique if Gx is positive definite. When both Gx and Gy are general positive semi definite matrices, AC Gy Gx may not be unique. If jGx C A0 Gy Aj ¤ 0 holds, AC Gy Gx is unique.
ˇ ˇ Note 4: If the matrices of the metric are positive definite, jGx j ¤ 0; ˇGy ˇ ¤ 0, then C (i) .AC Gy Gx /Gx Gy D A,
# 0 C (ii) .AC Gy Gx / D .A /G1 G1 . x
y
5-2 MINOLESS and Related Solutions Like Weighted Minimum
297
5-23 Eigenvalue Decomposition of .Gx ; Gy /-MINOLESS For the system analysis of an inverse problem the eigenspace analysis and eigenspace synthesis of x`m .Gx ; Gy /-MINOLESS of x is very useful and give some peculiar insight into a dynamical system. Accordingly we are confronted with the problem to develop “canonical MINOLESS”, also called the eigenvalue decomposition of .Gx ; Gy /-MINOLESS. First we refer again to the canonical representation of the parameter space X as well as the observation space Y introduced to you in the first chapter, Box 1.6 and Box 1.9. But we add here by means of Box 5.8 the forward and backward transformation of the general bases versus the orthogonal bases spanning the parameter space X as well as the observation space Y. In addition, we refer to Definition 1.5 and Lemma 1.6 where the adjoint operator A has been introduced and represented. Box 5.8. General bases versus orthogonal bases spanning the parameter space X as well as the observation space Y. “left” “parameter space” “general left base” span fa1 ; : : : ; am g D X W matrix of the metric: aa0 D Gx “orthogonal left base” span fex1 ; : : : ; exm g D X W matrix of the metric: ex e0 x D Im “base transformation” aD
ƒ1=2 x Vex versus
ex D V
0
ƒ1=2 a x
span fex1 ; : : : ; exm g D X
“right” “observation space” “general right base” Y D span fb1 ; : : : ; bn g W matrix of the metric W bb0 D Gy
(5.89)
“orthogonal right bsase” y y Y D span fe1 ; : : : ; en g :matrix of the metric W ey e0 y D In
(5.90)
“base transformation” b D ƒ1=2 y Uey
(5.91)
versus ey D U0 ƒ1=2 b y y
Y D span fe1 ; : : : ; eyn g:
Second, we are solving the general system of linear equations fy D AxjA 2 Rnm ; rkA < minfn; mgg
(5.92)
298
5 The Third Problem of Algebraic Regression
by introducing • The eigenspace of the rank deficient, rectangular matrix of rank r WD rkA < minfn; mg W A 7! A • The left and right canonical coordinates: x 7! x ; y 7! y as supported by Box 5.9. The transformations x 7! x (5.92), y 7! y (5.93, left) from the original coordinates .x1 ; : : : ; xm / to the canonical coordinates .x1 ; : : : ; xm /, the left star coordinates, as well as from the original coordinates (y1 ; : : : ; yn ) to the canonical coordinates (y1 ; : : : ; yn ), the right star coordinates, are polar decompositions: a rotation U; V is followed by a general stretch 1=2 fG1=2 y ; Gx g. Those root matrices are generated by product decompositions 0 1=2 0 1=2 as well as Gx D .G1=2 of type Gy D .G1=2 y / Gy x / Gx . Let us substitute Vx and (5.94, left) the inverse transformations (5.93, right) x 7! x D G1=2 x 1=2 y 7! y D Gy Uy into the system of linear equations (5.1), (5.94, right) y D Ax C i; rkA < minfn; mg or its dual (5.95, left) y D A x C i . Such an operation leads us to (5.95, right) y D f .x / as well as (5.96, left) y D f .x/: Subject to the orthonormality condition (5.96, right) U0 U D In and (5.97, left) V0 V D Im we have generated the left-right eigenspace analysis (5.97, right)
ƒ O1 ƒ D O2 O3
subject to the rank partitioning of the matrices U D ŒU1 ; U2 and V D ŒV1 ; V2 . Alternatively, the left-right eigenspace synthesis (5.104, left) AD
G1=2 ŒU1 ; y
ƒ O1 U2 O2 O3
V0 1 G1=2 x V0 2
is based upon the left matrix (5.99) L WD G1=2 U decomposed into (5.111) y 1=2 1=2 L1 WD Gy U1 and L2 WD Gy U2 and the right matrix (5.94, left) R WD G1=2 V x 1=2 V and R WD G V . Indeed the left matrix L by decomposed into R1 WD G1=2 1 2 2 x x means of (5.101, right) LL0 D G1 y reconstructs the inverse matrix of the metric of the observation space Y. Similarly, the right matrix R by means of (5.102, left) RR0 D G1 x generates the inverse matrix of the metric of the parameter space X. In terms of “L; R” we have summarized the eigenvalue decompositions (5.103, right)– (5.106, left). Such an eigenvalue decomposition helps us to canonically invert y D A x Ci by means of (5.106, right), namely the “full rank partitioning” of the system of canonical linear equations y D A x C i . The observation vector y 2 Rn is decomposed into y1 2 Rr1 and y2 2 R.nr/1 while the vector x 2 Rm of unknown parameters is decomposed into x1 2 Rr1 and x2 2 R.mr/1 .
5-2 MINOLESS and Related Solutions Like Weighted Minimum
299
.x1 /`m D ƒ1 y1 is canonical MINOLESS leaving y2 “unrecognized” and x2 D 0 as a “fixed datum”. Box 5.9. (Canonical representation, the general case: overdetermined and underdetermined system without full rank): “parameter space X”
versus
1=2
“observation space” 1=2
x D V0 G x x
y D U0 Gy y
and
(5.93)
and
Vx y D G1=2 Uy x D G1=2 x y
(5.94)
“overdetermined and unterdetermined system without full rank00 fy D Ax C ijA 2 Rnm ; rkA < minfn; mgg y D Ax C i
versus
1=2
versus
subject to U0 U D UU0 D In
(5.95)
0 1=2 U0 G1=2 x y y D A V Gx 1=2 CUGy i
G1=2 Uy D AG1=2 x y x 1=2 CGy Ui y D .U0 Gy AG1=2 V/x x
y D A x C i
1=2
y D .G1=2 UA V0 Gx y
/x C i (5.96)
subject to versus
V0 V D VV0 D Im
“left and right eigenspace synthesis” 0 U1 1=2 G1=2 A D ŒV1 ; V2 y AGx U0 2 ƒ O1 D O2 O3 0 ƒ O1 V1 A D G1=2 ŒU ; U G1=2 1 2 y x O2 O3 V0 2 “dimension identities” ƒ 2 Rrr ; O1 2 Rr.mr/ ; U1 2 Rnr ; V1 2 Rmr O2 2 R.nr/r ; O3 2 R.nr/.mr/; U2 2 Rn.nr/ ; V2 2 Rm.mr/ “left eigenspace”
“right eigenspace”
(5.97)
(5.98)
(5.99)
300
5 The Third Problem of Algebraic Regression 1=2
1=2
L WD G1=2 U ) L1 D U0 Gy y
R WD G1=2 V ) R1 D V0 Gx x (5.100) R1 WD G1=2 V1 ; R2 WD G1=2 V2 x x (5.101) 1 0 1 RR0 D G1 D Gx x ) .R / R (5.102)
L1 WD G1=2 U1 ; L2 WD G1=2 U2 y y 1 0 1 LL0 D G1 D Gy y ) .L / L
1
L
U0 1 L1 1=2 Gy DW D 0 U2 L 2
A D LA R1 A D ŒL1 ; L2 A
R 1 R 2
AA# L1 D L1 ƒ2 AA# L2 D 0
R
versus
versus
versus
1
V0 1 R1 1=2 Gx DW D 0 V2 R 2 (5.103)
A D L1 AR ƒ O1 A D O O 2 3 L1 D AŒR1 ; R2 L 2 A# AR1 D R1 ƒ2 A# AR2 D 0
(5.104)
(5.105)
(5.106)
“inconsistent system of linear equations without full rank”
ƒ O1 y DA x Ci D O2 O3
x1 i y C 1 D 1 x2 i2 y2
(5.107)
y1 2 Rr1 ; y2 2 R.nr/1 ; i1 2 Rr1 ; i2 2 R.nr/1 x1 2 Rr1 ; x2 2 R.mr/1 “if .x ; i / is MINOLESS, then x2 D 0; i D 0: .x1 /`m D ƒ1 y1 : ” Consult the commutative diagram of Fig. 5.6 for a shortened summary of the newly introduced transformation of coordinates, both of the parameter space X as well as the observation space Y. Third, we prepare ourselves for MINOLESS of the general system of linear equations fy D Ax C ijA 2nm ; rkA < minfn; mg jjijj2Gy D min; subject to jjxjj2Gx D ming by introducing Lemmas 5.4 and 5.5, namely the eigenvalue-eigencolumn equations of the matrices A# A and AA# , respectively, as well as Lemma 5.6, our basic result of “canonical MINOLESS”, subsequently completed by proofs. Throughout we refer to the adjoint operator which has been introduced by Definition 1.5 and Lemma 1.6.
5-2 MINOLESS and Related Solutions Like Weighted Minimum Fig. 5.6 Commutative diagram of coordinate transformations
∋x
301 A (A) ⊂
V′Gx1/2
U′Gy1/2
∋x*
y∗ ∈
(A) ⊂
Lemma 5.7. (eigenspace analysis versus eigenspace synthesis A 2 Rnm , r WD rkA < minfn; mg): The pair of matrices L; R for the eigenspace analysis and the eigenspace synthesis of the rectangular matrix A 2 Rnm of rank r WD rkA < minfn; mg, namely A D L1 AR versus A D LA R1
or
ƒ O1 A D O2 O3 L1 D AŒR1 ; R2 L 2
or A versus
D ŒL1 ; L2 A
R 1 R 2
are determined by the eigenvalue-eigencolumn equations (eigenspace equations) A# AR1 D R1 ƒ2 A# AR2 D 0
versus
AA# L1 D L1 ƒ2 AA# L2 D 0
subject to 2
21 6 :: : : 4 : :
3 0 q q :: 7 ; ƒ D Diag.C 2 ; : : : ; C 2 /: 5 r 1 :
0 2r
5-24 Notes The algebra of eigensystems is treated in varying degrees by most books on linear algebra, in particular tensor algebra. Special mention should be made of
302
5 The Third Problem of Algebraic Regression
R. Bellman’s classic “Introduction to matrix analysis” (1970) and Horn’s and Johnson’s two books (1985, 1991) on introductory and advanced matrix analysis. More or less systematic treatments of eigensystem are found in books on matrix computations. The classics of the field are Householder’s “Theory of matrices in numerical analysis” (1964) and Wilkinson’s “The algebraic eigenvalue problem” (1965). G. Golub’s and Van Loan’s “Matrix computations” (1996) is the currently definite survey of the field. Trefethen’s and Bau’s “Numerical linear algebra” (1997) is a high-level, insightful treatment with a welcomed stress on geometry. G.W. Stewart’s “Matrix algorithm: eigensystems” (2001) is becoming a classic as well. The term “eigenvalue” derives from the German Eigenwert, which was introduced by D. Hilbert (1904) to denote for integral equations the reciprocal of the matrix eigenvalue. At some point Hilbert’s Eigenwert inverted themselves and became attached to matrices. Eigenvalues have been called many things in their day. The “characteristic value” is a reasonable translation of Eigenwert. However, “characteristic” has an inconveniently large number of syllables and survives only in the terms “characteristic equation” and “characteristic polynomial”. For symmetric matrices the characteristic equation and its equivalent are also called the secular equation owing to its connection with the secular perturbations in the orbits of planets. Other terms are “latent value” and “proper value” from the French “valeur propre”. Indeed the day when purists and pedants could legitimately object to “eigenvalue” as a hybrid of German and English has long since passed. The German “eigen” has become thoroughly materialized English prefix meaning having to do with eigenvalues and eigenvectors. Thus we can use “eigensystem”, “eigenspace” or “eigenexpansion” without fear of being misunderstood. The term “eigenpair” used to denote an eigenvalue and eigenvector is a recent innovation.
5-3 The Hybrid Approximation Solution: ˛-HAPS and Tykhonov–Phillips Regularization Gx ; Gy -MINOLESS has been built on sequential approximations. First, the surjectivity defect was secured by Gy -LESS. The corresponding normal equations suffered from the effect of the injectivity defect. Accordingly, second Gx -LESS generated a unique solution the rank deficient normal equations. Alternatively, we may constitute a unique solution of the system of inconsistent, rank deficient equations fAx C i D yjA 2 Rnm ; r WD rkA < minfn; mgg by the ˛-weighted hybrid norm of type “LESS” and “MINOS”. Such a solution of a general algebraic regression problem is also called • Tykhonov–Phillips regularization
5-3 The Hybrid Approximation Solution: ˛-HAPS and Tykhonov–Phillips Regularization
303
Fig. 5.7 Balance of LESS and MINOS to general MORE
• Ridge estimator • ˛ HAPS: Indeed, ˛ HAPS is the most popular inversion operation, namely to regularize improperly posed problems. An example is the discretized version of an integral equation of the first kind. Definition 5.2. Definition 5.8 (˛ HAPS ): An m 1 vector x is called weighted ˛ HAPS (Hybrid APproximative Solution) with respect to an ˛-weighted Gx ; Gy -seminorm of (5.1), if xh D argfjjy Axjj2Gy C ˛jjxjj2Gx D min jAx C i D y; A 2 Rnm I rkA minfn; mgg:
(5.108)
Note that we may apply weighted ˛ HAPS even for the case of rank identity rkA minfn; mg. The factor ˛ 2 RC balances the least squares norm and the minimum norm of the unknown vector which is illustrated by Fig. 5.7. Lemma 5.8. .˛ HAPS /: xh is weighted ˛ HAPS of x of the general system of inconsistent, possibly weakly inconsistent, rank deficient system of linear equations (5.1) if and only if the system of normal equations .˛Gx C A0 Gy A/xh D A0 Gy y or
1 1 Gx C A0 Gy A xh D A0 Gy y ˛ ˛
(5.109)
is fulfilled. xh always exists and is uniquely determined if the matrix ˛Gx C A0 Gy A is regular or rkŒGx ; A0 Gy A D m:
(5.110)
Proof. ˛ HAPS is constructed by means of the Lagrangean L.x/ WD jjy Axjj2Gy C ˛jjxjj2Gy D .y Ax/0 Gy .y Ax/ C ˛.x0 Gy x/ D min; x
such that the first derivatives dL .xh / D 2.˛Gx C A0 Gy A/xh 2A0 Gy y D 0 dx
304
5 The Third Problem of Algebraic Regression
constitute the necessary conditions. The second derivatives @2 L .xh / D 2.˛Gx C A0 Gy A/ 0 @x@x0 generate the sufficiency conditions for obtaining the minimum of the unconstrained Lagrangean. If ˛Gx C A0 Gy A is regular of rkŒGy ; A0 Gy A D m, there exists a unique solution. Lemma 5.9. .˛ HAPS /: If xh is ˛ HAPS of x of the general system of inconsistent, possibly weakly inconsistent, possibly rank deficient system of linear equations (5.1) fulfilling the rank identity rkŒGy ; A0 Gy A D m or det.˛Gx C A0 Gy A/ ¤ 0 then xh D .˛Gx C A0 Gy A/1 A0 Gy y or xh D .Gx C
1 1 0 A Gy A/1 A0 Gy y ˛ ˛ or
0 1 1 0 xh D .˛I C G1 x A Gy A/ Gx A Gy y
or 1 1 1 0 G A Gy A/1 G1 A0 Gy y ˛ x ˛ x are four representations of the unique solution. xh D .I C
Chapter 6
The Third Problem of Probabilistic Regression Special Gauss–Markov Model with Datum Defect Setup of BLUMBE and BLE for the moments of first order and of BIQUUE and BIQE for the central moment of second order fy D A C cy ; A 2 Rnm ; rkA < minfn; mgg The Special Gauss-Markov model with datum defect – the stochastic analogue of Minimum Norm Least-Squares, is treated here first by the Best Linear Minimum Bias Estimator (BLUMBE), namely by Theorem 6.3, in the first section. Theorem 6.5 offers the estimation of ¢ 2 by Best Invariant Quadratic Uniformly Unbiased Estimation to be compared with the result of Theorem 6.6, namely the estimation of 2 by Best Invariant Quadratic Estimation. Six numerical models are compared based on practical leveling measurements. In the second section, we setup the Best Linear Estimation (hom BLE), (hom S-BLE) as well as (hom ’-BLE) for fixed effect based on the decomposition ŸO Ÿ D L.y Efyg/ ŒIm LA Ÿ This setup produces the weighting of two norms: 1. The average variances kL0 k2 2. The average bias k.Im LA/0 k2 . ˛-BLE balances the variance and the bias illustrated by Fig. 6.1. Theorems 6.12 and 6.13 compare (hom S-BLE) and (hom ˛-BLE, also called ridge estimator or Tykhonov-Phillips regularization published by many authors (J. Cai, E. Grafarend and B. Schaffrin (2004, 92 references))). A key example is our treatment of continuous networks in the third section, namely of second derivative type and finally approximated by a discrete setup. Finally we intend to state our assumption of C.R. Rao’s design of the substitute matrix S or ˛-I which solves the missing information of the variable Ÿ, perhaps by Prior Information Ÿ0 , subject of Bayes-estimation.
E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 6, © Springer-Verlag Berlin Heidelberg 2012
305
306
6 The Third Problem of Probabilistic Regression
Read only Definition 6.1, Theorem 6.3, Definition 6.4–6.6, Theorem 6.8–6.11.
Lemma 6.2 ξˆ hom Sy , S-BLUMBE of ξ
Theorem 6.3 hom Sy, S-BLUMBE of ξ
Definition 6.1 ξˆ hom Sy, S-BLUMBE
Lemma 6.4 ˆ D{ey} E{y}, D{Aξ},
Theorem 6.5 sˆ 2 BIQUUE of s2
Theorem 6.6 sˆ 2 BIQE of s 2
Definition 6.7 and Lemma 6.2, Theorem 6.3, Lemma 6.4, Theorems 6.5 and 6.6 review O of type hom † y , S-BLUMBE, BIQE, followed by the first example. Alternatively, estimators of type best linear, namely hom BLE, hom S-BLE and hom ˛-BLE are presented. Definitions 6.7, 6.8 and 6.9 relate to various estimators followed by Lemma 6.10, Theorems 6.11, 6.12 and 6.13. In Chap. 5, we have solved a special algebraic regression problem, namely the inversion of a system of inconsistent linear equations with a datum defect. By means of a hierarchic postulate of a minimum norm jjxjj2 D min, least squares solution jjy Axjj2 D min (“MINOLESS”) we were able to determine m unknowns from n observations through the rank of the linear operator, rkA D r < minfn; mg, was less than the number of observations or less the number of unknowns. Though “MINOLESS” generates a rigorous solution, we were left with the problem to interpret our results. The key for an evolution of “MINOLESS” is handed over to us by treating the special algebraic problem by means of a special probabilistic regression problem, namely as a special Gauss–Markov model with datum defect. The bias generated by any solution of a rank deficient system of linear equations will again be introduced
6 The Third Problem of Probabilistic Regression
307
Lemma 6.10 hom BLE, hom S-BLE, hom a-BLE
Definition 6.7 xˆ hom BLE of x
α
Definition 6.8 xˆ S-BLE of x
α
Theorem 6.11 xˆ hom BLE
α
Theorem 6.12 xˆ hom S-BLE
Definition 6.9 xˆ hom hybrid var-min bias
6
Theorem 6.13 xˆ hom α-BLE
as a decisive criterion for evaluating “MINOLESS”, now in the context of a probabilistic regression problem. In particular, a special form of “LUMBE” the linear uniformly minimum bias estimator jjLA Ijj D min, leads us to a solution which is equivalent to “MINOS”. “Best” of LUMBE in the sense of the average O D min also called BLUMBE, will give us a unique O 2 D trDfg variance jjDfgjj O solution of as a linear estimation of the observation vector y 2 fY; pdf g with respect to the linear model Efyg D A, Dfyg D † y of “fixed effects” 2 . Alternatively, in Chap. 5, we had solved the ill-posed problem y D Ax C i; A 2 Rnm ; rkA < minfn; mg, by means of ˛-HAPS. Here with respect to a special probabilistic regression problem we succeed to compute ˛-BLE (˛ weighted, S modified Best Linear Estimation) as an equivalence to ˛-HAPS of a special algebraic regression problem. Most welcomed is the analytical optimization problem to determine the regularization parameter ˛ by means of a special form of jjMSEf˛gjj2 D min, the weighted Mean Square Estimation Error. Such an optimal design of the regulator ˛ is not possible in the Tykhonov–Phillips regularization in the context of ˛-HAPS, but definitely in the context of ˛-BLE.
308
6 The Third Problem of Probabilistic Regression
6-1 Setup of the Best Linear Minimum Bias Estimator of Type BLUMBE Box 6.1 is a definition of our special linear Gauss–Markov model with datum defect. We assume (6.1) Efyg D A; rkA < minfn; mg (1st moments) and (6.2) Dfyg D † y , † y positive definite, rk† y D n (2nd moments). Box 6.2 reviews the bias vector as well as the bias matrix including the related norms. Box 6.1. (Special linear Gauss–Markov model with datum defect): fy D A C cy ; A 2 Rnm ; rkA < minfn; mgg 1st moments Efyg D A ; rkA < minfn; mg
(6.1)
2nd moments Dfyg D W † y ; † y positive definite; rk† y D n;
(6.2)
2 Rm ; vector of“fixed effects00 ; unknown; † y unknown or known from prior information: Box 6.2. (Bias vector, bias matrix Vector and matrix bias norms Special linear Gauss–Markov model of fixed effects subject to datum defect) A 2 Rnm ; rkA < minfn; mg Efyg D A; Dfyg D † y
(6.3)
“ansatz” O D Ly
(6.4)
bias vector O ¤0 ˇ WD EfO g D Efg
(6.5)
ˇ D LEfyg D ŒIm LA ¤ 0
(6.6)
bias matrix
6-1 Setup of the Best Linear Minimum Bias Estimator of Type BLUMBE
B W D In LA
309
(6.7)
“bias norms”
jjˇjj2 D ˇ 0 ˇ D 0 ŒIm LA0 ŒIm LA
(6.8)
jjˇjj2 D trˇˇ 0 D trŒIm LA 0 ŒIm LA0 D jjBjj2 0
(6.9)
jjˇjj2S WD trŒIm LASŒIm LA0 D jjBjj2S
(6.10)
“dispersion matrix” O D LDfygL0 D L†y L0 Dfg
(6.11)
“dispersion norm, average variance” O 2 W D trLDfygL0 D trL˙y L0 D W jjL0 jj† jjDfgjj y
(6.12)
“decomposition” O C .Efg O / O O D .O Efg/
(6.13)
O D L.y Efyg/ ŒIm LA
(6.14)
“Mean Square Estimation Error” O WD Ef.O /.O /0 g MSEfg
(6.15)
O D LDfygL0 C ŒIm LA 0 ŒIm LA0 MSEfg
(6.16)
“modified Mean Square Estimation Error” O W D LDfygL0 C ŒIm LASŒIm LA0 MSES fg
(6.17)
“MSE norms, average MSE” O 2 W D trEf.O /.O /0 g jjMSEfgjj
(6.18)
O 2 jjMSEfgjj D trLDfygL0 C trŒIm LA 0 ŒIm LA0 D jjL0 jj2†y C jj.Im LA/0 jj2 0
(6.19)
310
6 The Third Problem of Probabilistic Regression
O 2W jjMSES fgjj 0 W D trLDfygL C trŒIm LASŒIm LA0 D jjL0 jj2†y C jj.Im LA/0 jj2 0 :
(6.20)
Definition 6.1 defines (1st) O as a linear homogenous form, (2nd) of type “minimum bias” and (3rd) of type “smallest average variance”. Section 6.11 is a collection of definitions and lemmas, theorems basic for the developments in the future.
6-11 Definitions, Lemmas and Theorems Definition 6.1. (O hom † y ; SBLUMBE): An m 1 vector O D Ly is called homogeneous † y ; S-BLUMBE (homogeneous Best Linear Uniformly Minimum Bias Estimation) of in the special inconsistent linear Gauss–Markov model of fixed effects of Box 6.1, if (1st) O is a homogeneous linear form O D Ly (6.21) (2nd) in comparison to all other linear estimations O has the minimum bias in the sense of jjBjj2S W D jj.Im LA/0 jj2S D min; (6.22) L
(3rd) in comparison to all other minimum bias estimations O has the smallest average variance in the sense of O 2 D trL˙y L0 D jjL0 jj2 D min : jjDfgjj †y L
(6.23)
The estimation O of type hom † y , S-BLUMBE can be characterized by Lemma 6.2. .O hom † y ; S BLUMBE of /: An m 1 vector O D Ly is hom † y ; S-BLUMBE of in the special inconsistent linear Gauss–Markov model with fixed effects of Box 6.1, if and only if the matrix L fulfils the normal equations
†y ASA0
ASA0 0
L0 ƒ
D
with the n n matrix ƒ of “Lagrange multipliers”.
0 AS
(6.24)
6-1 Setup of the Best Linear Minimum Bias Estimator of Type BLUMBE
311
: Proof : O matrix First, we minimize the S-modified bias matrix norm, second the MSE() norm. All matrix norms have been chosen “Frobenius”. .i / jj.Im LA/0 jj2S D min : L
The S-weighted Frobenius matrix norm jj.Im LA/0 jj2S establishes the Lagrangean L.L/ W D tr.Im LA/S.Im LA/0 D min L
for S-BLUMBE.
L.L/ D min , L
ASA0 LO 0 AS D 0 .ASA0 / ˝ Im > 0;
according to Theorem 2.3. "
† y ASA0 ASA0
0
#"
C1 C2 C3 C4
#"
† y ASA0 ASA0
#
" D
0
† y ASA0 ASA0
#
0 (6.25)
0
0
0
0
† y C1 † y C † y C2 ASA C ASA C3 † y C ASA C4 ASA D † y
(6.26)
† y C1 ASA0 C ASA0 C3 ASA0 D ASA0 0
0
0
ASA C1 † y C ASA C2 ASA D ASA
0
ASA0 C1 ASA0 D 0:
(6.27) (6.28) (6.29)
0 Let us multiply the third identity by † 1 y ASA D 0 from the right side and substitute the fourth identity in order to solve for C2 . 0 0 1 0 ASA0 C2 ASA0 † 1 y ASA D ASA † y ASA 0 0 1 0 C2 D † 1 y ASA .ASA † y ASA /
(6.30) (6.31)
solves the fifth equation 0 0 1 0 0 1 0 A0 SA† 1 y ASA .ASA † y ASA / ASA † y ASA 0 D ASA0 † 1 y ASA
(6.32)
by the axiom of a generalized inverse. .ii/ jjL0 jj2†y D min : L
The † y -weighted Frobenius matrix norm of L subject to the condition of LUMBE generates the constrained Lagrangean
312
6 The Third Problem of Probabilistic Regression
L.L; ƒ/ D trL† y L0 C 2trƒ0 .ASA0 L0 AS/ D min : L;ƒ
According to the theory of matrix derivatives outlined in Appendix B @L O O O 0 D 0; .L; ƒ/ D 2.†y LO 0 C ASA0 ƒ/ @L @L O O .L; ƒ/ D 2.ASA0 L0 AS/ D 0; @ƒ O constitute the necessary conditions, while O ƒ/ at the “point” .L; @2 L O D 2.†y ˝ Im / > 0; O ƒ/ .L; @.vecL/@.vecL0 / to be a positive definite matrix, the sufficiency conditions. Indeed, the first matrix derivations have been identified as the normal equations of the sequential optimization problem. O For an explicit representation of hom † y , S-BLUMBE of we solve the normal equations for O D min jASA0 L0 AS D 0g: LO D argfjjD./jj L
O hom BLUMBEg as well as the In addition, we compute the dispersion matrix Dfj O mean square estimation error MSE fj hom BLUMBEg. Theorem 6.3. (hom † y ; S-BLUMBE of ): Let O D Ly be hom † y ; S-BLUMBE in the special Gauss–Markov model of Box 6.1. Then independent of the choice of the generalized inverse .ASA0 † y ASA0 / the unique solution of the normal equations (6.24) is 0 0 1 O D SA0 .ASA0 † 1 y ASA / ASA † y y;
(6.33)
completed by the dispersion matrix O D SA0 .ASA0 † 1 ASA0 / AS D./ y
(6.34)
the bias vector 0 0 1 ˇ D ŒIm SA0 .ASA0 † 1 y ASA / ASA † y A
O of mean estimation errors and the matrix MSE fg
(6.35)
6-1 Setup of the Best Linear Minimum Bias Estimator of Type BLUMBE
O C ˇˇ 0 Ef.O /.O /0 g D Dfg
313
(6.36)
modified by O C ŒIm LASŒIm LA0 Ef.O /.O /0 g D Dfg O C ŒS SA0 .ASA0 † 1 ASA0 / ASA0 † 1 AS; D Dfg y y
(6.37)
based upon the solution of 0 by S. O D rkS rkMSEfg
(6.38)
is the corresponding rank identity. :Proof: O .i / hom † y ; S BLUMBE of : First, we prove that the matrix of the normal equations
is singular.
† y ASA0 † y ASA0 ; D0 ASA0 0 ASA0 0
ˇ ˇ ˇ † y ASA0 ˇ 0 1 0 ˇ ˇ ˇ ASA0 0 ˇ D j† y j j ASA † y ASA j D 0;
0 due to rkŒASA0 † 1 y ASA D rkA < minfn; mg assuming rkS D m, rk† y D n. Note
ˇ ˇ m1 m1 ˇ A11 A12 ˇ ˇ ˇ D jA11 j jA22 A21 A1 A12 j if A11 2 11 ˇ A21 A22 ˇ rkA11 D m1 if with reference to Appendix A.7. Thanks to the rank deficiency of the partitioned normal equation matrix, we are forced to compute secondly its generalized inverse. The system of normal equations is solved for 0 LO 0 C1 C2 0 † y ASA0 D (6.39) O D ASA0 0 AS C3 C4 AS ƒ LO 0 D C2 AS
(6.40)
, LO D SA0 C0 2
(6.41)
LO D SA.ASA
0
0 0 1 † 1 y ASA / ASA † y
such that
(6.42)
314
6 The Third Problem of Probabilistic Regression
0 0 1 O D SA0 .ASA0 † 1 O D Ly y ASA / ASA † y y:
(6.43)
We leave the proof for 0 0 1 “SA0 .ASA0 † 1 y ASA / ASA † y
is a weighted pseudo-inverse or Moore-Penrose inverse” as an exercise. O .ii/ Dispersion matrix Dfg: The residual vector
O Efyg/ O Efyg D L.y
(6.44)
leads to the variance-covariance matrix O D L† O y LO 0 Dfg 0 0 1 0 0 1 0 D SA0 .ASA0 † 1 y ASA / ASA † y ASA .ASA † y ASA / AS
(6.45)
0 D SA0 .ASA0 † 1 y ASA / AS :
.iii/ Bias vector ˇ O ˇ WD EfO g D .Im LA/ 0 0 1 D ŒIm SA0 .ASA0 † 1 y ASA / ASA † y A:
(6.46)
Such a bias vector is not accessible to observations since is unknown. Instead it is common practice to replace by O (BLUMBE), the estimation O of of type BLUMBE. O (iv) Mean Square Estimation Error MSEfg O WD Ef.O /.O /0 g D Dfg O C ˇˇ 0 MSEfg 0 O y LO 0 C .Im LA/ O O 0: D L† .Im LA/ (6.47) O y g, nor ˇˇ are accessible to measurements. 0 is replaced by C.R. Neither Dfj† Rao’s substitute matrix S, † y D V 2 by a one variance component model 2 by O 2 (BIQUUE) or O 2 (BIQE), for instance. 0
1
O eQ y , Dfey g for O hom † y , S of ): Lemma 6.4. (E fyg, DfAg, (i) With respect to the model (1st) A D Efyg, Efyg 2 R.A/, rkA D W r m and V 2 D Dfyg, V positive definite, rkV D n under the condition dim R.SA0 / D rk.SA0 / D rkA D r, namely V, S-BLUMBE, is given by
6-1 Setup of the Best Linear Minimum Bias Estimator of Type BLUMBE
315
1
1 1 E fyg D AO D A.A0 V A/ A0 V y
(6.48)
with the related singular dispersion matrix 1
O D 2 A.A0 V A/ A0 DfAg
(6.49)
for any choice of the generalized inverse .A0 V1 A/ : (ii) The empirical error vector ey D y Efyg results in the residual error vector eQ y D y AO of type 1
1
eQ y D ŒIn A.A0 V A/A0 V By
(6.50)
with the related singular dispersion matrices 1
DfQey g D 2 ŒV A.A0 V A/ A0
(6.51)
for any choice of the generalized inverse .A0 V1 A/ : (iii) The various dispersion matrices are related by O C DfQey g Dfyg D DfAO C eQ y g D DfAg
(6.52)
D DfQey ey g C DfQey g; where the dispersion matrices eQ y and AO
(6.53)
O D C fQey ; eQ y ey g D 0: C fQey ; Ag
(6.54)
are uncorrected, in particularly,
When we compute the solution by O of type BIQUUE and of type BIQE we arrive at Theorems 6.5 and 6.6. Theorem 6.5. (O 2 BIQUUE of 2 , special Gauss–Markov model : Efyg D A, Dfyg D V 2 , A 2 Rnm ; rkA D r m, V 2 Rnm ; rkV D n): Let O 2 D y0 My D .vecM/0 y ˝ y D y0 ˝ y0 .vecM/ be BIQUUE with respect to the special Gauss–Markov model 6.1. Then 1
1
O 2 D .n r/1 y0 ŒV1 V1 A.A0 V A/ A0 V y 1
(6.55) 1
O 2 D .n r/1 y0 ŒV1 V1 ASA0 .ASA0 V ASA0 / ASA0 V y 1
O 2 D .n r/1 y0 V eQ y D .n r/1 eQ 0y V1 eQ y
(6.56) (6.57)
316
6 The Third Problem of Probabilistic Regression
are equivalent representations of the BIQUUE variance component O 2 which are independent of the generalized inverses 1
1
.A0 V A/ or .ASA0 V A0 SA/ : The residual vector eQ y , namely 1
1
eQ y .BLUMBE/ D ŒIn A.A0 V A/1 A0 V y
(6.58)
is of type BLUMBE. The variance of O 2 BIQUUE of 2 DfO 2 g D 2.n r/1 4 D 2.n r/1 . 2 /2
(6.59)
can be substituted by the estimation DfO 2 g D 2.n r/1 .O 2 /2 D 2.n r/1 .Qe0y V1 eQ y /2
(6.60)
Theorem 6.6. (O 2 BIQE of 2 , special Gauss–Markov model : Efyg D A, Dfyg D V 2 , A 2 Rnm ; rkA D r m, V 2 Rnm ; rkV D n): Let O 2 D y0 My D .vecM/0 y ˝ y D y0 ˝ y0 .vecM/ be BIQE with respect to the special Gauss–Markov model 6.1. Independent of the matrix S and of the generalized inverses 1
1
.A0 V A/ or .ASA0 V A0 SA/ ; 1
1
O 2 D .n r C 2/1 y0 ŒV1 V1 A.A0 V A/1 A0 V y 1
1
O 2 D .n r C 2/1 y0 ŒV1 V1 ASA0 .ASA0 V ASA0 /1 ASA0 V y 2
1 0
1
O D .n r C 2/ y V eQ y D .n r C
2/1 eQ 0y V1 eQ y
(6.61) (6.62) (6.63)
are equivalent representations of the BIQE variance component O 2 . The residual vector eQ y , namely 1
1
eQ y .BLUMBE/ D ŒIm A.A0 V A/1 A0 V y;
(6.64)
is of type BLUMBE. The variance of O 2 BIQE of 2 DfO 2 g D 2.n r/.n r C 2/2 4 D 2.n r/Œ.n r C 2/1 2 2
(6.65)
can be substituted by the estimation O O 2 g D 2.n r/.n r C 2/4 .Qe0y V1 eQ y /2 : Df
(6.66)
6-1 Setup of the Best Linear Minimum Bias Estimator of Type BLUMBE
317
The special bias ˇ 2 W D EfO 2 2 g D 2.n r C 2/1 2
(6.67)
can be substituted by the estimation O O 2 2 g D 2.n r C 2/2 eQ 0y V1 eQ y : ˇO 2 D Ef
(6.68)
Its MSE.O 2 / (Mean Square Estimation Error) O O 2 2 /2 g D DfO 2 g C . 2 EfO 2 g/2 MSEfO 2 g W D Ef. 1
(6.69)
2 2
D 2.n r C 2/ . / can be substituted by the estimation
b
O O 2 2 /2 g D Df O O 2 g C .Ef O 2 g/2 MSEfO 2 g D Ef.
(6.70)
D 2.n r C 2/3 .Qe0y V1 eQ y /:
6-12 The First Example: BLUMBE Versus BLE, BIQUUE Versus BIQE, Triangular Leveling Network The first example for the special Gauss–Markov model with datum defect fEfyg D A; A 2 Rnm ; rkA < minfn; mg; Dfyg D V 2 ; V 2 Rnm ; 2 2 RC ; rkV D ng is taken from a triangular leveling network. 3 modal points are connected, by leveling measurements Œh˛ˇ ; hˇ ; h ˛ 0 , also called potential differences of absolute potential heights Œh˛ ; hˇ ; h 0 of “fixed effects”. Alternative estimations of type .i / I; IBLUMBE of 2 Rm .ii/ V; SBLUMBE of 2 Rm .iii/ I; IBLE of 2 Rm .iv/ V; SBLE of 2 Rm .v/ BIQUUE of 2 2 R C .vi/ BIQE of 2 2 R C will be considered. In particular, we use consecutive results of Appendix A.7, namely from solving linear system of equations based upon generalized inverse,
318
6 The Third Problem of Probabilistic Regression
in short g-inverses. For the analyst, the special Gauss–Markov model with datum defect constituted by the problem of estimating absolute heights Œh˛ ; hˇ ; h of points fP˛ ; Pˇ ; P g from height differences is formulated in Box 6.3. Box 6.3. (The first example) 82 39 2 32 3 ˆ 1 C1 0 h˛ < h˛ˇ > = 6 7 E 4 hˇ 5 D 4 0 1 C1 5 4 hˇ 5 ˆ > : h ; h C1 0 1 ˛ 2 3 2 3 2 3 hˇ 1 C1 0 h˛ 6 7 33 y WD 4 hˇ 5; A W D 4 0 1 C1 5 2 R ; W D 4 hˇ 5 C1 0 1 h h ˛ 82 39 ˆ < h˛ˇ > = 6 7 D 4 hˇ 5 D Dfyg D V 2 ; 2 2C ˆ > : h ; ˛ :dimensions: 2 R3 ; dim D 3; y 2 R3 ; dimfY; pdf g D 3 m D 3; n D 3; rkA D 2; rkV D 3:
6-121 The First Example: I3 , I3 -BLUMBE In the first case, we assume a dispersion matrix Dfyg D I3 2 of i.i.d. observations Œy1 ; y2 ; y3 0
and
a unity substitute matrix S D I3 , in short u.s..
Under such a specification O is I3 ; I3 -BLUMBE of in the special Gauss–Markov model with datum defect. O D A0 .AA0 AA0 / AA0 y 2 2 3 3 210 2 1 1 1 AA0 AA0 D 3 4 1 2 1 5; .AA0 AA0 / D 4 1 2 0 5: 9 000 1 1 2 ? How did we compute the g-inverse .AA0 AA0 / ? The computation of the g-inverse .AA0 AA0 / has been based upon bordering.
6-1 Setup of the Best Linear Minimum Bias Estimator of Type BLUMBE
2 3 2 3 1 3 6 3 3 210 6 3 0 1 6 7 .AA0 AA0 / D 4 3 6 3 5 D 4 3 6 0 5 D 4 1 2 0 5: 9 3 3 6 000 0 0 0 2
Please, check by yourself the axiom of a g-inverse: 2
32 3 3 2 2 3 C6 3 3 C6 3 3 C6 3 3 C6 3 3 4 3 C6 3 5 4 3 C6 3 5 4 3 C6 3 5 D 4 3 C6 3 5 3 3 C6 3 3 C6 3 3 C6 3 3 C6 or 3 2 3 3 2 3 2 C6 3 3 C6 3 3 210 C6 3 3 1 4 3 C6 3 5 4 1 2 0 5 4 3 C6 3 5 D 4 3 C6 3 5 9 3 3 C6 3 3 C6 000 3 3 C6 2 3 3 2 h˛ y1 C y3 1 O D 4 hˇ 5 .I3 ; I3 BLUMBE/ D 4 y1 y2 5 3 h y2 y3 2
O1 C O2 C O3 D 0: O of the unknown vector of “fixed effects” Dispersion matrix Dfg O D 2 A0 .AA0 AA0 / A Dfg 2 3 C2 1 1 2 O D 4 1 C2 1 5 Dfg 9 1 1 C2 replace 2 by O 2 (BIQUUE):” O 2 D .n rkA/1 eQ 0y eQ y eQ y .I3 ; I3 BLUMBE/ D ŒI3 A.AA0 / A0 y 2 3 2 3 1 111 14 1 eQ y D 1 1 1 5 y D .y1 C y2 C y3 / 4 1 5 3 3 1 111 2 32 3 111 111 1 eQ 0y eQ y D y0 4 1 1 1 5 4 1 1 1 5 y 9 111 111
319
320
6 The Third Problem of Probabilistic Regression
1 2 .y C y22 C y32 C 2y1 y2 C 2y2 y3 C 2y3 y1 / 3 1 1 O 2 .BIQUUE/ D .y12 C y22 C y32 C 2y1 y2 C 2y2 y3 C 2y3 y1 / 3 2 3 C2 1 1 O D 1 4 1 C2 1 5 O 2 .BIQUUE/ Dfg 9 1 1 C2 eQ 0y eQ y D
replace 2 by O 2 (BIQE): O 2 D .n C 2 rkA/1 eQ 0y eQ y eQ y .I3 ; I3 BLUMBE/ D ŒI3 A.AA0 / A0 y 2 3 2 3 1 111 14 1 eQ y D 1 1 1 5 y D .y1 C y2 C y3 / 4 1 5 3 3 1 111 1 2 .y C y22 C y32 C 2y1 y2 C 2y2 y3 C 2y3 y1 / 9 1 2 3 C2 1 1 O D 1 4 1 C2 1 5 O 2 .BIQE/: Dfg 9 1 1 C2
O 2 .BIQE/ D
O For practice, we recommend Df.BLUMBE/, O 2 .BIQE/g, since the disperO is remarkably smaller when compared to Df.BLUMBE/, O sion matrix Dfg 2 O .BIQUUE/g, a result which seems to be unknown! Bias vector ˇ.BLUMBE/ of the unknown vector of “fixed effects” ˇ D ŒI3 A0 .AA0 AA0 / AA0 A; 2 3 111 1 ˇ D 41 1 15 3 111 O 3 ; I3 BLUMBE/” “replace which is inaccessible by .I 2 3 111 1 O 3 ; I3 BLUMBE/; ˇ D 4 1 1 1 5 .I 3 111 ˇD0 (due to O1 C O2 C O3 D 0).
6-1 Setup of the Best Linear Minimum Bias Estimator of Type BLUMBE
O 3 ; I3 BLUMBE/g Mean Square Estimation Error MSEf.I O D Dfg O C ŒI3 A0 .AA0 AA0 / AA0 A 2 ; MSEfg 2 3 522 2 O D 4 2 5 2 5: MSEfg 9 225 replace 2 by O 2 (BIQUUE): O 2 D .n rkA/1 e0 y ey 1 2 .y C y22 C y32 C 2y1 y2 C 2y2 y3 C 2y3 y1 /; 3 1 2 3 522 1 O D 4 2 5 2 5 O 2 .BIQUUE/: MSEfg 9 225
O 2 .BIQUUE/ D
“replace 2 by O 2 (BIQE): O 2 D .n C 2 rkA/1 e0 y ey ” 1 2 .y C y22 C y32 C 2y1 y2 C 2y2 y3 C 2y3 y1 / 9 1 2 3 522 1 O D 4 2 5 2 5 O 2 .BIQE/: MSEfg 9 225
O 2 .BIQE/ D
Residual vector eQ y and dispersion matrix DfQey g of the “random effect” ey eQ y .I3 ; I3 BLUMBE/ D ŒI3 A.A0 A/ A0 y 2 3 2 3 111 1 1 1 eQ y D 4 1 1 1 5 y D .y1 C y2 C y3 / 4 1 5 3 3 111 1 DfQey g D 2 ŒI3 A.A0 A/ A0 2 3 111 2 4 DfQey g D 1 1 1 5: 3 111 “replace 2 by O 2 (BIQUUE) or O 2 (BIQE)”: 2
3 111 1 DfQey g D 4 1 1 1 5 O 2 .BIQUUE/ 3 111
321
322
6 The Third Problem of Probabilistic Regression
or 2 3 111 14 DfQey g D 1 1 1 5 O 2 .BIQE/: 3 111 Finally note that O .I3 ; I3 BLUMBE/ corresponds xlm .I3 ; I3 MINOLESS/ discussed in Chapter 5. In addition, DfQey jI3 ; I3 BLUUEg D DfQey jI3 ; I3 BLUMBEg.
6-122 The First Example: V, S-BLUMBE In the second case, we assume a dispersion matrix Dfyg D V 2 of weighted observationsŒy1 ; y2 ; y3
and
a weighted substitute matrix S, in short w.s.
Under such a specification O is V, S-BLUMBE of in the special Gauss–Markov model with datum defect. 1 1 O D SA0 .ASA0 V ASA0 /1 ASA0 V y:
As dispersion matrix Dfyg D V 2 we choose 2 3 211 14 VD 1 2 1 5; rkV D 3 D n 2 112 2 3 3 1 1 1 V1 D 4 1 3 1 5; but 2 1 1 3 S D Diag.0; 1; 1/; rkS D 2 as the bias semi-norm. The matrix S fulfils the condition rk.SA0 / D rkA D r D 2: ?How did we compute the g-inverse .ASA0 V1 ASA0 / ? The computation of the g-inverse .ASA0 V1 ASA0 / has been based upon bordering.
6-1 Setup of the Best Linear Minimum Bias Estimator of Type BLUMBE
V1
323
2 3 C3 1 1 14 D 1 C3 1 5; S D Diag.0; 1; 1/; rkS D 2 2 1 1 C3
1 1 O D SA0 .ASA0 V ASA0 / ASA0 V 2 3 2 3 1 1 ASA0 V ASA0 D 2 4 3 6 3 5
1 3 2 3 2 0 1 1 1 .ASA0 V ASA0 / D 4 0 0 3 5 6 1 0 2 3 2 2 3 0 h˛ 1 D 4 2y1 y2 y3 5: O D 4 hˇ 5 3 h V;SBLUMBE y1 C y2 2y3 2
O of the unknown vector of “fixed effects” Dispersion matrix Dfg O D 2 SA0 .ASA0 V ASA0 / AS Dfg 2 3 000 2 O D 40 2 15 Dfg 6 012 1
“replace 2 by O 2 (BIQUUE): O 2 D .n rkA/1 eQ 0y eQ y ” 1
1
eQ y D .V; SBLUMBE/ D ŒI3 A.A0 V A/ A0 V y 2 2 3 3 111 1 1 y1 C y2 C y3 415 eQ y D 4 1 1 1 5 y D 3 3 111 1 2 32 3 111 111 1 eQ 0y eQ y D y0 4 1 1 1 5 4 1 1 1 5 y 9 111 111 1 2 .y C y22 C y32 C 2y1 y2 C 2y2 y3 C 2y3 y1 / 3 1 1 O 2 .BIQUUE/ D .y12 C y22 C y32 C 2y1 y2 C 2y2 y3 C 2y3 y1 / 3 eQ 0y eQ y D
324
6 The Third Problem of Probabilistic Regression
O D ŒV C A.A0 V1 A/ A0 O 2 .BIQUUE/ D fg 2 3 111 2 O D 4 1 1 1 5 O 2 .BIQUUE/: D fg 3 111 “replace 2 by O 2 (BIQE): O 2 D .n C 2 rkA/1 eQ 0y eQ y ” 1
1
eQ y .V; SBLUMBE/ D ŒI3 A.A0 V A/ A0 V y 2 2 3 3 111 1 C y C y 14 y 1 2 3 4 5 eQ y D 1 1 15yD 1 3 3 111 1 1 2 .y C y22 C y32 C 2y1 y2 C 2y2 y3 C 2y3 y1 / 9 1 2 3 C2 1 1 1 O D 4 1 C2 1 5 O 2 .BIQE/: D fg 9 1 1 C2 O We repeat the statement that we recommend the use of Df.BLUMBE/; .BIQE/g O O is remarkably smaller when compared to since the dispersion matrix Dfg DfO .BLUMBE/; O 2 .BIQUUE/g! O 2 .BIQE/ D
Bias vector ˇ(BLUMBE) of the unknown vector of “fixed effects ” 1
1
ˇ D ŒI3 SA0 .ASA0 V ASA0 / ASA0 V A 2 3 2 3 100 1 ˇ D 4 1 0 0 5 D 4 1 5; 100 1 “replace which is inaccessible by O (V,S-BLUMBE)” 2 3 1 O .V; SBLUMBE/ ¤ 0: ˇ D 4 1 5 ; 1 O Mean Square Estimation Error MSEf.V; SBLUMBE/g O D Dfg O C ŒS SA0 .ASA0 V ASA0 / ASA0 V AS 2 MSEfg 2 3 000 2 O D O 4 0 2 1 5 D Dfg: MSEfg 6 012 1
1
6-1 Setup of the Best Linear Minimum Bias Estimator of Type BLUMBE
“replace 2 by O 2 (BIQUUE): O 2 D .n rkA/1 e0 y ey ” O 2 .BIQUUE/ D 3 2 O D Dfg: O MSEfg Residual vector eQ y and dispersion matrix DfQey g of the “random effect” ey 1
1
eQ y .V; SBLUMBE/ D ŒI3 A.A0 V A/ A0 V y 2 3 2 3 111 1 C y C y 14 y 1 2 3 4 5 eQ y D 1 1 15yD 1 3 3 111 1 1
DfQey g D 2 ŒV A.A0 V A/ A0 2 3 111 2 24 DfQey g D 1 1 1 5: 3 111 “replace 2 by O 2 (BIQE): O 2 D .n C 2 rkA/1 e0 y ey ” O 2 .BIQE/versus 2 00 1 O D 40 2 MSEfg 6 01
3 0 1 5 2 .BIQE/: 2
Residual vector eQ y and dispersion matrix DfQey g of the “random effects” ey 1
1
eQ y .V; SBLUMBE/ D ŒI3 A.A0 V A/ A0 V y 2 3 2 3 111 1 14 y C y C y 1 2 3 5 4 eQ y D 1 1 1 yD 15 3 3 111 1 1
DfQey g D 2 ŒV A.A0 V A/ A0 2 3 111 2 24 DfQey g D 1 1 1 5: 3 111 “replace 2 by O 2 (BIQUUE) or O 2 (BIQE)”:
325
326
6 The Third Problem of Probabilistic Regression
2 3 111 24 DfQey g D 1 1 1 5 O 2 .BIQUUE/ 3 111 2
or
3 111 2 DfQey g D 4 1 1 1 5 O 2 .BIQE/: 3 111
6-123 The First Example: I3 , I3 -BLE In the third case, we assume a dispersion matrix Dfyg D I3 2 observations Œy1 ; y2 ; y3 .
and
a unity substitute matrix S D I3 , in short u.s.
Under such a specification O is I3 ; I3 -BLE of in the special Gauss–Markov model with datum defect. O .BLE/ D .I3 C A0 A/1 A0 y 2
3 2 C3 1 1 21 1 I3 C A0 A D 4 1 C3 1 5; .I3 C A0 A/1 D 4 1 2 4 1 1 C3 11 3 2 3 2 1 0 1 y1 C y3 1 1 O .BLE/ D 4 1 1 0 5 y D 4 Cy1 y2 5: 4 4 Cy2 y3 0 1 1
3 1 15 2
O1 C O2 C O3 D 0: O Dispersion matrix DfjBLEg of the unknown vector of “fixed effects” O DfjBLEg D 2 A0 A.I3 C A0 A/2 2 3 C2 1 1 2 O 4 1 C2 1 5: DfjBLEg D 16 1 1 C2 Bias vector ˇ.BLE/ of the unknown vector of “fixed effects” ˇ D ŒI3 C A0 A1
6-1 Setup of the Best Linear Minimum Bias Estimator of Type BLUMBE
3 2 3 2 211 21 C 2 C 3 14 1 ˇD 1 2 1 5 D 4 1 C 22 C 3 5: 4 4 1 C 2 C 23 112 Mean Square Estimation Error MSEf.BLE/g O MSEf.BLE/g D 2 ŒI3 C A0 A1 2 3 211 2 4 O MSEf.BLE/g D 1 2 1 5: 4 112 Residual vector eQ y and dispersion matrix DfQey g of the “random effect” ey eQ y .BLE/ D ŒAA0 C I3 1 y 2 2 3 3 211 2y1 C y2 C y3 1 1 eQ y .BLE/ D 4 1 2 1 5 y D 4 y1 C 2y2 C y3 5 4 4 y1 C y2 C 2y3 112 DfQey .BLE/g D 2 ŒI3 C AA0 2 2 3 655 2 4 DfQey .BLE/g D 5 6 5 5: 16 556 Correlations O D 2 ŒI3 C AA0 2 AA0 C fQey ; Ag 2 3 C2 1 1 2 O D 4 1 C2 1 5: C fQey ; Ag 16 1 1 C2 Comparisons BLUMBE-BLE .i / OBLUMBE OBLE OBLUMBE OBLE D A0 .AA0 AA0 / AA0 .AA0 C I3 /1 y 2 3 2 3 1 0 1 y1 C y3 1 1 4 1 1 0 5 y D 4 Cy1 y2 5: OBLUMBE OBLE D 12 12 Cy2 y3 0 1 1
327
328
6 The Third Problem of Probabilistic Regression
.ii/ DfOBLUMBE g DfOBLE g DfOBLUMBE g DfOBLE g 2
0
0
D A .AA AA0 / AA0 .AA0 C I3 /1 AA0 .AA0 AA0 / A C 2 A0 .AA0 AA0 / AA0 .AA0 C I3 /1 AA0 .AA0 C I3 /1 AA0 .AA0 AA0 / A 2 3 C2 1 1 7 DfOBLUMBE g DfOBLE g D 2 4 1 C2 1 5 positive semidefinite: 144 1 1 C2 .iii/ MSEfOBLUMBE g MSEfOBLE g MSEfOBLUMBE g MSEfOBLE g D 2 A0 .AA0 AA0 / AA0 .AA0 C I3 /1 AA0 .AA0 AA0 / A 2 3 C2 1 1 2 4 1 C2 1 5 positive semidefinite: MSEfOBLUMBE g MSEfOBLE g D 36 1 1 C2
6-124 The First Example: V, S-BLE In the fourth case, we assume a dispersion matrix Dfyg D V 2 of matrix S, weighted observations Œy1 ; y2 ; y3
and
a weighted substitute in short w.s. .
We choose 2 3 211 14 VD 1 2 1 5 positive definite; rkV D 3 D n; 2 112 2 3 C3 1 1 1 V1 D 4 1 C3 1 5; 2 1 1 C3 and S D Diag.0; 1; 1/; rkS D 2; 1 1 O D .I3 C SA0 V A/1 SA0 V y;
6-1 Setup of the Best Linear Minimum Bias Estimator of Type BLUMBE
2
3 2 3 1 0 0 21 0 0 1 1 4 14 5 2 5; I3 C SA0 V A D 4 2 5 2 5; .I3 C SA0 V A/1 D 21 2 2 5 14 2 5 2 3 h˛ O .V; SBLE/ D 4 hˇ 5 1
2
3
h 2
V;SBLE 3 0 0 0 0 1 4 1 4 D 21 14 6 4 5 y D 21 10y1 6y2 4y3 5: 4 6 10 4y1 C 6y2 10y3
O SBLEg of the unknown vector of “fixed effects” Dispersion matrix DfjV; O SBLEg D 2 SA0 V AŒI3 C SA0 V A1 S; DfjV; 2 3 0 0 0 2 O SBLEg D 4 0 76 22 5: DfjV; 441 0 22 76 1
1
Bias vector ˇ.V; SBLE/ of the unknown vector of “fixed effects” 1
ˇ D ŒI3 C SA0 V A1 2 3 2 3 21 0 0 211 1 4 1 4 ˇD 14 5 2 5 D 141 C 52 C 23 5: 21 21 14 2 5 141 C 22 C 53 Mean Square Estimation Error MSEfjV; SBLEg MSEfjV; SBLEg D 2 ŒI3 C SA0 VA1 S 2 3 000 2 4 0 5 2 5: MSEfjV; SBLEg D 21 025 Residual vector eQ y and dispersion matrix DfQey g of the “random effect” ey 1
eQ y .V; SBLE/ D ŒI3 C ASA0 V 1 y 3 2 3 2 11 6 4 11y1 C 6y2 C 4y3 1 4 1 4 6y1 C 9y2 C 6y3 5 eQ y fV; SBLEg D 6 9 6 5yD 21 21 4y1 C 6y2 C 11y3 4 6 11 1
DfQey .V; SBLE/g D 2 ŒI3 C ASA0 V 2 V
329
330
6 The Third Problem of Probabilistic Regression
2 3 614 585 565 2 4 DfQey .V; SBLE/g D 585 594 585 5: 882 565 585 614 Correlations 1
O D 2 .I3 C ASA0 V /2 ASA0 C fQey ; Ag 2 3 29 9 20 2 O D 4 9 18 9 5: C fQey ; Ag 441 20 9 29 Comparisons BLUMBE-BLE .i / OBLUMBE OBLE 1 OV;SBLUMBE OV;SBLE D SA0 .ASA0 V ASA0 / ASA0 .ASA0 C V/1 y 2 2 3 3 0 0 0 0 1 1 4 4 1 3 5 y D 4 4y1 y2 3y3 5: OV;SBLUMBE OV;SBLE D 21 21 3y1 C y2 4y3 3 1 4
.ii/ DfOBLUMBE g DfOBLE g DfOV;SBLUMBE g DfOV;SBLE g D SA0 .ASA0 V1 ASA0 / ASA0 .ASA0 C V/1 2
ASA0 .ASA0 V1 ASA0 / AV C 2 SA0 .ASA0 V1 ASA0 / ASA0 .ASA0 C V/1 ASA0 .ASA0 C V/1 ASA0 .ASA0 V1 ASA0 / AS 2 3 0 0 0 2 4 0 142 103 5 positive semidefinite. DfOV;SBLUMBE g DfOV;SBLE g D 882 0 103 142 .iii/ MSEfOBLUMBE g MSEfOBLE g MSEfOV;SBLUMBE g MSEfOV;SBLEg D SA .ASA V ASA0 / ASA0 .ASA0 C V/1 ASA0 .ASA0 V1 ASA0 / AS 2 3 000 2 4 MSEfOV;SBLUMBE g MSEfOV;SBLE g D 0 4 3 5 positive semidefinite: 42 034 2
0
0
1
6-1 Setup of the Best Linear Minimum Bias Estimator of Type BLUMBE
331
Summarizing, let us compare I; I BLUMBE versus I; I BLE and V; S BLUMBE versus V; S BLEŠ OBLUMBE OBLE , DfOBLUMBE g DfOBLE g and MSEfOBLUMBE g MSEfOBLE g as well as OV;SBLUMBE OV;SBLE , DfOV;SBLUMBE g DfOV;SBLE g and MSEfOV;SBLUMBE g MSEfOV;SBLE g result positive semidefinite: In consequence, for three different measures of distortions BLE is in favor of BLIMBE: BLE produces smaller errors in comparing with BLIMBE! Finally let us compare weighted BIQUUE and weighted BIQE: (i) Weighted BIQUUE O 2 and weighted BIQE O 2 O 2 D .n r C 2/y0 V1 eQ y D O 2 D .n r/1 y0 V1 eQ y versus 1 Q 0 1 Q D .n r/ ey V ey D .n r C 2/Qe0y V1 eQ y 2 3 411 14 .Qey /V ; SBLUMBE D 1 4 15y 6 114 2 3 C3 1 1 1 r D rkA D 2; n D 3; V1 D 4 1 C3 1 5 2 1 1 C3 1 1 O 2 D .y12 C y22 C y32 / versus O 2 D .y12 C y22 C y32 / 2 6 .ii/ DfO 2 jBIQUUEg versus DfO 2 jBIQEg DfO 2 g D 2.n r/1 4 versus DfO 2 g D 2.n r/.n r C 2/1 4 2 DfO 2 g D 2 4 versus DfO 2 g D 4 9 2 O 2 O O 2 g D 2.n r/1 .O 2 /2 versus EfO g D Df D 2.n r C 2/1 eQ 0y V1 eQ y O O 2 2 g D 1 .y 2 C y 2 C y 2 / Ef 2 3 9 1 1 2 2 2 O O g D .y1 C y2 C y3 / versus Ef O O 2 2 g D 2.n r C 2/1 .Qe0y V1 eQ y /2 Df 2 O O 2 2 /g D 1 .y 2 C y 2 C y 2 /: Ef. 2 3 54 1 2
1
.iii/ .Qey /BLUMBE D ŒIn A.A0 V A/ AV1 .Qey /BLE 1
.O 2 /BIQUUE D .n r/.Qe0y /BLE ŒV1 V1 A.A0 V A/ AV1 .Qey /BLE
332
6 The Third Problem of Probabilistic Regression
1 2 2 O BIQUUE O BIQE D .y12 C y22 C y32 / posi ti ve: 3 2 2 2 2 We repeat that the difference O BIQUUE O BIQE is in favor of O BIQE < O BIQUUE .
6-2 Setup of the Best Linear Estimators of Type hom BLE, hom S-BLE and hom a-BLE for Fixed Effects The topic of this section has been the subject of Jianqing Cai’s Ph.D. Thesis, Deutsche Geod¨atische Kommission, Bayerische Akademie der Wissenschaft, Report C 377, M¨unchen 2004. Numerical tests have documented that O of type †-BLUUE of is not robust against outliers in the stochastic vector y of observations. It is for this reason that we give up the postulate of unbiasedness, but keeping the set up of a linear estimation O D Ly of homogeneous type. The matrix L is uniquely determined by O 2 and minimum bias the ˛-weighted hybrid norm of type minimum variance jjDfgjj 2 jjI LAjj . For such a homogeneous linear estimation (2.21) by means of Box 6.4 O let us specify the real-valued, nonstochastic bias vector ˇ W D EfO g D Efg of type (6.11), (6.12), (6.13) and the real-valued, non-stochastic bias matrix Im LA (6.74) in more detail. First, let us discuss why a setup of an inhomogeneous linear estimation is not suited to solve our problem. In the case of an unbiased estimator, the setup of an O D the postulate of inhomogeneous linear estimation O D Ly C led us to Efg O unbiasedness if and only if Efg W D LEfyg C D .Im LA/ C D 0 for all 2 Rm or LA D I and D 0. Indeed the postulate of unbiasedness restricted the linear operator L to be the (non-unique) left inverse L D A L as well as the vector of inhomogeneity to zero. In contrast the bias vector ˇ W D EfO g D O D LEfyg D .Im LA/ C for a setup of an inhomogeneous Efg linear estimation should approach zero if D 0 is chosen as a special case. In order to include this case in the linear biased estimation procedure we set D 0. Second, we focus on the norm (2.79) namely jjˇjj2 W D Ef.O /0 .O /g O of . O In terms of a of the bias vector ˇ, also called Mean Square Error MSEfg O setup of a homogeneous linear estimation, D Ly, the norm of the bias vector is represented by .Im LA/0 0 .Im LA/ or by the weighted Frobenius matrix norm jj.Im LA/0 jj2 0 where the weight matrix 0 ; rk 0 D 1; has rank one. Obviously jj.Im LA/0 jj2 0 is only a semi-norm. In addition, 0 is not accessible since is unknown. In this problematic case we replace the matrix 0 by a fixed, positive definite m m matrix S and define the S-weighted Frobenius matrix norm jj.Im LA/0 jj2S of type (6.6) of the bias matrix Im LA. Indeed by means of the rank identity, rkS D m we have chosen a weight matrix of maximal rank. Now we are prepared to understand intuitively the following.
6-2 Setup of the Best Linear Estimators
333
Here we focus on best linear estimators of type hom BLE, hom S-BLE and hom a-BLE of fixed effects ; which turn out to be better than the best linear uniformly unbiased estimator of type hom BLUUE, but suffer from the effect to be biased. At first let us begin with a discussion about the bias vector and the bias matrix as O with respect to a homogeneous well as the Mean Square Estimation Error MSEfg linear estimation O D Ly of fixed effects based upon Box 6.4. Box 6.4. (Bias vector, bias matrix Mean Square Estimation Error in the special Gauss–Markov model with fixed effects): Efyg D A
(6.71)
Dfyg D † y
(6.72)
“ansatz” O D Ly
(6.73)
bias vector O ˇ W D EfO g D Efg
(6.74)
ˇ D LEfyg D ŒIm LA
(6.75)
bias matrix B W D Im LA
(6.76)
decomposition O C .Efg O / O D .O Efg/
(6.77)
O D L.y Efyg/ ŒIm LA
(6.78)
Mean Square Estimation Error O W D Ef.O /.O /0 g MSEfg
(6.79)
O D LDfygL0 C ŒIm LA 0 ŒIm LA0 MSEfg
(6.80)
O D 0/ .EfO Efgg modified Mean Square Estimation Error O W D LDfygL0 C ŒIm LA S ŒIm LA0 MSES fg Frobenius matrix norms
(6.81)
334
6 The Third Problem of Probabilistic Regression
O 2 W D tr Ef.O /.O /0 g jjMSEfgjj
(6.82)
O 2 jjMSEfgjj D tr LDfygL C tr ŒIm LA 0 ŒIm LA0 D jjL0 jj2˙y C jj.Im LA/0 jj2 0
(6.83)
O 2 jjMSES fgjj W D tr LDfygL C tr ŒIm LASŒIm LA0 D jjL0 jj2˙y C jj.Im LA/0 jj2S
(6.84)
0
0
hybrid minimum variance – minimum bias norm ˛-weighted L.L/ W D jjL0 jj2˙y C ˛1 jj.Im LA/0 jj2S
(6.85)
special model dim R.SA0 / D rkSA0 D rkA D m:
(6.86)
O subject to the homogeneous The bias vector ˇ is conventionally defined by Efg estimation form O D Ly. Accordingly the bias vector can be represented by (6.75) ˇ D ŒIm LA . Since the vector of fixed effects is unknown, there has been made the proposal to use instead the matrix Im LA as a matrix-valued measure of bias. O of A measure of the estimation error is the Mean Square estimation error MSEfg O type (6.79). MSEfg can be decomposed into two basic parts: O D LDfygL0 • the dispersion matrix Dfg • the bias product ˇˇ 0 . Indeed the vector O can be decomposed as well into two parts of type (6.77), O and (ii) Efg O which may be called estimation (6.78), namely (i) O Efg error and bias, respectively. The double decomposition of the vector O leads O of type (6.80). straightforward to the double representation of the matrix MSEfg Such a representation suffers from two effects: Firstly the vector of fixed effects is unknown, secondly the matrix 0 has only rank 1. In consequence, the matrix ŒIm LA 0 ŒIm LA0 has only rank 1, too. In this situation there has been made O with respect to 0 by the regular matrix S. MSES fg O a proposal to modify MSEfg O O has been defined by (6.81). A scalar measure of MSEfg as well as MSES fg are the Frobenius norms (6.82), (6.83), (6.84). Those scalars constitute the optimal risk in Definition 6.7 (hom BLE) and Definition 6.8 (hom S-BLE). Alternatively a homogeneous ˛-weighted hybrid minimum variance–minimum bias estimation (hom ˛-BLE) is presented in Definition 6.9 (hom ˛-BLE) which is based upon the weighted sum of two norms of type (6.85), namely • Average variance jjL0 jj2˙y D t rL˙y L0 • Average bias jj.Im LA/0 jj2S D t r ŒIm LA S ŒIm LA0 .
6-2 Setup of the Best Linear Estimators
min bias
335
min variance
balance between variance and bias
Fig. 6.1 Balance of variance and bias by the weight factor ˛
The very important estimator ˛-BLE is balancing variance and bias by the weight factor a which is illustrated by Fig. 6.1. Definition 6.7. (O hom BLE of ): An m1 vector O is called homogeneous BLE of in the special linear Gauss– Markov model with fixed effects of Box 6.3, if (1st) O is a homogeneous linear form O D Ly
(6.87)
(2nd) in comparison to all other linear estimations O has the minimum Mean Square Estimation Error in the sense of O 2 jjMSEfgjj D tr LDfygL0 C tr ŒIm LA 0 ŒIm LA0 D jjL0 jj2˙y C jj.Im LA/0 jj2 0 :
(6.88)
Definition 6.8. (O S-BLE of ): An m1 vector O is called homogeneous S-BLE of in the special linear Gauss– Markov model with fixed effects of Box 6.3, if (1st) O is a homogeneous linear form O D Ly
(6.89)
(2nd) in comparison to all other linear estimations O has the minimum S-modified Mean Square Estimation Error in the sense of O 2 jjMSES fgjj W D tr LDfygL C tr ŒIm LASŒIm LA0 D jjL0 jj2˙y C jj.Im LA/0 jj2S D min : 0
(6.90)
L
Definition 6.9. (O hom hybrid min var-min bias solution, a-weighted or hom a-BLE):
336
6 The Third Problem of Probabilistic Regression
An m1 vector O is called homogeneous ˛-weighted hybrid minimum variance– minimum bias estimation (hom a-BLE) of in the special linear Gauss–Markov model with fixed effects of Box 6.3, if (1st) O is a homogeneous linear form O D Ly
(6.91)
(2nd) in comparison to all other linear estimations O has the minimum varianceminimum bias in the sense of the ˛-weighted hybrid norm tr LDfygL0 C ˛1 tr .Im LA/ S .Im LA/0 D jjL0 jj2˙y C ˛1 jj.Im LA/0 jj2S D min ;
(6.92)
L
in particular with respect to the special model ˛ 2 RC ; dim R.SA0 / D rkSA0 D rkA D m: The estimations O hom BLE, hom S-BLE and hom ˛-BLE can be characterized as following: Lemma 6.10. (hom BLE, hom S-BLE and hom ˛-BLE): An m1 vector O is hom BLE, hom S-BLE or hom ˛-BLE of in the special linear Gauss–Markov model with fixed effects of Box 6.3, if and only if the matrix LO fulfils the normal equations (1st) hom BLE: (6.93) .† y C A 0 A0 /LO 0 D A 0 A0 (2nd) hom S-BLE: (3rd) hom ˛-BLE:
.† y C ASA0 /LO 0 D AS
(6.94)
.† y C ˛1 ASA0 /LO 0 D ˛1 AS
(6.95)
:Proof: (i) hom BLE: O 2 establishes the Lagrangean The hybrid norm jjMSEfgjj L.L/ W D tr L˙y L0 C tr .Im LA/ 0 .Im LA/0 D min L
for O hom BLE of . The necessary conditions for the minimum of the quadratic Lagrangean L.L/ are
6-2 Setup of the Best Linear Estimators
337
@L O .L/ W D 2Œ˙y LO 0 C A 0 A0 LO 0 A 0 D 0 @L which agree to the normal equations (6.93). (The theory of matrix derivatives is reviewed in Appendix A.7 (Facts: derivative of a scalar-valued function of a matrix: trace). The second derivatives @2 L O >0 .L/ @.vecL/@.vecL/0 at the “point” LO constitute the sufficiency conditions. In order to compute such an mn mn matrix of second derivatives we have to vectorize the matrix normal equation @L O O y C A 0 A0 / 2 0 A0 ; .L/ WD 2L.˙ @L @L O WD vecŒ2L.˙ O y C A 0 A0 / 2 0 A0 : .L/ @.vecL/ (ii) hom S-BLE: O establishes the Lagrangean The hybrid norm jjMSEs fgjj 2
L.L/ W D tr L˙y L0 C tr .Im LA/ S .Im LA/0 D min L
for O hom S-BLE of . Following the first part of the proof we are led to the necessary conditions for the minimum of the quadratic Lagrangean L.L/ @L O .L/ W D 2Œ˙y LO 0 C ASA0 LO 0 AS0 D 0 @L as well as to the sufficiency conditions @2 L O D 2Œ.† y C ASA0 / ˝ Im > 0: .L/ @.vecL/@.vecL/0 O D 0 agree to (6.92). The normal equations of hom S-BLE @
[email protected]/ (iii) hom ˛-BLE: The hybrid norm jjL0 jj2˙y C ˛1 jj.Im LA/0 jj2S establishes the Lagrangean L.L/ W D tr L˙y L0 C ˛1 tr .Im LA/S.Im LA/0 D min L
for O hom ˛-BLE of . Following the first part of the proof we are led to the necessary conditions for the minimum of the quadratic Lagrangean L.L/
338
6 The Third Problem of Probabilistic Regression
@L O .L/ D 2Œ.† y C A 0 A0 / ˝ Im vecLO 2vec. 0 A0 /: @L The Kronecker–Zehfuss product A ˝ B of two arbitrary matrices as well as .A C B/ ˝ C D A ˝ B C B ˝ C of three arbitrary matrices subject to dim A = dim B is introduced in Appendix A.7, “Definition of Matrix Algebra: multiplication of matrices of the same dimension (internal relation) and Laws”. The vec operation (vectorization of an array) is reviewed in Appendix A.7, too, “Definition, Facts: vecAB D .B0 ˝ I0 ` /vecA for suitable matrices A and B”. Now we are prepared to compute @2 L O D 2Œ.˙y C A 0 A0 / ˝ Im > 0 .L/ @.vecL/@.vecL/0 as a positive definite matrix. The theory of matrix derivatives is reviewed in Appendix A.7 “Facts: Derive of a matrix-valued function of a matrix, namely @.vecX/
[email protected]/0 ”. @L O .L/ D 2Œ ˛1 ASA0 LO 0 C † y LO 0 ˛1 AS0 ˛ @L as well as to the sufficiency conditions @2 L O D 2Œ. 1 ASA0 C † y / ˝ Im > 0 : .L/ ˛ @.vecL/@.vecL/0 O D 0 agree to (6.93). The normal equations of hom ˛-BLE @
[email protected]/ For an explicit representation of O as hom BLE, hom S-BLE and hom ˛-BLE of in the special Gauss–Markov model with fixed effects of Box 6.3, we solve the normal equations (6.94), (6.95) and (6.96) for LO D argfL.L/ D ming: L
Beside the explicit representation of O of type hom BLE, hom S-BLE and O the Mean Square hom ˛-BLE we compute the related dispersion matrix Dfg, O the modified Mean Square Estimation Error MSES fg O Estimation Error MSEfg, O and MSE˛;S fg in Theorem 6.11. (O hom BLE): Let O D Ly be hom BLE of in the special linear Gauss–Markov model with fixed effects of Box 6.3. Then equivalent representations of the solutions of the normal equations (6.93) are O D 0 A0 Œ† y C A 0 A0 1 y (6.96) (if Œ˙y C A 0 A0 1 exists) and completed by the dispersion matrix
6-2 Setup of the Best Linear Estimators
339
O D 0 A0 Œ† y C A 0 A0 1 † y Dfg Œ† y C A 0 A0 1 A 0 ;
(6.97)
by the bias vector O ˇ WD Efg D ŒIm 0 A0 .A 0 A0 C † y /1 A
(6.98)
O and by the matrix of the Mean Square Estimation Error MSEfg: O WD Ef.O /.O /0 g MSEfg O C ˇˇ 0 D Dfg
(6.99)
O MSEfg O C ŒIm 0 A0 .A 0 A0 C † y /1 A D Dfg 0 ŒIm A0 .A 0 A0 C † y /1 A 0 :
(6.100)
At this point we have to comment what Theorem 6.11 tells us. hom BLE has O of type (6.97), generated the estimation O of type (6.96), the dispersion matrix Dfg the bias vector of type (6.98) and the Mean Square Estimation Error of type (6.100) which all depend on the vector and the matrix 0 , respectively. We already mentioned that and the matrix 0 are not accessible from measurements. The situation is similar to the one in hypothesis testing. As shown later in this section we can produce only an estimator b and consequently can setup a hypothesis 0 of the “fixed effect” . Indeed, a similar argument applies to the second central moment Dfyg † y of the “random effect” y, the observation vector. Such a dispersion O and MSEfg. O Again matrix has to be known in order to be able to compute O Dfg, O y and we have to apply the argument that we are only able to construct an estimate † to setup a hypothesis about Dfyg ˙y . Theorem 6.12. (O hom S-BLE): Let O D Ly be hom S-BLE of in the special linear Gauss–Markov model with fixed effects of Box 6.3. Then equivalent representations of the solutions of the normal equations (6.94) are O D SA0 .† y C ASA0 /1 y
(6.101)
1 1 0 1 O D .A0 † 1 y A C S / A †y y
(6.102)
1 0 1 O D .Im C SA0 † 1 y A/ SA ˙y y
(6.103)
340
6 The Third Problem of Probabilistic Regression
(if S1 ; † 1 y exist) are completed by the dispersion matrices O D SA0 .ASA0 C † y /1 † y .ASA0 C † y /1 AS Dfg
(6.104)
O D .A0 † 1 A C S1 /1 A0 ˙ 1 A.A0 † 1 A C S1 /1 Dfg y y y
(6.105)
(if S1 ; † 1 y exist) by the bias vector O ˇ WD Efg D ŒIm SA0 .ASA0 C † y /1 A 1 1 0 1 ˇ D ŒIm .A0 † 1 y A C S / A ˙y A
(6.106)
(if S1 ; † 1 y exist)
O and by the matrix of the modified Mean Square Estimation Error MSEfg: O WD Ef.O /.O /0 g MSES fg
(6.107)
O C ˇˇ 0 D Dfg O D SA0 .ASA0 C † y /1 † y .ASA0 C † y /1 AS MSES fg
(6.108)
C ŒIm SA0 .ASA0 C † y /1 A 0 ŒIm A0 .ASA0 C † y /1 AS D S SA0 .ASA0 C † y /1 AS O D .A0 † 1 A C S1 /1 A0 † 1 A.A0 † 1 A C S1 /1 MSES fg y y y 1 1 0 1 0 C ŒIm .A0 † 1 y A C S / A † y A
(6.109)
0 1 1 1 ŒIm A0 † 1 y A.A † y A C S / 1 1 D .A0 † 1 y ACS /
(if S1 ; † 1 y exist). The interpretation of hom S-BLE is even more complex. In extension of the comments to hom BLE we have to live with another matrix-valued degree of O of type (6.104), (6.105) do freedom, O of type (6.101), (6.102), (6.103) and Dfg 0 no longer depend on the inaccessible matrix , rk. 0 /, but on the “bias weight matrix” S, rkS D m. Indeed we can associate any element of the bias matrix with a particular weight which can be “designed” by the analyst. Again the bias vector ˇ of type (6.106) as well as the Mean Square Estimation Error of type (6.107), (6.108), (6.109) depend on the vector which is inaccessible. Beside the “bias O Dfg, O ˇ and MSEs fg O are vector-valued or matrix-valued weight matrix S” , functions of the dispersion matrix Dfyg ˙y of the stochastic observation vector which is inaccessible. By hypothesis testing we may decide upon the construction b y. of Dfyg ˙y from an estimate ˙
6-2 Setup of the Best Linear Estimators
341
Theorem 6.13. (O hom ˛-BLE): Let O D Ly be hom ˛-BLE of in the special linear Gauss–Markov model with fixed effects Box 6.3. Then equivalent representations of the solutions of the normal equations (6.95) are (6.110) O D ˛1 SA0 .† y C ˛1 ASA0 /1 y 1 1 0 1 O D .A0 † 1 y A C ˛S / A † y y
(6.111)
1 1 O D .Im C ˛1 SA0 † 1 SA0 † 1 y A/ y y ˛
(6.112)
(if S1 ; † 1 y exist) are completed by the dispersion matrix
O D 1 SA0 .† y C 1 ASA0 /1 † y .† y C 1 ASA0 /1 AS 1 Dfg ˛ ˛ ˛ ˛
(6.113)
O D .A0 † 1 A C ˛S1 /1 A0 † 1 A.A0 † 1 A C ˛S1 /1 Dfg y y y
(6.114)
(if S1 ; † 1 y exist), by the bias vector O ˇ WD Efg D ŒIm ˛1 SA0 . ˛1 ASA0 C † y /1 A 1 1 0 1 ˇ D ŒIm .A0 † 1 y A C ˛S / A † y A
(6.115)
O (if S1 ; † 1 y exist) and by the matrix of the Mean Square Estimation Error MSEfg: O WD Ef.O /.O /0 g MSEfg O C ˇˇ 0 D Dfg
(6.116)
O D SC0 .ASA0 C † y /1 † y .ASA0 C † y /1 AS MSE˛; S fg C ŒIm ˛1 SA0 . ˛1 ASA0 C † y /1 A 0 ŒIm A0 . ˛1 ASA0 C † y /1 AS ˛1
(6.117)
D ˛1 S ˛1 SA0 . ˛1 ASA0 C † y /1 ˛1 AS O D .A0 † 1 A C ˛S1 /1 A0 † 1 A.A0 † 1 A C ˛S1 /1 MSE˛; S fg y y y 1 1 0 1 0 C ŒIm .A0 † 1 y A C ˛S / A † y A 1
1 1 ŒIm A0 † y A.A0 † 1 y A C ˛S /
(6.118)
342
6 The Third Problem of Probabilistic Regression 1 1 D .A0 † 1 y A C ˛S /
(if S1 ; † 1 y exist).
The interpretation of the very important estimator hom ˛-BLE O of is as follows: O of type (6.111), also called ridge estimator or Tykhonov–Phillips regulator, contains the Cayley inverse of the normal equation matrix which is additively 1 decomposed into A0 † 1 y A and ˛ S . The weight factor ˛ balances the first inverse dispersion part and the second inverse bias part. While the experiment informs b y , the bias weight matrix S and us of the variance-covariance matrix † y , say † the weight factor ˛ are at the disposal of the analyst. For instance, by the choice S D Di ag.s1 ; :::; s` / we may emphasize increase or decrease of certain bias matrix elements. The choice of an equally weighted bias matrix is S D Im . In contrast the weight factor ˛ can be determined by the A-optimal design of type O D min • trDfg • ˇˇ 0 D min
˛
˛
O D min. • trMSE˛; S fg ˛
O In the first case we optimize the trace of the variance-covariance matrix Dfg 0 of type (6.113), (6.114). Alternatively by means of ˇˇ D min we optimize ˛
the quadratic bias where the bias vector ˇ of type (6.115) is chosen, regardless of the dependence on . Finally for the third case – the most popular one – O of type we minimize the trace of the Mean Square Estimation Error MSE˛; S fg 0 (6.118), regardless of the dependence on . But beforehand let us present the proof of Theorem 6.10, Theorem 6.11 and Theorem 6.8. Proof: .i / O D 0 A0 Œ† y C A 0 A0 1 y If the matrix † y C A 0 A0 of the normal equations of type hom BLE is of full rank, namely rk.† y C A 0 A0 / D n, then a straightforward solution of (6.93) is LO D 0 A0 Œ† y C A 0 A0 1 : .ii/ O D SA0 .† y C ASA0 /1 y If the matrix † y C ASA0 of the normal equations of type hom S-BLE is of full rank, namely rk.† y C ASA0 / D n, then a straightforward solution of (6.94) is O D SA0 .† y C ASA0 /1 : L 1 1 0 1 .iii/ zQ D .A0 † 1 y A C S / A †y y
6-2 Setup of the Best Linear Estimators
343
Let us apply by means of Appendix A.7 (Fact: Cayley inverse: sum of two matrices, s(10), Duncan–Guttman matrix identity) the fundamental matrix identity 1 1 0 1 SA0 .† y C ASA0 /1 D .A0 † 1 y A C S / A †y ;
if S1 and † 1 y exist. Such a result concludes this part of the proof. 1 0 1 .iv/ O D .Im C SA0 † 1 y A/ SA † y y
Let us apply by means of Appendix A.7 (Fact: Cayley inverse: sum of two matrices, s(9)) the fundamental matrix identity 1 0 1 SA0 .† y C ASA0 /1 D .Im C SA0 † 1 y A/ SA † y ;
if † 1 y exists. Such a result concludes this part of the proof. .v/ O D ˛1 SA0 .† y C ˛1 ASA0 /1 y If the matrix † y C ˛1 ASA0 of the normal equations of type hom ˛-BLE is of full rank, namely rk.† y C ˛1 ASA0 / D n, then a straightforward solution of (6.95) is O D 1 SA0 Œ† y C 1 ASA0 1 : L ˛ ˛ 1 1 0 1 .vi/ O D .A0 † 1 y A C ˛S / A † y y
Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matrices, s(10), Duncan–Guttman matrix identity) the fundamental matrix identity 1 0 ˛ SA .† y
1 1 0 1 C ASA0 /1 D .A0 † 1 y A C ˛S / A † y
if S1 and † 1 y exist. Such a result concludes this part of the proof. 1 1 0 1 .vi i /O D .Im C ˛1 SA0 † 1 y A/ ˛ SA † y y
Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matrices, s(9), Duncan–Guttman matrix identity) the fundamental matrix identity 1 0 ˛ SA .† y
1 1 0 1 C ASA0 /1 D .Im C ˛1 SA0 † 1 y A/ ˛ SA † y
if † 1 y exist. Such a result concludes this part of the proof. O .viii/ hom BLE:Dfg O WD EfŒO EfgŒ O O Efg O 0g Dfg
344
6 The Third Problem of Probabilistic Regression
D 0 A0 Œ† y C A 0 A0 1 † y Œ† y C A 0 A0 1 A 0 : O and the substitution of O By means of the definition of the dispersion matrix Dfg of type hom BLE the proof has been straightforward. O .1st representation/ .ix/ hom S -BLE W Dfg O WD EfŒO EfgŒ O O Efg O 0g Dfg D SA0 .ASA0 C † y /1 † y .ASA0 C † y /1 AS: O and the substitution of O By means of the definition of the dispersion matrix Dfg of type hom S-BLE the proof of the first representation has been straightforward. O .2nd representation/ .x/ hom S-BLE W Dfg O WD EfŒO EfgŒ O O Efg O 0g Dfg 1 1 0 1 0 1 1 1 D .A0 † 1 y A C S / A ˙y A.A † y A C S / ;
O if S1 and † 1 y exist. By means of the definition of the dispersion matrix Dfg and O the substitution of of type hom S-BLE the proof of the second representation has been straightforward. O .1st representation/ .xi/ hom ˛-BLE W Dfg O WD EfŒO EfgŒ O O Efg O 0g Dfg D ˛1 SA0 .† y C ˛1 ASA0 /1 † y .† y C ˛1 ASA0 /1 AS ˛1 : O and the substitution of O By means of the definition of the dispersion matrix Dfg of type hom ˛-BLE the proof of the first representation has been straightforward. O .xii/ hom˛-BLE W Dfg.2nd representation/ O WD EfŒO EfgŒ O O Efg O 0g Dfg 1 1 0 1 0 1 1 1 D .A0 † 1 y A C ˛S / A † y A.A † y A C ˛S / ;
if S1 and† 1 y exist. By means of the definition of the dispersion matrix and the O Dfg substitution of O of type hom ˛-BLE the proof of the second representation has been straightforward. (xiii) bias ˇ for hom BLE, hom S-BLE and hom ˛-BLE O D C Efg O the various As soon as we substitute into the bias ˇ W D Efg estimators O of the type hom BLE, hom S-BLE and hom ˛-BLE we are directly led to various bias representations ˇ of type hom BLE, hom S-BLE and hom ˛-BLE. O of type hom BLE, hom S-BLE and hom ˛-BLE .xiv/ MSEfg
6-3 Continuous Networks
345
O W D Ef.O /.O /0 g MSEfg O D O Efg O C .Efg O / Ef.O /.O /0 g D O O 0g Ef.O Efg/.. O Efg/ O /.Efg O /0 C.Efg O D Dfg O C ˇˇ 0 : MSEfg O of . O Secondly At first we have defined the Mean Square Estimation Error MSEfg we have decomposed the difference O into the two terms O • O Efg O • Efg O namely in order to derive thirdly the decomposition of MSEfg, O namely Dfg O • The dispersion matrix of , • The quadratic bias ˇˇ 0 . O the dispersion matrix Dfg O and the bias vector As soon as we substitute MSEfg ˇ of various estimators O of the type hom BLE, hom S-BLE and hom ˛-BLE we are directly led to various representations ˇ of the Mean Square Estimation Error O MSEfg. Here is our proof’s end.
6-3 Continuous Networks Criterion matrices/ideal variance-covariance matrices are constructed from continuous networks being observed by signal derivatives of first and second order. The method of constrained least-squares leads to differential equations up to fourth order, to characteristic boundary values and constraints according to the datum choice. Here only one-dimensional networks on a line and on a circle are discussed. Their characteristic (modified) Green function is constructed and used for the computation of the variance-covariance function of adjusted mean signal functions. Finally numerical aspects originating from the discrete nature of real observational series are discussed. In detail, the transformation of a criterion matrix into a network datum and its comparison with the variance-covariance matrix of an ideally configurated network is presented. The aim of a study of continuous geodetic network is to find idealized variancecovariance matrices, also called criterion matrices, for various types of geodetic networks. From standard geodesy textbooks, e.g. W. Jordan, O. Eggert and M. Kneissl (1956 p. 123) it is well known for instance, that the error propagation on a leveling
346
6 The Third Problem of Probabilistic Regression
line is proportional to its length or in a two-dimensional leveling or distance network proportional to the logarithm of the distance between points. Here we concentrated on one-dimensional geodetic networks, especially on variance-covariance matrices of first and second order derivative observations on a line and on a circle. Higher dimensional continuous networks are introduced by B. Benciolini in the following contribution. For the related Fourier analysis of geodetic networks we refer to the contribution of H. S¨unkel. In addition, more extensive studies on continuous geodetic networks are referenced to K. Borre and T. Kraup (1974), K Borre (1977,1978, 1979i-iii, 1980), E. Grafarend (1977) and E. Grafarend et al (1979). At first we shall review continuous network of first derivative type, namely on a line and on a circle, by means of the least-squares variational principle and (modified) Green functions. Secondly, we give a similar analysis of one-dimensional geodetic networks of second derivative type. Thirdly we summarize the network analysis based on boundary value problems and (modified) Green functions and introduce discrete approximation techniques. In addition we discuss Taylor Karman structured criterion matrices in the context of ideally configurated discrete networks.
6-31 Continuous Networks of Second Derivatives Type This treatment is based on the contribution of E. Grafarend and F. Krumm (1985). Green’s function approach can be studied more deeply from reading R.P. Kanwal (1971) under the title “Linear Integral Equations, Theory and Technique”, in particular “integration by parts.” At first we discuss one-dimensional networks assuming second derivatives of signal functions to be observed. For network on a line we compute the characteristic differential equations, boundary values and constraints by the method of leastsquares constrained by the datum choice of fix and free type. The solutions are constructed by direct integration leading to the characteristic (modified) Green function. The important result that up to a variance factor the variance-covariance function of the adjusted signal function coincides with the (modified) Green function is proven. Two exercises extend the problems. 2 Assume an observational series s00i .x/ D ddx2 si .x/, i D 1.1/n, 0 x 1, of second derivatives of signal functions si .x/ on a line n is the total number of measurement series. As an example think of acceleration measurements on a line. The random function s00i .x/ will be characterized by the first two moments, the mean value and the variance-covariance matrix, namely Efs00i .x/g D ei 00 .x/
8i D 1.1/n
EfŒs00i .x/ Efsi00 .x/gŒsj00 .y/ Efsj00 .y/gg D † ij .x,y/ D ıij ı.x,y/ 2
8i; j D 1.1/n
(6.119) (6.120)
6-3 Continuous Networks
347
where ei is the vector of ones, ıij the Krockener or unit matrix, ı.x; y/ the Dirac or unit function, .x/ the unknown mean value function of continuity class C 4 Œ0; 1 and 2 the unknown variance of the observational series. Thus we have assumed no correlation between observations within a series and no correlation of observations between series. As an example consider a one-dimensional network of a second derivatives of R1 R1 signal functions si .x/ where we choose 0 .x/d x D 0 and 0 0 .x/d x D 0 as the network datum of free type. Then the constrained least-squares variational principle is used to determine the mean value function .x/ of continuity class C 4 Œ0; 1: 1 L.; 1 ; 2 / D 2 D
1 2
Z1
"00i .x/"00i .x/d x
Z1 C 1
0
Z1
0 .x/d x
0
Œs00i .x/ ei 00 .x/Œs00i .x/ ei 00 .x/d x
(6.121)
0
C 1
Z1 .x/d x C 2
0
Z1
.x/d x C 2 0
Z1
ıL.; 1 ; 2 / D
Z1
0 .x/d x D min ;1 ;2
0
Œei s00i .x/ C nO 00 .x/ı00 .x/d x C 1
0
Z1 ı.x/d x 0
Z1 C 2
ı0 .x/d x D 0
(6.122)
0 2
Z1
ı L.; 1 ; 2 / D
Z1
002
nı .x/d x C 1 0
ı2 .x/d x
0
Z1 C 2
Z1
02
ı .x/d x C 0 00
Œei s00i .x/
0 2
00
CnO .x/ı .x/d x “integration by parts”: (IBPB): ıL.; 1 ; 2 / D Œei s00i .1/ C nO 00 .1/ı0 .1/
(6.123)
348
6 The Third Problem of Probabilistic Regression
Œei s00i .0/ C nO 00 .0/ı0 .0/ O 000 .1/ 2 ı.1/ Œei s000 i .1/ C n C Œei s000 i .0/
(6.124)
000
C nO .0/ 2 ı.0/
Z1 Œei si .x/ C n.x/ C 1 ı.x/d x D 0
C 0
8 ı.x/; ı0 .0/; ı0 .1/ Boundary Value Problem (i)
differential equation O I V .x/ D
(ii)
d 2 00 1 sO .x/ O 1 ; O 1 WD 1 d x2 n
boundary values d 00 1 sO .0/ C O 2 ; O 2 WD 2 dx n d 00 000 sO .0/ C O 2 O .1/ D dx 000
O .0/ D
00
00
00
00
(6.126) (6.127)
O .0/ D sO .0/
(6.128)
O .1/ D sO .1/ (iii)
(6.125)
(6.129)
constraints Z1
Z1 .x/d x D 0
0
0 .x/d x D 0
(6.130)
0
Integration • 1st step Zx
IV
O
h
000
.x1 /d x1 D O .x1 /
ix
0 000 000 ) O .x/ D O .0/ O 1 x C
0
Zx 0
D O 1 x C
d x1 sOI V .x1 /
Zx
d x1 sOI V .x1 /
(6.131)
0
(6.132)
6-3 Continuous Networks
349
• 2nd step Rx 0
h 00 ix 000 000 O .x2 /d x2 D O .x2 / D sO .0/x C O 2 x 12 O 1 x2 0
(6.133)
Rx Rx2 C d x2 d x1 sOI V .x1 / 0
0
000 ) O .x/ D sO .0/ C sO .0/x C O 2 x C 12 O 1 x2 00
00
Rx Rx2 C d x2 d x1 sOI V .x1 / 0
(6.134)
0
• 3rd step Zx 0
h 0 ix 1 000 1 00 00 O .x3 /d x3 D O .x3 / D sO .0/x C sO .0/x2 C O 2 x2 0 2 2
1 O 1 x3 C 6
Zx
Zx3 d x3
0
Zx2 d x2
0
d x1 sOI V .x1 /
(6.135)
0
) 1 000 1 1 0 0 00 O .x/ D O .x/ C sO .0/x C sO .0/x2 C O 2 x2 O 1 x3 2 2 6 Zx Zx3 Zx2 C d x3 d x2 d x1 sOI V .x1 / 0
0
(6.136)
0
• 4th step Zx 0
x 1 00 1 0 0 O .x4 /dx4 D .x O 4 / 0 D O .0/x C sO .0/x2 C sO000 .0/x3 2 6
1 1 C O 2 x3 O 1 x4 C 6 24
Zx
Zx4 d x4
0
Zx3 d x3
0
Zx2 d x2
0
d x1 sOI V .x1 /
(6.137)
0
) 1 00 1 000 1 0 .x/ O D .0/ O C O .0/x C sO .0/x2 C sO .0/x3 C O 2 x3 2 6 6 x x x x Z Z4 Z3 Z2 1 O 1 x4 C d x4 d x3 dx2 d x1 sOI V .x1 / 24 0
0
0
0
(6.138)
350
6 The Third Problem of Probabilistic Regression
000
O .1/ D sO .1/ C O 2 D sO .0/ C O 2 O 1 C 000
000
Z1
d x1 sOI V .x1 /
(6.139)
0
Z1
000 000 O 1 D sO .0/ sO .1/ C
d x1 sOI V .x1 /
(6.140)
0
1 00 00 00 000 O .1/ D sO .1/ D sO .0/ C sO .0/ C O 2 O 1 2 Z1 Zx2 C d x2 d x1 sOI V .x1 / 0
0
) 1 00 00 000 O 2 D sO .1/ sO .0/ sO .0/ C O 1 2
Z1
Zx2 d x2
0
Zx2 d x2
0
Z1
(6.141)
0
1 000 1 000 1 00 00 O 2 D sO .1/ sO .0/ sO .0/ sO .1/ C 2 2 2 Z1
d x1 sOI V .x1 /
Z1
d x1 sOI V .x1 /
0
d x1 sOI V .x1 /
(6.142)
0
1 0 1 00 1 000 1 .x/d O x D .0/ O C O .0/ C sO .0/ C sO .0/ C O 2 2 6 24 24
0
1 O 1 C 120
Z1
Zx5 d x5
Zx4 d x4
0
Zx3 d x3
0
0
Zx2 d x2
0
d x1 sOI V .x1 / D 0
(6.143)
0
) 1 00 1 000 1 1 sO .0/ C sO .0/ C O 2 O 1 12 24 24 80 Zx Zx4 Zx3 Zx2 1 C d x4 d x3 d x2 d x1 sOI V .x1 / 2 .0/ O D
0
Z1
0
Zx5 d x5
0
0
Zx4 d x4
0
0
Zx3 d x3
0
Zx2 d x2
0
0
d x1 sOI V .x1 /
(6.144)
6-3 Continuous Networks
351
1 00 1 000 1 1 sO .0/ sO .0/ O 2 C O 1 2 6 6 24 Z1 Zx4 Zx3 Zx2 d x4 d x3 d x2 d x1 sOI V .x1 / 0
O .0/ D
0
0
0
(6.145)
0
Following R.P. Kanwal (1971 p. 285) by means of the Heaviside function H(x-y) we can rewrite Z1
Zx2 d x2
Z1 d x1 f .x1 / D
0
0
Z1
Zx3 d x3
Zx2
0
0
0
Z1
Zx4
Zx3
0
d x3 0
1 D 6
Z1
1 d x1 f .x1 / D 2 Zx2 d x2
0
0
Z1
.1 x1 /2 f .x1 /d x1
(6.147)
0
1 d x1 f .x1 / D 6
Zx
.x x1 /3 f .x 1/d x1
0
.x x1 /3 H.x x1 /f .x1 /d x1
(6.148)
0
Z1
Zx5 d x5
0
(6.146)
0
d x2
d x4
.1 x1 /f .x1 /d x1
Zx4 d x4
0
Zx3 d x3
0
Zx2 d x2
0
0
1 d x1 f .x1 / D 24
Z1
.x x1 /4 f .x1 /d x1
0
(6.149) such that
00
00
000
.x/ O D G0 .x; 0/Os .0/ C G0 .x; 1/Os .1/ G.x; 1/Os .1/ 000
Z1
C G.x; 0/Os .0/ C
G.x; y/OsI V .y/d y
0
where the modified Green function is defined by
352
6 The Third Problem of Probabilistic Regression
G.x; y/ D
1 1 1 1 .x C y/ .x3 y3 / .x4 C y4 / 120 24 12 24 Cx
1 1 1 1 1 y y2 C y3 C x3 y C .x y/3 H.x y/ 3 2 6 6 6
1 1 1 1 1 x C x2 x3 C x4 G.x; x/ D 120 12 3 2 4 1 .1 5y 10y3 5y4 5x C 40xy x > y W G.x; y/ D 120
3
7 7 7 7 7 7 7 3 3 2 3 4 7 C 20xy C 10x 60x y C 20x y 5x / 7 7 7 7 1 .1 5y C 10y3 5y4 5x C 40xy 7 x < y W G.x; y/ D 7 120 7 7 60xy2 C 20xy3 10x3 C 20x3 y 5x4 / 7 7 7 7 1 1 3 1 1 4 7 xC x x G.x; 0/ D 7 120 24 12 24 7 7 7 1 1 3 1 1 4 7 xC x x G.x; 1/ D 7 120 24 12 24 7 7 1 1 2 1 3 1 0 7 G .x; 0/ D C x x C x 7 24 3 2 6 7 7 7 1 1 1 0 7 x C x3 G .x; 1/ D 7 24 6 6 7 7 1 2 1 3 1 2 1 1 0 7 G .x; y/ D C y y C x. y C y / 7 7 24 4 6 3 2 7 7 1 3 1 2 7 C x .x y/ H.x y/ 7 6 2 7 7 1 2 1 00 7 G .x; y/ D y y x C xy C .x y/H.x y/ 7 7 2 2 7 7 1 000 7 G .x; y/ D y C x H.x y/ 7 2 5 GI V .x; y/ D 1 C ı.x y/
(6.150)
and plotted in Figs. 6.2 and 6.3 The result can be rewritten in a simpler form by applying integration by part.
6-3 Continuous Networks
Fig. 6.2 G.x/ D
1 120
1 x 12
353
C 13 x2 12 x3 C 14 x4
1 1 1 1 Fig. 6.3 G.x; y/ D 120 24 .x C y/ 12 .x3 y3 / 24 .x4 Cy4 /C 16 .x3 yCxy3 /Cx. 13 y 12 y2 /C 1 3 .x y/ H.x y/ 6
354
R1 0
6 The Third Problem of Probabilistic Regression
h i1 R1 2 00 00 d 00 O d yG.x; y/ ddy2 sO .y/ D G.x; y/ ddy sO .y/ @G @y .x; y/ d y s .y/ 0 0 h i h i1 1 00 00 D G.x; y/ ddy sO .y/ @G .x; y/O s .y/ @y 0 0 R1 2 00 @ G C d y @y2 .x; y/Os .y/ 0
000
000
D G.x; 1/Os .1/ G.x; 0/Os .0/ 0
00
0
00
G .x; 1/Os .1/ C G .x; 0/Os .0/ C
R1 0
2
00
d y @@yG2 .x; y/Os .y/ (6.151)
such that .x/ O D
R1 0
00 @2 G .x; y/Os .y/d y @y2
Now once again we base the error propagation study on ... : In general Z s n Z X @2 G 1 X @2 G † O .x1 ; x2 / D 2 d y1 d y2 2 .x1 ; y1 / 2 .x2 ; y2 / .y1 ; y2 / (6.152) n i;j D1 @y1 @y2 ij 1
1
0
00
0
is the transformation of the variance-covariance function of s00i .x/ into the variancecovariance function of .x/. O Due to the dispersion matrix model. 00
s X .y1 ; y2 / D ıij ı.y1 ; y2 /s200
(6.153)
ij
can be transformed into † O .x1 ; x2 / D n1 s200
R1 0
2 @2 G .x1 ; y/ @@yG2 .x2 ; y/ @y2
Again we integrated by parts: Z1 0
ˇ1 ˇ ˇ 00 00 00 0 d yG .x1 ; y/G .x2 ; y/ D ŒG .x1 ; y/; G .x2 ; y/ˇˇ ˇ 0
Z1 0
ˇ1 ˇ ˇ d yG .x1 ; y/G .x2 ; y/ D ŒG .x1 ; y/; G .x2 ; y/ˇˇ ˇ 000
0
00
0
0
ˇ1 000 ˇ ŒG .x1 ; y/G.x2 ; y/ˇ C 0
Z1 dy 0
@4 G .x1 ; y/G.x2 ; y/ @y4
(6.154)
6-3 Continuous Networks
355 00
00
000
Since GI V .x; y/ D 1 C ı.x y/, G .x; 1/ D 0, G .x; 0/ D 0, G .x; 0/ D G .x; 1/, G.x; 0/ D G.x; 1/ hold, takes the simple form 000
Z1 d yG
IV
Z1 .x1 ; y/G.x2 ; y/ D
0
G.x2 ; y/d y C G.x1 ; x2 /
(6.155)
0
A lengthy computation proves Z1
Zx2 d yG.x2; y/ D
Z1 d yG.x2 ; y/ C
0
0
d yG.x2 ; y/ D 0
(6.156)
x2
† O .x1 ; x2 / D n1 s200 G.x1 ; x2 / 2O .x/
1 D s200 n
1 1 1 1 1 x C x2 x3 C x4 120 12 3 2 4
1 2 1 1 1 ˙O .x1 ; x2 / D s00 .x1 C x2 / .x31 C x32 / n 120 24 12 1 4 1 1 1 x1 C x42 C x1 x2 x22 C x32 24 3 2 6 1 1 C x31 x2 C .x1 x2 /3 H.x1 x2 / 6 6
(6.157)
(6.158)
Again we have to discuss the contradictory boundary value problem. What are the condition of existence, especially with respect to the observation that the unobserved 000 000 derivatives sO .0/, sO .1/ appear in ... Thus let us compute the general solution of the boundary value problem of the homogenous differential equation, namely .i / O I1 V .x/ D 0 000 000 .ii/ O 1 .0/ D sO .0/ C O 2 00
00
000 000 I O 1 .1/ D sO .1/ C O 2 00
O 1 .0/ D sO .0/ Z1
Z1 .x/d O xD0
.iii/
00
I O 1 .1/ D sO .1/ I
0
0
O .x/d x D 0 0
The fundamental solution O 1 .x/ D A C Bx C Cx2 C Dx3 leads to
356
6 The Third Problem of Probabilistic Regression 000
000
000
O 1 .x/ D 6D ) O 1 .1/ D O 1 .0/ D 6D/ ) 000
000
sO .1/ D sO .0/
Z1
(6.159)
sOI V .x/d x D 0
(6.160)
0
and, in addition, 00 O 1 .1/ D 2C C 6D ) O 1 .x/ D 2C C 6D ) 00 O 1 .0/ D 2C
00
00
00
00
00
6D D O 1 .1/ O 1 .0/ D sO .1/ sO .0/
00
00
000
000
sO .1/ sO .0/ D sO .1/ D sO .0/ Z1
000
000
(6.161)
000
sO .x/d x D sO .1/ D sO .0/
(6.162)
0
has therefore to be transformed into 0
00
.x/ O D ŒG .x; 0/ C G.x; 1/ G.x; 0/Os .0/ 0
00
C ŒG .x; 1/ G.x; 1/ C G.x; 0/Os .1/ R1 C G.x; y/sI V .y/d y
(6.163)
0
Exercise Where are the minima and where is the maximum of the variance function 2O .x/ located? 2 3 1 1 d 2 O .x/ D s200 . C x x 2 C x 3 / D 0 dx n 12 3 2
(6.164)
, xO 1 D 0:211; 324; 865 I xO 2 D 0:5 I xO 3 D 0:788; 675; 135
(6.165)
6-3 Continuous Networks
357
Table 6.1 Summary of (modified) Green functions Type of continuous network
(modified) Green function
1st order derivative, “fixed”
x .x y/H.x y/
2nd order derivative, “free” 1st order derivative, “circle”, “free” 2nd order derivative, “free”
1 x2 C y2 yC .x y/H.x y/ 3 2 1 2 2 y C . C y2 / . x y /H. x y / 3 4 x 1 4 1 1 1 3 C .x C y/ .x y3 / .x y4 / C 120 24 24 12 1 1 1 1 1 x y y2 C y3 C x3 y C .x y/3 H.x y/ 3 2 6 6 6
d2 2 1 2 O .x/ D s200 . 3x C 3x2 / 2 dx n 3
(6.166)
d2 2 1 .Ox1 / > 0 d x2 O 6
(6.167)
(minimum!)
d2 2 1 <0 O .Ox2 / 2 dx 12 d2 2 1 O .Ox3 / > 0 2 dx 6
(maximum!)
(minimum!)
(6.168) (6.169)
Finally note Table 6.1 where all (modified) Green functions we have derived so far are collected.
6-32 Discrete Versus Continuous Geodetic Networks K. Borre and T. Krarup (1974) developed the general theory using the method of finite elements to perform the transition from the actual discrete network to a twodimensional continuum. Of course, we can consider the real network situation from another point of view: Once the characteristic continuous boundary value problem is formulated for the observed signal derivative function and solved by the method of (modified) Green functions we can interpret discrete measurement series as approximations of the continuous approach. (Compare the geodetic boundary value problem in physical geodesy, namely the Stokes integral, which can be discretized by means of finite elements, splines.) In using the concept of approximation to the solution of the continuous boundary value problem we avoid the inversion of the discrete normal equations as the following example will demonstrate.
358
6 The Third Problem of Probabilistic Regression
Example Consider the solution of the adjustment of the free, first derivative network on a finite line. The modified Green function is given by Table 6.1. Now assume the Dirac series of discrete first derivative data of type s0i .y/
D
p1 X
ı.y; y˛ /s0i .y˛ /
(6.170)
˛D1
where p is the number of points where the data are given. ı.y; y˛ / is the chosen base function of minimum local support. We are led to the solution Z1 .x/ O D 0
p1 X @G 0 .x; y/Os0 .y/d y D G .x; y˛ /s0i .y˛ / @y ˛D1
(6.171)
The error propagation study based on the modified Green function ( ...) and the dispersion matrix model 0
† sij .y˛ ; yˇ / D ıij ı˛;ˇ s20
(6.172)
leads to the variance-covariance function of .x/: O p1 1 2X 0 0 † O .x1 ; x2 / D s0 G .x1 ; y˛ /G .x2 ; y˛ / n ˛D1
D
p1 1 2X s0 Œ1 C y˛ C H.x1 y˛ /Œ1 C y˛ C H.x2 y˛ / n ˛D1
(6.173)
(6.174)
For more elaborate treatment of discrete networks and their finite element continuation we refer to the literature. We finally discuss Taylor-Karman structured networks, namely for didactic reasons on a line. Example Consider an absolute signal network whose variance -covariance function is assumed to be homogeneous: † O .˛; ˇ/ D
f .0/ D const 8˛ D ˇ; ˛ D 1.1/p; ˇ D 1.1/p f .j˛ ˇj/ 8˛ ¤ ˇ;
(6.175)
6-3 Continuous Networks
359
p again is the number of discrete points equally distant on a line. We are interested to transform the ideal variance-covariance function into the one with respect to a network datum is fixed or free type. Assume an inconsistent linear observational equation y D Ax D A1 x1 C A2 x2 whose design matrix A and unknown vector x is partitioned according to the rank of the matrix A. The method of least-squares leads to the normal equations AT1 A1 xO 1 C AT1 A2 xO 2 D AT1 y (6.176) xO 2 D 0 where xO 2 D 0 is the fixed datum choice. xO 1 D .AT1 A1 /1 AT1 y xO 2 D 0 T T .A1 A1 /1 AT1 .A1 A1 /1 AT1 xO 1 x1 WD yD AxII D xO 2 0 0 T 1 .A1 A1 /1 AT1 †I D A† II AT ŒA1 .AT1 A1 / ; 0 0
(6.177) (6.178) (6.179)
† I will be identify with the variance-covariance matrix of the network in the fixed datum. † II with the one of the homogeneous type † O .˛; ˇ/. For a discrete network of observed signal differences the .p 1/ p design matrix A is given by 2 1 8i D j i D 1.1/p 1 (6.180) A D .aij / D 4 1 8j D i C 1 0 otherwise Let us choose the point p as the fix-point. The rank partitioning of the design matrix leads to 2 1 8i D j i D 1.1/p 1 4 A1 D .aij /1 D (6.181) 1 8j D i C 1 i D 1.1/p 2 0 otherwise 0 8i D 1.1/p 2 A2 D .aij /2 D j Dp (6.182) 1 8i D p 1 The .p 1/ .p 1/ normal equation matrix AT1 A1 is structured according 2
1 6 2 6 6 .AT1 A1 /ij D 6 1 6 4 0
8i D j D 1 8i D j ¤ 1 i D 2.1/p 1 8j D i C 1 i D 1.1/p 2 i D j C 1 j D 1.1/p 2 otherwise
360
6 The Third Problem of Probabilistic Regression
Thus the inverse .AT1 A1 /1 enjoys the simple structure 2 .AT1 A1 /1 ij
D4
Œ.AT1 A1 /1 A1 ij D
p i 8i D j; i D 1.1/p 1 i > j; j D 1.1/p 1 p j j > i:
1 j i; i; j D 1.1/p 1 0 otherwise
A† O AT D 2f .jj i j/ f .ji C 1 j j/ f .ji j 1j/
f.AT1 A1 /1 AT1 A˙O AT gij D f .ji j j/ f .jj p C 1j/ C f .ji j 1j/ C f .jp j j/ C f .ji j 1j/8i; j D 1.1/p 1
˙I .˛; ˇ/ D f .0/ f .j˛ pj/ f .jˇ pj/ C f .j˛ ˇj/
(6.183)
Note that the point p was the datum point. In order to determine the unknown 1 1 function f .˛; ˇ/ D f .j˛ ˇj/ we compare .ATA and † I .˛; ˇ/ 8˛; ˇ D 1.1/ 1 / p 1. It is easily seen that f .j˛ ˇj/ D C 12 j˛ ˇj 8˛; ˇ D 1.1/p 1; C RC f .0/ D C
(6.184)
or equivalently 1 ˙O .˛; ˇ/ D f .0/ j˛ ˇj 2
(6.185)
Obviously to the ideally configured discrete network on a line there corresponds an ideal variance-covariance function of homogeneous type. The result can be generalized into homogeneous and isotropic networks of more than one dimension.
Chapter 7
Overdetermined System of Nonlinear Equations on Curved Manifolds The spherical problem of algebraic regression – inconsistent system of directional observational equations Here we review a special issue of distributions on manifolds, in particular the spherical problem of algebraic regression or analyse the inconsistent of directional observational equations. The first section introduces loss functions on longitudinal data (ˆ D 1) and (p D 2) on the circle or on the sphere as a differential manifold of dimension ˆ D 1 and p D 2. Section 6.2 introduces the minimal distance mapping on S 1 and S 2 and constructs the related normal equations. Section 6.3 reviews the transformation from the circular normal distribution to an oblique normal distribution including a historical note to von Mises analyzing data on a circle, namely atomic weights. We conclude with note on the “angular metric.” As a case study in section four we analysze 3D angular observations with two different theodolites, namely Theodolite I and Theodolite II. The main practical result is the set of data from (tan ^n ; tan ˆn ), its solution (^n ; ˆn ) is very different from the Least Squares Solution. Finally we discuss the von Mises-Fisher distribution for dimension 1, 2, 3 and 4, namely Relativity and Earth Sciences (for instance phase observations for the Global Positioning System (GPS)), Biology, Meteorology, Psychology, Image Analysis and Astronomy. A detailed reference list stays at the end. Read only Sect. 7-2 for a careful treatment(HAPS).
“Least squares regression is not appropriate when the response variable is circular, and can lead to erroneous results. The reason for this is that the squared difference is not an appropriate measure of distance on the circle.” – U. Lund (1999)
A typical example of a nonlinear model is the inconsistent system of nonlinear observational equations generated by directional measurements (angular observations, longitudinal data). Here the observation space Y as well as the parameter space X is the hypersphere Sp RpC1 : the von Mises circle S1 ; p D 2 the Fisher sphere S2 in general the Langevin sphere Sp . For instance, assume repeated measurements of horizontal directions to one target which are distributed as polar coordinates on a unit circle clustered around a central direction. Alternatively, assume repeated measurements of horizontal and vertical directions to one target which are similarly distributed as spherical coordinates (longitude, latitude) on a unit sphere clustered around a central direction. By means of a properly chosen loss function we aim at a determination of the central direction. Let us connect all points E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 7, © Springer-Verlag Berlin Heidelberg 2012
361
362
7 Overdetermined System of Nonlinear Equations on Curved Manifolds
Lemma 7.2 minimum geodesic distance: Definition 7.1 minimum geodesic distance: Definition 7.4 minimum geodesic distance:
1
2
Lemma 7.3 minimum geodesic distance: Lemma 7.5 minimum geodesic distance: Lemma 7.6 minimum geodesic distance:
1
1
2
2
on S1 , S2 or in general Sp the measurement points, by a geodesic, here the great circle, to the point of the central direction. Indeed the loss function will be optimal at a point on S1 , S2 or in general Sp called the central point. The result for such a minimum geodesic distance mapping will be presented.
7-1 Introduction Directional data, also called “longitudinal data” or “angular data”, arise in several situations, notable geodesy, geophysics, geology, oceanography, atmospheric science, meteorology and others. The von Mises or circular normal distribution CN .; / with mean direction parameter .0 2/ and concentration parameter . > 0/, the reciprocal of a dispersion measure, plays the role in circular data parallel to that of the Gauss normal distribution in linear data. A natural extension of the CN distribution to the distribution on a p-dimensional sphere SP RP C1 leads to the Fisher-von Mises or Langevin distribution L.; /. For p D 2, namely for spherical data (spherical longitude, spherical latitude), this distribution has been studied by R. A. Fisher (1953), generalizing the result of R. von Mises (1918) for p D 1, and is often quoted as the Fisher distribution. Further details can be taken from K. V. Mardia (1972), K. V. Mardia and P.E. Jupp (2000), G. S. Watson (1986, 1998) and A. Sen Gupta and R. Maitra (1998). Box 7.1. (Fisher-von Mises or Langevin distribution): p D 1 .R: von Mises 1918/ f .j; / D Œ2I0 ./1 expŒ cos. /
(7.1)
7-1 Introduction
363
f .j; / D Œ2I0 ./1 exp < jX >
(7.2)
cos WD
(7.3)
WD< jX >D x X C y Y D cos cos C sin sin cos D cos. /
(7.4)
D e1 cos C e2 sin 2 S1
(7.5)
X D e1 cos C e2 sin 2 S1
(7.6)
p D 2 .R: A: Fisher 1953/ f .; ˚j ; ˚ ; / D
expŒcos ˚ cos ˚ cos. / C sin ˚ sin ˚ 4 sinh exp < jX > D 4 sinh cos WD< jX >D x X C y Y C z Z
(7.7)
(7.8)
D cos ˚ cos cos ˚ cos C cos ˚ sin cos ˚ sin C sin ˚ sin ˚ cos D cos ˚ cos ˚ cos. / C sin ˚ sin ˚ D e1 x C e2 y C e3 z D e1 cos ˚ cos C e2 cos ˚ sin C e3 sin ˚ 2 S2
(7.9)
X D e1 X C e2 Y C e3 Z D e1 cos ˚ cos C e2 cos ˚ sin C e3 sin ˚ 2 S 2 :
(7.10)
Box 7.1 is a review of the Fisher-von Mises or Langevin distribution. First, we setup the circular normal distribution on S1 with longitude as the stochastic variable and . ; / the distributional parameters called “mean direction ” and “concentration measure”, the reciprocal of a dispersion measure. Due to the normalization of the circular probability density function (“pdf”) I0 ./ as the zero order modified Bessel function of the first kind of appears. The circular distance between the circular mean vector 2 S1 and the placement vector X 2 S1 is measured by “cos ”, namely the inner product < jX >, both and X represented in polar coordinates . ; /, respectively. In summary, (7.1) is the circular normal pdf, namely an element of the exponential class. Second, we refer to the spherical normal pdf on S2 with spherical longitude , spherical latitude ˚ as the stochastic variables and . ; ˚ ; / the distributional parameters called “longitudinal mean direction, lateral mean direction . ; ˚ /” and “concentration measure ”, the reciprocal of a dispersion measure. Here the normalization factor of the spherical pdf is =.4 sinh /. The spherical distance between the spherical mean vector 2 S2 and the placement vector X 2 S2 is measured by “cos ”, namely the inner product < jX >, both and X represented in polar coordinates – spherical coordinates . ; ˚ ; ; ˚/, respectively.
364
7 Overdetermined System of Nonlinear Equations on Curved Manifolds
In summary, (7.7) is the spherical normal pdf, namely an element of the exponential class. Box 7.2. (Loss function): p D 1: longitudinal data type 1 W
n X
cos i D max 10 cos ‰ D max
(7.11)
i D1
type 2 W
n X
.1 cos i / D min 10 .1 cos ‰/ D min
i D1
type 3 W
n X
sin2 i =2 D min
i D1
‰ ‰ 0 D min sin sin 2 2
(7.12)
(7.13)
transformation 1 cos D 2 sin2 =2
(7.14)
cosi D cos.i x/ D cos i cos x C sin i sin x
(7.15)
2 sin2 i =2 D 1 cos i D 1 cos i cos x C sin i sin x 2 3 2 3 3 2 y1 cos 1 cos 1 6 7 6 7 7 6 cos ‰ D 4 ::: 5 ; cos y WD cos ƒ WD 4 ::: 5 D 4 ::: 5 cos n
yn
(7.16) (7.17)
cos n
cos ‰ D cos y cos x C sin y sin x:
(7.18)
How to generate a loss function substituting least squares ? Obviously the von Mises pdf .p D 1/ has maximum likelihood if ˙inD1 cos i D ˙inD1 cos.i x/ is maximal. Equivalently ˙inD1 .1 cos i / is minimal. By transforming 1 cos i by (7.14) into 2 sin2 =2, an equivalent loss function is ˙inD1 sin2 =2 to be postulated minimal. According to Box 7.2 the geodetic distance is represented as a nonlinear function of the unknown mean direction x. (7.17) constitutes the observation vector y 2 S1 . Similarly the Fisher pdf .p 2/ has maximum likelihood if ˙inD1 cos i is maximal. Equivalent postulates are (7.20) ˙inD1 .1 cos i / D min and (7.21) ˙inD1 sin2 i =2 D min. According to Box 7.3 the geodetic distance (7.23) is represented as a nonlinear function of the unknown mean direction . ; ˚ / .x1 ; x2 /. (7.24), (7.25) constitute the nonlinear observational equations for direct observations of type “longitude latitude” .i ; ˚i / 2 S2 , the observation space Y, and unknown parameters of type mean longitudinal, lateral direction . ; ˚ / 2 S2 , the parameter space X.
7-2 Minimal Geodesic Distance: MINGEODISC
365
Box 7.3. (Loss function): p D 2: longitudinal data type 1 W
n X
10 cos ‰ D max
cos i D max
(7.19)
i D1
type 2 W type 3 W
n X i D1 n X
.1 cos i / D min sin2 i =2 D min
10 .1 cos ‰/ D min
(7.20)
‰ 0 ‰ / .sin / D min 2 2
(7.21)
.sin
i D1
transformation 1 cos D 2 sin2 =2
(7.22)
cosi D cos ˚i cos x2 cos.i x1 / C sin ˚i sin x2 (7.23) D cos ˚i cos i cos x1 cos x2 C cos ˚i sin i sin x1 cos x2 C sin ˚i sin x2 2 3 3 2 cos 1 cos 1 6 7 7 6 cos ‰ WD 4 ::: 5 ; cos y1 WD cos ƒ WD 4 ::: 5 (7.24) cos n cos n 3 2 cos ˚1 7 6 cos y2 WD cos ˆ WD 4 ::: 5 ; sin y1 ; sin y2 correspondingly cos ˚n 3 3 2 cos ˚1 sin 1 cos ˚1 cos 1 7 7 6 6 :: :: cos ‰ D 4 5 cos x1 cos x2 C 4 5 sin x1 cos x2 : : 2
cos ˚n cos n 3 2 sin ˚1 7 6 C 4 ::: 5 sin x2 :
cos ˚n sin n (7.25)
sin ˚n
7-2 Minimal Geodesic Distance: MINGEODISC By means of Definition 7.1 we define the minimal geodetic distance solution (MINGEODISC) on S2 . Lemma 7.2 presents you the corresponding nonlinear normal equation whose close form solution is explicitly given by Lemma 7.5 in terms of
366
7 Overdetermined System of Nonlinear Equations on Curved Manifolds
Gauss brackets (special summation symbols). In contrast Definition 7.4 confronts us with the definition of the minimal geodetic distance solution (MINGEODISC) on S2 . Lemma 7.7 relates to the corresponding nonlinear normal equations which are solved in a closed form via Lemma 7.8, again taking advantage of the Gauss brackets. Definition 7.1. (minimum geodesic distance: S1 ): A point g 2 S1 is called at minimum geodesic distance to other points i 2 S1 ; i 2 f1; ; ng if the circular distance function L.g / WD
n X
2.1 cos i / D
n X
i D1
2Œ1 cos.i g / D min g
i D1
(7.26)
is minimal. ( g D arg
n X
) 2.1 cos i / D min j cos i D cos.i g /
(7.27)
i D1
Lemma 7.2. (minimum geodesic distance, normal equation: S1 ): A point g 2 S1 is at minimum geodesic distance to other points i 2 S1 ; i 2 f1; ; ng if g D x fulfils the normal equation sin x
n X
! C cos x
cos i
i D1
n X
! sin i
D 0:
(7.28)
i D1
Proof. g is generated by means of the Lagrangean (loss function) L.x/ WD
n X
2Œ1 cos.i x/
i D1
D 2n 2 cos x
n X
cos i 2 sin x
i D1
n X
sin i D min : x
i D1
The first derivatives n
n
X X d L.x/ .g / D 2 sin g cos i 2 cos g sin i D 0 dx i D1 i D1
7-2 Minimal Geodesic Distance: MINGEODISC
367
constitute the necessary conditions. The second derivative n
n
X X d 2 L.x/ . / D 2 cos cos C 2 sin sin i > 0 g g i g dx 2 i D1 i D1 builds up the sufficiency condition for the minimum at g . Lemma 7.3. (minimum geodesic distance, solution of the normal equation: S1 ): Let the point g 2 S1 be at minimum geodesic distance to other points i 2 S1 ; i 2 f1; ; ng. Then the corresponding normal equation (7.28) is uniquely solved by tan g D Œsin =Œcos ;
(7.29)
such that the circular solution point is 1 Xg D e1 cos g C e2 sin g D p fe1 Œcos C e2 Œsin g 2 Œsin C Œcos 2 (7.30) with respect to the Gauss brackets !2 n X 2 Œsin WD sin i i D1 2
Œcos WD
n X
(7.31)
!2 cos i
:
(7.32)
i D1
Next we generalize MINGEODISC .p D 1/ on S1 to MINGEODISC .p D 2/ on S1 . Definition 7.4. (minimum geodesic distance: S2 ): A point .g ; ˚g / 2 S2 is called at minimum geodesic distance to other points .g ; ˚g / 2 S2 ; i 2 f1; ; ng if the spherical distance function L.g ; ˚g / WD
n X
2.1 cos i /
(7.33)
i D1
D
n X i D1
is minimal.
2Œ1 cos ˚i cos ˚g cos.i g / sin ˚i sin ˚g D min
g ;˚g
368
7 Overdetermined System of Nonlinear Equations on Curved Manifolds
.g ; ˚g / D arg f
n X
2.1 cos i / D min j
(7.34)
i D1
j cos i D cos ˚i cos ˚g cos.i g / C sin ˚i sin ˚g g: Lemma 7.5. (minimum geodesic distance, normal equation: S2 ): A point .g ; ˚g / 2 S2 is called at minimum geodesic distance to other points .g ; ˚g / 2 S2 ; i 2 f1; ; ng if g D x1 ; ˚g D x2 fulfils the normal equations sin x2 cos x1 C cos x2
n P i D1
cos x2 cos x1
n P i D1
n P
cos ˚i cos i sin x2 sin x1
i D1
cos ˚i sin i (7.35)
sin ˚i D 0
n X
cos ˚i sin i cos x2 sin x1
i D1
n X
cos ˚i cos i D 0:
i D1
Proof. .g ; ˚g / is generated by means of the Lagrangean (loss function) L.x1 ; x2 / W D
n X
2Œ1 cos ˚i cos i cos x1 cos x2
i D1
cos ˚i sin i sin x1 cos x2 sin ˚i sin x2 D 2n 2 cos x1 cos x2
n X
cos ˚i cos i
i D1
2 sin x1 cos x2
n X
cos ˚i sin i 2 sin x2
i D1
n X
sin ˚i :
i D1
The first derivatives n
X @L.x/ .g ; ˚g / D 2 sin g cos ˚g cos ˚i cos i @ x1 i D1 2 cos g cos ˚g
n X
cos ˚i sin i D 0
i D1 n
X @L.x/ .g ; ˚g / D 2 cos g sin ˚g cos ˚i cos i @ x2 i D1
(7.36)
7-2 Minimal Geodesic Distance: MINGEODISC
C2 sin g sin ˚g
n X
369
cos ˚i sin i
i D1
2 cos ˚g
n X
sin ˚i D 0
i D1
constitute the necessary conditions. The matrix of second derivative @2 L.x/ .g ; ˚g / 0 @ x@ x0 builds up the sufficiency condition for the minimum at .g ; ˚g /. @2 L.x/ .g ; ˚g / D 2 cos g cos ˚g Œcos ˚ cos @ x12 C2 sin g cos ˚g Œcos ˚ sin 2
@ L.x/ .g ; ˚g / D 2 sin g sin ˚g Œcos ˚ cos @ x1 x2 C2 cos g sin ˚g Œcos ˚ sin @2 L.x/ .g ; ˚g / D 2 cos g cos ˚g Œcos ˚ cos @ x22 C2 sin g cos ˚g Œcos ˚ sin C sin ˚g Œsin ˚: Lemma 7.6. (minimum geodesic distance, solution of the normal equation: S2 ): Let the point .g ; ˚g / 2 S2 be at minimum geodesic distance to other points .g ; ˚g / 2 S2 ; i 2 f1; ; ng. Then the corresponding normal equations (7.35), (7.36) are uniquely solved by tan g D Œcos ˚ sin =Œcos ˚ cos
(7.37)
Œsin ˚ tan ˚g D p Œcos ˚ cos 2 C Œcos ˚ sin 2 such that the circular solution point is Xg D e1 cos ˚g cos g C e2 cos ˚g sin g C e3 sin ˚g 1 D p 2 Œcos ˚ cos C Œcos ˚ sin 2 C Œsin ˚2 fe1 Œcos ˚ cos 2 C e2 Œcos ˚ sin 2 C e3 Œsin ˚g
(7.38)
370
7 Overdetermined System of Nonlinear Equations on Curved Manifolds
subject to Œcos ˚ cos WD
n X
cos ˚i cos i
(7.39)
cos ˚i sin i
(7.40)
i D1
Œcos ˚ sin WD
n X i D1
Œsin ˚ WD
n X
sin ˚i :
i D1
7-3 Special Models: From the Circular Normal Distribution to the Oblique Normal Distribution First, we present a historical note about the von Mises distribution on the circle. Second, we aim at constructing a two-dimensional generalization of the Fisher circular normal distribution to its elliptic counterpart. We present 5 lemmas of different type. Third, we intend to prove that an angular metric fulfils the four axioms of a metric.
7-31 A Historical Note of the von Mises Distribution Let us begin with a historical note: The von Mises Distribution on the Circle In the early part of the last century, Richard von Mises (1918) considered the table of the atomic weights of elements, seven entries of which are as follows (Table 7.1): He asked the question “Does a typical element in some sense have integer atomic weight ?” A natural interpretation of the question is “Do the fractional parts of the weight cluster near 0 and 1?” The atomic weight W can be identified in a natural way with points on the unit circle, in such a way that equal fractional parts correspond to identical points. This can be done under the mapping cos 1 ; 1 D 2.W ŒW /; W !xD sin 1 e where Œu is the largest integer not greater than u. Von Mises’ question can now be seen to be equivalent to asking “Do this points on the circle cluster near e D Œ1 00 ?”. e Incidentally, the mapping W ! x can be made in another way: e 1 cos 2 W !xD ; 2 D 2.W ŒW C /: sin 2 e 2
7-3 Special Models
371
Table 7.1 Element Atomic Weight W
Al Sb Ar As Ba Be Bi 26.98 121.76 39.93 74.91 137.36 9.01 209.00
Table 7.2 Element Al Sb Ar As Ba Be Bi 1 =2 0.98 0.76 0.93 0.91 0.36 0.01 0.00 2 =2 0.02 0.24 0.06 0.09 0.36 0.01 0.00
Average N1 =2 D 0:566 N2 =2 D 0:006
The two sets of angles for the two mappings are then as follows (Table 7.2): We note from the discrepancy between the averages in the final column that our usual ways of describing data, e.g., means and standard deviations, are likely to fail us when it comes to measurements of direction. If the points do cluster near e1 then the resultant vector ˙jND1 xj .here N D 7/ e xN =jjxjj D e , e we should have approximately should point in that direction, i.e., 1 0 e e e where xN D ˙xj =N and jjxjj D .x x/1=2 is the length of x. For the seven elements e here, ee e whose e weightseare considered we find 0:9617 cos 344:09ı cos 15:91ı xN =jjxjj D D D ; 0:2741 sin 344:09ı sin 15:91ı e e a direction not far removed from e1 . Von Mises then asked “For what distribution of e the unit circle is the unit vector O D Œcos O0 sin O0 0 D xN =jjxjj a maximum likelihood e e estimator (MLE) of a direction 0 of clustering or concentration ?” The answer is the distribution now known as the von Mises or circular normal distribution. It has density, expressed in terms of random angle , expfk cos. 0 /g d ; I0 .k/ 2 where 0 is the direction of concentration and the normalizing constant I0 .k/ is a Bessel function. An alternative expression is exp.kx0 / dS ee ; D I0 .k/ 2 e
"
# cos O0 ; jjxjj D 1: sin O0
Von Mises’ question clearly has to do with the truth of the hypothesis 1 H0 W 0 D 0 or D e1 D : 0
372
7 Overdetermined System of Nonlinear Equations on Curved Manifolds
It is worth mentioning that Fisher found the same distribution in another context (Fisher, 1956, SMSI, pp. 133–138) as the conditional distribution of x , given e kx k D 1 when x is N2 .¯; k 1I2 /. e e e
7-32 Oblique Map Projection A special way to derive the general representation of the two-dimensional generalized Fisher sphere is by forming the general map of S2 onto R2 . In order to follow a systematic approach, let us denote by fA; g the “real” metaversus by f˛; rg resp. fx; yg, longitude/meta-colatitude the meta-longitude/metarepresenting a point on the latitude representing the plane mapped values on the tangential sphere given by f ; ˚ g given by f ; ˚ g, alternatively by its polar coordinates fx; yg At first, we want to derive the equations generating an oblique map projection of S2R onto T0 S2R of equiareal type. ˛ D A; r D 2R sin =2 versus x D 2R sin =2 cos A y D 2R sin =2 sin A: Second, we intend to derive the transformation from the local surface element dAd sin to the alternate local surface element jJjd˛dr sin .˛; r/ by means of the inverse Jacobian determinant 2 2 3 3 d˛ dA 0 0 6 dA 6 d˛ 7 7 1 4 dr 5 D jJ j jJj D 4 d 5 0 0 d dr 2 3 @˛ @˛ DA ˛ 0 6 @A @ 7 1 J D 4 @r @r 5 D 0 D r @A @ # " 1 1 0 D˛ A Dr A : D JD ; jJj D 1 0 D˛ Dr R cos =2 R cos =2 Third, we read the inverse equations of an oblique map projection of S2R of equiareal type.
7-3 Special Models
373
r r r2 ; cos D 1 A D ˛; sin D 2 2R 2 4R2 r r r2 r 2 sin D 2 sin cos D 2 sin 1 sin : D 1 2 2 2 2 R 4R2 We collect our basic results in a few lemmas. Lemma 7.7. dAd sin D
rd˛dr R2
Lemma 7.8. (oblique azimuthal map projection of S2R / of equiareal type): direct equations ˛ D A; r D 2R sin =2 inverse equations r A D ˛; sin D R
r 1
r 2 : 2R
Lemma 7.9. dAd sin D
1 d˛ rdr dxdy D : 2 R R2
Lemma 7.10. (oblique azimuthal map projection of S2R / of equiareal type): direct equations x D 2R sin
cos A; y D 2R sin sin A 2 2 inverse equations
y 1 p 2 ; sin D x C y2 x 2 2R r p 1p 2 x2 C y 2 1 p 2 sin D x C y2 1 D x C y 2 4R2 .x 2 C y 2 /: 2 2 R 4R 2R tan A D
374
7 Overdetermined System of Nonlinear Equations on Curved Manifolds
Fig. 7.1 Left: Tangential plane. Right: Tangential plane
Lemma 7.11. (change from one chart in another chart (“cha-cha-cha”), Kartenwechsel): The direct equations of the transformation of spherical longitude and spherical latitude f ; ˚ g into spherical meta-longitude and spherical meta-colatitude fA; g are established by
cot A D Œ cos ˚ tan ˚ C sin ˚ cos. /= sin. / cos D cos ˚ cos ˚ cos. / C sin ˚ sin ˚ /
with respect to a meta-North pole f; ˚g. In contrast, the inverse equations of the transformation of spherical meta-longitude and spherical meta-colatitude fA; g into spherical longitude and spherical latitude f ; ˚ g read
cot. / D Œ sin ˚ cos A C cos ˚ cot = sin A sin ˚ D sin ˚ cos C cos ˚ sin cos A :
We report two key problems (Fig. 7.1). First, in the plane located at . ; ˚ / we place the circular normal distribution x D 2.1 cos / cos A D r. / cos ˛ y D 2.1 cos / sin A D r. / sin ˛ or A D ˛; r D 2.1 cos / as an alternative. A natural generalization towards an oblique normal distribution will be given by x D x cos " y sin "
x D x cos " C y sin " or
y D x sin " C y cos "
y D x sin " C y cos "
7-3 Special Models
375
and .x /2 x C .y /2 y D .x cos " C y sin "/2 x C .x sin " C y cos "/2 y D x 2 cos2 " x C x 2 sin2 " y C y 2 sin2 " x C y 2 cos2 " y Cxy.sin " cos " x sin " cos " y / D x 2 .cos2 " x C sin2 " y / C y 2 .sin2 " x C cos2 " y / Cxy sin " cos ". x y /: The parameters . x ; y / determine the initial values of the elliptic curve p p representing the canonical data set, namely .1= x ; 1= y /. The circular normal distribution is achieved for the data set x D y D 1: Second, we intend to transform the representation of coordinates of the oblique normal distribution from Cartesian coordinates in the oblique equatorial plane to curvilinear coordinates in the spherical reference frame: x 2 .cos2 " x C sin2 " y / C y 2 .sin2 " x C cos2 " y / C xy sin " cos ". x y / D r 2 cos2 ˛.cos2 " x C sin2 " y / C r 2 sin2 ˛.sin2 " x C cos2 " y / C r 2 sin ˛ cos ˛ sin " cos ". x y / D r 2 .; A/ cos2 A.cos2 " x C sin2 " y /Cr 2 .; A/ sin2 A.sin2 " x C cos2 " y / C r 2 .; A/ sin A cos A sin " cos ". x y /: Characteristically, the radical component r.; A/ be a function of the colatitude and of the azimuth A. The angular coordinate is preserved, namely ˛ D A. Here, our comments on the topic of the oblique normal distribution are finished.
7-33 A Note on the Angular Metric We intend to prove that an angular metric fulfils all the four axioms of a metric. Let us begin with these axioms of a metric, namely cos ˛ D
< xjy > < xjy > ; 80 ˛ , ˛ D arccos1 jjxjj jjyjj jjxjj jjyjj
based on Euclidean metric forms. Let us introduce the distance function d
x y ; jjxjj jjyjj
D arccos
< xj y > jjxjj jjyjj
376
to fulfill
7 Overdetermined System of Nonlinear Equations on Curved Manifolds
M1 W d.x; y/ 0 M 2 W d.x; y/ D 0 , x D y M 3 W d.x; y/ D d.y; x/ M 4 W d.x; z/ d.x; y/ C .y; z/ W Triangular Inequality:
Assume jjxjj2 D jjyjj2 D jjzjj2 D 1: Axioms M1 and M3 are easy to prove, Axiom M 2 is not complicated, but the Triangular Inequality requires work. Let x; y; z2 X; ˛ D d.x; y/; ˇ D d.y; z/and D d.x; z/, i.e. ˛; ˇ; 2 Œ0; ; cos ˛ D< x; y >; cos ˇ D< y; z >; cos D< x; z > : We wish to prove ˛ C ˇ. This result is trivial in the case cos , so we may assume ˛ C ˇ 2 Œ0; . The third desired inequality is equivalent to cos cos.˛ C ˇ/. The proof of the basic formulas relies heavily on the properties of the inverse product: 3 < u C u0 ; v >D< u; v > C < u0 ; v > 5 for all u; u0 ; v; v0 2 R3 ; and 2 R: < u; v C v0 >D< u; v > C < u; v0 > < u; v > D < u; v > D < u; v > Define x0 ; z0 2 R3 by x D .cos ˛/y C x0 ; z D .cos ˇ/y z0 ; then < x0 ; z0 >D< x .cos ˛/y; z C .cos ˇ/y > D < x; z > C.cos ˛/ < y; z > C.cos ˇ/ < x; y > .cos ˛/.cos ˇ/ < y; y > D cos C cos ˛ cos ˇ C cos ˛ cos ˇ cos ˛ cos ˇ D cos C cos ˛ cos ˇ: In the same way jjx0 jj2 D< x; x0 >D 1 cos2 ˛ D sin2 ˛ so that, since 0 ˛ ; jjx0 jj D sin ˛: Similarly, jjz0 jj D sin ˇ: But by Schwarz’ Inequality,< x0 ; z0 > jjx0 jj jjz0 jj: It follows that cos cos ˛ cos ˇ sin ˛ sin ˇ D D cos.˛ C ˇ/ and we are done!
7-4 Case Study Table 7.3 collects 30 angular observations with two different theodolites. The first column contains the number of the directional observations i ; i 2 f1; : : : ; 30g; n D 30. The second column lists in fractions of seconds the directional data, while the third column/fourth column is a printout of cos i = sin i . Table 7.4 is a comparison of g and O as the arithmetic mean. Obviously, on the level of concentration of data, g and O are nearly the same.
7-4 Case Study
377
Table 7.3 (The directional observation data using two theodolites and its calculation): Theodolite 1 No.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 22 23 24 25 26 27 28 29 30 ∑
Valueof Observation (Λi) 76º42′ 17.2′′ 19.5 19.2 16.5 19.6 16.4 15.5 19.9 19.2 16.8 15.0 16.9 16.6 20.4 16.3 16.7 16.0 15.5 19.1 18.8 18.7 19.2 17.5 16.7 19.0 16.8 19.3 20.0 17.4 16.2 Λˆ L = 76º42′17.73′′ sˆ = ± 1.55′′
Theodolite 2
cos Λi
sin Λi
Valueof observation (Λi)
0.229969
0.973198
76º42′19.5′′
0.229958
0.973201
0.229958 0.229959 0.229972 0.229957 0.229972 0.229977 0.229956 0.229959 0.229970 0.229979 0.229970 0.229971 0.229953 0.229973 0.229971 0.229974 0.229977 0.229960 0.229961 0.229962 0.229959 0.229967 0.229971 0.229960 0.229970 0.229959 0.229955 0.229968 0.229973
0.973201 0.973200 0.973197 0.973201 0.973197 0.973196 0.973201 0.973200 0.973198 0.973196 0.973198 0.973197 0.973202 0.973197 0.973197 0.973197 0.973196 0.973200 0.973200 0.973200 0.973200 0.973198 0.973197 0.973200 0.973198 0.973200 0.973201 0.973198 0.973197
19.0 18.8 16.9 18.6 19.1 18.2 17.7 17.5 18.6 16.0 17.3 17.2 16.8 18.8 17.7 18.6 18.8 17.7 17.1 16.9 17.6 17.0 17.5 18.2 18.3 19.8 18.6 16.9 16.7
0.229960 0.229961 0.229970 0.229962 0.229960 0.229964 0.229966 0.229967 0.229962 0.229974 0.229968 0.229969 0.229970 0.229961 0.229966 0.229962 0.229961 0.229966 0.229969 0.229970 0.229967 0.229970 0.229967 0.229964 0.229963 0.229956 0.229962 0.229970 0.229971
0.973200 0.973200 0.973198 0.973200 0.973200 0.973199 0.973199 0.973198 0.973200 0.973197 0.973198 0.973198 0.973198 0.973200 0.973199 0.973200 0.973200 0.973199 0.973198 0.973198 0.973198 0.973198 0.973198 0.973199 0.973199 0.973201 0.973200 0.973198 0.973197
6.898982
29.195958
Λˆ L = 76º42′17.91′′ sˆ = ± 0.94′′
6.898956
29.195968
cos Λi
sin Λi
“The precision of the theodolite two is higher compared to the theodolite one”. Alternatively, let us present a second example. Let there be given observed azimuths i and vertical directions ˚i , by Table 7.3 in detail. First, we compute the solution of the optimization problem (Table 7.5)
378
7 Overdetermined System of Nonlinear Equations on Curved Manifolds
Table 7.4 (Computation of theodolite data Comparison of O and g ): Left data set versus Right data set Theodolite 1 W O D 76ı 420 17:7300 ; sO D 1:5500 Theodolite 2 W O D 76ı 420 17:9100 ; sO D 0:9400 Table 7.5 (Data of type azimuth i and vertical direction ˚i ): i 124ı 9 125ı 2 126ı 1 125ı 7 n X
˚i 88ı 1 88ı 3 88ı 2 88ı 1
2.1 cos i / D
n X
i D1
i 125ı0 124ı9 124ı8 125ı1
˚i 88ı 0 88ı 2 88ı 1 88ı 0
2Œ1 cos ˚i cos ˚ cos.i / sin ˚i sin ˚
i D1
D min
; ˚
subject to values of the central direction n P
tan O D
i D1 n P i D1
cos ˚i sin i cos ˚i cos i
n P
; tan ˚O D s
i D1 n P i D1
cos ˚i cos i
sin ˚i
2
C
n P
i D1
cos ˚i sin i
2 :
This accounts for measurements of data on the horizontal circle and the vertical circle being Fisher normal distributed. We want to tackle two problems: O ˚O / with the arithmetic mean .; N ˚/ N of the data Problem 1: Compare .; set. Why do the results not coincide? O ˚/ O and .; N ˚N / do coincide? Problem 2 : In which case .; Solving Problem 1 Let us compute O ˚O / N ˚/ N D .125ı :212; 5; 88ı :125/ and .; .; D .125ı :206; 664; 5; 88ı :125; 050; 77/ j j D 0ı :005; 835; 5 D 2100 :007; 8; j ˚j D 0ı :000; 050; 7 D 000 :18:
7-4 Case Study
379
The results do not coincide due to the fact that the arithmetic means are obtained by adjusting direct observations with least-squares technology. Solving Problem 2 The results do coincide if the following conditions are met. • All vertical directions are zero • O D N if the observations i ; ˚i fluctuate only “a little” around the constant value 0 ; ˚0 • O D N if ˚i D const: • ˚O D ˚N if the fluctuation of i around 0 is considerably smaller than the fluctuation of ˚i around ˚0 . Note the values 8 X
cos ˚i sin i D 0:213; 866; 2I
8 X
i D1 8 X
cos ˚i sin i
2
D 0:045; 378; 8
i D1
cos ˚i cos i D 0:750; 903; 27I
8 X
i D1
cos ˚i cos i
2
D 0:022; 771; 8
i D1 8 X
cos ˚i D 7:995; 705; 3
i D1
and i D 0 nC ıi versus ˚i D ˚0 C n ı˚i X 1X N Nı N D 1 N ıi versus ı ˚ D ı˚i n i D1 n i D1 n X
cos ˚i sin i D n cos ˚0 sin 0 C cos ˚0 cos 0
i D1
n X
ıi
i D1
sin ˚0 sin 0
n X
ı˚i
i D1
D n cos ˚0 .sin 0 C ı cos 0 ı˚ tan 0 sin 0 / n X
cos ˚i cos i D n cos ˚0 cos 0 cos ˚0 sin 0
i D1
n X
ıi
i D1
sin ˚0 sin 0
n X
ı˚i
i D1
D n cos ˚0 .cos 0 ı sin 0 ı˚ tan 0 cos 0 /
380
7 Overdetermined System of Nonlinear Equations on Curved Manifolds
tan O D
sin 0 C ıNN cos 0 ıN˚N tan ˚0 sin 0 cos 0 ıNN sin 0 ıN˚N tan ˚0 cos 0 tan N D
.
n P i D1 2
sin 0 C ıNN cos 0 cos 0 ıNN sin 0
cos ˚i sin i /2 C . 2
n P i D1 2
cos ˚i cos i /2
D n2 .cos ˚0 C cos2 ı C sin ı˚ 2 sin ˚0 cos ı˚/ n X
sin ˚i D n sin ˚0 C ıN˚N cos ˚0
i D1
n sin ˚0 C ıN˚N cos ˚0 tan ˚O D p n cos2 ˚0 C ıNN 2 cos2 ˚0 C ıN˚N 2 sin2 ˚0 2ıN˚N sin ˚0 cos ˚0 sin ˚0 C ıN˚N cos ˚0 tan ˚N D : cos ˚0 ıN˚N sin ˚0 N ˚O ¤ ˚N holds in general. In consequence, O ¤ ; At the end we will summarize to additional references like E. Batschelet (1965), T.D. Downs and A.L. Gould (1967), E.W. Grafarend (1970), E.J. Gumbel et al (1953), P. Hartmann et al (1974), and M.A. Stephens (1969). Comments Data on the circle or on the sphere are examples for measurements of a Riemann manifold with curvature called RIEMANN of dimension p. A circle is a Riemann manifold if dimension p D 1, but a sphere a Riemann manifold of dimension p D 2. A minimum atlas of the circle can be constructed by two charts, a minimum atlas of the sphere again by two charts. In simple words, we need two charts to cover the circle or the sphere. For instance, by the stereographic projection of the following type. .i / First Chart Stereographic projection with projection centre at the South pole and projection plane at the North pole. .ii/ Second Chart Stereographic projection with projection centre at the North pole and projection plane at the South pole. An illustration is referred to E. Grafarend and F. Krumm (pages 168f, 211f and 217f (2006)). These projections are classified as conformal, they establish a conformism. A p-dimension of type von Mises–Fisher (or Langevin, in G.S. Watson’s terminology) distribution Mp ./ is a uniform distribution if
7-4 References
381
f .xI ; / D
.p1/=2 2
1 p1 T . p1 2 /I. 2 /./
exp. < jx >/
holds. The embedding of a p-dimension Riemann manifold into a p.p C 1/dimensional Euclidean manifold is an important issue. For instance p D 1 ) Riemann M1 , Euclid E1 p D 2 ) Riemann M2 , Euclid E3 p D 3 ) Riemann M3 , Euclid E4 p D 4 ) Riemann M4 , Euclid E10 : .Ex W relativity/ K.V. Mardia and P.E. Jupp (2000) mention applications of the von Mises–Fisher distribution in Earth Sciences, meteorology, biology, physics, psychology, image analysis and astronomy. For instance, GPS (Global Positioning System) for the analysis of phase observations (J. Cai and E. Grafarend 2007) is a very important area of research as well as the diagnosis of directional data in gravitostatics, magnetostatics, and electrostatics and in problems dealing with spherical harmonics, both in synthetic as well as in analysis, namely Fourier and Fourier–Legendre series. An open question is still the type of distribution on other Riemann manifolds like ellipse, hyperbola or ellipsoid hyperboloids, one sided or two sided, for instance.
References Anderson, T.W. and M.A. Stephens (1972), Arnold, K.J. (1941), Barndorff-Nielsen, O. (1978), Batschelet, E. (1965), Batschelet, E. (1971), Batschelet, E. (1981), Beran, R.J. (1968), Beran, R.J. (1979), Blingham, C. (1964), Blingham, C. (1974), Chang, T. (1986) Downs, T. D. and Gould, A. L. (1967), Durand, D. and J.A. Greenwood (1957), Enkin, R. and Watson, G. S. (1996), Fisher, R. A. (1953), Fisher, N.J. (1985), Fisher, N.I. (1993), Fisher, N.I. and Lee A. J. (1983), Fisher, N.I. and Lee A. J. (1986), Fujikoshi, Y. (1980), Girko, V.L. (1985), Goldmann, J. (1976), Gordon, L. and M. Hudson (1977), Grafarend, E. W. (1970), Greenwood, J.A. and D. Durand (1955), Gumbel, E. J., Greenwood, J. A. and Durand, D. (1953), Hammersley, J.M. (1950), Hartman, P. and G. S. Watson (1974), Hetherington, T.J. (1981), Jensen, J.L. (1981), Jupp, P.E. and (1980), Jupp, P. E. and Mardia, K. V. , Kanwal, R.P. (1971), Kariya, T. (1989), (1989), Kendall, D.G. (1974), Kent, J. (1976), Kent, J.T. (1982), Kent, J.T. (1983), Krumbein, W.C. (1939), Langevin, P. (1905), Laycock, P.J.
382
7 Overdetermined System of Nonlinear Equations on Curved Manifolds
(1975), Lenmitz, C. (1995), Lenth, R.V. (1981), Lord, R.D. (1948), Lund, U. (1999), Mardia, K.V. (1972), Mardia, K.V. (1975), Mardia, K. V. (1988), Mardia, K. V. and Jupp, P. E. (1999), Mardia, K.V. et al. (2000), Mhaskar, H.N., Narcowich, F.J. and J.D. Ward (2001), Muller, C. (1966), Neudecker, H. (1968), Okamoto, M. (1973), Parker, R.L. et al (1979), Pearson, K. (1905), Pearson, K. (1906), Pitman, J. and M. Yor (1981), Presnell, B., Morrison, S.P. and R.C. Littell (1998), Rayleigh, L. (1880), Rayleigh, L. (1905), Rayleigh, R. (1919), Rivest, L.P. (1982), Rivest, L.P. (1988), Roberts, P.H. and H.D. Ursell (1960), Sander, B. (1930), Saw, J.G. (1978), Saw, J.G. (1981), Scheidegger, A.E. (1965), Schmidt-Koenig, K. (1972), Selby, B. (1964), Sen Gupta, A. and R. Maitra (1998), Sibuya, M. (1962), Stam, A.J. (1982), Stephens, M.A. (1963), Stephens, M.A. (1964), Stephens, M. A. (1969), Stephens, M.A. (1979), Tashiro, Y. (1977), Teicher, H. (1961), Von Mises, R. (1918), Watson, G.S. (1956a, 1956b), Watson, G.S. (1960), Watson, G.S.(1961), Watson, G.S. (1962), Watson, G.S. (1965), Watson, G.S. (1966), Watson, G.S. (1967a, 1967b), Watson, G.S. (1968), Watson, G.S. (1969), Wat-son, G.S. (1970), Watson, G.S. (1974), Watson, G.S. (1981a, 1981b), Watson, G.S. (1982a, 1982b, 1982c, 1982d), Watson, G.S. (1983), Watson, G.S. (1986), Watson, G.S. (1988), Watson, G.S. (1998), Watson, G.S. and E.J. Williams (1956), Watson, G.S. and E..Irving (1957), Watson, G.S. and M.R. Leadbetter (1963), Watson, G.S. and S. Wheeler (1964), Watson, G.S. and R.J. Beran (1967), Watson, G.S., R. Epp and J.W. Tukey (1971), Wellner, J. (1979), Wood, A. (1982), Xu, P.L. (1999), Xu, P.L. (2001), Xu, P. (2002), Xu, P.L. et al. (1996a, 1996b), Xu, P.L., and Shimada, S.(1997).
Chapter 8
The Fourth Problem of Probabilistic Regression Special Gauss–Markov model with random effects Setup of BLIP and VIP for the central moments of first order The random effect model as a special Gauss-Markov model with random effects is an extension of the classical Gauss-Markov model: both effect, namely the vector y of observations as well as the vector of the regressor z (derived from the German “Zufall”) are random. Box 1 is a review of the model. Solutions are offered as (a) homogeneous best linear, minimum mean square predictor (hom BLIP), (b) homogeneous S-weighted best linear, minimum mean square predictor (hom S-BLIP) and (c) homogeneous ˛-weighted linear predictor (hom ˛-VIP). Over we take advantage of C.R. Rao’s definition of “bias in the mean” by his substitute matrix S. Essential is the decomposition. ZQ Z D L.y Efyg/ .z Efzg/ ŒIl LCEfzg As well as the Mean Square Prediction Error, the modified Mean Square Prediction Error and the related Frobenius matrix norms. Another decomposition is essential: MSPE fzN g can be divided in three components: 1. The dispersion matrix DfzN g 2. The dispersion matrix Dfzg 3. The bias product ““0 zN hom BLIP of z, zN S-hom BLIP and zQ normhybrid min var-min bias, ˛weighted or hom ˛-VIP are defined in Definitions 8.1–8.3. Theorems 8.5–8.7 collect the related representations of the solutions for the moments of first order. The open problem which is left, namely the choice of weight of the bias matrix, manifest of the weight factor. We discuss the A-optimal design of • tr D(zN / D min • ““0 D min • tr MSPEfzN g D min˛ While the theory of the random effect model is well established, this is not true for various applications. Therefore, we pay attention to three examples. The first example concentrates on three cases of nonlinear scalar error propagation with the random effect model. In case 1, we assume the mean value z is known, while for case 2 this value is unknown. Case 3 is built on z unknown, but z0 known as a random effect approximation. The second example is oriented to nonlinear vector-valued error propagation with random effect model taken from a geoinformatics, the quality design of a nearly rectangular planar surface element. We pose two questions: E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 8, © Springer-Verlag Berlin Heidelberg 2012
383
384
8 The Fourth Problem of Probabilistic Regression
1. What is the structure of the variance-covariance matrix? 2. How to approximate the chosen criterion matrix in terms of absolute coordinates? We have chosen a statical homogeneous and isotropic network of TaylorKarman type within two dimensions. The third example treats nonlinear vector valued error propagation with random effects applied to two-dimensional distance network. We have computed the Jacobi as well as the Hesse matrix for the Taylor-Karman structured criterion matrix. Fast track reading : Read only Theorems 8.5, 8.6 and 8.7.
Lemma 8.4 hom BLIP, hom S-BLIP and hom α-VIP
Definition 8.1 z~: hom BLIP of z
Theorem 8.5 z~: hom BLIP of z
Definition 8.2 z~: S-hom BLIP of z
Theorem 8.6 z~ : S-hom BLIP of z
Definition 8.3 z~: hom α-VIP of z
Theorem 8.7 z~: hom α-VIP of z
The general model of type “fixed effects”, “random effects” and “error-invariables” will be presented in our final chapter: Here we focus on “random effects” (Fig. 8.1).
8-1 The Random Effect Model Let us introduce the special Gauss–Markov model with random effects y D Cz C ey Cez specified in Box 8.1. Such a model is governed by two identities, namely the first identity CEfzg D Efyg of moments of first order and the second identity Dfy Czg C CDfzgC0 D Dfyg of central moments of second order.
8-1 The Random Effect Model
385
Fig. 8.1 Magic triangle
The first order moment identity CEfzg D Efyg relates the expectation Efzg of the stochastic, real-valued vector z of unknown random effects (“Zufallseffekte”) to the expectation Efyg of the stochastic, real-valued vector y of observations by means of the non-stochastic (“fixed”) real-valued matrix C 2 Rnl of rank C D 1; n DD d i mY is the dimension of the observation space Y, l D d i m Z the dimension of the parameter space Z of random effects z. The second order central moment identity † yCz C C† z C0 D † y relates the variance-covariance matrix †yCz of the random vector y Cz, also called dispersion matrix Dfy Czg and the variancecovariance matrix † z of the random vector z, also called dispersion matrix Dfzg, to the variance-covariance matrix † y of the random vector y of the observations, also called dispersion matrix Dfyg. In the simple random effect model we shall assume (i) rk † y D n and (ii) C fy; zg D 0, namely zero correlation between the random vector y of observations and the vector z of random effects. (In the random effect model of type Kolmogorov–Wiener we shall give up such a zero correlation.) There are three types of unknowns within the simple special Gauss–Markov model with random effects: (i) the vector z of random effects is unknown, (ii) the fixed vectors Efyg; Efzg of expectations of the vector y of observations and of the vector z of random effects (first moments) are unknown and (iii) the fixed matrices † y ; † z of dispersion matrices Dfyg; Dfzg (second central moments) are unknown. Box 8.1. (Special Gauss–Markov model with random effects): y D Cz C ey Cez Efyg D CEfzg 2 Rn Dfyg D Dfy Czg C CDfzgC0 2 Rnn C fy; zg D 0 z; Efzg; Efyg; † yCz ; † z unk nown dim R.C0 / D rkC D l:
386
8 The Fourth Problem of Probabilistic Regression
Here we focus on best linear predictors of type hom BLIP, hom S-BLIP and hom ˛-VIP of random effects z, which turn out to be better than the best linear uniformly unbiased predictor of type hom BLUUP. At first let us begin with a discussion of the bias vector and the bias matrix as well as of the Mean Square Prediction Error MSPEfQzg with respect to a homogeneous linear prediction zQ D Ly of random effects z based upon Box 8.2. Box 8.2. (Bias vector, bias matrix Mean Square Prediction Error in the special Gauss–Markov model with random effects) Efyg D CEfzg Dfyg D Dfy Czg C CDfzgC
(8.1) 0
(8.2)
“ansatz” zQ D Ly
(8.3)
bias vector ˇ WD EfQz zg D EfQzg Efzg
(8.4)
ˇ D LEfyg Efzg D ŒI` LCEfzg
(8.5)
bias matrix B WD I` LC
(8.6)
decomposition zQ z D zQ EfQzg .z Efzg/ C .EfQzg Efzg/
(8.7)
zQ z D L.y Efyg/ .z Efzg/ ŒI` LCEfzg
(8.8)
Mean Square Prediction Error MSPEfQzg WD Ef.Qz z/.Qz z/0 g MSPEfQzg D LDfygL0 C Dfzg C ŒI` LCEfzgEfzg0ŒI` LC0
(8.9) (8.10)
.C fy; zg D 0; EfQz EfQzgg D 0; Efz Efzgg D 0/ modified Mean Square Prediction Error MSPES fQzg WD LDfygL0 C Dfzg C ŒI` LC S ŒI` LC0
(8.11)
8-1 The Random Effect Model
387
Frobenius matrix norms jjMSPEfQzgjj2 WD tr Ef.Qz z/.Qz z/0 g
(8.12)
jjMSPEfQzgjj2 D tr LDfygL0 C tr Dfzg C tr ŒI` LCEfzgEfzg0ŒI` LC0 D jjL0 jj2˙y C jj.I` LC/0 jj2Ef.z/E.z/0 C tr Ef.z Efzg/.z Efzg/0 g
(8.13)
jjMSPES fQzgjj2 W WD tr LDfygL0 C tr ŒI` LCSŒI` LC0 C tr Dfzg D jjL0 jj2˙y C jj.I` LC/0 jj2S C tr Ef.z Efzg/.z Efzg/0 g
(8.14)
hybrid minimum variance – minimum bias norm ˛-weighted 1 jj.I` LC/0 jj2S ˛ special model
L.L/ WD jjL0 jj2˙y C
dim R.SC0 / D rkSC0 D rkC D l:
(8.15)
(8.16)
The bias vector ˇ is conventionally defined by EfQzg Efzg subject to the homogeneous prediction form zQ DLy. Accordingly the bias vector can be represented by (8.5) ˇ D ŒI` LCEfzg. Since the expectation Efzg of the vector z of random effects is unknown, there has been made the proposal to use instead the matrix I` LC as a matrix-valued measure of bias. A measure of the prediction error is the Mean Square prediction error MSPEfQzg of type (8.9). M SPEfQzg can be decomposed into three basic parts: • The dispersion matrix DfQzg D LDfygL0 • The dispersion matrix Dfzg • The bias product ˇˇ 0 . Indeed the vector zQ z can also be decomposed into three parts of type (8.7), (8.8) namely (i) zQ EfQzg, (ii) z Efzg and (iii) EfQzg Efzg which may be called prediction error, random effect error and bias, respectively. The triple decomposition of the vector zQ z leads straightforward to the triple representation of the matrix M SPEfQzg of type (8.10). Such a representation suffers from two effects: Firstly the expectation Efzg of the vector z of random effects is unknown, secondly the matrix EfzgEfz0g has only rank 1. In consequence, the matrix ŒI` LCEfzgEfzg0ŒI` LC0 has only rank 1, too. In this situation there has made the proposal to modify M SPEfQzg by the matrix EfzgEfz0g and by the regular matrix S. M SPEfQzg has been defined by (8.11). A scalar measure of M SPEfQzg as well as M SPEfQzg are the Frobenius norms (8.12), (8.13), (8.14). Those scalars constitute the optimal risk in Definition 8.1 (hom BLIP) and Definition 8.2 (hom S-BLIP). Alter-natively a homogeneous ˛-weighted hybrid minimum variance–minimum
388
8 The Fourth Problem of Probabilistic Regression
min bias
balance between variance and bias
min variance
Fig. 8.2 Balance of variance and bias by the weight factor ˛
bias prediction (hom VIP) is presented in Definition 8.3 (hom ˛-VIP) which is based upon the weighted sum of two norms of type (8.15), namely • Average variance jjL0 jj2˙y D t rL˙y L0 • Average bias jj.I` LC/0 jj2S D t r ŒI` LC S ŒI` LC0 . The very important predictor ˛-VIP is balancing variance and bias by the weight factor ˛ which is illustrated by Fig. 8.2. Definition 8.1. (Qz hom BLIP of z): An l1 vector zQ is called homogeneous BLIP of z in the special linear Gauss– Markov model with random effects of Box 8.1, if (1st) zQ is a homogeneous linear form zQ D Ly;
(8.17)
(2nd) in comparison to all other linear predictions zQ has the minimum Mean Square Prediction Error in the sense of jjM SPEfQzgjj2 D tr LDfygL0 C tr Dfzg C tr ŒI` LCEfzgEfzg0ŒI` LC0 D jjL0 jj2˙y Cjj.I` LC/0 jj2Ef.z/E.z/0 Ctr Ef.zEfzg/.zEfzg/0 g: (8.18) Definition 8.2. (Qz S-hom BLIP of z): An l1 vector zQ is called homogeneous S-BLIP of z in the special linear Gauss– Markov model with random effects of Box 8.1, if (1st) zQ is a homogeneous linear form zQ D Ly;
(8.19)
(2nd) in comparison to all other linear predictions zQ has the minimum S-modified Mean Square Prediction Error in the sense of
8-1 The Random Effect Model
389
jjM SPES fQzgjj2 WD tr LDfygL C tr ŒI` LCSŒI` LC0 C tr Ef.z Efzg/.z Efzg/0 g 0
D jjL0 jj2˙y C jj.I` LC/0 jj2S C tr Ef.z Efzg/.z Efzg/0 g D min : L
(8.20)
Definition 8.3. (Qz hom hybrid min var-min bias solution, ˛-weighted or hom ˛-VIP): An l1 vector zQ is called homogeneous ˛-weighted hybrid minimum variance– minimum bias prediction (hom ˛-VIP) of z in the special linear Gauss–Markov model with random effects of Box 8.1, if (1st) zQ is a homogeneous linear form zQ D Ly;
(8.21)
(2nd) in comparison to all other linear predictions zQ has the minimum varianceminimum bias in the sense of the ˛-weighted hybrid norm 1 tr .I` LC/ S .I` LC/0 ˛ 1 C jj.I` LC/0 jj2S D min L ˛
tr LDfygL0 C D
jjL0 jj2˙y
(8.22)
in particular with respect to the special model ˛ 2 RC ; dim R.SC0 / D rkSC0 D rkC D l: The predictions zQ hom BLIP, hom S-BLIP and hom ˛-VIP can be characterized as follows: Lemma 8.4. (hom BLIP, hom S-BLIP and hom ˛-VIP): An l1 vector zQ is hom BLIP, hom S-BLIP or hom ˛-VIP of z in the special linear Gauss–Markov model with random effects of Box 8.1, if and only if the matrix LO fulfils the normal equations (1st) hom BLIP:
(2nd) hom S-BLIP: (3rd) hom ˛-VIP:
.˙y C CEfzgEfzg0C0 /LO 0 D CEfzgEfzg0
(8.23)
.˙y C CSC0 /LO 0 D CS
(8.24)
1 1 ˙y C CSC0 LO 0 D CS ˛ ˛
(8.25)
390
8 The Fourth Problem of Probabilistic Regression
Proof. (i) hom BLIP: The hybrid norm jjM SPEfQzgjj2 establishes the Lagrangean L.L/ WD tr L˙y L0 C tr .Il LC/EfzgEfzg0.Il LC/0 C tr ˙z D min L
for zQ hom BLIP of z. The necessary conditions for the minimum of the quadratic Lagrangean L.L/ are @L O .L/ WD 2Œ˙y LO 0 C CEfzgEfzg0C0 LO 0 CEfzgEfzg0 D 0; @L which agree to the normal equations (8.23). (The theory of matrix derivatives is reviewed in Appendix A.7 (Facts: derivative of a scalar-valued function of a matrix: trace)). The second derivatives @2 L O >0 .L/ @.vecL/@.vecL/0 at the “point” LO constitute the sufficiency conditions. In order to compute such an lnln matrix of second derivatives we have to vectorize the matrix normal equation @L O O y C CEfzgEfzg0C0 / 2EfzgEfzg0C0 .L/ WD 2L.˙ @L @L O WD vecŒ2L.˙ O y C CEfzgEfzg0C0 / 2EfzgEfzg0C0 : .L/ @.vecL/ (ii) hom S-BLIP: The hybrid norm jjM SPEsfQzgjj2 establishes the Lagrangean L.L/ WD tr L˙y L0 C tr .I` LC/S.I` LC/0 C tr ˙z D min L
for zQ hom S-BLIP of z. Following the first part of the proof we are led to the necessary conditions for the minimum of the quadratic Lagrangean L.L/ @L O .L/ WD 2Œ˙y LO 0 C CSC0 LO 0 CS0 D 0 @L as well as to the sufficiency conditions @2 L O D 2Œ.˙y C CSC0 / ˝ I` > 0 : .L/ @.vecL/@.vecL/0 O D 0 agree to (8.24). The normal equations of hom S-BLIP @
[email protected]/
8-1 The Random Effect Model
391
(iii) hom ˛-VIP: The hybrid norm jjL0 jj2˙y C
1 jj.I` LC/0 jj2S establishes the Lagrangean ˛
L.L/ WD tr L˙y L0 C
1 tr .I` LC/S.I` LC/0 D min L ˛
for zQ hom ˛-VIP of z. Following the first part of the proof we are led to the necessary conditions for the minimum of the quadratic Lagrangean L.L/ @L O .L/ D 2Œ.˙y C CEfzgEfzg0C0 / ˝ I` vecLO 2vec.EfzgEfzg0C0 /: @L The Kronecker–Zehfuss Product A ˝ B of two arbitrary matrices as well as .ACB/ ˝C D A˝BCB˝C of three arbitrary matrices subject to d i mA D d i mB is introduced in Appendix A. (Definition of Matrix Algebra: multiplication matrices of the same dimension (internal relation) and multiplication of matrices (internal relation) and Laws). The vec operation (vectorization of an array) is reviewed in Appendix A, too. (Definition, Facts: vecAB D .B0 ˝ I0 ` /vecA for suitable matrices A and B.) Now we are prepared to compute @2 L O D 2Œ.˙y C CEfzgEfzgC0/ ˝ I` > 0 .L/ @.vecL/@.vecL/0 as a positive definite matrix. (The theory of matrix derivatives is reviewed in Appendix B (Facts: derivative of a matrix-valued function of a matrix, namely @.vecX/
[email protected]/0 ). @L O O 0 1 CS0 ˛ D 0 .L/ D 2Œ ˛1 CSC0 LO 0 C ˙y L ˛ @L as well as to the sufficiency conditions @2 L O D 2Œ. 1 CSC0 C ˙y / ˝ I` > 0: .L/ ˛ @.vecL/@.vecL/0 O D 0 agree to (8.25). The normal equations of hom ˛-VIP @
[email protected]/ | For an explicit representation of zQ as hom BLIP, hom S-BLIP and hom ˛-VIP of z in the special Gauss–Markov model with random effects of Box 8.1, we solve the normal equations (8.23), (8.24) and (8.25) for O D argfL.L/ D ming: L L
Beside the explicit representation of zQ of type hom BLIP, hom S-BLIP and hom ˛VIP we compute the related dispersion matrix DfQzg, the Mean Square Prediction
392
8 The Fourth Problem of Probabilistic Regression
Error M SPEfQzg, the modified the Mean Square Prediction Error M SPES fQzg and M SPE˛;S fQzg and the covariance matrices C fz; zQ zg in Theorem 8.5. (Qz hom BLIP): Let zQ D Ly be hom BLIP of z in the special linear Gauss–Markov model with random effects of Box 8.1. Then equivalent representations of the solutions of the normal equations (8.23) are zQ D EfzgEfzg0C0 Œ˙y C CEfzgEfzg0C0 1 y D EfzgEfzg0C0 Œ˙yCz C C˙z C0 C CEfzgEfzg0C0 1 y
(8.26)
(if Œ˙y C CEfzgEfzg0C0 1 exists) are completed by the dispersion matrix DfQzg D EfzgEfzg0C0 Œ˙y C CEfzgEfzg0C0 1 ˙y Œ˙y C CEfzgEfzg0C0 1 CEfzgEfzg0 by the bias vector (8.5) ˇ W D EfQzg Efzg D ŒI` EfzgEfzg0C0 .CEfzgEfzg0C0 C ˙y /1 CEfzg
(8.27)
(8.28)
and by the matrix of the Mean Square Prediction Error M SPEfQzg: M SPEfQzg WD Ef.Qz z/.Qz z/0 g D DfQzg C Dfzg C ˇˇ 0
(8.29)
M SPEfQzg WD DfQzg C Dfzg C ŒI` EfzgEfzg0C0 .CEfzgEfzg0C0 C ˙y /1 C EfzgEfzg0ŒI` C0 .CEfzgEfzg0C0 C ˙y /1 CEfzgEfzg0: (8.30) At this point we have to comment what Theorem 8.5 tells us. hom BLIP has generated the predictionQz of type (8.26), the dispersion matrix DfQzg of type (8.27), the bias vector of type (8.28) and the Mean Square Prediction Error of type (8.30) which all depend on the vector Efzg and the matrix EfzgEfzg0, respectively. We already mentioned that Efzg and EfzgEfzg0 are not accessible from measurements. The situation is similar to the one in hypothesis theory. As shown later in this section we can produce only an estimator Efzg and consequently can setup a hypothesis first moment Efzg of the “random effect” z. Indeed, a similar argument applies to
b
8-1 The Random Effect Model
393
the second central moment Dfyg ˙y of the “random effect” y, the observation vector. Such a dispersion matrix has to be known in order to be able to compute zQ , DfQzg, and M SPEfQzg. Again we have to apply the argument that we are only able to construct an estimate ˙O y0 and to setup a hypothesis about Dfyg ˙y . Theorem 8.6. (Qz hom S-BLIP): Let zQ D Ly be hom S-BLIP of z in the special linear Gauss–Markov model with random effects of Box 8.1. Then equivalent representations of the solutions of the normal equations (8.24) are zQ D SC0 .˙y C CSC0 /1 y D SC0 .˙yCz C C˙z C0 C CSC0 /1 y
(8.31)
zQ D .C0 ˙y1 C C S1 /1 C0 ˙y1 y
(8.32)
zQ D .I` C SC0 ˙y1 C/1 SC0 ˙y1 y
(8.33)
(if S1 ; ˙y1 exist) are completed by the dispersion matrices DfQzg D SC0 .CSC0 C ˙y /1 ˙y .CSC0 C ˙y /1 CS
(8.34)
DfQzg D .C0 ˙y1 C C S1 /1 C0 ˙y1 C.C0 ˙y1 C C S1 /1
(8.35)
(if S1 ; ˙y1 exist) by the bias vector (8.5)
ˇ WD EfQzg Efzg D ŒI` SC0 .CSC0 C ˙y /1 CEfzg ˇ D ŒIl .C0 ˙y1 C C S1 /1 C0 ˙y1 CEfzg
(8.36)
(if S1 ; ˙y1 exist) and by the matrix of the modified Mean Square Prediction Error MSPEfQzg: M SPES fQzg WD Ef.Qz z/.Qz z/0 g D DfQzg C Dfzg C ˇˇ 0 M SPES fQzg D ˙z C SC0 .CSC0 C ˙y /1 ˙y .CSC0 C ˙y /1 CS CŒI` SC0 .CSC0 C ˙y /1 CEfzgEfzg0ŒIl C0 .CSC0 C ˙y /1 CS
(8.37)
(8.38)
394
8 The Fourth Problem of Probabilistic Regression
MSPES fQzg D ˙z C .C0 ˙y1 C C S1 /1 C0 ˙y1 C.C0 ˙y1 C C S1 /1 CS C ŒI` .C0 ˙y1 C C S1 /1 C0 ˙y1 CEfzgEfzg0 ŒI` C0 ˙y1 C.C0 ˙y1 C C S1 /1
(8.39)
(if S1 ; ˙y1 exist). The interpretation of hom S-BLIP is even more complex. In extension of the comments to hom BLIP we have to live with another matrix-valued degree of freedom, zQ of type (8.31), (8.32), (8.33) and DfQzg of type (8.34), (8.35) do no longer depend on the inaccessible matrix EfzgEfzg0, rk.EfzgEfzg0), but on the “bias weight matrix” S, rk S = l. Indeed we can associate any element of the bias matrix with a particular weight which can be “designed” by the analyst. Again the bias vector ˇ of type (8.36) as well as the Mean Square Prediction Error of type (8.37), (8.38), (8.39) depend on the vector Efzg which is inaccessible. Beside the “bias weight matrix S” zQ , DfQzg, ˇ and M SPEfQzg are vector-valued or matrix-valued functions of the dispersion matrix Dfyg ˙y of the stochastic observation vector which is inaccessible. By hypothesis testing we may decide upon the construction b y. of Dfyg ˙y from an estimate ˙ Theorem 8.7. (Qz hom a-VIP): Let zQ D Ly be hom ˛-VIP of z in the special linear Gauss–Markov model with random effects Box 8.1. Then equivalent representations of the solutions of the normal equations (8.25) are 1 1 0 1 SC ˙y C CSC0 y ˛ ˛ 1 1 1 y D SC0 ˙yCz C C˙z C0 C CSC0 ˛ ˛
zQ D
zQ D .C0 ˙y1 C C ˛S1 /1 C0 ˙y1 y zQ D .I` C
1 0 1 1 1 0 1 SC ˙y C/ SC ˙y y ˛ ˛
(8.40)
(8.41) (8.42)
(if S1 ; ˙y1 exist) are completed by the dispersion matrix 1 1 1 0 1 1 1 0 0 DfQzg D SC ˙y C CSC ˙y ˙y C CSC CS ˛ ˛ ˛ ˛ 1 0 1 0 1 1 DfQzg D C0 ˙y1 C C ˛S1 C ˙y C C ˙y C C ˛S1
(8.43) (8.44)
8-1 The Random Effect Model
395
(if S1 ; ˙y1 exist) by the bias vector (8.5)
ˇ WD EfQzg Efzg " 1 # 1 0 1 0 CSC C ˙y D I` SC C Efzg ˛ ˛ ˇ D ŒI` .C0 ˙y1 C C ˛S1 /1 C0 ˙y1 CEfzg 1
(if S ;
˙y1
(8.45)
exist)
and by the matrix of the Mean Square Prediction Error M SPEfQzg: M SPEfQzg WD Ef.Qz z/.Qz z/0 g D DfQzg C Dfzg C ˇˇ 0 MSPEfQzg D ˙z C SC0 .CSC0 C ˙y /1 ˙y .CSC0 C ˙y /1 CS " 1 # 1 0 1 0 CSC C ˙y C EfzgEfzg0 C I` SC ˛ ˛ " # 1 1 1 CSC0 C ˙y I` C0 CS ˛ ˛
(8.46)
(8.47)
MSPEfQzg D ˙z C.C0 ˙y1 CC˛S1 /1 C0 ˙y1 C.C0 ˙y1 CC˛S1 /1 CS (8.48) C ŒI` .C0 ˙y1 C C ˛S1 /1 C0 ˙y1 CEfzgEfzg0 ŒI` C0 ˙y1 C.C0 ˙y1 C C ˛S1 /1 (if S1 ; ˙y1 exist). The interpretation of the very important predictors hom ˛-VIP zQ of zis as follows: zQ of type (8.41), also called ridge estimator or Tykhonov–Phillips regulator, contains the Cayley inverse of the normal equation matrix which is additively decomposed into C0 ˙y1 C and ˛ S1 . The weight factor ˛ balances the first inverse dispersion part and the second inverse bias part. While the experiment informs us of the b y , the bias weight matrix S and the weight variance-covariance matrix ˙y , say ˙ factor ˛ are at the disposal of the analyst. For instance, by the choice S D Di ag.s1 ; :::; s` / we may emphasize increase or decrease of certain bias matrix elements. The choice of an equally weighted bias matrix is S D I` . In contrast the weight factor ˛ can be determined by the A-optimal design of type
396
8 The Fourth Problem of Probabilistic Regression
• t rDfQzg D min • ˇˇ 0 D min
˛
˛
• trMSPEfQzg D min : ˛
In the first case we optimize the trace of the variance-covariance matrix DfQzg of type (8.43), (8.44). Alternatively by means of ˇˇ 0 D min we optimize the quadratic ˛
bias where the bias vector ˇ of type (8.45) is chosen, regardless of the dependence on Efzg. Finally for the third case – the most popular one – we minimize the trace of the Mean Square Prediction Error M SPEfQzg of type (8.48), regardless of the dependence on EfzgEfzg0. But beforehand let us present the proof of Theorem 8.5, Theorem 8.6 and Theorem 8.7. Proof.
.i / zQ D EfzgEfzg0C0 Œ˙y C CEfzgEfzg0C0 1 y
If the matrix ˙y C CEfzgEfzg0C0 of the normal equations of type hom BLIP is of full rank, namely rk.˙y C CEfzgEfzg0C0 / D n; then a straightforward solution of (8.23) is LO D EfzgEfzg0C0 Œ˙y C CEfzgEfzg0C0 1 : .ii/ zQ D SC0 .˙y C CSC0 /1 y If the matrix ˙y C CSC0 of the normal equations of type hom S-BLIP is of full rank, namely rk.˙y C CSC0 / D n; then a straightforward solution of (8.24) is LO D SC0 Œ˙y C CSC0 1 : .iii/ zQ D .C0 ˙y1 C C S1 /1 C0 ˙y1 y Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matrices, s(10), Duncan–Guttman matrix identity) the fundamental matrix identity SC0 .˙y C CSC0 /1 D .C0 ˙y1 C C S1 /1 C0 ˙y1 ; if S1 and ˙y1 exist. Such a result concludes this part of the proof. .i v/ zQ D .I` C SC0 ˙y1 C/1 SC0 ˙y1 y Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matrices, s(9)) the fundamental matrix identity SC0 .˙y C CSC0 /1 D .I` C SC0 ˙y1 C/1 SC0 ˙y1 ; if ˙y1 exists. Such a result concludes this part of the proof.
8-1 The Random Effect Model
397
.v/ zQ D
1 1 0 1 SC ˙y C CSC0 y ˛ ˛
1 If the matrix ˙y C CSC0 of the normal equations of type hom ˛-VIP is of full ˛ 1 0 rank, namely rk ˙y C CSC D n; then a straightforward solution of (8.25) is ˛ 1 1 1 LO D SC0 ˙y C CSC0 : ˛ ˛ .vi / zQ D .C0 ˙y1 C C ˛S1 /1 C0 ˙y1 y Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matrices, s(10), Duncan–Guttman matrix identity) the fundamental matrix identity 1 0 SC .˙y C CSC0 /1 D .C0 ˙y1 C C ˛S1 /1 C0 ˙y1 ; ˛ if S1 and ˙y1 exist. Such a result concludes this part of the proof. 1 1 0 1 1 SC ˙y y .vii/ zQ D I` C SC0 ˙y1 C ˛ ˛ Let us apply by means of Appendix A (Fact: Cayley inverse: sum of two matrices, s(9), Duncan–Guttman matrix identity) the fundamental matrix identity 1 0 1 0 1 1 1 0 1 0 1 SC .˙y C CSC / D Il C SC ˙y C SC ˙y ; ˛ ˛ ˛ if ˙y1 exist. Such a result concludes this part of the proof. .viii/ hom BLIP W DfQzg DfQzg W D EfŒQz EfQzgŒQz EfQzg0 g D EfzgEfzg0C0 Œ˙y C CEfzgEfzg0C0 1 ˙y Œ˙y C CEfzgEfzg0C0 1 CEfzgEfzg0: By means of the definition of the dispersion matrix DfQzg and the substitution of zQ of type hom BLIP the proof has been straightforward. .ix/ hom S-BLIP W DfQzg .1st representation/
398
8 The Fourth Problem of Probabilistic Regression
DfQzg W D EfŒQz EfQzgŒQz EfQzg0 g D SC0 .CSC0 C ˙y /1 ˙y .CSC0 C ˙y /1 CS: By means of the definition of the dispersion matrix DfQzg and the substitution of zQ of type hom S-BLIP the proof of the first representation has been straightforward. .x/ hom S-BLIP W DfQzg .2nd representation/ DfQzg W D EfŒQz EfQzgŒQz EfQzg0 g D .C0 ˙y1 C C S1 /1 C0 ˙y1 C.C0 ˙y1 C C S1 /1 ; if S1 and ˙y1 exist. By means of the definition of the dispersion matrix DfQzg and the substitution of zQ of type hom S-BLIP the proof the second representation has been straightforward. .xi / hom ˛-VIP W DfQzg .1st representation/ DfQzg W D EfŒQz EfQzgŒQz EfQzg0 g D 1 1 1 0 1 1 1 0 0 D SC ˙y C CSC ˙y ˙y C CSC CS : ˛ ˛ ˛ ˛ By means of the definition of the dispersion matrix DfQzg and the substitution of zQ of type hom ˛-VIP the proof the first representation has been straightforward. .xi i / hom ˛-VIP W DfQzg .2nd representation) DfQzg W D EfŒQz EfQzgŒQz EfQzg0 g D .C0 ˙y1 C C ˛S1 /1 C0 ˙y1 C.C0 ˙y1 C C ˛S1 /1 ; if S1 and˙y1 exist. By means of the definition of the dispersion matrix and the substitution of zQ of type hom ˛-VIP the proof of the second representation has been straightforward. (xiii) bias ˇ for hom BLIP, hom S-BLIP and hom ˛-VIP As soon as we substitute into the bias ˇ WD EfQzg Efzg D Efzg C EfQzg the various predictors zQ of the type hom BLIP, hom S-BLIP and hom ˛-VIP we are directly led to various bias representations ˇ of type hom BLIP, hom S-BLIP and hom ˛-VIP. (xiv) MSPEfQzg of type hom BLIP, hom S-BLIP and hom ˛-VIP
8-2 Examples
399
MSPEfQzg WD Ef.Qz z/.Qz z/0 g zQ z D zQ EfQzg .z Efzg/ D zQ EfQzg .z Efzg/ .Efzg EfQzg/ Ef.Qz z/.Qz z/0 g D Ef.Qz EfQzg/..Qz EfQzg/0 g C Ef.z Efzg/.z Efzg/0 gC C.EfQzg Efzg/.EfQzg Efzg/0 MSPEfQzg D DfQzg C Dfzg C ˇˇ 0 : At first we have defined the Mean Square Prediction Error M SPEfQzg of zQ . Secondly we have decomposed the difference zQ z into the three terms • zQ EfQzg • z Efzg • Efzg EfQzg in order to derive thirdly the decomposition of M SPEfQzg, namely • The dispersion matrix of zQ , namely DfQzg • The dispersion matrix of z, namely Dfzg • The quadratic bias ˇˇ 0 . As soon as we substitute MSPEfQzg the dispersion matrix DfQzg and the bias vector ˇ of various predictors zQ of the type hom BLIP, hom S-BLIP and hom ˛-VIP we are directly led to various representations ˇ of the Mean Square Prediction Error MSPEfQzg. Here is our proof’s end. |
8-2 Examples Example 8.1 Nonlinear error propagation with random effect models Consider a function y D f .z/ where y is a scalar valued observation and z a random effect. Three cases can be specified as follows: Case 1 (z assumed to be known): By Taylor series expansion we have f .z/ D f .z / C
1 0 1 f .z /.z z / C f 00 .z /.z z /2 C O.3/ 1Š 2Š
1 00 f .z /Ef.z z /2 g C O.3/ 2Š leading to (cf. E. Grafarend and B. Schaffrin 1983, p. 470) 1 Efyg D f .z / C f 00 .z /z2 C O.3/ 2Š Efyg D Eff .z/g D f .z / C
400
8 The Fourth Problem of Probabilistic Regression
Ef.y Efyg/2 g D EfŒf 0 .z /.z z / C
1 00 f .z /.z z /2 C O.3/ 2Š
1 00 f .z /z2 O.3/2 g; 2Š
hence EfŒy EfygŒŒy Efygg is given by 1 y2 D f 02 .z /z2 f 002 .z /z4 C f 0 f 00 .z /Ef.z z /3 g 4 1 C f 002 Ef.z z /4 g C O.3/: 4 Finally if z is quasi-normally distributed, we have 1 y2 D f 02 .z /z2 C f 002 .z /z4 C O.3/: 2 Case 2 (z unknown, but 0 known as a fixed effect approximation (this model is implied in E. Grafarend and B. Schaffrin 1983, p.470, 0 ¤ z )): 1 1 By Taylor series expansion we have f .z/ D f .0 /C f 0 .0 /.z0 /C f 00 .0 /.z 1Š 2Š 0 /2 C O.3/ using 0 D z C .0 z / ) z 0 D z z C .z 0 / we have 1 0 1 f .0 /.z z / C f 0 .0 /.z 0 / 1Š 1Š 1 1 C f 00 .0 /.z z /2 C f 00 .0 /.z 0 /2 2Š 2Š Cf 00 .0 /.z z /.z 0 / C O.3/
f .z/ D f .0 / C
and 1 Efyg D Eff .z/g D f .0 / C f 0 .0 /.z 0 / C f 00 .0 /z2 2 1 C f 00 .0 /.z 0 /2 C O.3/ 2 leading to EfŒy EfygŒŒy Efygg as
8-2 Examples
401
z2 D f 02 .0 /z2 C f 0 f 00 .0 /Ef.z z /3 g C 2f 0 f 00 .0 /z2 .z 0 / 1 C f 002 .0 /Ef.z z /4 g C f 002 .0 /Ef.z z /3 g.z 0 / 4 1 f 002 .0 /z4 C f 002 .0 /z2 .z 0 /2 C O.3/ 4 and with z being quasi-normally distributed, we have 1 z2 Df 02 .0 /z2 C2f 0 f 00 .0 /z2 .z 0 /C f 002 .0 /z4 Cf 002 .0 /z2 .z 0 /2 CO.3/ 2 with the first and third terms (on the right hand side) being the right hand side-terms of case 1 (cf. E. Grafarend and B. Schaffrin 1983, p.470). Case 3 (z unknown, but z0 known as a random effect approximation): By Taylor series expansion we have f .z/ D f .z / C C
1 0 1 f .z /.z z / C f 00 .z /.z z /2 C 1Š 2Š
1 000 f .z /.z z /3 C O.4/ 3Š
changing z z D z0 z D z0 Efz0 g .z Efz0 g/ and the initial bias .z Efz0 g/ D Efz0 g z DW ˇ0 leads to z z D z0 Efz0 g C ˇ0 : Consider
.z z /2 D .z0 Efz0 g/2 C ˇ02 C 2.z0 Efz0 g/ˇ0
we have 1 0 1 f .z /.z0 Efz0 g/ C f 0 .z /ˇ0 1Š 1Š 1 00 1 C f .z /.z0 Efz0 g/2 C f 00 .z /ˇ02 Cf 00 .z /.z0 Efz0 g/ˇ0 CO.3/ 2Š 2Š 1 1 Efyg D f .z / C f 0 .z /ˇ0 C f 00 .z /z20 C f 00 .z /ˇ02 C O.3/ 2 2 f .z/ D f .z / C
leading to EfŒy EfygŒŒy Efygg as
402
8 The Fourth Problem of Probabilistic Regression
y2 D f 02 .z /z20 C f 0 f 00 .z /Ef.z0 Efz0 g/3 g 1 C2f 0 f 00 .z /z20 ˇ0 C f 002 .z /Ef.z0 Efz0 g/4 g 4 Cf 002 .z /Ef.z0 Efz0 g/3 gˇ0 C f 002 .z /z20 ˇ02 1 1 C f 002 .z /z40 f 002 .z /Ef.z0 Efz0 g/2 gz20 C O.3/ 4 2 and with z0 being quasi-normally distributed, we have 1 y2 D f 02 .z /z20 C 2f 0 f 00 .z /z20 ˇ0 C f 002 .z /z40 C f 002 .z /z20 ˇ02 C O.3/ 2 with the first and third terms (on the right-hand side) being the right-hand side terms of case 1. Example 8.2 Nonlinear vector valued error propagation with random effect models In a GeoInformation System we ask for the quality of a nearly rectangular planar surface element. Four points fP1 ; P2 ; P3 ; P4 g of an element are assumed to have the coordinates .x1 ; y1 /; .x2 ; y2 /; .x3 ; y3 /; .x4 ; y4 / and form a 88 full variancecovariance matrix (central moments of order two) and moments of higher order. The planar surface element will be computed according the Gauß trapezoidal: F D
4 X yi C yi C1 .xi xi C1 / 2 i D1
with the side condition x5 D x1 ; y5 D y1 . Note that within the Error Propagation Law @2 F ¤0 @x@y holds (Fig. 8.3). First question ? What is the structure of the variance-covariance matrix of the four points if we assume statistical homogeneity and isotropy of the network (Taylor– Karman structure)? Second question ! Approach the criterion matrix in terms of absolute coordinates. Interpolate the correlation function linear! (Tables 8.1 and 8.2) Our example refers to the Taylor–Karman structure or the structure function introduced in Sect. 3-222.
8-2 Examples
403 P3
P2
e2
P4
P1 e1
Fig. 8.3 Surface element of a building in the map Table 8.1 Coordinates of a four-dimensional simplex
Table 8.2 Longitudinal and lateral correlation functions ˙m and ˙` for a Taylor–Korman structured 4 point network
Point
x
y
P1 P2 P3 P4
100.00m 110.00m 101.34m 91.34m
100.00m 117.32m 122.32m 105.00m
jxj 10m 20m 30m
˙m .jxj/ 0.700 0.450 0.415
˙l .jxj/ 0.450 0.400 0.238
:Solution: The Gauß trapezoidal surface element has the size: F D
y1 C y2 y2 C y3 y3 C y4 y4 C y1 .x1 x2 /C .x2 x3 /C .x3 x4 /C .x4 x1 /: 2 2 2 2
Once we apply the “error propagation law” we have to use. 1 F2 D J†J0 C H.vec†/.vec†/0 H0 : 2 In our case, n D 1 holds since we have only one function to be computed. In contrast, the variance-covariance matrix enjoys the format 88, while the Jacobi matrix of first derivatives is a 18 matrix and the Hesse matrix of second derivatives is a 164 matrix.
404
8 The Fourth Problem of Probabilistic Regression
Table 8.3 Distances and meridian correlation ˙` pq jxp xq j jxp xq j2 1-2 20.000 399.982 1-3 22.360 499.978 1-4 10.000 100.000 2-2 10.000 100.000 2-4 22.360 499.978 3-4 20.000 399.982
Table 8.4 Distance function versus ˙m .x/; ˙` .x/
function ˙m and longitudinal correlation function ˙m 0.45 0.44 0.70 0.70 0.44 0.45
˙l 0.40 0.36 0.45 0.45 0.36 0.40
xp xq 10 1.34 8.66 8.66 18.66 10
yp yq 17.32 22.32 5 5 12.32 17.32
jxj
˙m .x/
˙` .x/
10-20 20-30
0:95 0:025jxj 0:52 0:0035jxj
0:5 0:005jxj 0:724 0:0162jxj
Table 8.5 Taylor–Karman matrix for the case study x1 y1 x2
x1
y1
x2
y2
x3
y3
x4
y4
1
0
0.438
–0.022
0.441
–0.005
0.512
0.108
1
–0.022
0.412
–0.005
0.361
0.108
0.638
1
0
0.512
0.108
0.381
–0.037
1
0.108
0.634
–0.037
0.417
1
0
0.438
–0.022
1
–0.022
0.412
1
0
y2 x3
symmetric
y3 x4 y4
1
(i) The structure of the homogeneous and isotropic variance-covariance matrix is such that locally 22 variance-covariance matrices appear as unit matrices generating local error circles of identical radius. (ii) The celebrated Taylor–Karman matrix for absolute coordinates is given by † ij .xp ; xq / D ˙m .jxp xq j/ıij C Œ˙` .jxp xq j/ ˙m .jxp xq j/
xi xj jxp xq j2
subject to x1 WD x D xp xq ; x2 WD y D yp yq ; i; j 2 f1; 2gI p; q 2 f1; 2; 3; 4g: By means of a linear interpolation we have derived the Taylor–Karman matrix by Tables 8.3 and 8.4. P P Once we take care of m and ` as a function of the distance for gives values of tabulated distances we arrive at the Taylor–Karman correlation values of type Table 8.5.
8-2 Examples
405
Finally, we have computed the Jacobi matrix of first derivatives in Table 8.6 and the Hesse matrix of second derivatives in Table 8.7. Table 8.6 Table of the Jacobi matrix “Jacobi matrix”
@F @F @F @F JD ; ;:::; ; @x1 @y1 @x4 @y4 JD
1 Œy2 y4 ; x4 x2 ; y3 y1 ; x1 x3 ; y4 y2 ; x2 x4 ; y1 y3 ; x3 x1 2 J D 12 Œ12:32; 18:66; 22:32; 1:34; 12:32; 18:66; 22:32; 1:34: Note: 3 yi C1 yi 1 @F x0 D x4 D 7 y0 D y4 @xi 2 7 xi C1 xi 1 5 x5 D x1 @F D @yi 2 y5 D y1 :
Table 8.7 Table Hesse matrix “Hesse matrix” HD
@ @x0
˝
@ @x0 F .x/ D
@ @F @F @F @F D 0˝ ; ;:::; ; @x @x1 @y1 @x4 @y4 2 @2 F @2 F @2 F @2 F @2 F @ F ; ; : : : ; ; ; : : : ; ; D @x1 @y4 @y1 @x1 @y4 @x4 @y 2 @x 2 @x1 @y1 1 @ @F @F @F @F @ ; :::; : D ; :::; ; :::; @x1 @x1 @y4 @y4 @x1 @y4 Note the detailed computation in Table 8.8. Table 8.8 Second derivatives 0, +1/2, -1/2 :“interims formulae: Hesse matrix”:
406
8 The Fourth Problem of Probabilistic Regression
@2 F @2 F D D 0 8 i; j D 1; 2; 3; 4 @xi @xj @yi @yj @2 F @2 F D D 0 8 i D 1; 2; 3; 4 @xi @yi @yi @xi @2 F @2 F 1 8 i D 1; 2; 3; 4 D D @xi @yi 1 @yi @xi C1 2 @2 F @2 F 1 8 i D 1; 2; 3; 4: D D @yi @xi 1 @xi @yi C1 2 Results At first, we list the distances fP1 P2 ; P2 P3 ; P3 P4 ; P4 P1 g of the trapezoidal finite element by jP1 P2 j D 20 (for instance 20 m), jP2 P3 j D 10 (for instance 10 m), jP3 P4 j D 20 (for instance 20 m) and jP4 P1 j D 10 (for instance 10 m). Second, we compute F2 .first term/ D J˙ J0 by F2 .first term/ 1 D Œ12:32; 18:66; 22:32; 1:34; 12:32; 18:62; 22:32; 1:34 2 2 3 1 0 0:438 0:022 0:442 0:005 0:512 0:108 6 0 1 0:022 0:412 0:005 0:362 0:108 0:638 7 6 7 6 0:438 0:022 1 0 0:512 0:108 0:386 0:037 7 6 7 6 7 0 1 0:108 0:638 0:037 0:418 7 6 0:022 0:412 6 7 6 0:442 0:005 0:512 0:108 1 0 0:438 0:022 7 6 7 6 0:005 0:362 0:108 0:638 0 1 0:022 0:412 7 6 7 4 0:512 0:108 0:386 0:037 0:438 0:022 1 0 5 0:108 0:638 0:037 0:418 0:022 0:412 0 1 1 Œ12:32; 18:66; 22:32; 1:34; 12:32; 18:62; 22:32; 1:340 2 D 334:7117: Third, we need to compute F2 .second term/ D F2 .second term/ D where
1 H.vec†/.vec†/0 H0 by 2
1 H.vec†/.vec†/0 H0 D 7:2222 1035 2
8-2 Examples
407
2 6 6 6 6 6 6 6 6 6 6 6 H D vec 6 6 6 6 6 6 6 6 6 6 4
0 0 0 1 2 0 0 0 1 2
1 0 2 1 0 0 2 1 0 0 2 0 0 0 1 0 0 2 1 0 0 2 1 0 0 2 0 0 0 0
0
0
0
0 1 0 2 1 0 2 0 0 0
0 1 0 2 1 0 2
13 0 27 7 1 0 7 7 2 7 0 0 7 7 7 7 0 0 7 7 ; 1 7 7 0 7 2 7 7 1 0 7 7 2 7 0 0 7 7 5 0 0
and 2
3 1 0 0:438 0:022 0:442 0:005 0:512 0:108 6 0 1 0:022 0:412 0:005 0:362 0:108 0:638 7 6 7 6 0:438 0:022 1 0 0:512 0:108 0:386 0:037 7 6 7 6 7 0 1 0:108 0:638 0:037 0:418 7 6 0:022 0:412 vec† D vec 6 7: 6 0:442 0:005 0:512 0:108 1 0 0:438 0:022 7 6 7 6 0:005 0:362 0:108 0:638 0 1 0:022 0:412 7 6 7 4 0:512 0:108 0:386 0:037 0:438 0:022 1 0 5 0:108 0:638 0:037 0:418 0:022 0:412 0 1 Finally, we get the variance of the planar surface element F F2 D 334:7117 C 7:2222 1035 D 334:7117 i.e. F D ˙18:2951.m2 /: Example 8.3 Nonlinear vector valued error propagation with random effect models The distance element between P1 and P2 has the size: p F D .x2 x1 /2 C .y2 y1 /2 : Once we apply the “error propagation law” we have to use. 1 F2 D J†J0 C H.vec†/.vec†/0 H0 : 2 Table 8.6: Table of the Jacobi matrix “Jacobi matrix”
408
8 The Fourth Problem of Probabilistic Regression
JD
@F @F @F @F JD ; ;:::; ; @x1 @y1 @x4 @y4
.x2 x1 / .y2 y1 / .x2 x1 / .y2 y1 / ; ; ; ; 0; 0; 0; 0 F F F F
J D Œ0:5; 0:866; 0:5; 0:866; 0; 0; 0; 0: Table 8.7: Table Hesse matrix “Hesse matrix” HD
@ @ ˝ 0 F .x/ 0 @x @x
@ @F @F @F @F D 0˝ ; ;:::; ; @x @x1 @y1 @x4 @y4 D D
@2 F @2 F @2 F @2 F @2 F @2 F ; ; : : : ; ; ; : : : ; ; @x1 @y4 @y1 @x1 @y4 @x4 @y 2 @x12 @x1 @y1 @ @x1
@F @F ; :::; @x1 @y4
; :::;
@ @y4
@F @F ; :::; @x1 @y4
:
Note the detailed computation in Table 8.8. Table 8.8: Second derivatives : “interims formulae: Hesse matrix”: .x2 x1 /2 @F 1 @F .x2 x1 /.y2 y1 / D ; :::; ; ; @x1 @y4 F F3 F3 .x 2 x 2 / .x2 x1 /.y2 y1 / 1 C 2 3 1 ; ; 0; 0; 0; 0 ; F F F3 .y2 y1 /2 @F .x2 x1 /.y2 y1 / 1 @F D ; :::; ; ; 3 @x1 @y4 F F F3 .y22 y12 / 1 .x2 x1 /.y2 y1 / ; C ; 0; 0; 0; 0 F3 F F3 .x2 x1 /2 .x2 x1 /.y2 y1 / @F 1 @F D C ; :::; ; ; @x1 @y4 F F3 F3 .x22 x12 / 1 .x2 x1 /.y2 y1 / ; ; 0; 0; 0; 0 ; F F3 F3
@ @x1
@ @y1
@ @x2
8-2 Examples
@ @y2
409
@ @xi @ @yi
.y2 y1 /2 .x2 x1 /.y2 y1 / 1 C ; ; F3 F F3 .y22 y12 / .x2 x1 /.y2 y1 / 1 ; ; 0; 0; 0; 0 ; F3 F F3 3 @F @F D Œ0; 0; 0; ; 0; 0; 0; 0; 0 7 ; :::; @y4 7 ; 8 i D 3; 4: @x1 5 @F @F D Œ0; 0; 0; ; 0; 0; 0; 0; 0 ; :::; @x1 @y4
@F @F ; :::; @x1 @y4
D
Results At first, we list the distance fP1 P2 g of the distance element by jP1 P2 j D20 (for instance 20m). Second, we compute F2 .first term/ D J˙ J0 by F2 .first term/ D Œ0:5; 0:866; 0:5; 0:866; 0; 0; 0; 0 2 3 1 0 0:438 0:022 0:442 0:005 0:512 0:108 6 0 1 0:022 0:412 0:005 0:362 0:108 0:638 7 6 7 6 0:438 0:022 1 0 0:512 0:108 0:386 0:037 7 6 7 6 7 0 1 0:108 0:638 0:037 0:418 7 6 0:022 0:412 6 7 6 0:442 0:005 0:512 0:108 1 0 0:438 0:022 7 6 7 6 0:005 0:362 0:108 0:638 0 1 0:022 0:412 7 6 7 4 0:512 0:108 0:386 0:037 0:438 0:022 1 0 5 0:108 0:638 0:037 0:418 0:022 0:412 0 1 Œ0:5; 0:866; 0:5; 0:866; 0; 0; 0; 00 D 1:2000: Third, we need to compute F2 .second term/ D F2 .second term/ D where
2
1 H.vec†/.vec†/0 H0 by 2
1 H.vec†/.vec†/0 H0 D 0:0015 2
0:0375 0:0217 0:0375 0:0217 0 0 6 0:0217 0:0125 0:0217 0:0125 0 0 6 6 0:0375 0:0217 0:0375 0:0217 0 0 6 6 6 0:0217 0:0125 0:0217 0:0125 0 0 H D vec 6 6 0 0 0 0 00 6 6 0 0 0 0 00 6 4 0 0 0 0 00 0 0 0 0 00
3 00 0 07 7 0 07 7 7 0 07 7; 0 07 7 0 07 7 0 05 00
410
8 The Fourth Problem of Probabilistic Regression
and 2
3 1 0 0:438 0:022 0:442 0:005 0:512 0:108 6 0 1 0:022 0:412 0:005 0:362 0:108 0:638 7 6 7 6 0:438 0:022 1 0 0:512 0:108 0:386 0:037 7 6 7 6 7 0 1 0:108 0:638 0:037 0:418 7 6 0:022 0:412 vec† D vec 6 7: 6 0:442 0:005 0:512 0:108 1 0 0:438 0:022 7 6 7 6 0:005 0:362 0:108 0:638 0 1 0:022 0:412 7 6 7 4 0:512 0:108 0:386 0:037 0:438 0:022 1 0 5 0:108 0:638 0:037 0:418 0:022 0:412 0 1 Finally, we get the variance of the distance element F F2 D 1:2000C0:0015 D 1:2015 i.e.
F D ˙1:0961.m/:
Chapter 9
The Fifth Problem of Algebraic Regression: The System of Conditional Equations: Homogeneous and Inhomogeneous Equations: fBy D Bi versus c C By D Big
Conditional equations in its linear form are a standard topic in Geodetic Sciences. We mention for example F.R. Helmert (1907) and H. Wolf (1986). Here we start with weighted Least Squares (Gy - LESS) of inconsistent homogeneous and inhomogeneous linear conditional equations. Simple examples are .i / the triplet of angular observations and .i i / the sum of planar triangles. The stochastic approach is left to Chapter 13. Here we shall outline two systems of poor conditional equations, namely homogeneous and inhomogeneous inconsistent equations. First, Definition 9.1 gives us Gy -LESS of inconsistent homogeneous conditional equations which we characterize as the least squares solution with respect to the Gy -seminorm (Gy -norm) by means of Lemma 9.2, Lemma 9.3 (Gy -norm) and Lemma 9.4 (Gy -seminorm). Second, Definition 9.5 specifies Gy /-LESS of inconsistent inhomogeneous conditional equations which alternatively characterize as the corresponding least squares solution with respect to the Gy -seminorm by means of Lemma 9.6. Third, we come up with examples.
9-1 Gy -LESS of a System of a Inconsistent Homogeneous Conditional Equations Our point of departure is Definition 9.1 by which we define Gy -LESS of a system of inconsistent homogeneous conditional equations. Definition 9.1. (Gy -LESS of a system of inconsistent homogeneous conditional equations): E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 9, © Springer-Verlag Berlin Heidelberg 2012
411
412
9 The Fifth Problem of Algebraic Regression:
Lemma 1.2 xm Gx-MINOS of x
Lemma 1.3 xm Gx-MINOS of x
Definition 1.1 xm Gx-MINOS of x
Lemma 1.4 characterization of Gx-MINOS
Lemma 1.6 adjoint operator A#
Definition 1.5 Adjoint operator A#
Lemma 1.7 eigenspace analysis versus eigenspace synthesis
Corollary 1.8 Symmetric pair of eigensystems
Lemma 1.9 Canonical MINOS
The guideline of chapter one: definitions, lemmas and corollary
An n 1 vector i` of inconsistency is called Gy -LESS (LEast Squares Solution with respect to the Gy -seminorm) of the inconsistent system of linear conditional equations Bi D By; (9.1) if in comparison to all other vectors i 2 Rn the inequality jji` jj2Gy WD i0 ` Gy i` i0 Gy i DW jjijj2Gy
(9.2)
holds in particular if the vector of inconsistency i` has the least Gy -seminorm. Lemma 9.2 characterizes the normal equations for the least squares solution of the system of inconsistent homogeneous conditional equations with respect to the Gy -seminorm.
9-1 Gy -LESS of a System of a Inconsistent Homogeneous Conditional Equations
413
Lemma 9.2. (least squares solutions of the system of inconsistent homogeneous conditional equations with respect to the Gy -seminorm): An n 1 vector i` of the system of inconsistent homogeneous conditional equations Bi D By (9.3) is Gy -LESS if and only if the system of normal equations
Gy B0 B 0
i` `
D
0 By
(9.4)
with the q 1 vector ` of “Lagrange multipliers” is fulfilled. Proof. Gy -LESS of Bi D By is constructed by means of the Lagrangean L.i; / WD i0 Gy i C 20 .Bi By/ D min : i;
The first derivatives @L .i` ; ` / D 2.Gy i` C B0 ` / D 0 @i @L .i` ; ` / D 2.Bi` By/ D 0 @ constitute the necessary conditions. (The theory of vector-valued derivatives is presented in Appendix B). The second derivatives @L .i` ; ` / D 2Gy 0 @i@i0 build up due to the positive semidefiniteness of the matrix Gy the sufficiency condition for the minimum. The normal equations (9.4) are derived from the two equations of first derivatives, namely
Gy B0 B 0
i` `
0 D : By
Lemma 9.3 is a short review of the system of inconsistent homogeneous conditional equations with respect to the Gy -norm, Lemma 9.4 alternatively with respect to the Gy -seminorm. Lemma 9.3. (least squares solution of the system of inconsistent homogeneous conditional equations with respect to the Gy -norm):
414
9 The Fifth Problem of Algebraic Regression:
An n 1 vector i` of the system of inconsistent homogeneous conditional equations Bi D By is the least squares solution with respect to the Gy -norm if and only if it solves the normal equations
The solution
0 1 Gy i` D B0 .BG1 y B / By:
(9.5)
1 0 1 0 i` D G1 y B .BGy B / By
(9.6)
is unique. The “goodness of fit” of Gy -LESS is 0 1 jji` jj2Gy D i0 ` Gy i` D y0 B0 .BG1 y B / By:
(9.7)
Proof. A basis of the proof could be C.R. Rao’s Pandora Box, the theory of inverse partitioned matrices (Appendix A: Fact: Inverse Partitioned Matrix /IPM/ of a symmetric matrix). Due to the rank identity rkGy D n, the normal equations (9.4) can be faster solved by Gauss elimination. Gy i` C B0 ` D 0 Bi` D By: Multiply the first normal equation by BG1 and substitute the second normal y equation for Bi` . # 1 0 BG1 y Gy i` D Bi` D BGy B ` ) Bi` D By 0 BG1 y B ` D By 0 1 ` D .BG1 y B / By:
Finally we substitute the “Lagrange multiplier” ` back to the first normal equation in order to prove 0 1 Gy i` C B0 ` D Gy i` B0 .BG1 y B / By D 0 1 0 1 0 i` D G1 y B .BGy B / By:
We switch immediately to Lemma 9.4. Lemma 9.4. (least squares solution of the system of inconsistent homogeneous conditional equations with respect to the Gy -seminorm): An n 1 vector i` of the system of inconsistent homogeneous conditional equations Bi D By is the least squares solution with respect to the Gy -seminorm if the compatibility condition R.B0 / R.Gy / (9.8)
9-2 Solving a System of Inconsistent Inhomogeneous Conditional Equations
415
is fulfilled, and solves the system of normal equations 0 1 Gy i` D B0 .BG1 y B / By;
(9.9)
which is independent of the choice of the g-inverse G y.
9-2 Solving a System of Inconsistent Inhomogeneous Conditional Equations The text point of departure is Definition 9.5, a definition of Gy -LESS of a system of inconsistent inhomogeneous condition equations. Definition 9.5. (Gy -LESS of a system of inconsistent inhomogeneous conditional equations): An n 1 vector i` of inconsistency is called Gy -LESS (LEast Squares Solution with respect to the Gy -seminorm) of the inconsistent system of inhomogeneous conditional equations c C By D B i (9.10) (the minus sign is conventional), if in comparison to all other vectors i 2 Rn the inequality jji` jj2 WD i0 ` Gy i` i0 Gy i DW jjijj2Gy
(9.11)
holds, in particular if the vector of inconsistency i` has the least Gy -seminorm. Lemma 9.6 characterizes the normal equations for the least squares solution of the system of inconsistent inhomogeneous conditional equations with respect to the Gy -seminorm. Lemma 9.6. (least squares solution of the system of inconsistent inhomogeneous conditional equations with respect to the Gy -seminorm): An n 1 vector i` of the system of inconsistent homogeneous conditional equations Bi D By c D B.y d/ (9.12) is Gy -LESS if and only if the system of normal equations
Gy B0 B O
i` `
D
O ; By c
with the q 1 vector of Lagrange multipliers is fulfilled. i` exists surely if
(9.13)
416
9 The Fifth Problem of Algebraic Regression:
R.B0 / R.Gy /
(9.14)
and it solves the normal equations 0 1 Gy i` D B0 .BG1 y B / .By c/;
(9.15)
which is independent of the choice of the g-inverse G y . i` is unique if the matrix Gy is regular and in consequence positive definite.
9-3 Examples Our two examples relate to the triangular condition, the so-called zero misclosure, within a triangular network, and the condition that the sum within a flat triangle accounts to 180o . (i) The first example: triplet of angular observations We assume that three observations of height differences within the triangle P˛ Pˇ P sum up to zero. The condition of holonomic heights says h˛ˇ C hˇ C h ˛ D 0; namely 2
3 2 3 h˛ˇ i˛ˇ 6 7 B WD Œ1; 1; 1; y WD 4 hˇ 5 ; i WD 4 iˇ 5 : i ˛ h ˛ The normal equations of the inconsistent condition read for the case Gy D I3 : i` D B0 .BB0 /1 By; 2 3 111 1 B0 .BB0 /B D 4 1 1 1 5; 3 111 .i˛ˇ /` D .iˇ /` D .i ˛ /` D
1 .h C hˇ C h ˛ / 3 ˛ˇ
(ii) The second example: sum of planar triangles Alternatively, we assume: three angles which form a planar triangle of sum to ˛ C ˇ C D 180ı namely
9-3 Examples
417
The normal equations of the inconsistent condition equation read in our case Gy D I3 : i` D B0 .BB0 /1 .By c/; 2 3 1 1 B0 .BB0 /1 D 4 1 5 ; By c D ˛ C ˇ C 180ı ; 3 1 2 3 1 1 .i˛ /` D .iˇ /` D .i /` D 4 1 5 .˛ C ˇ C 180ı /: 3 1
Chapter 10
The Fifth Problem of Probabilistic Regression Setup of BLUUE for the moments of first order (Kolmogorov–Wiener prediction) We define the fifth problem of probabilistic regression by the inhomogeneous general linear Gauss-Markov model including fixed effects as well as random effect, namely by AŸ + CEfzg + y = Efyg together with variance-covariance matrices †z and †y being unknown as well as Ÿ, Efzg, and Efyg, y. It is the standard model of Kolmogorov-Wiener Prediction in its general form. By Theorem 10.3 we estimate Ÿ O EfOzg†y - Best Linear Uniformly Unbiased Estimation (Ÿ, O EfOzg†y and Efzg by (Ÿ, – BLUUE)) following C.R. Rao and J. Kleffe (1988 pp. 161 – 180) in section one. Section two is an attempt to construct an explicit representation of the error budget in the general Gauss-Markov model with mixed effects. Restrictive is the assumption Dfyg = V 2 (one parameter) and Dfzg = Z 2 (one parameter) and the correlations (V,Z) to be known! For such a model, we will be able by Theorem 10.5 to construct a homogeneous quadratic estimate of 2 . We leave the question open whether such a model is realistic. Instead we will give an example from linear collocation with Ÿ, Efzg, Efyg, †z , †y to be unknown, but y&$$$; to be known. We depart from analyzing a height network observed at two epochs. At the initial epoch three height differences have been observed. From the first epoch to the second epoch we assume height differences which change linear in time, for instance caused by an Earthquake. We apply a height varying model h˛ˇ ./ D h˛ˇ .O/ C h˛ˇ .O/ C O. 2 / where denotes the time difference from the first epoch to the second epoch. Section four will be a list of comments about linear and nonlinear prediction of type Kolmogorov-Wiener, an extensive list of geodetic examples from hybrid concepts of vertical deflections, gravity gradients and gravity values and specific trend model. An extensive list of references is given. Finally, we comment on (i) best inhomogeneous linear prediction, (ii) best homogeneous linear prediction (iiii) best homogeneous linear unbiased prediction by the dispersion ordering D3 D2 D1 When we apply instead of Kolmogorov-Wiener linear prediction for absolute quantities, we arrive at the more general concept of Krige prediction when we use differences, for instance kyP1 yP2 k2 D E fyP1 yP2 .EfyP1 g EfyP2 g/g2 E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 10, © Springer-Verlag Berlin Heidelberg 2012
419
420
10 The Fifth Problem of Probabilistic Regression
For a weakly relative translational invariant stochastic process establishing the Kolmogorov’s structure function. Alternatively, we are able to analyze higher order variance-covariance function, for instance designed by E. Grafarend (1984). Fast track reading: Read only Theorems 10.3 and 10.5.
“Prediction company’s chance of success is not zero, but close to it.”–Eugene Fama “The best way to predict the future is to invent it.”–Alan Kay
Lemma 10.2 Sy - BLUUE of x and E{z}: Definition 10.1 Sy − BLUUE of x and E{z}:
Theorem 10.3 xˆ , E{z} : Sy − BLUUE of x and E{z}: Lemma 10.4 E{y} : xˆ , E{z} Sy − BLUUE of x and E{z}
10.1.1.1 Theorem 10.5 Homogeneous quadratic setup of σˆ 2
The inhomogeneous general linear Gauss–Markov model with fixed effects and random effects will be presented first. We review the special Kolmogorov–Wiener model and extend it by the proper stochastic model of type BIQUUE given by Theorem 10.5. The extensive example for the general linear Gauss–Markov model with fixed effects and random effects concentrates on a height network observed at two epochs. At the first epoch we assume three measured height differences. Inbetween the first and the second epoch we assume height differences which change linear in time, for instance as a result of an earthquake we have found the height difference model h˛ˇ ./ D h˛ˇ .0/ C h˛ˇ C O. 2 /: Namely, indicates the time interval from the first epoch to the second epoch relative to the height difference h˛ˇ . Unknown are
10-1 Inhomogeneous General Linear Gauss–Markov Model
421
• The fixed effects h˛ˇ • The expected values of stochastic effects of type height difference velocities h˛ˇ given the singular dispersion matrix of height differences. Alternative estimation and prediction producers of • type .V C CZC0 /-BLUUE for the unknown fixed parameter vector of height differences of initial epoch and • the expectation data Efzg of stochastic height difference velocities z, and • of type .V C CZC0 /-BLUUE for the expectation data Efyg of height difference measurements y, • of type eQ y of the empirical error vector, • as well as of type .V C CZC0 /-BLUUP of the stochastic vector zQ of height difference velocities. For the unknown variance component 2 of height difference observations we review estimates of type BIQUUE. At the end, we intend to generalize the concept of estimation and prediction of fixed and random effects by a short historical remark.
10-1 Inhomogeneous General Linear Gauss–Markov Model (Fixed Effects and Random Effects) Here we focus on the general inhomogeneous linear Gauss–Markov model including fixed effects and random effects. By means of Definition 10.1 we review † y -BLUUE of and Efzg followed by the related Lemma 10.2, Theorem 10.3 and Lemma 10.4. Box 10.1. Inhomogeneous general linear Gauss–Markov model (fixed effects and random effects) A C CEfzg C D EfNyg
(10.1)
CfNy; zg D 0 † z WD Dfzg; † yN WD DfNyg C fNy; zg D 0 ; Efzg; EfNyg; † z ; † yN unknown unknown EfNyg 2 .ŒA; C/
(10.2)
The n 1 stochastic vector yN of observations is transformed by means of yN DW y to the new n 1 stochastic vector y of reduced observations which is characterized
422
10 The Fifth Problem of Probabilistic Regression
by second order statistics, in particular by the first moments Efyg and by the central second moments Dfyg. Definition 10.1. († y -BLUUE of and Efzg): The partitioned vector D Ly C ; namely " # O O
" D
O
b
Efzg
#
" D
L1
#
L2
" yC
1
#
2
is called † y -BLUUE of (Best Linear Uniformly Unbiased Estimation with respect to † y - norm) in (10.1) if (1st) O is uniformly unbiased in the sense of O D EfLy C g D for all 2 RmCl Efg
(10.3)
or O D EfL1 y C 1 g D for all 2 Rm Efg
b
Ef g O D EfEfzgg D EfL2 y C 2 g D D Efzg for all 2 Rl
(10.4)
and (2nd) in comparison to all other linear uniformly unbiased estimation O has minimum variance. O WD Ef.O /0 .O /g D trL†y L0 D jjL0 jj2 D min trDfg †y
(10.5)
L
or
1
˚ trD f g O WD trDfE fzgg WD E . O /0 . O /
1
1
D Ef.E fzgEfzg/0 .E fzgEfzg fzg/gDtrL2 † y L0 2 DjjL0 jj2†y D min : L2
(10.6) We shall specify † y -BLUUE of and Efzg by means of 1 D 0, 2 D 0 and writing the residual normal equations by means of “Lagrange multipliers”. Lemma 10.2. (˙ y -BLUUE of and Efzg):
1
An .m C l/ 1 vector ŒO 0 ; O 0 0 D ŒO 0 ; E fzg0 0 D ŒL0 1 ; L0 2 0 y C Œ 0 1 ; 0 2 0 is † y BLUUE of Œ 0 ; E fzg0 0 in (10.1), if and only if
1
1 D 0; 2 D 0
10-1 Inhomogeneous General Linear Gauss–Markov Model
423
hold and the matrices L1 and L2 fulfill the system of normal equations 2
3 2 3 32 0 †y A C 0 0 L 1 L0 2 4 A0 0 0 5 4 ƒ11 ƒ12 5 D 4 Im 0 5 ƒ21 ƒ22 C0 0 0 0 Il or
† y L0 1 C A 11 C C 21 D 0; † y L0 2 C A 12 C C 22 D 0 A0 L0 1 D Im ; A0 L0 2 D 0 C0 L0 1 D 0; C0 L0 2 D Il
(10.7)
(10.8)
with suitable matrices ƒ11 ; ƒ12 ; ƒ21 and ƒ22 of “Lagrange multipliers”. Theorem 10.3 specifies the solution of the special normal equations by means of (10.9) relative to the specific “Schur complements” (10.10)–(10.13).
b
O Efzg† y -BLUUE of and Efzg): Theorem 10.3. (;
1
Let ŒO 0 ; E fzg0 0 be † y -BLUUE of the Œ 0 ; Efzg0 0 in the mixed Gauss–Markov model (10.1). Then the equivalent representations of the solution of the normal equations (10.7) O WD
" # O O
" WD
O E fzg
1
#
" D
0 1 A0 †1 y A A †y C 1 C0 †y A C0 †1 y C
#
A0 † 1 y y C0
(10.9)
0 1 1 0 1 1 0 1 1 0 1 O D fA0 †1 A0 †1 y ŒIn C.C †y C/ C †y Ag y ŒIn C.C †y C/ C †y y n o1 1 1 0 1 O D E fzg D C0 †y ŒIn A.A0 †1 y A/ A †y C
1
0 1 1 0 1 C0 †1 y ŒIn A.A †y A/ A †y y 1 O D S A sA
1
O WD E fzg D S1 C sC
1
1 0 1 O D .A0 †1 y A/ A †y .y E fzg/
1
1 0 1 O O WD E fzg D .C0 †1 y C/ C †y .y A/
are completed by the dispersion matrices and the covariance matrices. n o D O WD D
#1 #) " ( " #) (" 0 1 A0 †1 O O y A A †y C D DW † O WD D 0 1 C0 †1 O E fzg y A C †y C
1
424
10 The Fifth Problem of Probabilistic Regression
n o ˚ 1 0 1 1 0 1 D O D A0 †1 DW † O y ŒIn C.C †y C/ C †y A
b
O g O Efzggg Cf; O D Cf; 1 0 1 ˚ 1 0 1 1 0 1 A †y C.C0 †1 D A0 †1 y ŒIn C.C †y C/ C †y A y C/ ˚ 1 1 1 0 1 0 1 1 0 1 D .A0 †1 y A/ A †y C C†y ŒIn A.A †y A/ A †y C
b
0 1 1 0 1 1 D f g O WD DfEfzgg D fC0 †1 DW † O y ŒIn A.A †y A/ A †y Cg
O zg D 0 Cf;
b
Cf ; O zg WD CfEfzg; zg D 0; where the “Schur complements” are defined by 0 1 1 0 1 SA WD A0 †1 y ŒIn C.C †y C/ C †y A;
(10.10)
0 1 1 0 1 sA WD A0 †1 y ŒIn C.C †y C/ C †y y;
(10.11)
0 1 1 0 1 SC WD C0 †1 y ŒIn A.A †y A/ A †y C;
(10.12)
0 1 1 0 1 sC WD C0 †1 y ŒIn A.A †y A/ A †y y
(10.13)
1
1
Our final result (10.14)–(10.23) summarizes (i) the two forms (10.14) and (10.15) of estimating E fyg and DfE fygg as derived covariance matrices, (ii) the empirical error vector eQ y and the related variance-covariance matrices (10.19)–(10.21) and (iii) the dispersion matrices Dy by means of (10.22)–(10.23).
1
b
O Efzg † y -BLUUE of and Efzg): Lemma 10.4. (E fyg W ; (i) With respect to the mixed Gauss–Markov model (10.1) † y -BLUUE of the Efyg D A C CEfzg is given by
1
b
E fyg D AO C CEfzg 1 0 1 1 0 1 D AS1 A sA C C.C †y C/ C †y .y ASA sA /
1
(10.14)
or
b
E fyg D AO C CEfzg 1 1 1 0 1 D A.A0 †1 y A/ A †y .y ASC sC / C CSC sC
(10.15)
with the corresponding dispersion matrices
1
b
DfE fygg D DfAO C CEfzgg
b
b
b
O 0 C Acovf; O EfzggC0 C Ccovf; O EfzggA0 C CDfEfzggC0 D ADfgA
10-1 Inhomogeneous General Linear Gauss–Markov Model
1
425
b
DfE fygg D DfAO C CEfzgg 1 1 0 0 1 1 0 1 D C.C0 †1 y C/ C CŒIn C.C †y C/ C †y ASA
1
1 0 0 1 A0 ŒIn †1 y C.C †y C/ C
b
DfE fygg D DfAO C CEfzgg 1 0 0 1 D A.A0 †1 y A/ A C ŒIn A.A †y 1 0 1 0 1 1 0 A/1 A0 †1 y CSC C ŒIn †y A.A †y A/ A ;
ey where SA ; sA ; SC ; sC are “Schur complements” (10.10), (10.11), (10.12) and (10.13). The covariance matrix of E fyg and z amounts to
1 1 b zg D 0: fyg; zg D CfAO C CEfzg; covfE
(10.16)
1
(ii) If the “error vector” is empirically determined by means of the residual vector eQ y D y E fyg we gain the various representations of type
or
1 1 0 1 eQ y D ŒIn C.C0 †1 y C/ C †y .y ASA sA /
(10.17)
1 1 0 1 eQ y D ŒIn A.A0 †1 y A/ A †y .y CSC sC /
(10.18)
with the corresponding dispersion matrices 1 0 DfQey g D † y C.C0 †1 y C/ C 1 0 1 0 1 0 1 1 0 1 ŒIn C.C0 †1 y C/ C †y ASA A ŒIn †y C.C †y C/ C (10.19)
or 1 0 DfQey g D † y A.A0 †1 y A/ A 1 0 1 0 1 1 0 1 1 0 ŒIn A.A0 †1 y A/ A †y CSC C ŒIn †y A.A †y A/ A ; (10.20)
where SA ; sA ; SC ; sC are “Schur complements” (10.10), (10.11), (10.12) and (10.13). eQ y and z are uncorrelated because of CfQey ; zg D 0: (iii) The dispersion matrices of the observation vector is given by
(10.21)
426
10 The Fifth Problem of Probabilistic Regression
b
b
Dfyg D DfAO C CEfzg C eQ y g D DfAO C CEfzgg C DfQey g Dfyg D DfQey ey g C DfQey g:
(10.22)
1
eQ y and E fyg are uncorrelated since
1
b
CfQey ; E fygg D CfQey ; AO C CEfzgg D CfQey ; eQ y ey g D 0 :
(10.23)
10-2 Explicit Representations of Errors in the General Gauss–Markov Model with Mixed Effects A collection of explicit representations of errors in the general Gauss–Markov model with mixed effects will be presented: , Efzg, yN D y, † z , † y will be assumed to be unknown, known. In addition, CfNy; zg will be assumed to vanish. The prediction of random effects will be summarized here. Note our simple model A C CEfzg D Efyg; Efyg 2 R.ŒA; C/; rkŒA; C D m C ` < n; Efzg unknown, Z 2 D Dfzg, Z positive definite rkZ D s `, V 2 D Dfy Czg, V positive semidefinite rkV D t n; rkŒV; CZ D n, Cfz; y Czg D 0. A homogeneous-quadratic ansatz O 2 D y0 My will be specified now. Theorem 10.5. (homogeneous-quadratic setup of O 2 ): (i) Let O 2 D y0 My D .vecM/0 .y ˝ y/ be BIQUUE of 2 with respect to the model of the front desk. Then O 2 D .n m `/1 Œy0 fIn .V C CZC0 /1 AŒA0 .V C CZC0 /1 A1 A0 g .V C CZC0 /1 y s0 A S1 C sC O 2 D .n m `/1 Œy0 Q.V C CZC0 /1 y s0 A S1 A sA Q WD In .V C CZC0 /1 CŒC0 .V C CZC0 /1 C1 C0 subject to ŒSA ; sA WD A0 .V C CZC0 /1 fIn CŒC0 .V C CZC0 /1 C1 C0 .V C CZC0 /1 g ŒA; y D A0 Q.V C CZC0 /1 ŒA; y and
10-2 Explicit Representations of Errors
427
ŒSC ; sC WD C0 .V C CZC0 /1 fIn AŒA0 .V C CZC0 /1 A1 A0 .V C CZC0 /1 gŒC; y; where SA and SC are “Schur complements”. Alternately, we receive the empirical data based upon O 2 D .n m `/1 y0 .V C CZC0 /1 eQ y D .n m `/1 eQ0 y .V C CZC0 /1 eQ y and the related variances DfO 2 g D 2.n m `/1 4 D 2.n m `/1 . 2 /2 or replacing by the estimations DfO 2 g D 2.n m `/1 .O 2 /2 D 2.n m `/1 ŒeQ0 y .V C CZC0 /1 eQ y 2 : (ii) If the cofactor matrix V is positive definite, we will find for the simple representations of type BIQUUE of 2 the equivalent representations O 2 D .n m `/1 y0 fV1 V1 ŒA; C
A0 V1 A A0 V1 C C0 V1 A C0 V1 C
A0 A1 gy C0
1
O 2 D .n m `/1 .y0 QV y s0 A S1 A sA / 1
1
1
O 2 D .n m `/1 y0 V ŒIn A.A0 V A/1 A0 V .y CS1 C sC / subject to the projection matrix 1
Q D In V1 C.C0 V C/1 C0 and 1
1
1
1
ŒSA ; sA WD A0 V ŒIn C.C0 V C/1 C0 V ŒA; y D A0 QV ŒA; y ŒSC ; sC WD fI` C C0 V1 ŒIn A.A0 V1 A/1 A0 V1 CZg1 C0 V1 ŒIn A.A0 V1 A/1 A0 V1 ŒC; y: Alternatively, we receive the empirical data based upon 1
O 2 D .n m `/1 y0 V eQ y D .n m `/1 eQ0 y V1 eQ y and the related variances DfO 2 g D 2.n m `/1 4 D 2.n m `/1 . 2 /2
428
10 The Fifth Problem of Probabilistic Regression
O O 2 g D 2.n m `/1 .O 2 /2 D 2.n m `/1 .eQ0 y V1 eQ y /2 : Df The proofs are straight forward.
10-3 An Example for Collocation Here we will focus on a special model with fixed effects and random effects, in particular with , Efzg, EfNyg, † z , † y unknown, but known. We depart in analyzing a height network observed at two epochs. At the initial epoch three height differences have been observed. From the first epochs to the second epoch we assume height differences which change linear in time, for instance caused by an Earthquake. There is a height varying model h˛ˇ ./ D h˛ˇ .0/ C h˛ˇ .0/ C O. 2 /; where notes the time difference from the first epoch to the second epoch, related to the height difference. Unknown are the fixed height differences h˛ˇ and the expected values of the random height difference velocities h˛ˇ . Given is the singular dispersion matrix of height difference measurements. Alternative estimation and prediction data are of type .V C CZC0 /-BLUUE for the unknown parameter of height difference at initial epoch and the expected data Efzg of stochastic height difference velocities z, of type .V C CZC0 /-BLUUE of the expected data Efyg of height difference observations y, of type eQ y of the empirical error vector of observations and of type .V C CZC0 /-BLUUP for the stochastic vector zQ of height difference velocities. For the unknown variance component 2 of height difference observations we use estimates of type BIQUUE. In detail, our model assumptions are epoch 1 82 39 2 3 # h˛ˇ > 1 0 " ˆ < e = h˛ˇ 6 7 6 7 e E 4 hˇ 5 D 4 0 1 5 ˆ > hˇ : e ; e h ˛ 1 1 e epoch 2 82 39 2 h˛ˇ > 1 ˆ < e = 6 6 7 E 4 hˇ 5 D 4 0 ˆ > : e ; 1 h ˛ e
32
3
2
h˛ˇ
0 6 7 6 hˇ 76 154 0 56 6 Efh g 4 ˛ˇ 1 Efhˇ g 0
3 7 7 7 7 5
10-3 An Example for Collocation
429
epoch 1 and 2 3 3 39 2 2 1 0 0 0 > > > 7 6 7 7> 6 > 6 0 17 7> 6 0 07 7 6 > 7" 7> 6 " # # = 6 1 1 7 h˛ˇ 7> 6 0 0 7 Efh g 7 6 7 7 6 ˛ˇ C6 7 7 7 D6 6 1 0 7 hˇ 7> 6 0 7 Efhˇ g > 7 7 > 6 7> 6 7 6 7 7> 6 > 4 0 15 5> 4 0 5 > > ; 1 1 3 2 2 3 1 0 h˛ˇ 7 6 6e 7 6 0 17 6 hˇ 7 7 6 6e 7 " # 6 1 1 7 6h 7 h˛ˇ 7 6 6 e ˛ 7 y WD 6 7 WD 7 ; A WD 6 6 1 07 6 k ˛ˇ 7 hˇ 7 6 6e 7 7 6 6 7 4 0 15 4 k ˇ 5 e 1 1 k˛ e 2 3 0 0 6 7 6 0 07 6 7 # " 6 0 07 Efh˛ˇ g 6 7 C WD 6 7 ; Efzg D 6 07 Efhˇ g 6 7 6 7 4 0 5
82 ˆ h˛ˇ ˆ ˆ 6e ˆ ˆ 6 hˇ ˆ ˆ 6 ˆ ˆ <6 e 6h E 6 e ˛ 6k ˆ ˆ ˆ6 e˛ˇ ˆ 6k ˆ ˆ ˆ 4 ˇ ˆ ˆ : e k ˛ e
rank identities rkA D 2; rkC D 2; rkŒA; C D m C l D 4. The singular dispersion matrix Dfyg D V 2 of the observation vector y and the singular dispersion matrix Dfzg D Z 2 are determined in the following. We separate 3 cases. (i) rkV D 6; rkZ D 1 V D I6 ; Z D
1 2
11 11
(ii) rkV D 5; rkZ D 2 1 I2 ; rk.VCCZC0 / D 6 2 (iii) rkV D 4; rkZ D 2
V D Diag.1; 1; 1; 1; 1; 0/; Z D
V D Diag.1; 1; 1; 1; 0; 0/; Z D
1 I2 ; rk .VCCZC0 / D 6: 2
430
10 The Fifth Problem of Probabilistic Regression
In order to be as simple as possible we use the time interval D 1. With the numerical values of matrix inversion and of “Schur-complements”, e.g. Table 10.1: .VCCZC0 /1 Table 10.2: fIn AŒA0 .VCCZC0 /1 A1 A0 .VCCZC0 /1 g Table 10.3: fIn CŒC0 .VCCZC0 /1 C1 C0 .VCCZC0 /1 g Table 10.4: “Schur-complements” SA , SC Table 10.5: vectors sA , sC O Dfg, O Efzg, DfE fygg 1st case: ,
b
1
O D 1 2 1 1 0 0 0 y D 1 2y1 C y2 y3 ; 3 12 1 000 3 y1 C 2y2 C y3 2 2 1 ; 3 12 2 1 1 2 1 1 2y1 y2 C y3 C 2y4 C y5 y6 Efzg D ; yD y1 2y2 y3 C y4 C 2y5 C y6 1 2 1 1 2 1 2 7 5 ; DfEfzgg D 3 57 O D Dfg
b
b
b
b
O Dfg, O Efzg, DfEfzgg 2nd case: , O D 1 2 1 1 0 0 0 y D 1 2y1 C y2 y3 ; 3 12 1 000 3 y1 C 2y2 C y3 O D Dfg
21 ; 12
4 2 2 3 3 3 y; 2 4 2 3 3 3 2 13 5 ; DfEfzgg D 6 5 13
bD Efzg
2 3
b b DfEfzgg b O Dfg, O Efzg, 3rd case: ,
1 2y1 C y2 y3 1 2 1 1 0 0 0 yD ; O D 3 12 1 000 3 y1 C 2y2 C y3 2 2 1 ; 3 12 2 1 1 3 0 0 1 2y1 y2 C y3 C 3y4 ; Efzg D 1 2 1 0 3 0 3 y1 2y2 y3 C 3y5 O D Dfg
b
10-3 An Example for Collocation
431
Table 10.1 Matrix inverse .VCCZC0 /1 for a mixed Gauss–Markov model with fixed and random effects
1st case
2nd case
3rd case
VCCZC0 3 2 100 000 6010 0007 7 6 7 6 6001 0007 7 6 6000 2107 7 6 4000 1205 000 001 2 100 000 6010 000 6 6 6001 000 6 6 0 0 0 2 0 1 6 4000 0 2 1 0 0 0 1 1 2 2 100 000 6010 000 6 6 6001 000 6 6 0 0 0 1 0 1 6 4000 0 1 1 0 0 0 1 1 3
3 7 7 7 7 7 7 7 5 3 7 7 7 7 7 7 7 5
.VCCZC0 /1 3 2 300 000 6030 000 7 7 6 7 16 6003 000 7 7 6 3 6 0 0 0 2 1 0 7 7 6 4 0 0 0 1 2 0 5 000 0 0 3 3 2 400 000 6040 000 7 7 6 7 6 16004 000 7 7 6 4 6 0 0 0 3 1 2 7 7 6 4 0 0 0 1 3 2 5 0 0 0 2 2 4 3 2 100 000 6010 000 7 7 6 7 6 000 7 6001 7 6 6 0 0 0 2 1 1 7 7 6 4 0 0 0 1 2 1 5 0 0 0 1 1 1
Table 10.2 Matrices fIn CŒA0 .VCCZC0 /1 A1 A0 .VCCZC0 /1 g for a mixed Gauss–Markov model with fixed and random effects
1st case
2nd case
0 fIn AŒA .VCCZC0 /1 A1 A03.VCCZC0 /1 g 2 13 7 4 5 1 4 6 7 13 4 1 5 4 7 6 7 7 1 6 6 4 4 16 4 4 8 7 6 7 24 6 11 7 4 19 1 4 7 6 7 4 7 11 4 1 19 4 5 4 4 8 4 4 16 2 3 13 5 6 4 4 3 6 5 13 6 4 4 3 7 6 7 7 1 6 6 6 6 12 0 0 6 7 6 7 24 6 11 5 6 20 4 3 7 6 7 4 5 11 6 4 20 3 5 6 6 12 0 0 18
(Continued)
432
10 The Fifth Problem of Probabilistic Regression
Table 10.2 Continued
3rd case
2
5 6 1 6 16 6 2 6 8 6 3 6 4 1 2
1 2 5 2 2 4 1 2 3 2 2 4
3 1 1 3 2 2 5 1 1 5 2 2
3 0 07 7 7 07 7 07 7 05 8
Table 10.3 Matrices fIn CŒC0 .VCCZC0 /1 C0 1 C0 .VCCZC0 /1 g for a mixed Gauss–Markov model with fixed and random effects
1st case
2nd case
3rd case
fIn CŒC0 .VCCZC0 /1 C1 C0 .VCCZC0 /1 g 2 3 300 0 0 0 60 3 0 0 0 07 6 7 7 16 60 0 3 0 0 07 6 7 3 6 0 0 0 1 1 1 7 6 7 4 0 0 0 1 1 1 5 000 0 0 0 3 2 200 0 0 0 60 2 0 0 0 0 7 7 6 7 6 60 0 2 0 0 0 7 7 6 6 0 0 0 1 1 1 7 7 6 4 0 0 0 1 1 1 5 000 0 0 0 3 2 1000 0 0 60 1 0 0 0 07 7 6 7 6 60 0 1 0 0 07 7 6 60 0 0 0 0 07 7 6 40 0 0 0 0 05 0 0 0 1 1 1
Table 10.4 “Schur-complements ” SA , SC , three cases, for a mixed Gauss– Markov model with fixed and random effects
1st case 2nd case 3rd case
SA (10.10) 2 1 1 2 2 1 1 2 2 1 1 2
SA 1 1 21 3 1 2 1 21 3 12 1 21 3 12
SC (10.12) 1 7 5 8 5 7 1 13 5 24 5 13 1 5 1 8 1 5
SC 1 1 75 3 5 7 1 13 5 6 5 13 1 51 3 15
10-3 An Example for Collocation
433
Table 10.5 Vectors sA and sC , three cases, for a mixed Gauss–Markov model with fixed and random effects
1st case 2nd case 3rd case
sA (10.11) 1 0 1 0 0 01 1 00 1 0 1 0 0 01 1 00 1 0 1 0 0 01 1 00
sC (10.13) 1 9 3 12 9 3 12 y 8 3 9 12 3 9 12 1 7 1 6 4 4 9 y 24 1 7 6 4 4 9 1 3 1 2 5 1 0 y 8 1 3 2 1 5 0
0 y 0 0 y 0 0 y 0
Table 10.6 Numerical values, Case 1
1 1
1st case: E fyg, DfE fygg, eQ y , DfQey1 g 2
2 6 1 6 6 6 1 E fyg D 6 6 0 6 4 0 0
1
1 1 2 1 1 2 0 0 0 0 0 0
3 3 2 0 0 0 2y1 C y2 y3 6 y C 2y C y 7 0 0 0 7 2 3 7 7 6 1 7 7 6 y C 2y C 2y 0 0 0 7 6 1 2 37 7 7y D 6 6 2y4 C y5 y6 7 2 1 1 7 7 7 6 4 y4 C 2y5 y6 5 1 2 1 5 y4 C y5 C 2y6 1 1 2 2
3 2 1 1 0 0 0 6 1 2 1 0 0 0 7 6 7 2 6 6 1 1 2 0 0 0 7 7 DfE fygg D 6 7 3 6 0 0 0 5 4 1 7 6 7 4 0 0 0 4 5 1 5 0 0 0 1 1 2 3 2 1 5 8 5 1 4 6 5 1 8 1 5 4 7 7 6 6 1 6 8 8 4 4 4 8 7 7 eQ y D 7 6 6 12 11 7 4 7 11 8 7 7 6 4 7 11 4 11 7 8 5 4 4 8 8 8 4 2 3 1 1 1 0 0 0 6 1 1 1 0 0 0 7 6 7 2 6 6 1 1 1 0 0 0 7 7 DfQey g D 6 7: 3 6 0 0 0 1 1 1 7 6 7 4 0 0 0 1 1 15 0 0 0 1 1 1
1
434
10 The Fifth Problem of Probabilistic Regression
b
b
b
b
b
b
First case: O Dfg, O Efzg, DfEfzgg , Second case: O Dfg, O Efzg, DfEfzgg , Third case: O Dfg, O Efzg, DfEfzgg. , Here are the results of computing
1
b
E fyg, DfEfzgg, eQ y and DfQey g, ordered as case 1, case 2, and case 3 (Tables 10.6–10.10). Table 10.7 Numerical values, Case 2
1 1
2nd case: E fyg, DfE fygg, eQ y , DfQey1 g 2
4 6 2 6 16 6 2 E fyg D 6 66 0 6 4 0 0
1
2
2 2 4 2 2 4 0 0 0 0 0 0
00 00 00 33 33 00
3 0 0 7 7 7 0 7 7 3 7 7 3 5 6
3 2 1 1 0 0 0 6 1 2 1 0 0 0 7 6 7 2 6 6 1 1 2 0 0 0 7 7 DfE fygg D 6 7 6 0 0 0 5 4 1 7 3 6 7 4 0 0 0 4 5 1 5 0 0 0 1 1 2 2 3 2 3 2y1 2y2 C 2y3 2 2 2 0 0 0 6 2y C 2y 2y 7 6 2 2 2 0 0 0 7 1 2 37 6 6 7 6 7 6 7 1 6 2 2 2 0 0 0 7 6 2y1 2y2 C 2y3 7 eQ y D 6 7 7y D 6 6 3y4 3y5 C 3y6 7 6 6 0 0 0 3 3 3 7 6 7 6 7 4 3y4 3y5 C 3y6 5 4 0 0 0 3 3 3 5 3y4 C 3y5 3y6 0 0 0 0 0 0 2 3 2 2 2 0 0 0 6 2 2 2 0 0 0 7 6 7 7 2 6 6 2 2 2 0 0 0 7 DfQey g D 6 7 6 6 0 0 0 3 3 0 7 6 7 4 0 0 0 3 3 0 5 0 0 0 0 0 0
1
10-3 An Example for Collocation
435
1 1
1st case: E fyg, DfE fygg, eQ y , DfQey g 2
2 6 1 6 6 6 1 E fyg D 6 6 0 6 4 0 0
1
1 1 2 1 1 2 0 0 0 0 0 0
2 3 3 2y1 C y2 y3 0 0 0 6 y C 2y C y 7 0 0 0 7 2 3 7 6 1 7 6 7 7 0 0 0 7 6 y1 C 2y2 C 2y3 7 7y D 6 7 6 2y4 C y5 y6 7 2 1 1 7 6 7 7 4 y4 C 2y5 y6 5 1 2 1 5 1 1 2 y4 C y5 C 2y6 2
3 2 1 1 0 0 0 6 1 2 1 0 0 0 7 6 7 2 6 6 1 1 2 0 0 0 7 7 DfE fygg D 6 7 6 0 0 0 5 4 1 7 3 6 7 4 0 0 0 4 5 1 5 0 0 0 1 1 2 2 3 1 5 8 5 1 4 6 5 1 8 1 5 4 7 6 7 6 1 6 8 8 4 4 4 8 7 7 eQ y D 6 7 12 6 11 7 4 7 11 8 7 6 7 4 7 11 4 11 7 8 5 4 4 8 8 8 4 2 3 1 1 1 0 0 0 6 1 1 1 0 0 0 7 6 7 7 2 6 6 1 1 1 0 0 0 7 DfQey g D 6 7: 3 6 0 0 0 1 1 1 7 6 7 4 0 0 0 1 1 15 0 0 0 1 1 1
1
Table 10.8 Numerical values, Case 3
1 1
3rd case: E fyg, DfE fygg, eQ y , DfQey g 2
0 6 0 6 1 6 6 0 E fyg D 6 3 6 2 6 4 1 1
1
0 0 0 0 0 0 1 1 2 1 1 0
0 0 0 0 0 0 0 0 0 3 3 3
3 0 07 7 7 07 7 07 7 05 0
436
10 The Fifth Problem of Probabilistic Regression
3 2 1 1 0 0 0 6 1 2 1 0 0 0 7 7 6 2 6 6 1 1 2 0 0 0 7 7 DfE fygg D 7 6 3 6 0 0 0 3 0 3 7 7 6 4 0 0 0 0 3 3 5 0 0 0 3 3 6 2 3 2 3 y1 y2 C y3 1 1 1 0 0 0 6 y C 2y y 7 6 1 1 1 0 0 0 7 2 37 6 1 7 6 6 7 6 7 1 6 1 1 1 0 0 0 7 6 y1 y2 C y3 7 eQ y D 7 y D 6 6 7 6 7 0 3 6 0 0 0 0 0 07 6 7 6 7 4 4 0 0 0 0 0 05 5 0 0 0 0 000 0 3 2 1 1 1 0 0 0 6 1 1 1 0 0 0 7 7 6 7 2 6 6 1 1 1 0 0 0 7 DfQey g D 7: 6 3 6 0 0 0 0 0 07 7 6 4 0 0 0 0 0 05 0 0 0 000 2
1
Table 10.9 Data of type zQ , DfQzg for 3 cases 1st case:
1 2 1 1 2 1 1 y; 3 1 2 1 1 2 2 75 DfQzg D ; 3 57 zQ D
2nd case:
1 4 2 2 3 3 3 y; 3 2 4 2 3 3 3 13 5 ; DfQzg D 6 5 13 zQ D
3rd case:
1 2 1 1 3 0 0 zQ D y; 3 1 2 1 0 3 0 51 DfQzg D ; 3 15
10-3 An Example for Collocation
437
O O 2 g for 3 cases Table 10.10 Data of type O 2 , DfO 2 g, Df 1st case: n D 6; m D 2, ` D 2, n m ` D 2 3 7 1 2 5 1 4 6 1 7 2 1 5 4 7 7 6 7 1 06 6 2 2 10 4 4 8 7 2 O D y 6 7 y; DfO 2 g D 4 ; 12 6 5 1 4 7 1 2 7 7 6 4 1 5 4 1 7 2 5 4 4 8 2 2 10 3 2 7 1 2 5 1 4 6 1 7 2 1 5 4 7 7 6 7 6 1 2 0 6 2 2 10 4 4 8 7 O DfO g D fy 6 7 yg2 ; 144 6 5 1 4 7 1 2 7 7 6 4 1 5 4 1 7 2 5 4 4 8 2 2 10 2
2nd case: n D 6; m D 2, ` D 2, n m ` D 2 3 2 2 2 0 0 0 6 2 2 2 0 0 0 7 7 6 7 1 06 6 2 2 2 0 0 0 7 2 y 6 O D 7 y; DfO 2 g D 4 ; 12 6 0 0 0 3 3 3 7 7 6 4 0 0 0 3 3 3 5 0 0 0 3 3 3 3 2 2 2 2 0 0 0 6 2 2 2 0 0 0 7 7 6 7 6 1 2 2 2 0 0 0 7 2 6 O O 2 g D Df fy0 6 7 yg ; 144 6 0 0 0 3 3 3 7 7 6 4 0 0 0 3 3 3 5 0 0 0 3 3 3 2
3rd case: n D 6; m D 2, ` D 2, n m ` D 2 2
1 6 1 6 1 6 6 1 O 2 D y0 6 6 6 0 6 4 0 0
1 1 1 1 1 1 0 0 0 0 0 0
00 00 00 00 00 00
3 0 07 7 7 07 7 y; DfO 2 g D 4 ; 07 7 05 0
438
10 The Fifth Problem of Probabilistic Regression
8 ˆ ˆ ˆ ˆ ˆ ˆ ˆ <
2
1 6 1 6 6 1 2 06 1 O y 6 DfO g D 6 0 144 ˆ ˆ 6 ˆ ˆ ˆ 4 0 ˆ ˆ : 0
1 1 1 0 0 0
1 00 1 0 0 1 00 0 00 0 00 0 00
3 92 > 0 > > > 07 > 7 > = 7 > 07 7y : 07 > > 7 > > 05 > > > ; 0
Here is my journey’s end.
10-4 Comments (i) In their original contribution A. N. Kolmogorov (1941 a, b, c) and N. Wiener (“yellow devil”, 1939) did not depart from our general setup of fixed effects and random effects. Instead they departed from the model “yP˛ D cP˛ C
q X
cP˛ Qˇ yQˇ C
ˇD1
q X
cP˛ Qˇ Q yQˇ yQ C O.y 3 /00
ˇ; D1
of nonlinear prediction, e.g. homogeneous linear prediction yP1 D yQ1 cP1 Q1 C yQ2 cP1 Q2 C C yQq cP1 Qq yP2 D yQ1 cP2 Q1 C yQ2 cP2 Q2 C C yQq cP2 Qq yPp1 D yQ1 cPp1 Q1 C yQ2 cPp1 Q2 C C yQq cPp1 Qq yPp D yQ1 cPp Q1 C yQ2 cPp Q2 C C yQq cPp Qq : From given values of “random effects” .yQ1 ; : : : ; yQp / other values of “random effects” .yP1 ; : : : ; yPp / have been predicted, namely under the assumption of “equal correlation” P of type P1 —– P2 = Q1 —– Q2
2 2 DfE fy gg D C.C†
1 1 E fyP g D C.C†1 yP C/ C†yP yP 1 1 0 yP C/ C
P
or for all yP 2 fyP1 ; : : : ; yPP g
10-4 Comments
439
jjyP yOP jj WD Ef.yP yOP /g2 min; Ef.yP yOP /2 g D Ef.y2i
Qq X
0 y1j cij /2 g D min.K W /
Q1
for homogeneous linear prediction. “ansatz00 EfyP EfyOP gg WD 0; Ef.yP1 EfyOP1 g/.yP2 EfyOP2 g/g D cov.yP1 ; yP2 / QDQq
Ef.yP yOP /2 g D cov.P; P / C
XX Qj
X
cpj cov.P; Qj /
QDQ1
cpj cpk cov.Qj ; Qk / D min
Qk
“Kolmogorov–Wiener prediction” kDq
cpj .K W /D
X
Œcov.Qj ; Qk /1 cov.Qk ; P /
kD1 q q X X cov.P; Qj /cov.P; Qk /Œcov.Qj ; Qk /1 Ef.yP yOP /2 jK W g D cov.P; P / j D1 kD1
constrained to jQj Qk j Qj —+—–+—–+—–+—–+—Qk cov.Qj ; Qk / D cov.Qj Qk / “yP EfyP g is weakly translational invariant 00 KW prediction suffers from the effect that “a priori” we know only one realization of the random function y.Q1 /; : : : ; y.Qq /, for instance.
1
cov./ D
1 Nı
X
.yQj EfyQj g/.yQk EfyQk g/:
jQj Qk j
Modified versions of the KW prediction exist if we work with random fields in the plane (weak isotropy) or on the plane (rotational invariances).
440
10 The Fifth Problem of Probabilistic Regression
As a model of “random effects” we may write X
cP˛ Qˇ EfyQˇ g D EfyP˛ g CEfzg D Efyg:
Qˇ
(ii) The first model applies if we want to predict data of one type to predict data of the same type. Indeed, we have to generalize if we want to predict, for instance, 3 vertical deflections; gravity gradients; 5 from gravity disturbances: gravity values; The second model has to start from relating one set of heterogeneous data to another set of heterogeneous data. In the case we have to relate the various data sets to each other. An obvious alternative setup is X Qˇ
c1P˛ Qˇ Efz1Qˇ g C
X
c2P˛ R Efz2R g C D EfyP˛ g:
R
(iii) The level of collocation is reached if we include a trend model in addition to Kolmogorov–Wiener prediction, namely Efyg D A C CEfzg; the trend being represented by A. The decomposition of “trend” and “signal” is well represented in E. Grafarend (1976), E. Groten (1970), E. Groten and H. Moritz (1964), S. Heitz (1967, 1968, 1969), S. Heitz and C.C. Tscherning (1972), R.A. Hirvonen (1956, 1962), S. K. Jordan (1972 a, b, c, 1973), W.M. Kaula (1959, 1963, 1966 a, b, c, 1971), K.R. Koch (1973 a, b), K.R. Koch and S. Lauer (1971), L. Kubackove (1973, 1974, 1975), S. Lauer (1971 a, b), S.L. Lauritzen (1972, 1973, 1975), D. Lelgemann (1972, 1974), P. Meissl (1970, 1971), in particular H. Moritz (1961, 1962, 1963 a, b, c, d, 1964, 1965, 1967, 1969, 1970 a, b, c, d, e, 1971, 1972, 1973 a, b, c, d, e, f, 1974 a, b, 1975), H. Moritz and K.P. Schwarz (1973), W. Mundt (1969), P. Naicho (1967, 1968), G. Obenson (1968, 1970), A.M. Obuchow (1947, 1954), I. Parzen (1960, 1963, 1972), L.P. Pellinen (1966, 1970), V.S. Pugachev (1962), C.R. Rao (1971, 1972, 1973 a, b), R. Rupp (1962, 1963, 1964 a, b, 1966 a, b, c, 1972, 1973 a, b, c, 1974, 1975), H. P. Robertson (1940), M. Rosenblatt (1959, 1966), R. Rummel (1975 a, b), U. Schatz (1970), I. Schoenberg (1942), W. Schwahn (1973, 1975), K.P. Schwarz (1972, 1974 a, b, c, 1975 a, b, c), G. Seeber (1972), H.S. Shapiro (1970), L. Shaw et al (1969), L. Sjoeberg (1975), G. N. Smith (1974), F. Sobel (1970), G.F. Taylor (1935, 1938), C. Tscherning (1972, a, b, 1973, 1974 a, b, 1975 a, b, c, d, e), C. Tscherning and R.H Rapp (1974), V. Vyskocil (1967, 1974 a, b, c), P. Whittle (1963 a, b), N. Wiener (1958, 1964), H. Wolf (1969, 1974), E. Wong (1969), E. Wong and J.B. Thoma (1961), A.M. Yaglom (1959, 1961).
10-4 Comments
441
(iv) An interesting comparison is the various solutions of type O 1 .best inhomogeneous linear prediction/ yO2 D lO C Ly O 1 .best homogeneous linear prediction/ yO2 D Ly O 1 .best homogeneous linear unbiased prediction/ yO2 D Ly dispersion identities D3 D2 D1 : (v) In spite of the effect that “trend components” and “KW prediction” may serve well the needs of an analyst, generalizations are obvious. For instance, in Krige’s prediction concept it is postulated that only jjyp1 yp2 jj2 D W Ef.yp1 yp2 .Efyp1 g Efyp2 g//2 g is a weakly relative translational invariant stochastic process. A.N. Kolmogorov has called the weakly relative translational invariant random function “structure function” Alternatively, higher order variance-covariance functions have been proposed: jjy1 ; y2 ; y3 jj2 DW Ef.y1 2y2 C y3 / .Efy1 g 2Efy2 g C Efy3 g/2 g jjy1 ; y2 ; y3 ; y4 jj2 DW Ef.y1 3y2 C 3y3 y4 / .Efy1 g 3Efy2 g C 3Efy3 g Efy4 g/2 g etc.. (vi) Another alternative has been the construction of higher order absolute variance-covariance functions of type jj.yQ1 EfyQ1 g/ .yQ2 EfyQ2 g/ .yQ3 EfyQ3 g/jj jj.yQ1 EfyQ1 g/ .:::/ .yQn EfyQn g/jj like in E. Grafarend (1984) derived from the characteristic functional, namely a series expression of higher order variance-covariance functions.
Chapter 11
The Sixth Problem of Probabilistic Regression - the random effect model – “errors-in-variable”
“In difference to classical regression error-in-variables models here measurements occurs in the regressors. The naive use of regression estimators leads to severe bias in this situation. There are consistent estimators like the total least squares estimator (TLS) and the moment estimator (MME). J. Polzehl and S. Zwanzig Department of Mathematics Uppsala University, 2003 A.D.” Read Definition in Tables 11.1, 11.2 and 11.3 and Lemma 11.1 as well as SIMEX/SYMEX in Table 11.4
In contrast to the conventional regression models, error-in-variable models (Error-in-Variables Models: EVM) have characteristic measurement error or residuals in the regressor, with respect to such a model, those regression estimators lend to a severe bias problem. Here we only review linear and nonlinear error in variable models which are based on Bayes Statistic. Very often the EVM are called Total Least Squares when we use an extension of classical Least Squares. There is a vast literature about EVMs. Examples of Bayesian Statistics for EVMs are S.M. Berry et al (2002), R.J. Carrol et al (1996i,ii), W.A. Fuller (1984) and J. Pilz (1994). Notable algebraic contributions are Z. Bai and J. Demmel (1993), A. Bj¨orek, P. Heggernes and P. Matstoms (2000), G.H. Golub, P.C. Hansen and D.P. O’Leary (1999), G.H. Golub and C. van Loan (1980, 1996), H. Guo and R.H. Renant (2002), P.C. Hansen (1989,1992, 1994), P.C. Hansen and D.O’Leary (1993), R.H. Horn E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 11, © Springer-Verlag Berlin Heidelberg 2012
443
444
11 The Sixth Problem of Probabilistic Regression
Table 11.1 Second order Statistics “Error-in-Variables”
(i)
Efyg D EfXg i; j 2 f1; : : : ; ng; k; l 2 f1; : : : ; mg Efyi g D i Efyg D y
(ii)
Efyi Efyi gg D 0 EfŒyi Efyi gŒyj Efyj g D Dij
2
x1m1 x11 x12 6 x21 x22 x 2m1 6 6 ::: 6 4 xn11 xn12 xn1m1 xn1 xn2 xnm1 (iv)
Second order moment of observation space Y
3 xi1 First order moments of Ef4 : : : 5g D EfŒxik g D x random effects xik xi m Efxik Efxik gg D 08i 2 f1; : : : ; ng; k 2 f1; : : : ; mg 2
(iii)
First order moment of observation space Y
3 x1m x2m 7 7 7DX 7 xn1m 5 xnm
EfŒxik Efxik gŒxjl Efyjl gg WD ijkl 2
3 11 : : : 1n 4 ::: ::: ::: 5 : : : nn 3 2 n1 ij11 : : : ij1m 4 ::: ::: ::: 5 ij m1 : : : ij mn
Second order moment of random effects Xik var-cov matrix of observation space Y var-cov matrix of random effects Xik
model assumptions (iv-1) (iv-2) (iv-3) (iv-4) (iv-5) (iv-6) (iv-7)
assumptions : Dij D Diagf12 ; : : : ; n2 g “variance component model of observation space Y” assumptions : Dij 1j “full symmetric matrix of observation space Y” assumptions: ‘‘simple model” Dij ıij 2 assumptions: Dfxijkl g DW ijkl “fourth order tensor” : ijkl D 2 Diagf1 ; : : : ; m g assumptions:ij kl D ijkl “decomposition” assumptions:ijkl D ij 2 ıkl “simple model” assumptions:ijkl D 2 ıij ıkl “super simple model”
and C.R. Johnson (1985), S. van Huffel and J. Vanderwalle (1991), B.N. Parlett (1980), S. van Huffel and H. Zha (1993), A.N. Tikhonov (1963), A.N. Tikhonov and V. Y. Arsenin (1977), R. A. Renaut and H. Guo (2005). Contribution on straight-line-fit were presented by A. Acar et al (2006), M. Bipp and H. Krans (199), S. Kupferer (2005), J. Reinkind (2008), B. Schaffrin (1983, 2002, 2008) and B. Schaffrin and A. Wieser (2008). Linear and quadratic constraints were used in B. Schaffrin and Y.A. Fellus (2009).
11 The Sixth Problem of Probabilistic Regression
445
Table 11.2 Explicit and implicit models (i) Explicit model E Efyi g D EfXTi g D T Xi for all 2 WD f1; : : : ; mg yi D Efyi g C ey ; Xik D EfXik g C EX for all Y 2 f1; : : : ; ng; X 2 fX11 ; : : : ; Xnm g “ey error vector, EX error matrix” (ii) Implicit model “We design our model that there no distinction between response variables Y and the predictor variables X” Y are called response variables, X are called predictor variables. Definitions: (i) ZTi WD fYi ; XTi g (ii) ˛ T WD f1; 1 ; : : : ; m g (iii) Ti WD fi ; MTi g “Implicit model” Oi WD f1; 1 ; : : : ; m gŒEfy1 g; EfXi1 g; : : : ; Efxi1 g Zi WD i C EZ ; Z0i WD .Yi ; X0i /, varEZ WD ImC1 2
References relating to stochastic formulations of EVMs are also very extensive. For instance, we mention to works of SIMEX models (SIMulation EXtrapolation estimators) by R.J. Carroll et al (1961i,ii), R.J. Carroll et al and L.A. Stefanski (1997), J. R. Cook and L.A. Stefanski (1987), X. Lin and R.J. Carroll et al (1999,2000), J. P. Holcomb (1999), L.A. Stefanski and J. R. Cook (1995), N. Wang et al (1998), in particular S. Zwanzig (1987,1989,1997,1998, 1999, 2000), A. Kukash and S. Zwanzig(2002), J. Polzehl and S. Zwanzig (2005), S. Zwanzig(2005), J. Polzehl and S. Zwanzig (2005) and S. Zwanzig(1998). Finally, let us refer to Fig. 11.1 in order to illustrate the model “errors-invariables”. We already mentioned them in contrast to standard regression models, EVMs have “measurement errors” in the regressor. The naive use of regression estimators leads to severe bias problems: The main idea of SIMEX (SIMulation EXtrapolation estimator) is to eliminate the effect of “measurement errors” by extrapolation. Now samples with arbitrary larger variances are generated by added simulated errors to the original observed regressor variables. Here we follow to a large part the contribution of J. Polzehl and S. Zwanzig (2003) which summarizes a great number of contribution of SIMEX.
446
11 The Sixth Problem of Probabilistic Regression
Table 11.3 Inverse parameter transformation, naive estimator and Total Least Squares (11.31) Oi D ŒEfyi g; i Efxij g; : : : ; i Efxi mg=Œ` for all i D 1; : : : ; n; k D 1; : : : ; m ` (11.32) ˛ WD .˛/=` for ` D 1; : : : ; m; ˛ 0 WD ˛ “inverse parameter transformation” (11.33) k D .˛k` /=˛0` for k D 1; : : : ; m ` /i.`/ (11.34) Oi Di ` .˛.`/ “explicit rewritten EVM” ` .`/ C E` for ` D 0; 1; : : : ; m 1; m (11.35) Z` D .˛.`/ “naive regression model” ` /0 Z` C E` (11.36) Z` D .˛.`/ “The `th variable is considered as response and error in the other predictor variables are ignored” ` WD .Z0.`/ Z` /1 Z0.`/ Z` (11.37) ˛O .`/ (11.38) ˛`` D 1 “solution of the naive estimator and its bias”
(11.30)
(11.40)
(11.41)
min ˛ 0 ; Z0 ; Z˛ ˛; ˛` D 1 “bias” ` 0 0 g D EfŒ..`/ C E0.`/ /..`/ C E.`/ /1 ..`/ C E0.`/ /g Ef˛O .`/ : 0 0 ` .`/ .˛.`/ D ..`/ .`/ C 2 Im /1 .`/ “criterion function for the `th naive regression” 1 O naive;` D arg min 12 Œ1; Z0 Z ` “Total Least Squares Estimator OTLS , maximum likelihood estimator for a Gaussian error distribution” TLS: OTLS D arg min
Pn
i D1
.kYi Efx0i k g k kxi k Efxi k gk2 / min m Efxi k g 2 R
(11.42)
(11.43)
TLS. / D Pm 1 `D1 Naive` . / Pm
`D1
1 D TLS. /1
1 Œ1; Z0 Z 1Ck k2
max Naive` . /1 max TLS. /1
11-1 The Model of Error-in-Variables or Total Least Squares
447
Fig. 11.1 Magic triangle
11-1 The Model of Error-in-Variables or Total Least Squares Let us in more detail introduce the model definition of Error-in-Variables, also called Total Least Squares, in a statistical reference frame. The observation space Y is generated by the first order moments Efyi g, the mean value operator, and the central second order moments Dfyi g DW Dij , the variance-covariance matrix in the observation space. In contrast the random effects matrix-valued space X is represented by the first order, second order tensor Efxy g and by the central second order, fourth order tensor Dfyi k g DW ij kl . In modeling we present 7 sub-models, of variances and of covariances, of course, we assume that the observation space Y and the random effects space Xi k are uncorrelated. cov.yi ; xi k / D 0 We leave the formulation of Bayesian estimation for a model to the references. Instead we formulate equivalent models of type “implicit” and “explicit” here. The implicit model underlines the symmetry structure of the “error-in-variable models.” Here we will proceed as following: First we will solve the nonlinear system of type “error-in-variables” by generalized least-squares. Second we use a straight line fit by technique of GLS (“generalized least squares”). Third we use the symmetry of EVM in order to introduce SYMEX for “Total Least Squares” and demonstrate its properties by a simulated example.
448
11 The Sixth Problem of Probabilistic Regression
11-2 Algebraic Total Least Squares for the Nonlinear System of the Model “Error-in-Variables” First, we define the random effect model of type “errors-in-variables” subject to the minimum condition iy0 Wy iy Ctr =0X WX =X D min. Second, we form the derivations, the partial derivations iy0 Wy iy C tr =0X WX =X C 20 fy X C =X iy g, the necessary conditions for obtaining the minimum.
Definition 11.1. (the random effect model: “errors-in-variables ”) The nonlinear model of type “errors-in-variables” is solved by “total least squares” based on the risk function y D Efyg C ey y D y0 C iy
(11.1)
X D EfXg C EX X D X0 C =X
(11.2)
subject to Efyg 2 Rn y0 2 Rn EfXg 2 R
nm
X0 2 R
(11.3)
nm
(11.4)
rk EfXg D m rk X0 D m L1 WD L1 WD
P
and n m
wii i2 ii1 ii2 i1 ; i2 i0 y Wiy
L2 WD
and and
L2 WD
P
(11.5) wk1 k2 ii1 k1 ii1 k2
i1 ; k1 ; k2 tr=0X WX =X
L1 C L2 D i0 y Wy iy C tr=0X WX =X D min
y0 ; X0
L DW jjiy jj2Wy C jj=X jj2WX
(11.6)
subject to y X C =X iy D 0:
(11.7)
11-2 Algebraic Total Least Squares for the Nonlinear System of the Model
449
The result of the minimization process is given by Lemma 11.1: Lemma 11.1. (error -in-variables model, normal equations): The risk function of the model “errors-in-variables” is minimal, if and only if 1 @L D X0 ` C =0X ` D 0 2 @
(11.8)
1 @L D Wy iy ` D 0 2 @iy
(11.9)
1 @L D WX =X C 0 ` 2 @IX
(11.10)
1 @L D y Xy C =X iy D 0 2 @ and det
@2 L @i @j
(11.11)
(11.12)
Proof. First, we begin with the modified risk function jjiy jj2Wy C jj=X jj2 C 2 0 .y X C =X iy / D min; where the minimum condition is extended over y; X; iy ; =X ; ; when denotes the Lagrange parameter. i0 y Wy iy C tr=0X WX =X C 2 0 .y X C =X iy / D min
(11.13)
if and only if 1 @L D X0 ` C =0X ` D 0 2 @
(11.14)
1 @L D Wy iy ` 2 @iy
(11.15)
1 @L D WX =X D 0 0 2 @.tr=X WX =X /
(11.16)
450
11 The Sixth Problem of Probabilistic Regression
1 @L D y X C =X iy D 0 2 @ `
(11.17)
and @2 L positive semidefinite: @ @ 0
(11.18)
The first derivatives guarantee the necessity of the solution, while the second derivatives being positive semidefinite assure the sufficient condition. Second, we have the nonlinear equations, namely .X0 C =0X / ` D 0 .bilinear/
(11.19)
Wy iy ` D 0 .linear/
(11.20)
WX =X D 0 .bilinear/
(11.21)
y X C =X iy D 0; .bilinear/
(11.22)
which is a problem outside our orbit-of-interest. An example is given in the next chapter. Consult the literature list at the end of this chapter.
11-3 Example: The Straight Line Fit Our example is based upon the “straight line fit” y D ax C b1 where .x; y/ has been measured, in detail Efyg D aEfxg C b1 a D ŒEfxg; 1 b or
1 WD a; 2 WD b; x1 C 12 D Œx; 1
1 2
and y x1 12 C e0 x 1 e0 y D 0: (1 ; 2 ) are the two unknowns in the parameter space. It has to be noted that the term ex 1 includes two coupled unknowns, namely ex and 1 . Second, we formulate the modified method of least squares.
11-3 Example: The Straight Line Fit
451
L.1 ; 2 ; ex ; ey ; / D i0 Wi C 2.y0 x0 1 12 C i0x 1 i0y / D i0 Wi C 2 0 .y x1 12 C i0x 1 iy / or i0 y Wy iy C i0 x Wx ix C2.y0 x0 1 10 2 C i0 x 1 i0 y / : Third, we present the necessary and sufficient conditions for obtaining the minimum of the modified method of least squares. 1 @L D x0 ` C i0 x ` D 0 2 @1
(11.23)
1 @L D 10 ˘` D 0 2 @2
(11.24)
1 @L D Wy iy ` D 0 2 @iy
(11.25)
1 @L D Wx ix C l 1 D 0 2 @ix
(11.26)
1 @L D y x1 12 C ix iy D 0 2 @
(11.27)
and det
@2 L=@2 1 @2 L=@1 @2 2 @ L=@1 @2 @2 L=@2 2
0:
(11.28)
Indeed, these conditions are necessary and sufficient for obtaining the minimum of the modified method of least squares. By Gauss elimination we receive the results .x0 C i0 x / ` D 0
(11.29)
1 C C n D 0
(11.30)
Wy iy D `
(11.31)
Wx ix D ` 1
(11.32)
Wy y D Wy x1 C Wy 12 Wy ix 1 C Wy iy
(11.33)
or
452
11 The Sixth Problem of Probabilistic Regression
Wy y D Wy x1 Wy 12 .Ix 12 / ` D 0 if Wy D Wx D W
(11.34)
and x0 Wy x0 Wx1 x0 W12 x0 .Ix 12 / ` D 0 0
0
0
0
y Wy y Wx1 y W12 y .Ix
0
D0
(11.35) (11.36)
C x0 l D Ci0x `
(11.37)
; C::: C n D 0
(11.38)
0
,
0 y x x x Wy 0 Wx1 0 W12 0 .Ix 12 / ` D 0 y0 x y y 0
12 / `
(11.39)
subject to
Let us iterate the solution. 2 0 0 0 60 0 0 6 6 6 0 0 Wx 6 40 0 0 xn 1n 1 In
1 C C n D 0;
(11.40)
x0 1 l D i0 x l :
(11.41)
3 0 .x0 n i0 x / 7 0 10 n 7 7 0 1 In 7 7 Wy In 5 In 0
2
3 2 3 1 0 6 7 607 2 6 7 6 7 6 7 6 7 6 ix 7 D 6 0 7: 6 7 6 7 4 iy 5 405 ` yn
We meet again the problem that the nonlinear terms 1 and ix appear. Our iteration is based on the initial data .i/ Wx D Wy D W D In ; .ii/ .ix /0 D 0; .iii/ .1 /n D 1 .y.iy /0 D x.1 /0 C1n .2 /0 /
in general 2
0 60 6 6 60 6 40 xn
0 0 0 0 0 Wx 0 0 1n 1 .i / In
3 0 .x0 i0 x / 7 0 1 7 7 0 1 .i / In 7 7 Wy In 5 In 0
3 2 3 1 .i C1/ 0 7 607 6 6 7 6 2.i C1/ 7 7 6 7 6 6 ix.i C1/ 7 D 6 0 7 7 6 7 6 405 4 iy.i C1/ 5 yn l.i C1/ 2
x1 D 1 ; x2 D 1 ; x3 D ix ; x4 D iy ; x5 D ` :
11-4 The Models SIMEX and SYMEX Fig. 11.2 General linear regression
453 12
10
8
6
4
2
0
0
2
4
6
8
10
12
The five unknowns have led us to the example within Fig. 11.2. Our solutions are collected as follows: 1 W 0:752; 6 0:866; 3 0:781; 6 2 W 0:152; 0 1:201; 6 0:244; 6 case : case : our general y.i / D 0; X.i / D 0 result after X.i / ¤ 0; y.i / ¤ 0 iteration
11-4 The Models SIMEX and SYMEX (R.J. Carroll, J.R. Cook, L.A.Stefanski, J. Polzehl and S. Zwanzig) Many authors take advantage of the symmetry of the implicit model of Total Least Squares or the simulation of random condition equation given before. Using the alternative formulation of type (11.26) by dividing our model equation by minus ` and taking care of the inverse parameter transformation (11.28). Equation (11.30) is a rewritten explicit model of EVM. We put emphasis on the “naive regression model” by means of (11.31), (11.29) and (11.33). Finally we estimate the bias within naive regression within (11.34). The related solution of type “Total Least Squares” which is maximum likelihood estimation is presented by (11.36) to (11.39). SIMEX and SYMEX is reviewed in special table, enriched by simulated examples and tables.
454
11 The Sixth Problem of Probabilistic Regression
SIMEX versus SYMEX Following J.R. Cook and L.A. Stefanski(1994), L.A. Stefanski and J.R. Cook (1995) on the one side and J. Polzehl and S. Zwanzig (2007) on the other side we compare SIMEX and SYMEX. SIMEX
SYMEX
First step : Simulation For any values out of the set f ˛ W ˛ D 0; : : : ; Ag starting with 0 we simulate samples with increased measurement variance Xˇi k . / WD Xi k C 1=2 ıi kˇ for all ˇ D 1; : : : ; nˇ I i D 1; : : : ; nI k D 1; : : : ; K with independent identical distributed ıi kˇ N.0; 1/, replace the new observation matrix Z. / using Xi kˇ . / instead of Xi k
First step : Simulation For values out of the set f ˛ W ˛ D 0; : : : ; Ag starting with 0 we simulate samples with increased measurement variance as Z.`/i kˇ . / WD Z.`/i k C 1=2 ˇ D 1; : : : ; nˇ I i D 1; : : : ; nI k D 1; : : : ; K with independent identical distributed ıi kˇ N.0; 1/, for instance by adding independent error with variance in each component of the resulting matrix. Define the observation matrix Z. / as the matrix containing the value Zi kˇ . /, note that Z. 0 / contains the nˇ -times replicated the original sample.
Second step : Estimation Calculate the naive regression estimate O . / D ŒZ.`/ . /0 Z.`/ . /1 Y for all values of
Second step : Estimation For all simulated samples, for instance values ˛ , calculate the naive estimates ˛. Q ˛ / in our model by ` ˛Q .`/ . / D 0 ŒZ.`/ . /Z.`/ . /1 Z0.`/ . /Z` subject to ˛Q `` . / D 1
11-4 The Models SIMEX and SYMEX
Third step : Estimation Fit the model . ; / WD .Z0.`/ Z.`/ C Im /1 with unknown parameter , used to describe the expectation of O . / as a function of . Fit this model by weighted least squares. O WD arg min A X
w˛ kO . ˛ / . ˛ ; /k2
˛D1
subject to w0 1 and w˛ D 1 for all ˛ > 0. A large value of w0 can be used in order to force the model to fit perfectly for the informative estimate for instance D 0
455
Third step : Estimation Fit the model ` ˛.`/ . ; / WD 0 .Z.`/ Zn.`/ C Im /1 subject to ˛`` . ; / WD 1 with parameter 2 Rm . The estimate of is given by O ` WD arg min A X
w˛ k˛O ` . ˛ / ˛ ` . ˛ ; /k2
˛D0
subject to w0 1 and w˛ D 1 for all ˛ > 0. Using our inverse transformation, we define O`k . ; O ` / WD ˛`k . ; O ` /=˛`0 . ; O ` / for all k D 1; : : : ; m, arbitrary Fourth step : Extrapolation Determine an optimal value for extrapolation to the case of zero measurement variance of type WD
Fourth step : Extrapolation O . ; / is used to extrapolate to the case of zero measurement variance using D 2 . Such a result leads us to the SIMEX estimate O .SIMEX/ D . 2 ; / O
m m X X
arg min 2 . min ; 0/ kO ` . / O m . /k2
`D0 mD`C1
under the constraint min D min f 2 R W 1 0 n Z.`/ Z.`/ C Im / > 0g for all ` D 0; : : : ; m
456
11 The Sixth Problem of Probabilistic Regression
Fifth step : SYMEX estimation Estimate by O .SIMEX/ D O . / D
m
1 X ` O . / mC1 `D0
The key idea of SIMEX as well as of SYMEX is to include the extrapolation generalized to the case of zero variance in using D 2 in the extrapolation step. The value WD 2 can be interpreted as an estimate of the measurement error variance, In addition, J.Polzehl and S.Zwanzig (2007) give a simple relation for SYMEX to calculate the naive estimator O .naive/; ; formulated in a special theorem. After all they conclude that SYMEX, in general, has smaller root-meansquare errors and less outliers when compared to SIMEX. This can be due to an approximation to the optimal TLS estimate. In all simulations SYMEX was not worse than SIMEX, namely with an improvement being largest for larger values or 2 . SIMULATION All our simulations are of size 250 as presented by J.Polzehl and S.Zwanzig (2007). The grid of the quantities used ˛ D 0:2˛ 2 with A D 10. We generate samples of increased measurement variance with n D 100. The parameter space built on value and 2 are collected in Tables 11.3 up to 11.5. Example 11.1. (one parameter 1 D ) Efyi g D Efxi g; Yi D EfYi g C ey ; Xi D EfXi g C EX subject to Efey g D 0; EfEX g D 0; Dfey g D 2 ; DfEX g D 2 .i D 1; : : : ; n/ Design details EfXi g 2 f1; C1g; P.Xi D 1/ D PfXi D C1g D 0:5 sample size: n=20
11-4 The Models SIMEX and SYMEX
457
Example 11.2. (two parameter 1 and 2 ) Efyi g D 1 Efxi1 g C 2 Efxi 2 g; Yi D EfYi g C ey ; Xi1 D EfXi1 g C EXi1 Xi 2 D EfXi 2 g C EXi2 subject to Efey g D 0; EfXi1 g D EfXi 2 g D 0; Dfey g D 2 ; DfEi k g D ıi k 2 for all i 2 f1; : : : ; ng; k 2 f1; 2g Design details EfXi k g 2 f1; C1g; PfXi k D 1g D PfXi k D C1g D 0:5 sample size: n D 25 Example 11.3. (two parameters 1 and 2 ) We use the second example, but we vary the design towards a two point design in terms of Xi k N.0; I2 / Table 11.5 Median absolute error, bias and standard deviation of estimates for simulation in Example 11.1 (m D 1)
0.125 0.25 0.5 0.125 0.25 0.5 0.125 0.25 0.5
2 2 2 1 1 1 0.5 0.5 0.5
MAE
bias
Naive
SIMEX SYMEX Naive
0.0453 0.117 0.389 0.0285 0.0702 0.195 0.0211 0.0493 0.111
0.0411 0.0847 0.19 0.027 0.0611 0.137 0.0222 0.0467 0.1
0.0403 0.0814 0.173 0.0258 0.053 0.116 0.0214 0.0459 0.0965
0.025 0.11 0.39 0.015 0.059 0.19 0.0058 0.025 0.095
sd
SIMEX
SYMEX
Naive SIMEX SYMEX
0.0065 0.012 0.014 5.6e-05 1.6e-05 0.013 0.0019 0.0037 0.011
0.007 0.011 0.01 0.0015 0.00091 0.0033 0.00029 0.0014 0.0067
0.042 0.088 0.18 0.026 0.055 0.11 0.02 0.042 0.084
0.041 0.087 0.23 0.026 0.055 0.14 0.02 0.042 0.1
0.04 0.082 0.18 0.026 0.051 0.11 0.02 0.041 0.092
458
11 The Sixth Problem of Probabilistic Regression
Table 11.6 Median absolute error of estimates for simulation in Example 11.2 (m D 2)
1
0.125 0.25 0.5 0.125 0.25 0.5 0.125 0.25 0.5 0.125 0.25 0.5
2
true
Naive
SIMEX
SYMEX
true
Naive
SIMEX
SYMEX
2 2 2 1 1 1 0.5 0.5 0.5 0.5 0.5 0.5
0.059 0.14 0.417 0.0382 0.0761 0.216 0.0193 0.0443 0.12 0.045 0.1343 0.398
0.0502 0.104 0.241 0.0343 0.0655 0.147 0.0211 0.0425 0.103 0.038 0.0866 0.21
0.0481 0.101 0.215 0.0319 0.06 0.137 0.021 0.0408 0.0991 0.0369 0.0763 0.158
2 2 2 1 1 1 0.5 0.5 0.5 0.5 0.5 0.5
0.0534 0.128 0.386 0.0409 0.0719 0.185 0.0214 0.0429 0.0979 0.0417 0.0772 0.146
0.0534 0.108 0.263 0.0353 0.0626 0.155 0.0224 0.0453 0.103 0.0406 0.0854 0.187
0.0522 0.106 0.201 0.0348 0.0573 0.125 0.0224 0.0428 0.082 0.0416 0.0835 0.173
Table 11.7 Median absolute error of estimates for simulation in Example 11.2 (m D 2)
1
0.125 0.25 0.5 0.125 0.25 0.5 0.125 0.25 0.5 0.125 0.25 0.5
true 2 2 2 1 1 1 0.5 0.5 0.5 0.5 0.5 0.5
Naive 0.0492 0.132 0.402 0.027 0.0688 0.205 0.0174 0.0403 0.105 0.0393 0.123 0.407
2
SIMEX 0.0411 0.0874 0.195 0.0235 0.0476 0.105 0.0158 0.0331 0.0711 0.0289 0.0596 0.143
SYMEX 0.0412 0.0847 0.178 0.0227 0.0467 0.0994 0.0161 0.0314 0.0688 0.0305 0.0579 0.121
true 2 2 2 1 1 1 0.5 0.5 0.5 0.5 0.5 0.5
Naive 0.0399 0.108 0.372 0.0207 0.0554 0.189 0.0143 0.0346 0.0983 0.024 0.0526 0.111
SIMEX 0.0333 0.0713 0.168 0.0194 0.0403 0.0966 0.0129 0.0275 0.0681 0.0239 0.0499 0.112
SYMEX 0.00316 0.0627 0.137 0.0175 0.0369 0.0753 0.013 0.0255 0.0567 0.024 0.0491 0.107
sample size: n D 25 Tables 11.4-11.4 provide the results of simulations in terms of median absolute error for the naive regression estimates. The SIMEX estimate, for instance using extrapolation with D 2 , and SYMEX estimate. In addition, estimated bias and standard deviation are given
J. Polzehl and S. Zwanzig (2003)
11-5 References
459
11-5 References Please, contact the following references. Abatzoglou, J.T. and Mendel, J.M. (1987), Abatzoglou, J.T. and Mendel, J.M. and Harada, G.A. (1991), Bajen, M.T., Puchal, T., Gonzales, A., Gringo, J.M., Castelao, A., Mora, J. and Comin, M. (1997), Berry, S.M., Carroll, R.J. and Ruppert, D. (2002), Bjrck, A. (1996), Bjrck, A. (1997), Bjrck, A., Elfving, T. and Strakos, Z. (1998), Bjrck, A., Heggernes, P. and Matstoms, P. (2000), Bojanczyk, A.W., Bront, R.P., Van Dooren, P., de Hoog, F.R. (1987), Bunch, J.R., Nielsen, C.P. and Sorensen, D.C. (1978), Bunke, H. and Bunke, O. (1989), Cap-derou, A., Douguet, D., Simislowski, T., Aurengo, A. and Zelter, M. (1997), Car-roll, R.J., Ruppert, D. and Stefanski, L.A. (1996), Carroll, R.J., Kschenhoff, H., Lombard, F. and Stefanski, L.A. (1996), Carroll, R.J. and Stefanski, L.A. (1997), Chandrasekuran, S. and Sayed, A.H. (1996), Chun, J. Kailath, T. and Lev-Ari, H. (1987), Cook, J.R. and Stefanski, L.A. (1994), Dembo, R.S., Eisenstat, S.C. and Steihaug, T. (1982), De Moor, B. (1994), Fuller, W.A. (1987), Golub, G.H. and Van Loan, C.F. (1980), Hansen, P.C. (1998), Higham, N.J. (1996), Holcomb, J. P. (1996), Humak, K.M.S. (1983), Kamm, J. and Nagy, J.G. (1998), Kailath, T. Kung, S. and More, M. (1979), Kailath, T. and Chun, J. (1994), Kailath, T. and Sayed, A.H.(1995), Kleming, J.S. and Goddard, B.A. (1974), Kung, S.Y., Arun, K.S. and Braskar Rao, D.V. (1983), Lemmerling, P., Van Huffel, S. and de Moor, B. (1997), Lemmerling, P., Dologlou, I. and Van Huffel, S. (1998), Lin, X. and Carroll, R.J. (1999), Lin, X. and Carroll, R.J. (2000), Mackens, W. and Voss, H. (1997), Mastronardi, N, Van Dooren, P. and Van Huffel, S., Mastronardi, N., Lemmerling, P. and Van Huffel, S. (2000), Nagy, J.G. (1993), Park, H. and El-den, L. (1997), Pedroso, J.J. (1996), Polzehl, J. and Zwanzig, S. (2003), Rosen, J.B., Park, H. and Glick, J. (1996), Stefanski, L.A. and Cook, J.R. (1995), Stew-art, M. and Van Dooren, P. (1997), Van Huffel, S. (1991), Van Huffel, S. and Vandewalle, J. (1991), Van Huffel, S., Decanniere, C., Chen, H. and Van Hecke, P.V. (1994), Van Huffel, S., Park, H. and Rosen, J.B. (1996), Van Huffel, S., Vandewalle, J., de Rao, M.C. and Willems, J.L.,Wang, N., Lin, X., Gutierrez, R.G. and Carroll, R.J. (1998), and Yang, T. (1996), Berry, S.M., Carroll, R.J. and D. Ruppert (2002), Bunke, H. and O. Bunke (1989), Carrol, R.J., Ruppert, D. and L.A. Stefanski (1996), Carroll, R.J., et al (1996), Carroll, R.J. and L.A. Stefanski (1997), Cook, J.R. and L.A. Stefanski, Fuller W.A (1987), Hollomb, J.R. (1999), Lin, X. and J. R. Cook (1999), Stefanski, L.A. and J.R. Cook (1995), Lin, X. and R.J. Carroll (2000), Wang et al. (1998)
Chapter 12
The Nonlinear Problem of the 3d Datum Transformation and the Procrustes Algorithm A special nonlinear problem is the three-dimensional datum transformation solved by the Procrustes Algorithm. A definition of the three-dimensional datum transformation with the coupled unknowns of type dilatation unknown, also called scale factor, translation and rotation unknown follows afterwards. Fast track reading: Read Definition 12.1, Corollary 12.2-12.4, 12.6 and Lemma 12.5 and Lemma 12.7.
Corollary 12.2: translation partial W - LESS for x2 Corollary 12.3: scale partial W - LESS for x1 Definition 12.1: 7(3) 3d datum transformation
Corollary 12.4: rotation partial W - LESS for X3 Theorem 12.5: W – LESS of (Y'1 = Y2X'3 x1 + 1x'2 + E) Corollary 12.6: I – LESS of (Y1 – Y2X'3 x1 + 1x'2 + E)
Let us specify the parameter space X, namely dilatation parameter – the scale factor – 1∈
1 the
Versus
2 the column vector of translation parameter 3×1 2∈
Versus
X3 ∈ + (3) =: {X3 || 3×3 | X3*X3 = I3 and | X3 | = +1} X3 is an orthonormal matrix, rotation matrix of three parameters
E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 12, © Springer-Verlag Berlin Heidelberg 2012
461
462
12 The Nonlinear Problem of the 3d Datum Transformation and the Procrustes Algorithm
which is built on the scalar x1 , the vector x2 and the matrix X3 . In addition, by the matrices 2
2 3 3 x1 x2 : : : xn1 xn X1 X2 : : : Xn1 Xn Y1 WD 4 y1 y2 : : : yn1 yn 5 2 R3n and Y2 WD 4 Y1 Y2 : : : Yn1 Yn 5 2 R3n z1 z2 : : : zn1 zn Z1 Z2 : : : Zn1 Zn we define a left and right three-dimensional coordinate arrays as an n-dimensional simplex of observed data. Our aim is
to determine the parameters of the three-dimensional datum transformation fx1 ; x2 ; X3 g out of a nonlinear transformation (conformal group C7 .3/). x1 2 R stands for the dilatation parameter, also called scale factor, x2 2 R31 denotes the column vector of translation parameters, and X3 2 O C .3/ DW fX3 2 R33 jX03 X3 D I3 ; jX3 j D C1g the orthonormal matrix, also called rotation matrix of three parameters. The key problem is how to determine the parameters for the unknowns of type fx1 ; x2 ; X3 g, namely scalar dilatation x1 , vector of translation and matrix of rotation, for instance by weighted least squares. Example 12.1. (simplex of minimal dimension, n D 4 points tetrahedron): 2
2 30 30 x1 x2 x3 x4 X1 X2 X3 X4 Y1 WD 4 y1 y2 y3 y4 5 , 4 Y1 Y2 Y3 Y4 5 DW Y z1 z2 z3 z4 Z1 Z2 Z3 Z4 2 2 3 2 3 3 x1 y1 z1 e11 e12 e13 X1 Y1 Z1 6 x2 y2 z2 7 6 6 7 7 6 7 D 6 X2 Y2 Z2 7 X3 x1 C 1x0 C 6 e21 e22 e23 7: 2 4 x3 y3 z3 5 4 e31 e32 e33 5 4 X3 Y3 Z3 5 x4 y4 z4 X4 Y4 Z4 e41 e42 e43 Example 12.2. (W-LESS) We depart from the setup of the pseudo-observation equation given in Example 12.1 (simplex of minimal dimension, n D 4 points, tetrahedron). For a diagonal weight W D Diag.w1 ; : : : ; w4 / 2 R44 we compute the Frobenius error matrix Wseminorm
jjEjj2W
2 0 2 3 w1 e e e e B 11 21 31 41 6 0 4 56 WD tr.E0 WE/Dtr B @ e12 e22 e32 e42 4 0 e13 e23 e33 e43 0
0 w2 0 0
0 0 w3 0
3 2 0 e11 6 e21 0 7 76 0 5 4 e31 e41 w4
e12 e22 e32 e42
31 e13 C e23 7 7C e33 5A e43
12-1 The 3d Datum Transformation and the Procrustes Algorithm
0
2 3 e11 e w e w e w e w 11 1 21 2 31 3 41 4 B 6 e21 4 5 6 D tr B @ e12 w1 e22 w2 e32 w3 e42 w4 4 e31 e13 w1 e23 w2 e33 w3 e43 w4 e41 2
463
e12 e22 e32 e42
31 e13 C e23 7 7C 5 e33 A e43
D w1 e211 C w2 e221 C w3 e231 C w4 e241 C w1 e212 C w2 e222 C Obviously, the coordinate C w3 e232 C w4 e242 C w1 e213 C w2 e223 C w3 e233 C w4 e243 : errors .e11 ; e12 ; e13 / have the same weight w1 , .e21 ; e22 ; e23 / have the same weight w2 , .e31 ; e32 ; e33 / have the same weight w3 , and finally .e41 ; e42 ; e43 / have the same weight w4 . We may also say that the error weight is pointwise isotropic, weight e11 D weight e12 D weight e13 D w1 etc. However, the error weight is not homogeneous since w1 D weight e11 ¤ weight e21 D w2 : Of course, an ideal homogeneous and isotropic weight distribution is guaranteed by the criterion w1 D w2 D w3 D w4 :
12-1 The 3d Datum Transformation and the Procrustes Algorithm First, we present W-LESS for our nonlinear adjustment problem for the unknowns of type scalar, vector and special orthonormal matrix. Second, we review the Procrustes Algorithm for the parameters fx1 ; x2 ; X3 g. Definition 12.1. (nonlinear analysis for the three-dimensional datum transformation: the conformal group C7 .3/): The parameter array fx1` ; x2` ; X3` g is called W-LESS (LEast Squares Solution with respect to the W - Seminorm) of the inconsistent linear system of equations Y2 X0 3 x1 C 1x0 2 C E D Y1
(12.1)
subject to X0 3 X3 D I3 ; jX3 j D C1
(12.2)
of the field of parameters in comparison with alternative parameter arrays fx1` ; x2` ; X3` g fulfils the inequality equation
464
12 The Nonlinear Problem of the 3d Datum Transformation and the Procrustes Algorithm
jjY1 Y2 X0 3` x1` 1x0 2` jj2W WD tr..Y1 Y2 X0 3` x1` 1x0 2` /0 W.Y1 Y2 X0 3` x1` 1x0 2` // DW tr..Y1 Y2 X0 3 x1 1x0 2 /0 W.Y1 Y2 X0 3 x1 1x0 2 //
in other words if
DW jjY1 Y2 X0 3 x1 1x0 2 jj2W
(12.3)
E` WD Y1 Y2 X0 3` x1` 1x0 2`
(12.4)
has the least W - seminorm. ? How to compute the three unknowns fx1 ; x2 ; X3 g by means of W-LESS ? Here we will outline the computation of the parameter vector by means of partial W-LESS: At first, by means of W-LESS we determine x2` , secondly by means of W-LESS x1` , followed by thirdly means of W-LESS X3 . In total, we outline the Procrustes Algorithm. Step one: x2 Corollary 12.2. (partial W-LESS for x2` ): A 3 1 vector x2` is partial W-LESS of (12.1) subject to (12.2) if and only if x2` fulfils the system of normal equations 10 W1x2` D .Y1 Y2 X0 3 x1 /0 W1:
(12.5)
x2` always exists and is represented by 1
x2` D .10 W1/ .Y1 Y2 X0 3 x1 /0 W1:
(12.6)
For the special case W D In the translated parameter vector x2` is given by x2` D
1 .Y1 Y2 X0 3 x1 /0 1: n
(12.7)
For the proof, we shall first minimize the risk function .Y1 Y2 X0 3 x1 1x0 2 /0 .Y1 Y2 X0 3 x1 1x0 2 / D min x2
with respect to x2 ! :Detailed Proof of Corollary 12.2: W-LESS is constructed by the unconstrained Lagrangean
12-1 The 3d Datum Transformation and the Procrustes Algorithm
465
1 jjEjj2W D jjY1 Y2 X03 x1 1x02 jj2W 2 1 D tr.Y1 Y2 X03 x1 1x02 /0 W.Y1 Y2 X03 x1 1x02 / 2 D min 0
L.x1 ; x2 ; X3 / WD
x1 0;x2 2R31 ;X3 X3 DI3
@L .x2` / D .10 W1/x2 .Y1 Y2 X03 x1 /0 W1 D 0 @x;2 constitutes the first necessary condition. Basics of the vector-valued differentials are found in E. Grafarend and B. Schaffrin (1993, pp. 439–451). As soon as we backward substitute the translational parameter x2` , we are led to the centralized Lagrangean L.x1 ; X3 / D
1 trfŒY1 .Y2 X03 x1 C .10 W1/1 110 W.Y1 Y2 X03 x1 //0 W 2
ŒY1 .Y2 X03 x1 C .10 W1/1 110 W.Y1 Y2 X03 x1 //g L.x1 ; X3 / D
1 trfŒ.I .10 W1/1 110 /W.Y1 Y2 X03 x1 /0 W 2
Œ.I .10 W1/1 110 /W.Y1 Y2 X03 x1 /g 1 C WD In 110 2 being a definition of the centering matrix, namely for W D In C WD In .10 W1/1 110 W
(12.8)
being in general symmetric. Substituting the centering matrix into the reduced Lagrangean L.x1 ; X3 /, we gain the centralized Lagrangean L.x1 ; X3 / D
1 trfŒY1 Y2 X03 x1 0 C0 WCŒY1 Y2 X03 x1 g: 2
(12.9)
Step two: x1 Corollary 12.3. (partial W-LESS for x1` ): A scalar x1` is partial W-LESS of (12.1) subject to (12.3) if and only if x1` D
trY01 C0 WCY2 X03 trY02 C0 WCY2
(12.10)
466
12 The Nonlinear Problem of the 3d Datum Transformation and the Procrustes Algorithm
holds. For the special case W D In the real parameter is given by x1` D
trY01 C0 CY2 X03 : trY02 C0 CY2
(12.11)
The general condition is subject to C WD In .10 W1/1 110 W:
(12.12)
:Detailed Proof of Corollary 12.3: For the proof we shall newly minimize the risk function L.x1 ; X3 / D
1 trfŒY1 Y2 X03 x1 0 C0 WCŒY1 Y2 X03 x1 g D min x1 2 subject to X03 X3 D I3 :
@L .x1` / D x1` trX3 Y02 C0 WCY2 X03 trY01 C0 WCY2 X03 D 0 @x1 constitutes the second necessary condition. Due to trX3 Y02 C0 WCY2 X03 D trY02 C0 WCY2 X03 X3 D Y02 C0 WCY2 lead us to x1` . While the forward computation of.@L=@x1 /.x1` / D 0 enjoyed a representation of the optimal scale parameter x1` , its backward substitution into the Lagrangean L.x1 ; X3 / amounts to 0 0 0 0 0 0 0 trY1 C WCY2 X3 0 0 trY1 C WCY2 X3 C WC Y1 Y2 X3 L.X3 / D tr Y1 Y2 X3 trY20 C0 WCY2 trY20 C0 WCY2 1 trY01 C0 WCY2 X03 L.X3 / D tr .Y01 C0 WCY1 / tr.Y01 C0 WCY2 X03 / 2 trY02 C0 WCY2 tr.X3 Y02 C0 WCY1 /
trY01 C0 WCY2 X03 C trY02 C0 WCY2
Ctr.X3 Y02 C0 WCY2 X03 / L.X3 / D
ŒtrY01 C0 WCY2 X03 2 ŒtrY02 C0 WCY2 2
ŒtrY01 C0 WCY2 X03 2 1 ŒtrY01 C0 WCY2 X03 2 1 tr.Y01 C0 WCY1 / C 2 ŒtrY02 C0 WCY2 2 ŒtrY02 C0 WCY2
12-1 The 3d Datum Transformation and the Procrustes Algorithm
L.X3 / D
467
1 1 ŒtrY01 C0 WCY2 X03 2 tr.Y01 C0 WCY1 / D 0min : 2 2 ŒtrY02 C0 WCY2 X3 X3 DI3
(12.13)
Third, we are left with the proof for the Corollary 12.4, namely X3 . Step three: X3 Corollary 12.4. (partial W-LESS for X3` ): A 3 1 orthonormal matrix X3 is partial W-LESS of (12.1) subject to (12.3) if and only if X3` D UV0 (12.14) holds where A WD Y01 C0 WCY2 D U† s V0 is a singular value decomposition with respect to a left orthonormal matrix U;U0 U D I3 , a right orthonormal matrix V;VV0 D I3 , and † s D Diag.1 ; 2 ; 3 / a diagonal matrix of singular values .1 ; 2 ; 3 /: The singular values are the canonical coordinates of the right eigenspace .A0 A † 2s I/V D 0: The left eigenspace is based upon U D AV† 1 s : :Detailed Proof of Corollary 12.4: The form L.X3 / subject to X03 X3 D I3 is minimal if tr.Y01 C0 WCY2 X03 / D
max
x1 0;X03 X3 DI3
:
Let A WD Y01 C0 WCY2 D U†V 0 , a singular value decomposition with respect to a left orthonormal matrix U;U0 U D I3 , a right orthonormal matrix V;VV0 D I3 and † s D Diag.1 ; 2 ; 3 / a diagonal matrix of singular values .1 ; 2 ; 3 /. Then tr.AX03 /
D tr.U† s V
0
X03 /
0
D tr.† s V X3 U/ D
3 X
i rii
i D1
3 X
i
i D1
holds, since 0
R D V0 X3 U D Œrij 2 R33 is orthonormal with jjrii jj 1. The identity tr.AX03 / D
(12.15) 3 P i D1
0
i applies, if
V0 X3 U D I3 ; i:e:X03 D VU0 ; X3 D UV0 ; namely, if tr.AX03 / is maximal
468
12 The Nonlinear Problem of the 3d Datum Transformation and the Procrustes Algorithm
tr.AX03 /
D 0max , X3 X3 DI3
trAX03
D
3 X
0
i , R D V0 X3 U D I3 :
(12.16)
i D1
An alternative proof of Corollary 12.4 based on formal differentiation of traces and determinants has been given by P.H. Sch¨onemann (1966) and P.H. Sch¨onemann and R.M. Carroll (1970). Finally, we collect our sequential results in Theorem 12.5 identifying the stationary point of W-LESS specialized for W D I in Corollary 12.5. The highlight is the Procrustes Algorithm we review in Table 12.1. Theorem 12.5. (W-LESS of Y01 D Y2 X03 x1 C 1x02 C E): (i) The parameter array fx1 ; x2 ; X3 g is W-LESS if x1` D
trY01 C0 WCY2 X03` trY02 C0 WCY2
x2` D .10 W1/1 .Y1 Y2 X03` x1` /0 W1 X3 D UV
0
(12.17) (12.18) (12.19)
subject to the singular value decomposition of the general 3 3 matrix
namely
Y01 C0 WCY2 D UDiag.1 ; 2 ; 3 /V0
(12.20)
Œ.Y01 C0 WCY2 /0 .Y01 C0 WCY2 / i Ivi D 0
(12.21)
0
V D Œv1 ; v2 ; v3 ; VV D I3
(12.22)
U D Y01 C0 WCY2 VDiag.11 ; 21 ; 31 /;
(12.23)
0
U U D I3
(12.24)
C WD In .10 W1/1 110 W:
(12.25)
and as well as the centering matrix
(ii) The empirical error matrix of type W-LESS accounts for trY01 C0 WCY2 VU0 E` D ŒIn 110 W.10 W1/1 Y1 Y2 VU0 trY02 C0 WCY2 with the related Frobenius matrix W-seminorm 0 trY01 C0 WCY2 VU0 jjE` jj2W D tr.E0` WE` / D trf Y1 Y2 VU0 trY02 C0 WCY2
(12.26)
12-1 The 3d Datum Transformation and the Procrustes Algorithm
ŒIn 110 W.10 W1/1 0 WŒIn 110 W.10 W1/1 trY01 C0 WCY2 VU0 Y1 Y2 VU0 trY02 C0 WCY2
469
(12.27)
and the representative scalar measure of the error of type W-LESS jjE` jjW D
q
tr.E0` WE` /=3n :
(12.28)
A special result is obtained if we specialize Theorem 12.5 to the case W D In : Corollary 12.6. (I-LESS of Y01 D Y2 X03 x1 C 1x02 C E): (i) The parameter array fx1 ; x2 ; X3 g is I-LESS of Y01 D Y2 X03 x1 C 1x02 C E if x1` D x2` D
trY01 CY2 X03` trY02 CY2
1 .Y1 Y2 X03` x1` /0 1 n X3` D UV0
(12.29)
(12.30) (12.31)
subject to the singular value decomposition of the general 3 3 matrix Y01 CY2 D UDiag.1 ; 2 ; 3 /V0
(12.32)
namely Œ.Y01 CY2 /0 .Y01 CY2 /i2 Ivi D 0; 8i 2 f1; 2; 3g; V D Œv1 ; v2 ; v3 ; VV0 D I3 (12.33) U D Y01 CY2 VDiag.11 ; 21 ; 31 /; UU0 D I3 (12.34) and as well as the centering matrix C WD In
1 0 11 : n
(12.35)
(ii) The empirical error matrix of type I- LESS accounts for 0 0 0 1 0 0 trY1 C Y2 VU Y1 Y2 VU E` D In 11 n trY02 CY2 with the related Frobenius matrix W-seminorm 0 trY01 CY2 VU0 jjEjj2I D tr.E0` E` / D trf Y1 Y2 VU0 trY02 CY2
(12.36)
470
12 The Nonlinear Problem of the 3d Datum Transformation and the Procrustes Algorithm
0 0 1 0 0 trY1 CY2 VU Y1 Y2 VU In 11 n trY02 CY2
(12.37)
and the representative scalar measure of the error of type I-LESS jjE` jjI D
q
tr.E0` E` /=3n:
(12.38)
In the proof of Corollary 12.6 we only sketch the result that the matrix In .1=n/110 is idempotent: 1 0 1 0 2 1 In 11 In 11 D In 110 C 2 .110 /2 n n n n D In
2 0 1 1 11 C 2 n110 D In 110 : n n n
As a summary of the various steps of Corollary 12.2–12.4, 12.5 and Theorem 12.5, Table 12.1 presents us the celebrated Procrustes Algorithm, which is followed by one short and interesting Citation about “Procrustes”. Following Table 12.1, we present the celebrated Procrustes Algorithm which is a summary of the various steps of Corollary 12.2–12.4, 12.5 and Theorem 12.5. Procrustes (the subduer), son of Poseidon, kept an inn benefiting from what he claimed to be a wonderful all-fitting bed. He lopped of excessive limbage from tall guests and either flattened short guests by hammering or stretched them by racking. The victim fitted the bed perfectly but, regrettably, died. To exclude the embarrassment of an initially exact-fitting guest, variants of the legend allow Procrustes two, different-sized beds. Ultimately, in a crackdown on robbers and monsters, the young Theseus fitted Procrustes to his own bed.
12-2 The Variance: Covariance Matrix of the Error Matrix E By Lemma 12.7 we review the variance–covariance matrix, namely the vector valued form of the transposed error matrix, as a function of † vecY1 ; † vecY2 and the covariance matrix † vecY01 ;.In ˝x1 X3 /vec†Y0 : 2
Lemma 12.7. (Variance–covariance “error propagation”): Let vecE0 be the vector valued form of the transposed error matrix E WD Y1 Y2 X03 x1 1x02 . Then
12-2 The Variance: Covariance Matrix of the Error Matrix E
471
Table 12.1 (Procrustes Algorithm): 3 2 3 2 X1 Y1 Z1 x1 y1 z1 6 : : : 7 6 : : : 7 7 6 7 Step 1: Read Y1 D 6 4 :: :: :: 5 and 4 :: :: :: 5 D Y2 xn yn zn Xn Yn Zn Step 2: Compute: Y01 CY2 subjecttoC WD In n1 110 Step 3: Compute: SVDY01 CY2 D U Diag.1 ; 2 ; 3 /V0 3-1 j.Y01 CY2 /0 .Y01 CY2 / i2 Ij D 0 ) .1 ; 2 ; 3 / 3-2 ..Y01 CY2 /0 .Y01 CY2 / i2 I/vi D 0; 8i 2 f1; 2; 3g V D Œv1 ; v2 ;v3 right eigenvectors (right eigencolumns/ 3-3 U D
Y01 CY2 VDiag.11 ; 21 ; 31 /
left eigenvectors (left eigencolumns)
Step 4: Compute: X3` D UV0 rotation trY01 CY2 X03 (dilatation) trY02 CY2 1 Step 6: Compute: x2` D .Y1 Y2 X03 x1 /0 1 (translation) n trY01 CY2 VU0 / (error matrix) Step 7: Compute: E` D C.Y1 Y2 VU0 trY02 CY2 ‘optional control0 E` WD Y1 .Y2 X03 x1` C 1x02` / p Step 8: Compute: jjE` jjI WD tr.E0` E` / (error matrix) p Step 9: Compute: jjE` jjI WD tr.E0 ` E` /=3n (mean error matrix) Step 5: Compute: x1` D
† vecE0 D † vecY01 C.In ˝x1 X3 /† vecY02 .In ˝x1 X3 /0 2† vecY01 ;.In ˝x1 X3 /vecY02 (12.39) is the exact representation of the dispersion matrix (variance–covariance matrix) † vecE0 of vecE0 in terms of dispersion matrices (variance–covariance matrices) † vecY01 and † vecY02 of the two coordinates sets vecY01 and vecY02 as well as their covariance matrix † vecY01 ;.In ˝X3 /vecY02 : The proof follows directly from “error propagation”. Obviously the variance– covariance matrix of † vecE0 can be decomposed in the variance–covariance matrix † vecY01 , the product .In ˝ x1 X3 /† vecY02 .In ˝ X3 /0 using prior information of x1 and X3 and the covariance matrix † vecY01 ;.In ˝x1 X3 /vecY02 again using prior information of x1 and X3 .
12-21 Case Studies: The 3d Datum Transformation and the Procrustes Algorithm By Tables 12.1 and 12.2 we present two sets of coordinates, first for the local system A, second for the global system B, also called “World Geodetic System 84”. The units are in meter. The results of
472
12 The Nonlinear Problem of the 3d Datum Transformation and the Procrustes Algorithm
Table 12.2 Coordinates for system A (local system) Station Name Solitude Buoch Zeil Hohenneuffen Kuehlenberg Ex Mergelaec Ex Hof Asperg Ex Kaisersbach
X.m/ 4157222.543 4149043.336 4172803.511 4177148.376 4137012.190 4146292.729 4138759.902
Y .m/ 664789.307 688836.443 690340.078 642997.635 671808.029 666952.887 702670.738
Z.m/ 4774952.099 4778632.188 4758129.701 4760764.800 4791128.215 4783859.856 4785552.196
Positional error sphere 0.1433 0.1551 0.1503 0.1400 0.1459 0.1469 0.1220
Table 12.3 Coordinates for system B (WGS 84) Station Name
X.m/
Y .m/
Z.m/
Positional error sphere
Solitude Buoch Zeil Hohenneuffen Kuehlenberg Ex Mergelaec Ex Hof Asperg Ex Kaisersbach
4157870.237 4149691.049 4173451.354 4177796.064 4137659.549 4146940.228 4139407.506
664818.678 688865.785 690369.375 643026.700 671837.337 666982.151 702700.227
4775416.524 4779096.588 4758594.075 4761228.899 4791592.531 4784324.099 4786016.645
0.0103 0.0038 0.0006 0.0114 0.0068 0.0002 0.0041
I-LESS, Procrustes Algorithm are listed in Table 12.3, especially jjE` jjI WD
q
tr.E0` E` /; jjjE` jjjI WD
q
tr.E0` E` /=3n
and W-LESS, Procrustes Algorithm in Table 12.4, specially jjE` jjW WD
q
tr.E0` WE` /; jjjE` jjjW WD
q
tr.E0` WE` /=3n
completed by Table 12.5 of residuals from the Linearized Least Squares and by Table 12.6 listing the weight matrix.
Discussion By means of the Procrustes Algorithm which is based upon W-LESS with respect to Frobenius matrix W-norm we have succeeded to solve the normal equations of Corollary 12.2 and 12.4 (necessary conditions) of the matrix-valued “error equations”
12-2 The Variance: Covariance Matrix of the Error Matrix E
473
Table 12.4 Results of the I-LESS Procrustes transformation Values Rotation Matrix X3 2 R33 Translation x2 2 R31 .m/ Scale x1 2 R Residual matrix E.m/
0.999999999979023 4.8146461589238e-6 4.33273602401529e-6 641.8804 68.6553 416.3982 1.00000558251985 Site Solitude Buoch Zeil Hohenneuffen Kuelenberg Ex Mergelaec Ex Hof Asperg Ex Keisersbach
4.33275933098276e- 6 4.81462518486797e-6 0.999999999976693 4.84085332591588e-6 4.84087418647916e-6 0.999999999978896
X.m/ 0.0940 0.0588 0.0399 0.0202 0.0919 0.0118 0.0294
Y.m/ 0.1351 0.0497 0.0879 0.0220 0.0139 0.0065 0.0041
Z.m/ 0.1402 0.0137 0.0081 0.0874 0.0055 0.0546 0.0017
Error matrix norm (m) p jk El kjW WD tr.E 0.2890 l El / Mean error matrix norm (m) p jk El kjW WD tr.E l El /=3n 0.0631
Table 12.5 Results of the W-LESS Procrustes transformation Values Rotation Matrix X3 2 R33
0.999999999979141 4.77975830372179e-6 4.34410139438235e-6 4.77977931759299e-6 0.999999999976877 4.83729276438971e-6 4.34407827309968e-6 4.83731352815542e-6 0.999999999978865
Translation x2 2 R31 .m/
641.8377 68.4743 416.2159 1.00000561120732 Site Solitude Buoch Zeil Hohenneuffen Kuelenberg Ex Mergelaec Ex Hof Asperg Ex Keisersbach
Scale x1 2 R Residual matrix E.m/
X.m/ 0.0948 0.0608 0.0388 0.0195 0.0900 0.0105 0.0266
Y.m/ 0.1352 0.0500 0.0891 0.0219 0.0144 0.0067 0.0036
Z.m/ 0.1407 0.0143 0.0072 0.0868 0.0052 0.0542 0.0022
Error matrix norm (m) p jk El kjW WD tr.E 0.4268 l WEl / Mean error matrix norm (m) p jk El kjW WD tr.E l WEl /=3n 0.0930
vecE0 D vecY01 .In ˝ x1 X3 /vecY02 vecx2 10 subject to X03 X3 D I3 ; jX3 j D C1: The scalar - valued unknown x1 2 R represented dilatation (scale factor), the vector - valued unknown x2 2 R31 the translation vector, and the matrix - valued
474
12 The Nonlinear Problem of the 3d Datum Transformation and the Procrustes Algorithm
Table 12.6 Residuals from the linearized LS solution Site
X.m/
Y .m/
Z.m/
Solitude Buoch Zeil Hohenneuffen Kuelenberg Ex Mergelaec Ex Hof Asperg Ex Keisersbach
0.0940 0.0588 0.0399 0.0202 0.0919 0.0118 0.0294
0.1351 0.0497 0.0879 0.0220 0.0139 0.0065 0.0041
0.1402 0.0137 0.0081 0.0874 0.0055 0.0546 0.0017
Table 12.7 Weight matrix 3 1:8110817 0 0 0 0 0 0 7 6 0 2:1843373 0 0 0 0 0 7 6 7 6 0 0 2:1145291 0 0 0 0 7 6 7 6 WD6 0 0 0 1:9918578 0 0 0 7; 7 6 7 6 0 0 0 0 2:6288452 0 0 7 6 5 4 0 0 0 0 0 2:1642460 0 0 0 0 0 0 0 2:359370 2
unknown X3 2 SO.3/ the orthonormal matrix. The conditions of sufficiency, namely the Hesse matrix of second derivatives, of the Lagrangean L.x1 ; x2 ; X3 / are not discussed here. They are given in the Procrustes references. In order to present you with a proper choice of the isotropic weight matrix W, we introduced the corresponding “random regression model” (Table 12.7) EfvecE0 g D EfvecY01 g .In ˝ x1 X3 /EfvecY02 g vecx2 10 D 0 first moment identity, DfvecE0 g D DfvecY01 g .In ˝ x1 X3 /DfvecY02 g.In ˝ x1 X3 /0 2CfvecY01 ; .In ˝ x1 X3 /vecY02 g; second central moment identity.
12-3 References Here is a list of important references: Awange LJ (1999), Awange LJ (2002), Awange LJ, Grafarend E (2001 a, b, c), Awange LJ, Grafarend E (2002), Bernhardt T (2000), Bingham C, Chang T, Richards D (1992), Borg I, Groenen P (1997), Brokken FB (1983), Chang T, Ko DJ (1995), Chu MT, Driessel R (1990), Chu MT, Trendafilov NT (1998), Crosilla F (1983a, b), Dryden IL (1998),
12-3 References
475
Francesco D, Mathien PP, Senechal D (1997), Golub GH (1987), Goodall C (1991), Gower JC (1975), Grafarend E and Awange LJ (2000, 2003), Grafarend E, Schaffrin B (1993), Grafarend E, Knickmeyer EH, Schaffrin B (1982), Green B (1952), Gulliksson M(1995a, b), Kent JT, Mardia KV (1997), Koch KR (2001), Krarup T (1979), Lenzmann E, Lenzmann L (2001a, b), Mardia K (1978), Mathar R (1997), Mathias R (1993), Mooijaart A, Commandeur JJF (1990), Preparata FP, Shamos MI (1985), Reinking J (2001), Sch¨onemann PH (1966),Sch¨onemann PH, Carroll RM (1970), Schottenloher M (1997), Ten Berge JMF (1977), Teunissen PJG (1988), Trefethen LN, Bau D (1997) and Voigt C (1998).
Chapter 13
The Sixth Problem of Generalized Algebraic Regression the system of conditional equations with unknowns - (Gauss–Helmert model) C.F. Gauss and F.R. Helmert introduced the generalized algebraic regression problem which can be identified as a system of conditional equations with unknowns. Here we present variance-covariance component estimation of Helmert type in the Gauss–Helmert model.
Here we follow E.Grafarend: “Variance-covariance-component estimation of HELMERT type in the Gauss-Helmert model” in detail. In the first section we carefully define the Gauss-Helmert model of condition equations with unknown parameters in a linear model. The second section is E.Grafarend‘s model of variance-covariance-component estimation of HELMERT type within linear models of inhomogeous condition equations: B " D By c, By not an element of R.A/ C c. In contrast, section three is E.Grafarend’s model of variance-covariance-component estimation of HELMERT type within the linear model of inhomogeneous condition equations with unknown parameters, namely within the linear model model Ax C " C B" D By c, By not an element of R.A/ C c. Finally we discuss in section four the block structure of the dispersion matrix D(y) and give two detailed examples for the variance-covariance estimation within the Gauss-Helmert model of Helmert type. In order to improve an adjustment result in the Gauss–Helmert model (condition equations with unknowns: Ax C Be D By c) the stepwise procedure of estimation of variance-covariance components of the observation vector y originating from F.R. Helmert for the Gauss–Markov model is developed. The basic result is the construction of a local Helmert-type inhomogeneous, invariant, quadratic and unbiased estimator of variance-covariance components. First results are presented for (i) a parameter matrix A of full column rank, (ii) a condition matrix B of full row rank and (iii) a positive-definite dispersion matrix of the observations. Two example are given. F.R. Helmert (1924, pp. 258–267) introduced a stepwise procedure for the estimation of variance-covariance components in the Gauss–Markov model of parameter adjustment which will be extended here for variance-covariance-component estimation in the model of condition adjustment in the general model of condition equations with unknown parameters or the Gauss–Helmert model according to H. Wolf (1978). In contrast to other estimation techniques which are reviewed by B. Schaffrin (1983) the Helmert method has the following advantages:
E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 13, © Springer-Verlag Berlin Heidelberg 2012
477
478
13 The Sixth Problem of Generalized Algebraic Regression
In the first step unknowns and inconsistencies of the observations with respect to the model (“measurement errors”) are determined by the classical model of weighted least-squares assuming a priori variances and covariances of the observations. In the second step the a priori values of the observational variances are corrected by solving the famous Helmert system of linear equations. In case that the corrected variance-covariance-component differ significantly from the a priori ones the estimation process can be started again, now with the new variances and covariances of the observations. The stepwise procedure can come to an end when a priori and a posteriori variance-covariance-component coincide, a situation we call “reproducing”. There has been the argument the argument that “variance-covariance-component estimation” has no practical impact on the adjustment result. But referring to our own experience the contrary is true. From an adjustment of heterogeneously observed three-dimensional networks we found improvements of adjustment results through the stepwise Helmert procedure of more than a hundred percent. E.Grafarend’s contribution on “variance-covariance-component estimation of HELMERT type in the Gauss–Helmert model” (Zeitschrift f¨ur Vermessungswesen 109 (1984) 34–44), namely condition equations with unknowns, is organized in the following way: The first paragraph reviews the classical variance-covariancecomponent estimation of Helmert type of the linear Gauss–Markov model of parameter adjustment. A first extension for the model of linear inhomogeneous condition equations between the observations is given in paragraph two. As a notable result we mention the character of an inhomogeneous, quadratic unbiased of the generalized Helmert estimator for this model, in general. Paragraph three is devoted to variance-component of Helmert type in the linear Gauss–Helmert model, also called “inhomogeneous, inconsistent condition equations with unknown parameter”. A local Helmert-type invariant, inhomogeneous, quadratic an unbiased estimator is constructed. Finally, the fourth paragraph is an introduction into the block structure of the variance-covariance matrix of the observations. Our first results of variance-covariance-component estimation with respect to the model Ax C B" D By c are restricted to (i) a parameter matrix A of full column rank, (ii) a condition matrix B of full row rank and (iii) a positive-definite variance-covariance matrix of the observations. More general results will be given elsewhere. Here we have generalized the variance-covariance-component estimation of Helmert Type of Gauss–Markov model given by E. Grafarend, A. Kleusberg and B. Schaffrin (1980), first results for condition adjustment without parameter unknowns of L. Sj¨oberg (1983a,b) and four condition adjustment with unknowns of C.G Person (1982). Finally we mention a peculiar use of our variance-covariancecomponent estimation technique, the estimation of variance-covariance functions of “signals” and “noise” within “collocation”, e.g. in the version of H. Wolf (1977).
13-1 Variance-Covariance-Component Estimation in the Linear Model
479
13-1 Variance-Covariance-Component Estimation in the Linear Model Ax C " D y; y … R.A/ According to E.Grafarend (1984): Variance-covariance-component estimation of Helmert type in the Gauss-Helmert model, Zeitschrift fuer Vermessungswesen 109 (1984) pp. 30–40 we follow his text: we assume that the system of linear equations Ax C " D y is formulated by (13.1) and (13.2). The n m fixed matrix A of full column rank r.A/ D m connects the m 1 fixed unknown vector x of parameters and the n 1 stochastic vector y of observations. The n 1 stochastic vector " of inconsistency accounts for the fact that the observation vector y does not belong to the range R.A/ of the matrix A spanned by its columns. " is chosen in such a way that y " is an element of the column space of A. Within second order statistics the linear model Ax C " D y is characterized by the first and second order moments of the vector y of observations. E.y/ D
m X
ai i
(13.1)
lD1
D.y/ D
l X j D1
Cjj j2 C
l l1 X X j D1 kD2
l.lC1/=2
Cj k j k D
X
Cj j
(13.2)
j D1
j
(13.3)
480
13 The Sixth Problem of Generalized Algebraic Regression
@L ."l / D 2†1 0 ""l D 0 @" @L 1 .xl / D 2.A0 †1 0 Ax1 A†0 y/ D 0 @x @2 L ."l / D 2†1 0 0 @""0
(13.4) (13.5) (13.6)
The first derivatives of the Lagrange-function constitute the necessary condition, the second derivatives the sufficiency condition. Since we have assumed a positive definite dispersion matrix †0 (13.25) is always fulfilled. (Here it would have been sufficient to have started from a positive-semidefinite weight matrix, but this generalization will be treated elsewhere.) Thus we arrive at the classical least-square solution 1 0 1 x1 D .A0 †1 (13.7) 0 A/ A †0 y (ii) The invariance of the estimation Let us make use of the estimation x1 in order to compute the stochastic vector "l of inconsistency. 1 0 1 "l D y Ax1 D ŒI A.A0 †1 0 A/ A †0 y D D0 y 1
D0 WD I A.A0 †0 A/1 A0 †1 0
(13.8) (13.9)
Lemma 13.1. Let "l D D0 y be the weighted least-squares estimation of the stochastic vector " of inconsistency. "l is invariant with respect to the transformation y ! y Ax D " such that " l D D0 " holds. D0 is an idempotent matrix. For the proof replace y by y Ax within (13.10). 1 0 1 ŒI A.A0 †1 0 A/ A †0 .y Ax/ 1 0 1 0 1 1 0 1 ŒI A.A0 †1 0 A/ A †0 y ŒA ŒI A.A †0 A/ A †0 x 1 0 1 ŒI A.A0 †1 0 A/ A †0 y“q:e:d”
Next we compute 1 0 1 0 1 1 0 1 D20 D ŒI A.A0 †1 0 A/ A †0 ŒI A.A †0 A/ A †0 1 0 1 0 1 1 0 1 I A.A0 †1 0 A/ A †0 A.A †0 A/ A †0 1 0 1 0 1 1 0 1 CA.A0 †1 0 A/ A †0 A.A †0 A/ A †0 1 0 1 I A.A0 †1 0 A/ A †0 D D0
which is by definition the postulate of idempotence.
(13.10)
13-1 Variance-Covariance-Component Estimation in the Linear Model
481
The underlying ideas of the variance-covariance-component estimation techniques of Helmert type are as following. At first we compute Ef"0l †1 0 "l g which can be expressed via (13.10). "l D D0 " in the terms of E.""0 / D †. Secondly we partition †1 0 and † into consistent blocks. Once we combine those procedure we gain a linear system of equations from which an unbiased estimation H of variancecovariance-components, based on the †1 0 weighted least-square estimation "l can be constructed. We give the steps of calculations in details, now. 0 0 1 0 1 0 0 1 Ef"0l †1 0 "l g D Ef" D0 †0 D0 "g D trD0 †0 D0 ."" / D trD0 †0 D0 † l.lC1/=2
X
†D
Cj j ; †0
1
l.lC1/=2
D
j D1 l.lC1/=2
X
Ef"0l Ei .0 /"l g
(13.11)
X
Ej .0 /
(13.12)
j D1 l.lC1/=2 l.lC1/=2
D
i D1
X
X
i D1
j D1
trD00 ei .0 /D0 Cj j
(13.13)
l.lC1/=2
Ef"0l Ei .0 /"l g D
X
trD00 Ei .0 /D0 Cj j
.i D 1; : : : ; l.l C 1/=2/
j D1
(13.14) l.lC1/=2
E.qi / D
X
Hij j
.i D 1; : : : ; l.l C 1/=2/
(13.15)
i D1
qi WD "0l Ei .0 /"l D y0 D00 Ei .0 /D0 y Hij WD
trD00 Ei .0 /D0 Cj
.i D 1; : : : ; l.l C 1/=2/ .i; J D 1; : : : ; l.l C 1/=2/
(13.16) (13.17)
Hij H is the famous l.l C 1/=2/ l.l C 1/=2/ Helmert matrix. Let us assume Hij H is a regular matrix such that D H 1 Efqg. Now we choose H WD H1 q
(13.18)
as the local variance-covariance-component estimation of Helmert type, we can guarantee unbiasedness, E.H / D , due to E.H / D H 1 Eq D Theorem 13.1. Assume the Helmert matrix H to be regular. Then H is unique and 0 -HIQUE (0 -Helmert type invariant quadratic unbiased estimation). Uniqueness is obvious from (13.18). Invariance with respect to y ! y Ax D " is consequence of Lemma 13.1 while the property of a quadratic unbiased estimation follows directly from construction principle. A more general theorem, especially based on a singular Helmert matrix, ha been given by E. Grafarend, A Kleusberg
482
13 The Sixth Problem of Generalized Algebraic Regression
and B. Schaffrin (1980, pp. 165–167) to which we refer. We note that even if H is regular, H is not 0 -BIQUE (0 -best IQUE). Step of higher order It has been stated that the 0 -Helmert type variance-covariance-component estimation H is a local one. The function H .0 / illustrates this property. Conventionally, geodesists have designed algorithms which iterate the function H .0 / to the point H D 0 , the 0 -reproducing variance-covariance-component estimation such that H D H1 .0 /q.0 / D 0 . But it has been shown by E. Grafarend and A. Kleusberg (1980, pp. 133–135) that there are more than one reproducing points. We have therefore advocated to make a plot of the function H .0 / and H D 0 in order to have a deeper insight into the local behavior of the estimation process. Anyway the higher order steps of iteration H .0 / will be stopped at some point H D 0
13-2 Variance-Covariance-Component Estimation in the Linear Model B" D By c; By … R.A/ C c According to E.Grafarend (1984): Variance-covariance-component estimation of HELMERT type in the Gauss-Helmert model, Zeitschrift fuer Vermessungswesen 109 (1984) pp. 30–40 we follow his text: given the system of linear condition equation B" D By c: The q n fixed matrix B of full row rank r.B/ D q connects to n 1 stochastic vector " of inconsistency and the q 1 stochastic vector By c where the q 1 fixed vector c of inhomogeneity accounts for the condition equations or constraints. The n 1 vector of " of inconsistency is chosen in such a way that By does not belong to the range R.B/ of the matrix B (spanned by its columns) and being translated by the vector c of inhomogeneity and that BE.y/ D c holds. Within second order statistics the linear model B" D By c is characterized by the first and second order moments of the vector y of observations. E.y/ D y ";
D.y/ D
l X j D1
Cjj j2
C
j l1 X X j D1 kD2
E.By/ D BE.y/ D c
(13.19)
l.lC1/=2
Cj k j k D
X
Cj j ; D.By/ D BD.y/B0 (13.20)
j D1
j
13-2 Variance-Covariance-Component Estimation in the Linear Model
483
In the first step of the estimation process of fE.y/; D.y/g we depart from an initial variance-covariance matrix †0 of the dispersion matrix D.y/ D † of the observation vector y, or equivalently, from an initial set 0 of variance-covariancecomponents. Using the initial dispersion matrix †0 , the unknown vector " of inconsistency are estimated by “constrained least-squares”: (i) Lagrange-function L."; / WD "0 01 " C 20 .B" By C c/
(13.21)
D Œy E.y/0 01 Œy E.y/ C 20 ŒBE.y/ C c D min "; @L ."l ; l / D 201 "l C 2B0 l D 0 @" @L ."l ; l / D 2.B"l By C c/ D 2fBŒE.y/l C cg D 0 @" @L fŒE.y/l ; l g D 201 fy ŒE.y/l g 2B0 l D 0 @" @2 L ."l ; l / D 201 0 @"@"0
(13.22)
(13.23) (13.24) (13.25)
The first derivatives of the Lagrange-function constitute the necessary conditions, the second derivatives the sufficiency condition. Since we assume a positive-definite dispersion matrix 0 in paragraph one, (13.25) is always fulfilled. (Here it would have been sufficient to have started from a positive-semidefinite weight matrix, but this generalization will be treated elsewhere.) Thus we arrive at the classical system of normal equations 1 0 0 †0 B "l D (13.26) By c B 0 l where the block matrix of normal equations is called the fundamental matrix of constrained minimization. (For more details see S.L Campbell and C.D Meyer (1979, pp. 63–70).) For a q n matrix B of full row rank, r.B/ D q, and for a positive-definite variance-covariance matrix 0 the vector "l of inconsistency and the vector l of Lagrange-multipliers can be easily obtained from the system of normal equations. "l D † 0 B0 .B†0 B0 /1 .By c/;
l D .B†0 B0 /1 .By c/
ŒE.y/l D y "l D ŒI 0 B0 .B†0 B0 /1 By †0 B0 .B†0 B0 /1 c
(13.27) (13.28)
484
13 The Sixth Problem of Generalized Algebraic Regression
(ii) The invariance of estimation Let us discuss the “constrained least-squares” estimation "l of the vector " of inconsistency. Once write "l D † 0 B0 .B†0 B0 /1 .By c/ D l C L0 y ŒE.y/l D ŒI †0 B0 .B†0 B0 /1 By †0 B0 .B†0 B0 /1 c D l0 C M0 y 0
0 1
l0 W D †0 B .B†0 B / c L0 W D †0 B0 .B†0 B0 /1 B;
(13.29) (13.30) (13.31)
M0 WD I L0
(13.32)
It is obvious that (13.29), (13.30) represent linear, but inhomogeneous estimations. Lemma 13.2. Let "l D †0 B0 .B†0 B0 /1 .By c/ D l C L0 y be the constrained least-squares estimations of the stochastic vector " of inconsistency. "l is invariant with respect to the transformation By c ! BŒy E.y/ such that "l D L0 "
(13.33)
holds. L0 is an idempotent matrix. The proof follows the lines of the one of Lemma 13.1. (iii) Variance-covariance-component estimation of Helmert type For a study of the underlying ideas of the variance-covariance-component estimation techniques of Helmert type we refer to paragraph 1 (iii). We only have to 1 0 1 0 0 1 exchange D0 D I A.A0 †1 0 A/ A †0 by L0 D 0 B .B0 B / B , e.g. Ef"0l Ei .0 /"l g
l.lC1/=2
D
X
trL00 Ei .0 /L0 Cj j .i D 1; : : : ; l.l C 1/=2
(13.34)
j D1 l.lC1/=2
E.qi / D
X
Hij j .i D 1; : : : ; l.l C 1/=2
(13.35)
qi WD "0l Ei .0 /"l D .l C L0 y/0 Ei .0 /.l C L0 y/ .i D 1; : : : ; l.l C 1/=2
(13.36)
j D1
Hij WD trL00 Ei .0 /L0 Cj .i; j D 1; : : : ; l.l C 1/=2 (13.37) Hij H is the generalized l.l C 1/=2 l.l C 1/=2 Helmert matrix. Let us assume Hij H is a regular matrix such that D H1 E.q/. Now we choose H WD H1 q
(13.38)
as the local variance-covariance-component estimation of Helmert type, we can guarantee unbiasedness, E.H / D , due to E.H / D H1 E.q/ D .
13-3 Variance-Covariance-Component Estimation in the Linear Model
485
Theorem 13.2. Assume the g-Helmert matrix H to be regular. Then H is unique and 0 -Helmert type invariant inhomogeneous quadratic unbiased estimation. Uniqueness is obvious from (13.38). Invariance with respect to By c ! BŒy E.y/ is a consequence Lemmas 2 which the property of an inhomogeneous mixed linear-quadratic estimation follows directly form the construction principle. Note c D 0 the 0 -Helmert type estimation H is 0 -HIQUE. A more general theorem, especially based on a singular g-Helmert matrix, will be given elsewhere. Steps of high order The nature of a local estimation H .0 / leads again to an iteration scheme to a point H D 0 , the 0 -reproducing variance-covariance-component estimation such that H D H1 .0 /q.0 / D 0 . Again we advocate the computation of the complete graph H .0 / since more than one reproducing point might appear.
13-3 Variance-Covariance-Component Estimation in the Linear Model Ax C " C B" D By c; By … R.A/ C c According to E.Grafarend (1984): Variance-covariance-component estimation of HELMERT type in the Gauss-Helmert model, Zeitschrift fuer Vermessungswesen 109 (1984) pp. 40–42 we follow his text: given the system of linear equations Ax C " C B" D By c: The stochastic n 1 vector " of inconsistency pays attention to the fact that the q 1 vector By does not belong to the range, spanned by the column space of the n m matrix A and translated by the q 1 vector c. The vector " is chosen in such a way that the vector By c is an element of the column space of the matrix A translated by the vector c. The fixed q m matrix A is full of column rank, r.A/ D m, but the fixed q n matrix B of full row rank r.B/ D q. Within second order statistics the linear model Ax C " C B" D By c is characterized by the first and second order moments of the n 1 vector y of observation. Ey D y " BE.y/ D
m X
(13.39)
ai i C c D Ax C c
(13.40)
i D1
D.y/ D
l X
Cjj j C
j D1
D.By/ D BD.y/B0
l l1 X X j D1 kD2
l.lC1/=2
Cj k j k D
X
Cj j
(13.41)
j D1
(13.42)
The block structure of the dispersion matrix D(y) is discussed in more detail in Sect. 13-1 and in Sect. 13-4. Here we develop the idea of stepwise estimation of
486
13 The Sixth Problem of Generalized Algebraic Regression
variance-covariance-component of F.R. Helmert type with respect to the linear model of condition equations with unknown Ax C " C B" D By c The initial step In the first step of the estimation process of .; / or .x1 ; x2 / we depart from initial variance-covariance matrix †0 of the dispersion matrix D.y/ D † of the observation vector y, or, equivalently, from an initial set 0 of variance-covariancecomponents. Using the initial dispersion matrix †0 , the unknown vector x and " are estimated by “constrained least-squares”: (i) Lagrange-function 0 L."; x; / WD "0 †1 0 " C 2 .Ax C B" By C c/ D min "; x;
(13.43)
@L 0 ."l ; xl ; l / D 2†1 0 "l C 2B l D 0 @" @L ."l ; xl ; l / D 2A0 l D 0 @x @L ."l ; xl ; l / D 2.Axl C B"l By C c/ D 0 @ @0 L ."l ; xl ; l / D 2†1 0 0 @"@"0
(13.44) (13.45) (13.46)
The first derivatives of the Lagrange-function constitute the necessary conditions, the second derivatives the sufficiency condition. Since we have assumed a positivedefinite dispersion matrix †0 in paragraph one, (13.46) is always fulfilled. (Here it would have been sufficient to have started from a positive-definite weight matrix, but this generalization will be treated elsewhere.) Thus we arrive at the classical system of normal equations. 2
3 32 3 2 0 †1 0 "l 0 B 0 4 B 0 A 5 4 l 5 D 4 By c 5 0 xl 0 A0
(13.47)
with the conventional block matrix of constrained minimization. For a q m matrix A of full column rank, r.A/ D m, a q n matrix B of full rank, r.B/ D q and a positive-semidefinite weight matrix †0 the unknown vectors "l ; l and xl can be easily obtained by successive reduction of the system of normal equations.
13-3 Variance-Covariance-Component Estimation in the Linear Model
487
"l D †0 B0 .B†0 B0 /1 fIAŒA0 .B†0 B0 /1 A1 A0 .B†0 B0 /1 g.Byc/ (13.48) †0 B0 .B†0 B0 /1 f.By c/ Axl g l D .B†0 B0 /1 fAŒA0 .B†0 B0 /1 A1 A0 .B†0 B0 /1 Ig.By c/
(13.49)
xl D ŒA0 .B†0 B0 /1 A1 A0 .B†0 B0 /1 .By c/
(13.50)
(ii) The variance of the estimation Let us discuss the “constrained least-squares” estimation "l of the vector " of inconsistency. Once write "l D †0 B0 .B†0 B0 /1 fI AŒA0 .B†0 B0 /1 A1 A0 .B†0 B0 /1 g.By c/ (13.51) D f0 C F0 y f0 WD †0 B0 .B†0 B0 /1 fI AŒA0 .B†0 B0 /1 A1 A0 .B†0 B0 /1 gc
(13.52)
F0 WD †0 B0 .B†0 B0 /1 fI AŒA0 .B†0 B0 /1 A1 A0 .B†0 B0 /1 gB
(13.53)
it is obvious that (13.52) represents a linear, but inhomogeneous estimation. Lemma 13.3. Let "l D f0 C F0 y be the constrained least-squares estimation of the stochastic vector " of inconsistency. "l is invariant with respect to the transformation By c ! Ax C BŒy E.y/ such that " l D F0 "
(13.54)
holds. F0 is an idempotent matrix The proof follows the lines of Lemma 13.1. (iii)Variance-covariance-components estimation of Helmert type Again we refer for a motivation of the variance-covariance-component estimation technique of Helmert type to paragraph 1 (iii). We only have to exchange 1 0 1 Dl D I A.A0 †1 0 A/ A †0 byF0
(13.55)
in (13.53) Ef"0l Ei .0 /"l g
l.lC1/=2
D
X
trF00 Ei .0 /F0 Cj j .i D 1; : : : ; l.l C 1/=2 (13.56)
j D1 l.lC1/=2
E.qi / D
X
Hij j .i D 1; : : : ; l.l C 1/=2
j D1
qi W D "0l Ei .0 /"l D .f0 C F0 y/0 Ei .0 /.f0 C F0 y/
(13.57)
488
13 The Sixth Problem of Generalized Algebraic Regression
.i D 1; : : : ; l.l C 1/=2 Hij W D
(13.58)
trF00 Ei .0 /F0 Cj .i; j
D 1; : : : ; l.l C 1/=2
(13.59)
Hij H is the generalized l.l C 1/=2 l.l C 1/=2 Helmert matrix. Let us assume Hij H is a regular matrix such that D H1 E.q/. Now we choose H WD H1 q
(13.60)
as the local variance-covariance-component estimation of Helmert type, we can guarantee unbiasedness, E.H / D , due to E.H / D H1 E.q/ D . Theorem 13.3. Assume the g-Helmert matrix H to be regular. Then H is unique and 0 -Helmert type invariant inhomogeneous quadratic unbiased estimation. Uniqueness is obvious from (13.60). Invariance with respect to By c ! Ax C BŒy E.y/ is a consequence of Lemma 13.3 while the property of an inhomogeneous mixed linear-quadratic estimation follows directly from the construction principle, especially (13.52). Note that for c D 0 the 0 -Helmert type estimation H is 0 -HIQUE. A more general theorem, especially based on a singular g-Helmert matrix, will be given elsewhere. Steps of high order The local property of estimation H .0 / leads to the standard iteration problem. From the complete graph H .0 / we gain insight into the locations of the 0 -reproducing variance-covariance-components estimation such that H D H1 . 0 /q. 0 / D . 0 / In the applications of general linear model Ax C " C B" D By c a partitioned version appears: 2
3 0 A WD 4 A2 5 ; A3
2
3 B1 B WD 4 B2 5 ; 0
3 c1 c WD 4 c2 5 ; c3 2
c3 2 R.A3 /
(13.61)
(i) first partitioning inconsistent system of inhomogeneous condition equations between the observations only: B1 " D B1 y c1 (ii) second partitioning inconsistent system of inhomogeneous condition equations with unknowns: A2 xB2 " D B2 y c2 (iii) third partitioning consistent system of inhomogeneous condition equations between unknowns only: A3 D c3 ; c3 2 R.A3 /
13-4 The Block Structure of Dispersion Matrix Dfyg
489
13-4 The Block Structure of Dispersion Matrix Dfyg According to E.Grafarend (1984): Variance-covariance-component estimation of HELMERT type in the Gauss-Helmert model, Zeitschrift fuer Vermessengswesen 109(1984) pp.42-44 at first we will review the block-partitioning of the dispersion matrix Dfyg of the observation vector y. An example from leveling and another one from distance observations will illustrate the technique of decomposition into blocks. The variance-covariance matrix Dfyg of the stochastic vector y of observations is decomposed into 3 12 Q11 12 Q12 : : : 1l Q1l 6 12 Q0 2 Q22 : : : 2l Q2l 7 12 2 7 (13.62) D6 5 4::: 0 0 2 1l Q1l 2l Q2l : : : sl Ql l 2
D.y/ D
l X
Ci i i2 C
j D1
l l1 X X
Cj k j k
j D1 kD1
2
Cjj
3 0 ::: 0 WD 4 : : : Qjj : : : 5 row j .j D 1; : : : ; l/ 0 ::: 0
(13.63)
column j 2
Cj k
0 6::: 6 WD 6 4 0
3 ::: 0 0 Qj k : : : 7 7 7 row j .j; k D 1; : : : ; l/ Q0j k 0 5 ::: 0
(13.64)
column j Example A1 (leveling) Let us estimate the variance-components of forward and backward observations in a triangular leveling network, namely at two different epochs of observation. The 4 4 dispersion matrix D.y/ D †.h12 ; h23 ; h21 ; h32 D † of the observations h12 ; h23 at epoch one and h21 ; h32 at epoch two can be block-partitioned into 2 11 C 22 212 612 C 23 13 22 6 6 6 †D6 6 6 2 4 C 1 1 2 1 2 1 2 13 C 22 12 23
12 C 23 22 13 21 C 12 11 22 22 C 33 223 22 C 31 21 32 22 C 31 21 23 23 C 32 22 33
11 C 22 2 12 21 C 32 31 22
3 22 C 13 12 23 23 C 31 22 33 7 7 7 7 7 7 7 12 23 13 C 2 5 22 C 33 2 23 (13.65)
490
13 The Sixth Problem of Generalized Algebraic Regression
where ij D EŒ"i .t1 /"j .t1 / D i .t1 /j .t1 /%ij D ˚ij .t1 ; t1 /
(13.66)
j i
D ˚ij .t1 ; t2 /
(13.67)
D EŒ"i .t2 /"j .t2 / D i .t2 /j .t2 /% D ˚ij .t2 ; t2 /
(13.68)
ij
D EŒ"i .t1 /"j .t2 / D
j i .t1 /j .t2 /%i ij
(13.69) j
We call ii D ˚i i .t1 ; t2 / autovariance and i D ˚ij .t1 ; t2 /; i ¤ j , crossvariance. In a first design of second order let us assume that the absolute heights are uncorrelated, their variances to be identical within one epoch, but varying from epoch to epoch. Then the second order design matrix of observed height differences can be represented by 2
3 2 2 .t1 / 2 .t1 / 0 6 2 .t1 / 2 2 .t1 / 7 7 †D6 4 0 0 2 2 .t2 1/ 2 .t1 / 5 0 0 2 .t1 / 2 2 .t2 1/ 2 2 3 3 2 1 0 0 00 0 0 6 1 2 0 0 7 6 7 2 60 0 0 0 7 7 D 12 6 4 0 0 0 0 5 C 2 4 0 0 2 1 5 0 0 00 0 0 1 2 Q11 0 0 0 D C11 .1 /2 C C22 .2 /2 D 12 C 22 0 Q22 0 0 where 2 .t1 / D .1 /2 ; 2 .t2 / D .2 /2 are unknown variance-covariance-component of absolute heights at epoch t1 and t2 .
Q11 D Q22
2 1 D 1 2
(13.70)
Here the open problem is to estimate 2 .t1 / and 2 .t2 / Example A2 (distance observations): Three targets at distances 1, 2 and 4 km will be observed twice by distances. The two sets of observations lead to the conditions E.y1 / D E.y4 /; E.y2 / D E.y5 /; E.y3 / D E.y6 / or
13-4 The Block Structure of Dispersion Matrix Dfyg
3 1 0 0 1 0 0 B" D 4 0 1 0 0 1 0 5 " DBy 0 0 1 0 0 1
491
2
(13.71)
Let us assume that the observations are uncorrelated, but the variances will be represented by (13.72) y2i D 12 C i2 22 where 12 is the distance-independent part, but 22 the distance-dependent part of the observational variance. Thus the 6 6 dispersion matrix D.y/ D †.y1 ; : : : ; y6 / D † of observations y1 ; : : : ; y6 can be represented by 2 2 3 1 C22 0 0 0 6 7 1 6 7 12 C 22 0 0 0 6 7 4 6 7 1 2 6 7 2 0 0 0 1 C 2 6 7 16 7 (13.73) † D6 6 0 7 2 2 0 0 1 C2 6 7 6 7 2 1 2 6 0 7 0 0 C 1 2 6 7 4 4 5 1 0 0 0 12 C 22 16 1 1 1 1 D 12 I C 22 diag 1; ; ; 1; ; (13.74) 4 16 4 16 D C11 .1 /2 C C22 .2 /2 1 1 1 1 C11 D I; C22 D diag 1; ; ; 1; ; 4 16 4 16
(13.75)
Here the open problem is to estimate .1 /2 and .2 /2 . This introduction into conditional equation with unknowns–the Gauss–Helmert model–is based on our contribution in Zeit schrift f¨ur Vermessungswesen 109(1984) 32–44, the first contribution which is based on variance-covariance-component estimation. The question is our model can be based on Bayesian estimation is still open. Model studies of our type we performed by S.L Campbell and C.D. Meyer (1979), E. Grafarend and A. Kleusberg (1980), A. Kleusberg and B. Schaffrin(1980), in particular F.R. Helmert (1924), E. Grafarend and A. Kleusberg (1980), A. Kleusberg and E. Grafarend (1981), L.E. Sj¨oberg (1983a,b) and in particular on H. Wolf (1978, 1988).
Chapter 14
Special Problems of Algebraic Regression and Stochastic Estimation multivariate Gauss–Markov model, the n-way classification model, dynamical systems Up to now, we have only considered an “univariate Gauss–Markov model”. Its generalization towards a multivariate Gauss–Markov model will be given in Sect. 14-1. At first, we define a multivariate linear model by Definition 14.1 by giving its first and second order moments. Its algebraic counterpart via multi-variate LESS is subject of Definition 14.2. Lemma 14.3 characterizes the multi-variate LESS solution. Its multivariate Gauss–Markov counterpart is given by Theorem 14.4. In case we have constraints in addition, we define by Definition 14.5 what we mean by “multivariate Gauss–Markov model with constraints”. The complete solution by means of “multivariate Gauss–Markov model with constraints” is given by Theorem 14.5. In contrast, by means of a MINOLESS solution we present the celebrated “n-way classification model”. Examples are given for a 1-way classification model, for a 2-way classification model without interaction, for a 2-way classification model with interaction with all numerical details for computing the reflexive, symmetric generalized inverse .A0 A/ rs . The higher classification with interaction is finally reviewed. We especially deal with the problem how to compute a basis of unbiased estimable quantities from biased solutions. Finally, we take account of the fact that in addition to observational models, we have dynamical system equations. Additionally, we therefore review the Kalman Filter (Kalman–Bucy Filter). Two examples from tracking a satellite orbit and from statistical quality control are given. In detail, we define the stochastic process of type ARMA and ARIMA. A short introduction on “dynamical system theory” is presented. By two examples we illustrate the notions of “a steerable state” and of “observability”. A careful review of the conditions “steerability” by Lemma 14.7 and “observability” by Lemma 14.8 is presented. Traditionally the state differential equation as well as the observational equation are solved by a typical Laplace transformation which we will review shortly. At the end, we focus on the modern theory of dynamic nonlinear models and comment on the theory of chaotic behaviour as its upto date counterpart.
14-1 The Multivariate Gauss–Markov Model: A Special Problem of Probabilistic Regression Let us introduce the multivariate Gauss–Markov model as a special problem of the probabilistic regression. If for one matrix A of dimension O.A/ D n m in a Gauss–Markov model instead of one vector of observations several observation E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 14, © Springer-Verlag Berlin Heidelberg 2012
493
494
14 Special Problems of Algebraic Regression and Stochastic Estimation
vectors yi of dimension O.yi / D n p with identical variance-covariance matrix †ij are given and the fixed array of parameters i has to be determined, the model is referred to as a Multivariate Gauss–Markov model. The standard Gauss–Markov model is then called a univariate Gauss–Markov model. The analysis of variance-covariance is applied afterwards to a multivariate model if the effect of factors can be referred to not only by one characteristic of the phenomenon to be observed, but by several characteristics. Indeed this is the multivariate analysis of variance-covariance. For instance, the effects of different regions on the effect of a species of animals are to be investigated, the weight of the animals can serve as one characteristic and the height of the animals as a second one. Multivariate models can also be setup, if observations are repeated at different times, in order to record temporal changes of a phenomenon. If measurements in order to detect temporal changes of manmade constructions are repeated with identical variance-covariance matrices under the same observational program, the matrix A of coefficients in the Gauss–Markov model stays the same for each repetition and each repeated measurement corresponds to one characteristic. Definition 14.1. (multivariate Gauss–Markov model): Let the matrix A of the order n m be given, called the first order design matrix, let i denote the matrix of the order m p of fixed unknown parameters, and let yi be the matrix of the order n p called the matrix of observations subject to p n. Then we speak of a “multivariate Gauss–Markov model” if 2
Efyi g D Ai
Ofyi g D n p subject to 4 OfAg D n m Ofi g D m p
Dfyi ; yj g D In ıij subject to Ofıij g D p p and p:d:
(14.1) (14.2)
for all i; j 2 f1; : : : ; pg apply for a second order statistics. Equivalent vector and matrix forms are EfYg D A„ and EfvecY g D .Ip ˝ A/vec „
(14.3)
DfvecYg D † ˝ In and d fvecYg D d.† ˝ In /
(14.4)
subject to Of†g D p p; Of† ˝ Ig D np np; OfvecYg D np 1; OfYg D n p OfDfvecYgg D np np; Ofd.vecY/g D np.np C 1/=2:
14-1 The Multivariate Gauss–Markov Model: A Special Problem
495
The matrix DfvecYg builds up the second order design matrix as the Kronecker– Zehfuss product † and In . In the multivariate Gauss–Markov model both the matrices i and ij or „ and † are unknown. An algebraic equivalent of the multivariate linear model would read as given by Definition 14.2. Definition 14.2. (multivariate linear model): Let the matrix A of the order n m be given, called first order algebraic design matrix, let xi denote the matrix of the order of fixed unknown parameter, and yi be the matrix of order n p called the matrix of observations subject to p n. Then we speak of an algebraic multivariate linear model if p X
jjyi Axi jj2Gy D min jjvecY .Ip ˝ A/vecXjj2GvecY D min xi
i D1
(14.5)
establishing a Gy or GvecY -weighted least squares solution of type multivariate LESS. It is a standard solution of type multivariate LESS if 2
OfXg D m p X D .A0 Gy A/ A0 Gy Y 4 OfAg D n m OfYg D n p ;
(14.6)
which nicely demonstrates that the multivariate LESS solution is built on a series of univariate LESS solutions. If the matrix A is regular in the sense of rk.A0 Gy A/ D rkA D m, our multivariate solution reads X D .A0 Gy A/1 A0 Gy Y
(14.7)
excluding any rank deficiency caused by a datum problem. Such a result may be initiated by fixing a datum parameter of type translation (3 parameters at any epoch), rotation (3 parameters at any epoch) and scale (1 parameter at any epoch). These parameters make up the seven parameter conformal group C7 .3/ at any epoch in a three-dimensional Euclidian space (pseudo-Euclidian space). Lemma 14.3. (general multivariate linear model): A general multivariate linear model is multivariate LESS if p; p X
.yi Axi /Gij .yj Axj / D min x
i; j D1
or
(14.8)
496
14 Special Problems of Algebraic Regression and Stochastic Estimation
p; p mX n m; X X
.y˛i a˛ˇ xˇi /Gij .y˛j a˛ xj / D min x
˛D1 ˇ; i; j
(14.9)
or .vecY .Ip ˝ A/vecX/0 .In ˝ Gy .vecY .Ip ˝ A/vecX/ D min : x
(14.10)
An array X; dim X D m p is multivariate LESS, if vecX D Œ.Ip ˝ A/0 .In ˝ GY / .Ip ˝ A/1 .Ip ˝ A/0 .In ˝ GY /vecY
(14.11)
and rk.In ˝ GY / D np:
(14.12)
End of Lemma 14.3 Thanks to the weight matrix Gij the multivariate least squares solution (14.3) differs from the special univariate model (14.7). The analogue to the general LESS model (14.8)–(14.10) of type multivariate BLUUE is given next. Theorem 14.4. (multivariate Gauss–Markov model of type i , in particular .†; In /-BLUUE): A multivariate Gauss–Markov model is .†; In /-BLUUE if the vector vec„ of an array „, dim „ D n p, dim.vec„/ D np 1 of unknowns is estimated by the matrix b i , namely O D Œ.Ip ˝ A/0 .† ˝ In /1 .Ip ˝ A/1 .Ip ˝ A/0 .† ˝ In /1 vecY (14.13) vec„
subject to rk.† ˝ In /1 D np:
(14.14)
† ij denotes the variance-covariance matrix of multivariate effects y˛i for all ˛ D 1; : : : ; n and i D 1; : : : ; p. An unbiased estimator of the variance-covariance matrix of multivariate effects is 2
1 2 O 0 O 6 i D j W O i D n q .yi Ai / .yi Ai / 6 4 1 .yi AOi /0 .yj AOi / i ¤ j W O ij D nq because of
(14.15)
14-1 The Multivariate Gauss–Markov Model: A Special Problem
Ef.yi AOi /0 .yj AOj /g D Efy0 i .I A.A0 A/1 A0 /yj g D ij .n q/:
497
(14.16)
A nice example is given in K.R. Koch (1988 pp. 281–286). For practical applications we need the incomplete multivariate models which do not allow a full rank matrix ij . For instance, in the standard multivariate model, it is assumed that the matrix A of coefficients has to be identical for p vectors yi and the vectors yi have to be completely given. If due to a change in the observational program in the case of repeated measurements or due to a loss of measurements, these assumptions are not fulfilled, an incomplete multivariate model results. If all the matrices of coefficients are different, but if p vectors yi of observations agree with their dimension, the variance-covariance matrix † and the vectors i of first order parameters can be iteratively estimated. For example, if the parameters of first order, namely i , and the parameters of second order, namely ij , the elements of the variancecovariance matrix, are unknown, we may use the hybrid estimation of first and second order parameters of type fi ; ij g as outlined in Chap. 3, namely Helmert type simultaneous estimation of fi ; ij g (B. Schaffrin 1983, p. 101). An important generalization of the standard multivariate Gauss–Markov model taking into account constraints, for instance caused by rank definitions, e.g. the datum problem at r epochs, is the multivariate Gauss–Markov model with constraints which we will treat at the end. Definition 14.5. (multivariate Gauss–Markov model with constraints): If in a multivariate model (14.1) and (14.2) the vectors i of parameters of first order are subject to constraints H i D wi ; (14.17) where H denotes the r m matrix of known coefficients with the restriction H.A0 A/ A0 A D H; rkH D r m
(14.18)
and wi known r 1 vectors, then Efyi g D Ai ;
(14.19)
Dfyi ; yj g D In ij
(14.20)
subject to H i D wi
(14.21)
498
14 Special Problems of Algebraic Regression and Stochastic Estimation
is called “the multivariate Gauss–Markov model with linear homogeneous constraints”. If the p vectors wi are collected in the r p matrix W, dim W D r p; the corresponding matrix model reads EfYg D A„; DfvecYg D † ˝ In ; H„ D W
(14.22)
subject to Of†g D p p; Of† ˝ In g D np np; OfvecYg D np 1; OfYg D n p OfDfvecYgg D np np; OfHg D r m; Of„g D m p; OfWg D r p:
(14.23)
The vector forms EfvecYg D .Ip ˝ A/vec„; DfvecYg D † ˝ In ; vecW D .Ip ˝ H/vec„ are equivalent to the matrix forms. A key result is Lemma 15.6 in which we solve for a given multivariate weight matrix Gij – being equivalent to .† ˝ In /1 – a multivariate LESS problem. Theorem 14.6. (multivariate Gauss–Markov model with constraints): A multivariate Gauss–Markov model with linear homogeneous constraints is .†; In /-BLUUE if O D .Ip ˝ .A0 A/ A0 /vecY C Y.Ip ˝ .A0 A/H0 .H.A0 A/ H0 /1 /vecY vec„ .Ip ˝ .A0 A/ H0 /1 H.A0 A/ A0 /vecY
(14.24)
or O D .A0 A/ .A0 Y C H0 .H.A0 A/ H0 /1 .W H.A0 A/ A0 Y// „
(14.25)
An unbiased estimation of the variance-covariance matrix † is 1 O Y/0 .A„ O Y/ f.A„ nmCr O W/g: O W/0 .H.A0 A/ A0 /1 .H„ C.H„
Q D †
End of Theorem 14.6
14-2 n-Way Classification Models Another special model is called n-way classification model.
(14.26)
14-2 n-Way Classification Models
499
We will define it and show how to solve its basic equations. Namely, we begin with the 1-way classification and to continue with the 2- and 3-way classification models. A specific feature of any classification model is the nature of the specific unknown vector which is either zero or one. The methods to solve the normal equation vary: In one approach, one assumption is that the unknown vector of zeros or ones is a fixed effect. The corresponding normal equations are solved by standard MINOLESS, weighted or not. Alternatively, one assumes that the parameter vector consists of random effects. Methods of variance-covariance component estimation are applied. Here we only follow a MINOLESS approach, weighted or not. The interested reader of the alternative technique of variance-covariance component estimation is referred to our Chap. 3 or to the literature, for instance H. Ahrens and J. Laeuter (1974) or S.R. Searle (1971), my favorite.
14-21 A First Example: 1-Way Classification A one-way classification model is defined by yij D Efyij g C eij D C ˛i C eij 0
0
0
0
0
(14.27)
0
y W D Œy 1 y 2 : : : y p1 y p ; x WD Œ ˛1 ˛2 : : : ˛p1 ˛p
(14.28)
where the parameters and ˛i are unknown. It is characteristic for the model that the coefficients of the unknowns are either one or zero. A MINOLESS (Minimum Norms LEast Squares Solution) for the unknown parameters , ˛i is based on jjy Axjj2I D min and jjxjj2I D min x
x
we built around a numerical example.
Numerical example: 1-way classification Here we will investigate data concerning the investment on consumer durables of people with different levels of education. Assuming that investment is measured by an index number, namely supposing that available data consist of values of this index for seven people: Table 14.1 illustrates a very small example, but adequate for our purposes. A suitable model for these data is yij D C ˛i C eij ;
(14.29)
where yij is investment index of the j th person in the i th education level, is a general mean, ˛i is the effect on investment of the i th level of education and eij
500
14 Special Problems of Algebraic Regression and Stochastic Estimation
Table 14.1 (investment indices of seven people): Level of education 1 (High School incomplete) 2 (High School graduate) 3 (College graduate) Total
Number of people 3 2 2 7
Indices 74, 68, 77 76, 80 85,93
Total 219 156 178 553
is the random error term peculiar to yij . For the data of Table 14.1 there are three educational levels and i takes the values j D 1; 2; : : : ; ni 1; ni where ni is the number of observations in the i th educational level, in our case n1 D 3; n2 D 2 and n3 D 2 in Table 14.1. Our model is the model for the 1-way classification. In general, the groupings such as educational levels are called classes and in our model yij as the response and levels of education as the classes, this is a model we can apply to many situations. The normal equations arise from writing the data of Table 14.1 in terms of our model equation. 2 2 3 3 3 C ˛1 C e11 y11 74 6 C ˛1 C e12 7 6 y12 7 6 68 7 6 6 6 7 7 7 6 C ˛ C e 7 6y 7 6 77 7 1 13 7 6 6 13 7 6 7 6 6 6 7 7 7 6 76 7 D 6 y21 7 D 6 C ˛2 C e21 7 ; O.y/ D 7 1 6 6 6 7 7 7 6 C ˛2 C e22 7 6 y22 7 6 80 7 6 6 6 7 7 7 4 C ˛3 C e31 5 4 y31 5 4 85 5 93 y32 C ˛3 C e32 2
or 2
3 74 6 68 7 6 7 6 77 7 6 7 6 7 6 76 7 D y D 6 7 6 80 7 6 7 4 85 5 93
2
1 61 6 61 6 6 61 6 61 6 41 1
3 100 1 0 07 7 1 0 07 7 7 0 1 07 7 0 1 07 7 0 0 15 001
3 e11 6e 7 6 12 7 6e 7 6 13 7 7 6 C 6 e21 7 D Ax C ey 7 6 6 e22 7 7 6 4 e31 5 e32 2
2
3 6 ˛1 7 6 7 4 ˛2 5 ˛3
and 2
110 61 1 0 6 61 1 0 6 6 A D 61 0 1 6 61 0 1 6 41 0 0 100
3 0 2 3 07 7 07 7 6 ˛1 7 7 7 07; x D 6 4 ˛2 5 ; O .A/ D 7 4; O.x/ D 4 1 7 7 0 7 ˛3 15 1
14-2 n-Way Classification Models
501
with y being the vector of observations and ey the vector of corresponding error terms. As an inconsistent linear equation y ey D Ax; Ofyg D 7 1; OfAg D 7 4; Ofxg D 4 1 we pose the key question: ?What is the rank of the design matrix A? Most notable, the first column is 1n and the sum of the other three columns is also one, namely c2 C c3 C c4 D 1n ! Indeed, we have a proof for a linear dependence: c1 D c2 C c3 C c4 . The rank rkA D 3 is only three which differs from OfAg D 7 4: We have to build in this rank deficiency. For example, we could postulate the condition x4 D ˛3 D 0 eliminating one component of the unknown vector. A more reasonable approach would be based on the computation of the symmetric reflexive generalized inverse such that 0 xlm D .A0 A/ rs A y;
(14.30)
which would guarantee a least squares minimum norm solution or a V; S-BLUMBE solution (Best Linear V-Norm Uniformly Minimum Bias S-Norm Estimation) for V D I; S D I and C (14.31) rkA D rkA0 A D rk.A0 A/ rs D rkA A0 A is a symmetric matrix ) .A0 A/ rs is a symmetric matrix or called :rank preserving identity: !symmetry preserving identity! We intend to compute xlm for our example (Table 14.2). Summary The general formulation of our 1-way classification problem is generated by identifying the vector of responses as well as the vector of parameters (Table 14.3): n D n1 C n2 C : : : C np D
p X
ni
(14.32)
i D1
experimental design W number of observations n D n1 C n2 C : : : C np
number of parameters W 1Cp
rank of the design matrix W 1 C .p 1/ D p (14.33)
502
14 Special Problems of Algebraic Regression and Stochastic Estimation
Table 14.2 (1-way classification, example: normal equation): 3 2 1100 2 3 61 1 0 07 7 27 3 2 23 1111111 6 61 1 0 07 7 6 61 1 1 0 0 0 07 6 7 63 3 0 07 7 7 6 A0 A D 6 1 0 1 0 7D4 6 40 0 0 1 1 0 05 6 7 2 0 2 05 61 0 1 07 7 2002 0000011 6 41 0 0 15 1001 A0 A D DE; OfDg D 4 3; OfEg D 3 4 2 3 732 63 3 07 7 D0 D D 6 4 2 0 2 5 ; E to be determined W 200 0
0
D A A D D0 DE ) .D0 D/1 D0 A0 A D E 2 3 2 3 732 2 3 7322 6 66 30 18 7 3 3 07 4 5 D0 D D 4 3 3 0 0 5 6 4 2 0 2 5 D 30 18 6 2020 18 6 8 200 :compute .D0 D/1 and .D0 D/1 D0 W 2 3 100 1 ) E D .D0 D/1 D0 A0 A D 4 0 1 0 1 5 001 1 2
0 0 1 0 1 0 .A0 A/ rs D E .EE / .D D/ D D
3 0:0833 0 0:0417 0:0417 6 0 0:2500 0:1250 0:1250 7 7 D6 4 0:0417 0:1250 0:3333 0:1667 5 0:0417 0:1250 0:1667 0:3333 2
0 .A0 A/ rs A D
3 0:0833 0:0833 0:0833 0:1250 0:1250 0:1250 0:1250 6 0:2500 0:2500 0:2500 0:1250 0:1250 0:1250 0:1250 7 6 7 4 0:0833 0:0833 0:0833 0:3750 0:3750 0:1250 0:1250 5 0:0833 0:0833 0:0833 0:1250 0:1250 0:3750 0:3750
14-2 n-Way Classification Models
503
2
xlm
3 60:0 6 13:0 7 0 6 7: D .A0 A/ rs A y D 4 18:0 5 29:0
Table 14.3 (1-way classification): y0 WD Œy11 ; y12 : : : y1.n1 1/ y1n1 jy21 y22 : : : y2.n21/ y2n2 j : : : jyp1 yp2 : : : yp.np 1/ y.pnp / x0 WD Œ ˛1 ˛2 : : : ˛p1 ˛p 2 3 1 1 0 0 6 7 6 7 6 1 1 0 0 7 6 7 6 1 0 1 0 7 6 7 6 7 6 7 A WD 6 7 ; O.A/ D n .p C 1/ 6 1 0 1 0 7 6 7 6 7 6 7 6 1 0 0 1 7 6 7 4 5 1 0 0 1 :MINOLESS: jjy Axjj2 D min and jjxjj2 D min x
0 xlm D .A0 A/ rs A y:
(14.34) (14.35)
14-22 A Second Example: 2-Way Classification Without Interaction A two-way classification model without interaction is defined by “MINOLESS” yij k D Efyij k g C eij k D C ˛i C ˇj C eij k
(14.36)
y0 WD Œy0 11 ; y0 21 ; : : : ; y0 p1 1 ; y0 p1 ; y0 12 ; y0 22 ; : : : ; y0 p q1 ; y0 pq x0 D Œ; ˛1 ; : : : ; ˛p ; ˇ1 ; : : : ; ˇq jjy Axjj2I D min x
and jjxjj2 D min : x
(14.37) (14.38)
504
14 Special Problems of Algebraic Regression and Stochastic Estimation
Table 14.4 Level of factors Level of the factor B Level of factor A 2 ::: p
1 n11 n21 ::: np1
2 n12 n22 ::: np2
::: ::: ::: ::: :::
q n1q n2q ::: npq
Table 14.5 Number of seconds beyond 3 minutes, taken to boil 2 quarts of water
X Brand of Y Stove Z W Total no. of. obs mean
A 18 3 6 27 3 9
Make of Pan B C 12 24 9 15 3 18 15 66 2 4 16 21 7 21
Number of total observations mean 54 3 18 9 3 18 18 3 18 27 3 18 108
The factor A appears in p levels and the factor B in g levels. If nij denotes the number of observations under the influence of the i th level of the factor A and the j th level of the factor B, then the results of the experiment can be condensed in Table 14.4. If ˛i and ˇy denote the effects of the factors A and B, the mean of all observations, we receive C˛i Cˇj D Efyij k g for all i 2 f1; : : : ; pg; j 2 f1; : : : ; qg; k 2 f1; : : : ; nij g (14.39) as our model equation. If nij D 0 for at least one pairfi; j g, then our experimental design is called incomplete. An experimental design for which nij is equal of all pairs fij g, is said to be balanced. The data of Table 14.5 describe such a general model of yij k observations in the i th row (brand of stove) and j th column (make of the pan), is the mean, ˛i is the effect of the i th row, ˇj is the effect of the j th column, and eij k is the error term. Outside the context of rows and columns ˛i is equivalently the effect due to the i th level of the ˛ factor and ˇj is the effect due to the j th level of the ˇ factor. In general, we have p levels of the ˛ factor with i D 1; : : : ; p and q levels of the ˇ factor with j D 1; : : : ; q: in our example p D 4 and q D 3. With balanced data every one of the pq cells in Table 14.5 would have one (or n) observations and n 1 would be the only symbol needed to describe the number of observations in each cell. In our Table 14.5 some cells have zero observations and some have one. We therefore need nij as the number of observations in the i th row and j th column. Then all nij D 0 or 1, and the number of observations are the
14-2 n-Way Classification Models
values of ni D
q X j D1
nij ; nj D
505
p X
nij ; n D
i D1
p q X X
nij :
(14.40)
i D1 j D1
Corresponding totals and means of the observations are shown, too. For the observations in Table 14.5 the linear equations of the model are given as follows, 2 3 3 2 3 3 2 2 18 11 1 2 3 y11 e11 6 7 7 7 6 7 6 6 6 12 7 6 y12 7 6 11 1 7 6 7 6 e12 7 6 7 7 6 7 6 7 6 7 6 6 7 7 6 7 ˛1 7 6 7 6 6 24 7 6 y 7 6 11 1 7 6 6 7 6 e13 7 6 7 7 6 7 6 13 7 6 7 6 6 7 7 6 7 ˛2 7 6 7 6 6 9 7 6 y 7 6 1 1 1 7 6 6e 7 7 6 6 7 7 6 7 6 23 7 6 23 7 6 6 7 7 6 7 6 ˛3 7 6 7 6 6 7 7 7 6 7 6 7C6 6 3 7 D 6 y31 7 D 6 1 1 1 7 6 6 e31 7 ; 7 6 6 7 7 6 ˛4 7 6 7 6 7 6 6 7 7 6 7 7 6 7 6 6 15 7 6 y33 7 6 1 1 1 7 6 6 e 7 6 33 7 6 7 7 6 7 6 7 6 6 7 7 6 7 6 ˇ1 7 7 6 7 6 6 6 7 6 y41 7 6 1 11 7 6 6 7 6 e41 7 6 7 7 6 7 6 7 6 6 7 7 6 ˇ2 7 6 7 6 7 6 6 3 7 6 y42 7 6 1 1 1 7 4 5 6 e42 7 4 5 5 4 5 5 4 4 ˇ3 1 1 1 18 y43 e43 where dots represent zeros. In summary, 2
3 2 3 18 11000100 6 12 7 6 11000010 7 6 7 6 7 6 24 7 6 11000001 7 6 7 6 7 6 9 7 6 10100001 7 6 7 6 7 6 7 6 7 6 3 7 D y D 6 10010100 7 6 7 6 7 6 15 7 6 10010001 7 6 7 6 7 6 6 7 6 10001100 7 6 7 6 7 4 3 5 4 10001010 5 18 10001001 2
11 61 1 6 61 1 6 61 0 6 6 A D 61 0 6 61 0 6 61 0 6 41 0 10
00 00 00 10 01 01 00 00 00
01 00 00 00 01 00 11 10 10
2
3 e11 6 7 6 ˛1 7 6 e12 7 6 7 6e 7 6 ˛ 7 6 13 7 6 2 7 6e 7 6 7 6 23 7 6 ˛3 7 6 7 6 7 C 6 e31 7 6 ˛4 7 6 7 6 7 6 e33 7 6 ˇ1 7 6 7 6 7 6 e41 7 4 ˇ2 5 6 7 4 e42 5 ˇ3 e43 2
3
3 2 3 00 7 1 07 6 ˛1 7 6 7 1 17 7 6˛ 7 7 6 27 0 17 6 7 7 6 ˛3 7 0 0 7 ; x D 6 7 ; O.A/ D 9 8; O.x/ D 8 1 7 6 ˛4 7 6 7 0 17 7 6 ˇ1 7 6 7 0 07 7 4 ˇ2 5 1 05 ˇ3 01
506
14 Special Problems of Algebraic Regression and Stochastic Estimation
with y being the vector of observations and ey the vector of corresponding error terms. As an inconsistent linear equation y ey D Ax; Ofyg D 9 1; OfAg D 9 8; Ofxg D 8 1 we pose the key question: ? What is the rank of the design matrix A ?
Most notable, the first column is 1n and the sum of the next 4 columns is also 1n as well as the sum of the remaining 3 columns is 1n , too, namely c2 Cc3 Cc4 Cc5 D 1n and c6 C c7 C c8 D 1n . The rank rkA D 1 C .p 1/ C .q 1/ D 1 C 3 C 2 D 6 is only six which differs from OfAg D 9 8: We have to take advantage of this rank deficiency. For example, we could postulate the condition x5 D 0 and x8 D 0 eliminating two components of the unknown vector. A more reasonable approach would be based on the computation of the symmetric reflexive generalized inverse such that 0 xlm D .A0 A/ rs A y;
(14.41)
which would guarantee a least square minimum norm solution or a I; I-BLUMBE solution (Best Linear I-Norm Uniformly Minimum Bias I-Norm Estimation) and 0 rkA D rkA0 A D rk.A0 A/ rs D rkA
(14.42)
A0 A is a symmetric matrix ) .A0 A/ rs is a symmetric matrix or called :rank preserving identity: :symmetry preserving identity (Table 14.6): Summary The general formulation of our 2-way classification problem without interaction is generated by identifying the vector of responses as well as the vector of parameters (Table 14.7). subject to c2 C : : : C cp D 1; cpC1 C : : : C cq D 1 ni D
q X j D1
nij ; nj D
p X i D1
nij ; n D
p; q X i D1; j D1
nij
(14.43)
14-2 n-Way Classification Models
Table 14.6 equation
507
2-way classification without interaction, example: normal 2 2
11 61 1 6 60 0 6 6 60 0 0 AAD6 60 0 6 61 0 6 40 1 00
11 10 01 00 00 00 00 11
3 111 0 0 07 7 0 0 07 7 7 0 0 07 7 1 1 17 7 1 0 07 7 0 1 05
11 00 00 11 00 10 00 01
001 2
93 63 3 6 61 0 6 6 62 0 D6 63 0 6 63 1 6 42 1 41 A0 A D DE; 2 931 63 3 0 6 61 0 1 6 6 D D 62 0 0 6 63 0 0 6 43 1 0 210
˛1 ˛2 ˛3 ˛4 ˇ1 ˇ2 ˇ3
6 61 6 61 6 61 6 61 6 6 61 6 61 6 61 6 61 4 1
123 000 100 020 003 011 001 111
1 1 1 0 0 0 0 0 0
32 11 00 10 11 30 02 00
0 0 0 1 0 0 0 0 0 3
0 0 0 0 1 1 0 0 0
#
0 0 0 0 0 0 1 1 1 "
1 0 0 0 1 0 1 0 0
4 17 7 17 7 7 17 7 17 7 07 7 05 4
OfDg D 8 6; OfEg D 6 8 3 232 0 1 17 7 0 0 07 7 7 2 1 0 7 ; E to be determi ned 7 0 1 17 7 1 3 05 002
D0 A0 A D D0 DE ) .D0 D/1 D0 A0 A D E 2 3 133 45 14 29 44 28 6 45 21 4 8 15 11 7 6 7 6 7 6 14 4 3 3 3 2 7 0 DDD6 7 6 29 8 3 10 11 4 7 6 7 4 44 15 3 11 21 8 5 28 11 2 4 8 10 compute .D0 D/1 and .D0 D/1 D0
0 1 0 0 0 0 0 1 0
#
0 0 1 1 0 1 0 0 1 "
3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5
508
14 Special Problems of Algebraic Regression and Stochastic Estimation
E D .D0 D/1 D0 A0 A 3 1:0000 0:0000 0:0000 0:0000 1:0000 0:0000 0:0000 1:0000 6 0:0000 1:0000 0:0000 0:0000 1:0000 0:0000 0:0000 0:0000 7 7 6 7 6 6 0:0000 0:0000 1:0000 0:0000 10:0000 0:0000 0:0000 0:0000 7 D6 7 6 0:0000 0:0000 0:0000 1:0000 1:0000 0:0000 0:0000 0:0000 7 7 6 4 0:0000 0:0000 0:0000 0:0000 0:0000 1:0000 0:0000 1:0000 5 0:0000 0:0000 0:0000 0:0000 0:0000 0:0000 1:0000 1:0000 2
0 0 0 1 0 .A0 A/ rs D E .EE / .D D/ D
3 0:0665 0:0360 0:1219 0:0166 0:0360 0:0222 0:0748 0:0305 6 0:0360 0:3112 0:2327 0:0923 0:0222 0:0120 0:0822 0:0582 7 7 6 6 0:1219 0:2327 0:8068 0:2195 0:2327 0:1240 0:1371 0:1392 7 7 6 7 6 6 0:0166 0:0923 0:2195 0:4208 0:0923 0:0778 0:1020 0:0076 7 D6 7 6 0:0360 0:0222 0:2327 0:0923 0:3112 0:0120 0:0822 0:0582 7 7 6 6 0:0222 0:0120 0:1240 0:0778 0:0120 0:2574 0:1417 0:0935 7 7 6 4 0:0748 0:0822 0:1371 0:1020 0:0822 0:1417 0:3758 0:1593 5 2
0:0305 0:0582 0:1392 0:0076 0:0582 0:0935 0:1593 0:2223
2
0:0526 6 0:2632 6 6 0:0132 6 6 6 0:1535 D6 6 0:0702 6 6 0:2675 6 4 0:1491 0:0658
.A0 A/ rs A 0:1053 0:1930 0:0263 0:0263 0:1404 0:1316 0:3684 0:1316
0:0000 0:3333 0:2500 0:0833 0:0000 0:0833 0:1667 0:2500
0:1579 0:2105 0:7895 0:2105 0:2105 0:0526 0:0526 0:0526
0:1053 0:1404 0:0263 0:3596 0:1404 0:2018 0:0351 0:1316
0:0526 0:0702 0:2368 0:4298 0:0702 0:1491 0:0175 0:1842
2
xlm
0:0526 0:0702 0:0132 0:1535 0:2632 0:2675 0:1491 0:0658
3 5:3684 6 10:8421 7 6 7 6 6:1579 7 6 7 6 7 1:1579 6 7 0 D .A0 A/ A y D 6 7: rs 6 1:8421 7 6 7 6 0:2105 7 6 7 4 4:2105 5 9:7895
0:1053 0:1404 0:0263 0:0263 0:1930 0:1316 0:3684 0:1316
3 0:0000 0:0000 7 7 0:2500 7 7 7 0:0833 7 7 0:3333 7 7 0:0833 7 7 0:1667 5 0:2500
14-2 n-Way Classification Models
509
Table 14.7 (2-way classification without interaction): y0 WD Œy0 11 ; : : : ; y0 p1 ; y0 12 ; : : : ; y0 pq1 y0 pq x0 WD Œ; ˛1 ; : : : ; ˛p ; ˇ1 ; : : : ; ˇq A WD Œ1n ; c2 ; : : : ; cp ; cpC1 ; : : : ; cq experimental design W number of rank of the number of observations parameters W design matrix W p;q P nij 1CpCq 1 C .p 1/ C .q 1/ D p C q 1 nD i;j D1
(14.44) :MINOLESS: jjy Axjj2 D min x
and
jjxjj2 D min
(14.45)
0 xlm D .A0 A/ rs A y:
14-23 A Third Example: 2-Way Classification with Interaction A two-way classification model with interaction is defined by “MINOLESS” yij k D Efyij k g C eij k D C ˛i C ˇj C .˛ˇ/ij C eij k
(14.46)
subject to i 2 f1; : : : ; pg; j 2 f1; : : : ; qg; k 2 f1; : : : ; nij g 0
y WD Œy0 11 ; : : : ; y0 p1 ; y0 p ; y0 12 y0 22 ; : : : y0 nq1 y0 pq x0 WD Œ; ˛1 ; : : : ; ˛p ; ˇ1 ; : : : ; ˇq ; .˛ˇ/11 ; : : : ; .˛ˇ/pq jjy
Axjj2I
D
min andjjxjj2I x
D min : x
(14.47) (14.48)
It was been in the second example on 2-way classification without interaction that the effects of different levels of the factors A and B were additive. An alternative model is a model in which the additivity does not hold: the observations are not independent of each other. Such a model is called a model with interaction between the factors whose effect .˛ˇ/ij has to be reflected by means of 2
i 2 f1; : : : ;pg C ˛i C ˇj C .˛ˇ/ij D Efyij k g for all 4 j 2 f1; : : : ;qg k 2 f1; : : : ;nij g
(14.49)
510
14 Special Problems of Algebraic Regression and Stochastic Estimation
like our model equation. As an example we consider by means of Table 14.8 a plant breeder carrying out a series of experiments with three fertilizer treatments on each of four varieties of grain. For each treatment-by-variety combination when he or she plants several 40 40 plots. At harvest time she or he finds that many of the plots have been lost due to being wrongly ploughed up and all he or she is left with are the data of Table 14.8. With four of the treatment-in-variety combinations there are no data at all, and with the others there are varying numbers of plots, ranging from 1 to 4 with a total of 18 plots in all. Table 14.8 shows the yield of each plot, the total yields, the number of plots in each total and the corresponding mean, for each treatment-variety combination having data. Totals, numbers of observations and means are also shown for the three treatments, the four varieties and for all 18 plots. The symbols for the entries in the table, are also shown in terms of the model. The equations of a suitable linear model for analyzing data of the nature of Table 14.8 is for yij k as the kth observation in the i th treatment and j th variety. In our top table, is the mean, ˛i is the effect of the i th treatment, ˇj is the effect of the j th variety, .˛ˇ/ij is the interaction effect for the i th treatment and the j th variety and `ij k is the error term. With balanced data every one of pq cells of our table would have n observations. In addition there would be pq levels of the .˛ˇ/ factor, the interaction factor. However, with unbalanced data, when some cells have no observations they are only as many .˛ˇ/ij -levels in the data as there are nonempty cells. Let the number of such cells be s (s = 8 in Table 14.8). Then, if nij Table 14.8 Weight of grain form 40 40 trial plots
Treatment 1
2
3
Total
1 8 13 9 30 y11 .n11 / 6 12 18 -
48
2
Variety 3 12 – 12 y31 .n13 /
4 7 11 – 18 y41 .n14 /
-
12 14 26 9 7
-
-
14 16
– 16 42
– 30 42
10 14 11 13 48 66
Totals
60
94 198
14-2 n-Way Classification Models
511
is the number of observations in the .i; j /th cell of type “treatment i and variety j ”, s the number of cells in which nij ¤ 0, in all other cases nij > 0. For these cells yij D
nij X
yij k ; yNij D yij =nij
(14.50)
kD1
is the total yield in the .i; j /th cell, and yNij is the corresponding mean. Similarly, yD
p X
yi D
i D1
q X
yj D
j D1
p;q X
p;q;nij
X
yij D
i;j D1
yij k
(14.51)
i D1;j D1;kD1
is the total yield for all plots, the number of observations called “plots” therein being nD
p X i D1
ni D
q X
nj D
p;q X
nij :
(14.52)
i;j
j D1
We shall continue with the corresponding normal equations being derived from the observational equations. ‚
.˛ˇ/ …„
ƒ
2 3 2 3 2 ˛1 ˛2 ˛3 ˇ1 ˇ2 ˇ3 ˇ4 11 13 14 21 22 31333 34 3 2 8 e111 11 1 1 y111 3 2 6 13 7 6 y 7 6 1 1 1 1 7 6e 7 6 7 6 112 7 6 6 112 7 7 7 6 6 7 6 7 6 7 7 6 6 9 7 6 y113 7 6 1 1 1 1 7 6 ˛1 7 6 e113 7 7 6 6 7 6 7 6 7 7 6 6 12 7 6 y131 7 6 1 1 1 7 6 ˛2 7 6 e131 7 7 6 6 7 6 7 6 7 7 6 6 7 7 6 y141 7 6 1 1 1 7 6 ˛3 7 6 e141 7 7 6 6 7 6 7 6 7 7 6 6 11 7 6 y142 7 6 1 1 1 7 6 ˇ1 7 6 e142 7 7 6 6 7 6 7 6 7 7 6 6 6 7 6 y211 7 6 1 1 1 1 7 6 ˇ2 7 6 e211 7 7 6 6 7 6 7 6 7 7 6 6 12 7 6 y 7 6 1 1 1 1 7 6 ˇ 7 6 e 7 3 7 6 7 6 212 7 6 6 212 7 7 6 7 6 6 7 6 7 6 7 7 6 6 12 7 6 y221 7 6 1 1 1 1 7 6 ˇ4 7 6 e221 7 7C6 6 7D6 7D6 7 76 6 14 7 6 y222 7 6 1 1 1 1 7 6 .˛ˇ/11 7 6 e222 7 7 6 6 7 6 7 6 7 7 6 6 9 7 6 y321 7 6 1 1 1 1 7 6 .˛ˇ/13 7 6 e321 7 7 6 6 7 6 7 6 7 7 6 6 7 7 6 y322 7 6 1 1 1 1 7 6 .˛ˇ/14 7 6 e322 7 7 6 6 7 6 7 6 7 7 6 6 14 7 6 y331 7 6 1 1 1 1 7 6 .˛ˇ/ 7 6 e331 7 21 7 6 7 6 7 6 7 6 7 6 7 6 6 16 7 6 y 7 6 1 1 1 1 7 6 7 6 7 6 332 7 6 7 6 .˛ˇ/22 7 6 e332 7 7 6 6 7 6 7 6 7 7 6 6 10 7 6 y341 7 6 1 1 1 1 7 6 .˛ˇ/32 7 6 e341 7 7 6 6 7 6 7 6 7 7 6 6 14 7 6 y342 7 6 1 1 1 1 7 4 .˛ˇ/33 5 6 e342 7 6 7 6 7 6 7 6 7 4 11 5 4 y343 5 4 1 1 1 1 5 4 e343 5 .˛ˇ/34 y344 e344 13 1 1 1 1
512
14 Special Problems of Algebraic Regression and Stochastic Estimation
where the dots represent zeros. 2
18 6 6 6 6 4 6 6 6 8 6 6 5 6 6 4 6 6 3 6 6 6 6 6 6 3 6 6 1 6 6 2 6 6 6 2 6 6 2 6 6 2 6 4 2 4
6 6 3 1 2 3 1 2
4 4 2 2 2 2
8 8 2 2 4 2 2 4
5 3 2 5 3 2
4 2 2 4 2 2
3 1 2 3 1 2
6 2 4 6 2 4
3 3 3 3
1 1 1 1
2 2 2 2
2 2 2 2
2 2 2 2
2 2 2 2
2 2 2 2
32 3 3 2 3 2 4 y 198 6 7 7 6 7 6 7 7 6 ˛1 7 6 y1 7 6 60 7 6 7 6 6 7 7 7 6 ˛2 7 6 y2 7 6 44 7 7 76 7 7 6 7 6 4 7 6 ˛3 7 6 y3 7 6 94 7 76 7 7 6 7 6 6 7 7 6 7 6 7 7 6 ˇ1 7 6 y1 7 6 48 7 6 ˇ2 7 6 y2 7 6 42 7 7 76 7 7 6 7 6 6 7 7 6 7 6 7 7 6 ˇ3 7 6 y3 7 6 42 7 76 7 7 6 7 6 4 7 6 ˇ4 7 6 y4 7 6 66 7 76 7 A0 Ax` D A0 y: 7D6 7D6 6 .˛ˇ/11 7 6 y11 7 6 30 7 7 76 7 7 6 7 6 6 7 7 6 7 6 7 7 6 .˛ˇ/13 7 6 y13 7 6 12 7 6 .˛ˇ/ 7 6 y 7 6 18 7 7 14 7 76 7 6 14 7 6 76 7 7 6 7 6 7 6 .˛ˇ/21 7 6 y21 7 6 18 7 76 7 7 6 7 6 7 6 .˛ˇ/22 7 6 y22 7 6 26 7 76 7 7 6 7 6 6 7 7 6 7 6 7 7 6 .˛ˇ/32 7 6 y32 7 6 16 7 5 4 .˛ˇ/33 5 4 y33 5 4 30 5 4
.˛ˇ/34
y34
48
Now we again pose the key question: ?What is the rank of the design matrix A? The first column is 1n and the sum of other columns is c2 C c3 C c4 D 1n and c5 Cc6 Cc7 Cc8 D 1n . How to handle the remaining sum .˛ˇ/ : : : of our incomplete model? Obviously, we experience rkŒc9 ; : : : ; c16 D 8, namely rkŒ.˛ˇ/ij D 8 for .˛ˇ/ij 2 f11 ; : : : ; 34 g:
(14.53)
As a summary, we have computed rk.A0 A/ D 8, a surprise for our special case. A more reasonable approach would be based on the computation of the symmetric reflexive generalized inverse such that 0 x`m D .A0 A/ rs A y;
(14.54)
which would assure a minimum norm, least squares solution or a I; I-BLUMBE solution (Best Linear I-Norm Uniformly Minimum Bias I-Norm Estimation) and C rkA D rkA0 A D rk.A0 A/ rs D rkA
A0 A is a symmetric matrix ) .A0 A/ rs is a symmetric matrix or called
(14.55)
14-2 n-Way Classification Models
513
Table 14.9 (2-way classification with interaction): y0 WD Œy0 11 ; : : : ; y0 p1 ; y0 12 ; : : : ; y0 pq1 ; y0 pq x0 WD Œ; ˛1 ; : : : ; ˛p ; ˇ1 ; : : : ; ˇq ; .˛ˇ/11 ; : : : ; .˛ˇ/pq A WD Œ1n ; c1 ; : : : ; cp ; cpC1 ; : : : ; cq ; c11 ; : : : ; cpq subject to c2 C : : : C cp D 1; cpC1 C : : : C cq D 1;
p;q X
cij D .p 1/.q 1/
i;j D1
nD
p X
ni D
i D1
q X
nj D
p;q X
j D1
ni;j
i;j
:rank preserving identity: !symmetry preserving identity! Table 14.9 summarizes all the details of 2-way classification with interaction. In general, for complete models our table lists the general number of parameters and the rank of the design matrix which differs from our incomplete design model. experimental design W number of observations nD
p;q P
number of parameters W
rank of the design matrix W
1 C .p 1/ C .q 1/C C.p 1/.q 1/ i;j (14.56) jjy Axjj2 D min and jjxjj2 D min (14.57) ni;j
1 C p C q C pq
x
x
0 x`m D .A0 A/ rs A y:
(14.58)
For our key example we get from the symmetric normal equation A0 Ax` D A0 y the solution 0 x`m D .A0 A/ rs A y given A0 AandA0 y OfA0 Ag D 16 16; Of; : : : ; .˛ˇ/31 g D 16 1; OfA0 yg D 16 1 A0 A D DE; OfDg D 18 12; OfEg D 12 16
514
14 Special Problems of Algebraic Regression and Stochastic Estimation
2
3 63 6 60 6 60 6 6 63 6 60 6 60 6 60 DD6 63 6 60 6 6 60 6 60 6 60 6 60 6 40 0
122 120 002 000 002 000 100 020 000 100 020 002 000 000 000 000
22 00 20 02 00 22 00 00 00 00 00 00 20 02 00 00
3 24 0 07 7 0 07 7 2 47 7 7 0 07 7 0 07 7 2 07 7 0 47 7 0 07 7 0 07 7 7 0 07 7 0 07 7 0 07 7 0 07 7 2 05 04
D0 A0 A D D0 DE ) .D0 D/1 D0 A0 A D E E D .D0 D/1 D0 A0 A D
2
3 1:0000 1:0000 0:0000 0:0000 1:0000 0:0000 0:0000 6 1:0000 1:0000 0:0000 0:0000 0:0000 0:0000 1:0000 7 6 7 6 1:0000 1:0000 0:0000 0:0000 0:0000 0 0:0000 7 6 7 6 7 6 1:0000 0:0000 1:0000 0:0000 1:0000 0:0000 0:0000 7 6 7 6 1:0000 0 1:0000 0:0000 0 1:0000 0:0000 7 6 7 6 1:0000 0 0 1:0000 0 1:0000 0 7 6 7 4 1:0000 0:0000 0:0000 1:0000 0 0:0000 1:0000 5 1:0000
0
0:0000 1:0000
0
0:0000
0
0 x`m D .A0 A/ rs A y D D Œ6:4602; 1:5543; 2:4425; 2:4634; 0:6943; 1:0579; 3:3540; 1:3540; 1:2912; 0:6315; 0:3685; 0:5969; 3:0394; 1:9815; 2:7224; 1:722450 :
14-24 Higher Classifications with Interaction If we generalize 1-way and 2-way classifications with interactions we arrive at a higher classification of type C ˛i C ˇj C k C : : : C
14-2 n-Way Classification Models
515
C .˛ˇ/ij C .˛ /i k C .˛ˇ/j k C : : : C .˛ˇ /ij k C : : : D Efyij k:::` g
(14.59)
for all i 2 f1; : : : ; pg; j 2 f1; : : : ; qg; k 2 f1; : : : ; rg; : : : ; ` 2 f1; : : : ; nij k ; : : :g: An alternative stochastic model assumes a fully occupied variance–covariance matrix of the observations, namely Dfyg EfŒy EfygŒy Efyg0 g †ij . Variance–covariance estimation techniques are to be applied. In addition, a mixed model for the effects are be applied, for instance of type Efyg D A C CEfzg
(14.60)
Dfyg D CDfzgC0 :
(14.61)
Here we conclude with a discussion of what is unbiased estimable: Example (1-way classification): If we depart from the model Efyij g D C ˛i and note rkA D rkA0 A D 1 C .p 1/ for i 2 f1; : : : ; pg; namely rkA D p; we realize that the 1 C p parameters are not unbiased estimable: The first column results from the sum of the other columns. It is obvious the difference ˛i ˛1 is unbiased estimable. This difference produces a column matrix with full rank.
Summary ˛i ˛1 quantities are unbiased estimable
Example (2-way classification with interaction): Our first statement relates to the unbiased estimability of the terms ˛1 ; : : : ; ˛p and ˇ1 ; : : : ; ˇq : obviously, the differences ˛i ˛1 and ˇj ˇ1 for i; j < 1 are unbiased estimable. The first column is the sum of the other terms. For instance, the second column can be eliminated which is equivalent to estimating ˛i ˛1 in order to obtain a design matrix of full column rank. The same effect can be seen with the other effect ˇj for the properly chosen design matrix: ˇj ˇ1 for all j > 1 is unbiased estimable! If we add the pq effect .˛ˇ/ij of interactions, only those interactions increase the rank of the de-sign matrix by one respectively, which refer to the differences ˛i ˛1 and ˇj ˇ1 , altogether .p 1/.q 1/ interactions. To the effect .˛ˇ/ij of the interactions are estimable, pq .p 1/.q 1/ D p C q 1 constants may be added, that is to the interactions .˛ˇ/i1 ; .˛ˇ/i 2 ; : : : ; .˛ˇ/i q with
516
14 Special Problems of Algebraic Regression and Stochastic Estimation
i 2 f1; : : : ; pg the constants .˛ˇ1 /; .˛ˇ2 /; : : : ; .˛ˇq / and to the interactions .˛ˇ/2j ; .˛ˇ/3j ; : : : ; .˛ˇ/pj with j 2 f1; : : : ; qg the constants .˛ˇ2 /; .˛ˇ3 /; : : : ; .˛ˇp /: The constants .˛ˇ1 / need not to be added which can be interpreted by .˛ˇ1 / D .˛ˇ1 /. A numerical example is p D 2; q D 2 x0 D Œ; ˛1 ; ˛2; ˇ1 ; ˇ2 ; .˛ˇ/11 ; .˛ˇ/12 ; .˛ˇ/22 : Summary ˛ D ˛2 ˛1 ; ˇ D ˇ2 ˇ1 for all i 2 f1; 2g; j 2 f1; 2g
(14.62)
as well as .˛ˇ1 /; .˛ˇ2 /; .˛ˇ2 /
(14.63)
are unbiased estimable. At the end we review the number of parameters and the rank of the design matrix for a 3-way classification with interactions according to the following example. 3-way classification with interactions
experimental design W number of observations
nD
p;q;r P i;j;kD1
nij k
number of parameters W 1 C p C q C rC Cpq C pr C qrC Cpqr
rank of the design matrix W 1 C .p 1/ C .q 1/ C .r 1/ C.p 1/.q 1/C C.p 1/.r 1/ C .q 1/.r 1/C .p 1/.q 1/.r 1/ D pqr (14.64)
14-3 Dynamical Systems
517
14-3 Dynamical Systems There are two essential items in the analysis of dynamical systems: First, there exists a “linear or linearized observational equation y.t/ D Cz.t/” connecting a vector of stochastic observations y to a stochastic vector z of so called “state variables”. Second, the other essential is the characteristic differential equation of type “z0 .t/ D F .t; z.t//”, especially linearizied “z0 .t/ D Az.t/ ”, which maps the first derivative of the “state variable” to the “state variable” its off. Both, y.t/andz.t/ are functions of a parameter, called “time t”. The second equation describes the time development of the dynamical system. An alternative formulation of the dynamical system equation is “z.t/ D Az.t 1/ ”. Due to the random nature “of the two functions” y.t/ D Cz.t/ and z0 D Az the complete equations read " Efy.t/g D CEfzg and
Vfey .t1 /; ey .t2 /g D †y .t1 ; t2 / Dfey .t/g D †y .t/; "
Dfez .t/g D †z .t/;
0
Efz .t/g D AEfz.t/g and
Vfez .t1 /; ez .t2 /g D †z .t1 ; t2 /
(14.65)
(14.66)
Here we only introduce “the time invariant system equations” characterized by A.t/ D A: z0 .t/ abbreviates the functional d z.t/=dt: There may be the case that the variance-covariance functions †y .t/ and †z .t/ do not change in time: †y .t/ D †y ; †z .t/ D †z equal a constant. Various models exist for the variance-covariance functions †y .t1 ; t2 / and †z .t1 ; t2 /, e.q. linear functions as in the case of a Gauss process or a Brown process †y .t2 t1 /, †z .t2 t1 /. The analysis of dynamic system theory was initiated by R.E. Kalman (1960) and by R.E. Kalman and R.S. Bucy (1961): “KF” stand for “Kalman filtering”. Example 1 (tracking a satellite orbit): Tracking a satellite’s orbit around the Earth might be based on the unknown state vector z.t/ being a function of the position and the speed of the satellite at time t with respect to a spherical coordinate system with origin at the mass center of the Earth. Position and speed of a satellite can be measured by GPS, for instance. If distances and accompanying angles are measured, they establish the observation y.t/. The principles of space-time geometry, namely mapping y.t/ into z.t/, would be incorporated in the matrix C while ey .t/ would reflect the measurement errors at the time instant t. The matrix A reflects the situation how position and speed change in time according the physical lows governing orbiting bodies, while ez would allow for deviation from the lows owing to factors as none uniformity of the Earth gravity field. Example 2 (statistical quality control): Here the observation vector y.t/ is a simple approximately normal transformation of the number of derivatives observed in a sample obtained at time t, while y1 .t/
518
14 Special Problems of Algebraic Regression and Stochastic Estimation
and y2 .t/ represent respectively the refractive index of the process and the drift of the index. We have the observation equation and the system equations y.t/ D z1 .t/ C ey .t1 / and
z1 .t/ D z2 .t/ C ez1 z2 .t/ D z2 .t 1/ C ez2 :
In vector notation, this system of equation becomes z.t/ D Az.t 1/ C ez namely " z.t/ D
z1 .t/ z2 .t/
"
# ; ez D
11
#"
01
e z1
#
e z2
" ; AD
01
#
01
do not change in time. If we examine y.t/ y.t 1/ for this model, we observe that under the assumption of constant variance, namely ey .t/ D ey and ez .t/ D ez , the autocorrelation structure of the difference is identical to that of an ARIMA (0,1,1) process. Although such a correspondence is sometimes easily discernible, we should in general not consider the two approaches to be equivalent. A stochastic process is called an ARMA process of the order .p; q/ if z.t/ D a1 z.t 1/ C a2 z.t 2/ C : : : C ap z.t p/ D D b0 u.t/ C b1 u.t 1/ C b2 u.t 2/ C : : : C bq u.t q/ forallt 2 f1; : : : ; T g
(14.67)
also called a mixed autoregressive/moving – average process. In practice, most time series are non-stationary. In order to fit a stationary model, it is necessary to remove non-stationary sources of variation. If the observed time series is non-stationary in the mean, then we can use the difference of the series. Differencing is widely used in all scientific disciplines. If z.t/; t 2 f1; : : : ; T g, is replaced by r d z.t/, then we have a model capable of describing certain types of non-stationary signals. Such a model is called an “integrated model” because the stationary model that fits to the difference data has to the summed or “integrated” to provide a model for the original non-stationary data. Writing W .t/ D r d z.t/ D .1 B/d z.t/
(14.68)
for all t 2 f1; : : : ; T g the general autoregressive integrated moving-average (ARIMA) process is of the form ARIMA
14-3 Dynamical Systems
519
W .t/ D ˛1 W .t 1/ C : : : C ˛p W .t p/ C b0 u.t/ C : : : C bq u.t q/ (14.69) or ˆ.B/W.t/ D .B/
(14.70)
d
ˆ.B/.1 B/ z.t/ D .B/u.t/:
(14.71)
Thus we have an ARMA .p; q/ model for W .t/, t 2 f1; : : : ; T g, while the model for W .t/ describing the d th differences for z.t/ is said to be an ARIMA process of order .p; d; q/. For our case, ARIMA (0,1,1) means a specific process p D 1; r D 1, q D 1. The model for z.t/; t 2 f1; : : : ; T g, is clearly non-stationary, as the AR operators ˆ.B/.1 q/d has d roots on the unit circle since patting B D 1 makes the AR operator equal to zero. In practice, first differencing is often found to the adequate to make a series stationary, and accordingly the value of d is often taken to be one. Note the random part could be considered as an ARIMA (0,1,0) process. It is a special problem of time-series analysis that the error variances are generally not known a priori. This can be dealt with by guessing, and then updating them in an appropriate way, or, alternatively, by estimating then forming a set of data over a suitable fit period. In the state space modeling, the prime objective is to predict the signal in the presence of noise. In other words, we want to estimate the m1 state vector Efz.t/g which cannot usually be observed directly. The Kalman filter provides a general method for doing this. It consists of a set of equations that allow us to update the estimate of Efz.t/g when a new observation becomes available. We will outline this updating procedure with two stages, called • The prediction stage • The updating stage. Suppose we have observed a univariate time series up to time .t 1/, and that Efz.t 1/g is “the best estimator” Efz.t 1/g based on information up to this time. For instance, “best” is defined as an PLUUP estimator. Note that z.t/, z.t 1/ etc is a random variable. Further suppose that we have evaluated the m m variancecovariance matrix of Efz.t 1/g which we denote by P ft 1g. The first stage called the prediction stage is concerned with forecasting Efz.t/g from data up to the time .t 1/, and we denote the resulting estimator in a obvious notation by Efz.t/jz.t 1/g. Considering the state equations where Dfez .t 1/g is still unknown at time .t 1/ the obvious estimator for Efz.t/g is given by
2
b
2
Efz.t/jOz.t 1/g D G.t/Efz.t 1/g
(14.72)
and the variance-covariance matrix Vftjt 1g D G.t/Vft 1gG 0 C W ftg
(14.73)
520
14 Special Problems of Algebraic Regression and Stochastic Estimation
called prediction equations. The last equations follows from the standard results on variance-covariance matrices for random vector variables. When the new observation at time t, namely when y.t/ has been observed the estimator for Efz.t/g can be modified to take account of the extra information. At time .t 1/, the best forecast of y.t/ is given by h0 Efz.t/jOz.t 1/g so that the prediction error is given by eOy .t/ D y.t/ h0 .t/Efz.t/jz.t 1/g:
(14.74)
This quantity can be used to update the estimate of Efz.t/g and its variancecovariance matrix. EfOz.t/g D Efz.t/jOz.t 1/g C K.t/Oey
(14.75)
Vftg D Vftjt 1g K.t/h0 .t/Vftjt 1g 0
0
Vftg D Vft; t 1gh .t/=Œh .t/Vftjt 1gh C
(14.76) n2 :
(14.77)
Vftg is called the gain matrix. In the univariate case, K.t/ is just a vector of size .m 1/. The previous equations constitute the second updating stage of the Kalman filter, thus they are called the updating equations. A major practical advantage of the Kalman filter is that the calculations are re-cursive so that, although the current estimates are based on the whole past history of measurements, there is no need for an ever expanding memory. Rather the near estimate of the signal is based solely on the previous estimate and the latest observations. A second advantage of the Kalman filter is that it converges fairly quickly when there is a constant underlying model, but can also follow the movement of a system where the underlying model is evolving through time. For special cases, there exist much simpler equations. An example is to consider the random walk plus noise model where the state vector z.t/ consist of one state variable, the current level .t/. It can be shown that the Kalman filter for this model in the steady state case for t ! 1 reduces the simple recurrence relation .t/ O D .t O 1/ C ˛Oe.t/; where the smoothing constant ˛ is a complicated function of the signal-to-noise ratio w2 =n2 . Our equation is simple exponential smoothing. When w2 tends to zero, .t/ is a constant and we find that ˛ ! 0 would intuitively be expected, while as w2 =n2 becomes large, then ˛ approaches unity. For a multivariate time series approach we may start from the vector-valued equation of type Efy.t/g D CEfzg; (14.78) where C is a known nonsingular m m matrix. By LESS we are able to predict
b
1
Efz.t/g D C1 E fy.t/g:
(14.79)
14-3 Dynamical Systems
521
Once a model has been put into the state-space form, the Kalman filter can be used to provide estimates of the signal, and they in turn lead to algorithms for various other calculations, such as making prediction and handling missing values. For instance, forecasts may be obtained from the state-space model using the latest estimates of the state vector. Given data to time N, the best estimate of the state vector is written Efz.N /g and the h-step-ahead forecast is given by
1
1
3
Efy.N /g D h0 .N C h/Efz.N C h/g D h.N C h/G.N C h/GfN C h 1g : : : G.N C 1/Efz.N /g
1
(14.80)
where we assume h .N C h/ and future values of G.t/ are known. Of course, if G.t/ is a constant, say G, then we get
2
1
Efy.N jh/g D h0 .N C h/G h Efz.N /g:
(14.81)
If future values of h.t/orG.t/ are not known, then they must themselves be forecasted or otherwise guessed. Up to this day a lot of research has be been done on nonlinear models in prediction theory relating to state-vectors and observational equations. There are excellent reviews, for instance by P. H. Frances (1988), C. W. J. Granger and P. Newbold (1986), A. C. Harvey (1993), M. B. Priestley (1981, 1988) and H. Tong (1990). C. W. Granger and T. Ter¨asvirta (1993) is a more advanced text. In terms of the dynamical system theory we regularly meet the problem that the observational equation is not of full column rank. A state variable leads to a relation between the system input-output solution, especially a statement on how a system is developing in time. Very often it is reasonable to switch from a state variable, in one reference system to another one with special properties. Let T this time be a similarity transformation, namely described by a non-singular matrix of type z WD Tz z D T1 z
(14.82)
d z D T1 ATz C T1 Bu.t/; z0 D T1 z0 dt y.t/ D CTz .t/ C Du.t/:
(14.83) (14.84)
The key question is now whether to the characteristic state equation there belongs a transformation matrix such that for a specific matrix A and B there exists an integer number r, 0 r < n, of the form "
A D
A11 A12 0 A22
"
#
; OfA g D
r r
r .n r/
.n 1/ r .n r/ .n r/
#
522
14 Special Problems of Algebraic Regression and Stochastic Estimation
"
B D
B1
#
"
; OfB g D
0
qr q .n r/
# :
In this case the state equation separates in two distinct parts. d z .t/ D A11 z1 C A12 z2 .t/ C B1 h.t/; z1 .0/ D z10 dt 1
(14.85)
d z .t/ D A22 z2 .t/; z2 .0/ D z20 : (14.86) dt 2 The last nr elements of z cannot be influenced in its time development. Influence is restricted to the initial conditions and to the eigen dynamics of the partial system 2 (characterized by the matrix A22 ). The state of the whole system cannot be influenced completely by the artificially given point of the state space. Accordingly, the state differential equation in terms of the matrix pair .A; B/ is not steerable. Example 3 (steerable state): If we apply the dynamic matrix A and the introductory matrix of a state model of type 2 3 4 2 6 3 37 7; B D 1 ; AD6 4 1 55 0:5 3 3 we are led to the alternative matrices after using the similarity transformation "
A D
" # 1 ; B D : 0 2 0
1
1
#
If the initial state is located along the z1 -axis, for instance z20 D 0, then the state vector remains all times along this axis. It is only possible to move this axis along a straight line “up and down”. In case that there exists no similarity transformation we call the state matrices .A; B/ steerable. Steerability of a state differential equation may be tested by Lemma 14.7. (Steerability): The pair .A; B/ is steerable if and only if rkŒBAB : : : An1 B D rkF.A; B/ D n:
(14.87)
F.A; B/ is called matrix of steerability. If its rank r < n, then there exists a transformation T such that A D T1 AT and B D T1 B has the form
14-3 Dynamical Systems
523
" A D
A11 A12
#
" ; B D
0 A22
B1
# (14.88)
0
and .A11 ; B1 / is steerable. Alternatively we could search for a transformation matrix T such that transforms the dynamic matrix and the exit matrix of a state model to the form "
A D
#
A11 0
"
r r
; OfA g D
A21 A22
r .n r/
#
.n 1/ r .n r/ .n r/
C D ŒC1 ; 0; OfC g D Œr p; .n r/ p : In this case the state equation and the observational equations read d z .t/ D A11 z1 .t/ C B1 u.t/; z1 .0/ D z10 dt 1 d z .t/ D A21 z1 .t/ C A22 z2 .t/ C B2 u.t/; z2 .0/ D z20 dt 2 y.t/ D C2 z1 .t/ C Du.t/:
(14.89)
(14.90) (14.91)
The last nr elements of the vector z are not used in the exit variable y. Since they do not have an effect to z1 , the vector g contains no information of the component of the state vector. This state moves in the n r dimensional subspace of Rn without any change in the exit variables. Our model .C; A/ is in this case called non-observable. Example 4 (observability): If the exit matrix and the dynamic matrix of a state model can be characterized by the matrices " # 0 1 2 4 ; AD ; CD 5 3 2 3 an application of the transformation matrix T leads to the matrices "
C D Œ1 ; 0 ; A D
1 0 1 2
# :
For an arbitrary motion of the state in the direction of the z2 axis has no influence on the existing variable. If there does not exist a transformation T, we call the state vector observable. A rank study helps again!
524
14 Special Problems of Algebraic Regression and Stochastic Estimation
Lemma 14.8. (Observability test): The pair .C; A/ is observable if and only if 2
C
3
7 6 6 CA 7 7 6 rk 6 : 7 D rkG.C; A/ D n: 7 6 :: 5 4 Cn1
(14.92)
G.C; A/ is called observability matrix. If its rank r < n, then there exists a transformation matrix T such that A D T1 AT and C D C T is of the form " " # # A11 0 r r r .n r/ A D (14.93) ; OfA g D A21 A22 .n 1/ r .n r/ .n r/ C D ŒC1 ; 0; OfC g D Œr p; .n r/ p
(14.94)
C1 ; A11
is observable. and With Lemma 14.7 and Lemma 14.8 we can only state whether a state model is steerable or observable or not, or which dimension has a partial system being classified as non-steerable and non-observable. In order to determine which part of a system is non-steerable or non-observable – which eigen motion is not excited or non-observable – we have to be able to construct proper transformation matrices T. A tool is the PBH-test we do not analyze here. Both the state differential equation as well as the initial equation we can Laplace transform easily. We only need the relations between the input, output and state variable via polynom matrices. If the initial conditions z0 vanish, we get the Laplace transformed characteristical equations .sIn A/z.s/ D Bu.s/
(14.95)
y.s/ D Cz.s/ C Du.s/:
(14.96)
For details we recommend to check the reference list. We only refer to solving both the state differential equation as well as the initial equation: Eliminating the state vector z.s/ lead us to the algebraic relation between u.s/andy.s/ W G.s/ D C.sIn A/1 B C D or
(14.97)
14-3 Dynamical Systems
G.s/ D
C1 C2
525
" s
Ir
0
0 Inr
#
"
A11 A12
#!1 "
0 A22
B1 0
# CD
(14.98)
D C1 .sIn A11 /1 B1 C D: Recently, the topic of chaos has attracted much attention. Chaotic behavior arises from certain types of nonlinear models, and a loose definition is apparently random behavior that is generated by a purely deterministic, nonlinear system. Refer to the contributions of K.S. Chan and H. Tong (2001), J. Gleick (1987), V. Isham (1983), H. Kants and I. Schreiber (1997).
Chapter 15
Algebraic Solutions of Systems of Equations Linear and Nonlinear Systems of Equations.
15-1 Introductory Remarks Algebraic techniques of Groebner bases and Multipolynomial resultants are presented in this Chapter as efficient tools for solving explicitly systems of linear and nonlinear equations. Similar to the Gauss elimination technique applied to linear systems of equations, Groebner bases and Multipolynomial resultants are useful in eliminating several variables in multivariate systems of nonlinear equations in such a manner that the end product results into univariate polynomial equations whose roots are readily determined using existing functions such as the roots command in MATLAB. The capability of Groebner bases and Multipolynomial resultants to solve explicitly nonlinear geodetic problems enables them to be used as the computational engine in the Gauss–Jacobi combinatorial algorithm proposed by C.F. Gauss (published posthumously, e.g., Appendix D-3) and C.G.I. Jacobi (1841) to solve the nonlinear Gauss–Markov model. By converting nonlinear geodetic observation equations into algebraic (polynomial) form, the Gauss–Jacobi combinatorial algorithm solves an overdetermined system of equations in two steps as follows: 1. In the first step, all the m combinations of minimal subsets of the observation equations are formed, and the n unknowns of each subset rigorously solved by means of Groebner bases or Multipolynomial resultants. The m solution sets are then represented as points in an n-dimensional space Rn . 2. In a second step, the solution of the overdetermined Gauß-Markov-model is constructed as a weighted arithmetic mean of those m solution points, where the weights are obtained from the Error propagation law/variance-covariance propagation. This chapter presents the theory of Groebner basis, multipolynomial resultants and the Gauss–Jacobi combinatorials methods, and is organized as follows. Section 15-2 provides the background to the algebraic and numerical methods used to derive exact solutions. In Sect. 15-3, the algebraic algorithms needed to solve the nonlinear system of equations are presented, and enriched with examples to help the reader understand them. The algorithms are finally applied successfully to three real-time geodetic problems in Sect. 15-4; transforming in a closed form geocentric coordinates to Gauss ellipsoidal coordinates (geodetic) through Minimum Distance Mapping (MDM); solution of GNSS pseudorange equations (using the GPS satellite
E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2 15, © Springer-Verlag Berlin Heidelberg 2012
527
528
15 Algebraic Solutions of Systems of Equations
vehicle as example); and to obtain the seven datum transformation parameters from two sets of coordinates. By means of Groebner basis, the scale parameter (in the seven parameters datum transformation problem) is shown to fulfill be a univariate algebraic equation of fourth order (quartic polynomial), while the rotation parameters are functions in scale and the coordinates differences. What we present in this chapter is a nutshell of algebraic treatment of systems of polynomial equations. For a complete coverage, we refer to J.L. Awange and E.W. Grafarend (2005), and J.L. Awange et al. (2010).
15-2 Background to Algebraic Solutions In Geodesy, Photogrammetry and Computer Vision, nonlinear equations are often encountered in several applications as they often relate the observations (measurements) to the unknown parameters to be determined. In case the number of observations n and the number of unknowns m are equal (n D m), the unknown parameters may be obtained by solving explicitly or in a closed form the system of equations relating observations to the unknown parameters. For example, D. Cox et al. (1998, pp. 28–32) illustrated that for systems of equations with exact solution, the system becomes vulnerable to small errors introduced during root findings and in case of extending the partial solution to the complete solution of the system, the errors may accumulate and thus become so large. If the partial solution is derived by iterative procedures, then the errors incurred during the root-finding may blow up during the extension of the partial solution to the complete solution (back substitution). In some applications, symbolic rather than numerical solution are desired. In such cases, explicit procedures are usually employed. The resulting symbolic expressions often consists of univariate polynomials relating the unknown parameters (unknown variables) to the known variables (observations). By inserting known values into these univariate polynomials, numerical solutions are readily computed for the unknown variables. Advantages of explicit solutions are listed, e.g., by E. L. Merritt (1949) as; provision of satisfaction to the users (Photogrammetrist and Mathematicians) of the methods, provision of data tools for checking the iterative methods, desired by Geodesist whose task of control densification does not favor iterative procedures, provision of solace and the requirement of explicit solutions rather than iterative by some applications. In Geodesy for example, the Minimum Distance Mapping problem considered by E. Grafarend and P. Lohse (1991) relates a point on the Earth’s topographical surface uniquely (one-to-one) to a point on the International Reference Ellipsoid. The solution of such an optimization problem requires that the equations be solved explicitly. The draw back that was experienced with explicit solutions was that they were like rare jewels. The reason for this was partly because the methods required extensive computations for the results to be obtained, and partly because the resulting
15-2 Background to Algebraic Solutions
529
symbolic expressions were too large and required computers with large storage capacity. Until recently, the computers that were available could hardly handle large computations due to lack of faster Central Processing Unit (CPU), shortage of Random Access Memory (RAM) and limited hard disk space for storage. The other set back experienced by the explicit procedures was that some of the methods, especially those from algebraic fields, were formulated based on theoretical concepts that were hard to realize or comprehend without the help of computers. For a long time, these setbacks hampered progress of the explicit procedures. The advances made in computer technology in recent years, however, has changed the tides and led to improvements in explicit computational procedures which hitherto were difficult to achieve. Apart from the improvements in existing computational procedures, new computational techniques are continuously being added to the increasing list of computational methods with an aim of optimizing computational speed and effeiciency (see, e.g., Palancz and Awange 2012). In this category are the algebraic methods of Groebner bases and Multipolynomial resultants. This Chapter presents algebraic computational techniques of Groebner bases and Multipolynomial resultants as suitable tools for solving explicitly nonlinear systems of observation equations that have been converted into algebraic (polynomial) forms in Geodesy and Geoinformatics. The algebraic techniques of Groebner bases and Sylvester resultant (resultant of two polynomials) for solving polynomial equations in Geodesy have been mentioned and examples of their applications to the twodimensional case given in the work of P. Lohse (1994, pp. 36–39, 71–76). We will demonstrate using examples how the Groebner bases and Multipolynomial resultants (resultant of more than two polynomials) can be used to solve problems of three-dimensional nature inherent in Geodesy. E. Grafarend (1989) already suggested the use of Groebner bases approach in solving the perspective center problem in photogrammetry. Other than revolutionizing computation procedures, the advances made in computer technology have also led to improvement in instrumentation for data acquisition as exemplified in the case of GPS positioning satellites. Since its inception as a positioning tool, the Global Positioning System (GPS) – referred to by E. Grafarend and J. Shan (1996) as the Global Problem Solver – has revolutionized geodetic positioning techniques and maintained its supremacy as a positioning tool. These improvements on instrumentation for data acquisition have led to improved data collection procedures and increase in accuracy. Lengthy geodetic procedures such as triangulation that required a lot of time, intervisibility between stations, and a large manpower are rapidly being replaced by satellite positioning techniques, which require shorter observation periods, no intervisibility requirement, weather independent and less manpower leading to optimization of time and money. Such improvements can be seen by the number of Global Navigation Satellite Systems (GNSS) being launched by various countries. These includes; Chinese’s Compass/Beidou and Russian’s GLONASS (see, e.g., Wellenhof 2008; Awange 2012). Whereas improved instrumentation is applauded, it comes a long with its own difficulties. One of the difficulties is that a lot of data is collected than required
530
15 Algebraic Solutions of Systems of Equations
to determine unknown parameters leading to redundancies. In positioning with GPS for example, due to its constellation that offer a wider coverage, more than four satellites can be observed at any point of the earth. In the minimal case, only four satellites are required to fix the receiver position and the receiver clock bias assuming that the transmitter clock bias and the transmitter-receiver bias have been neglected for the pseudo-range type of observations. More than four satellites as is nowadays the case, therefore, leads to superfluous observations. In such cases, where n > m, the explicit solutions give way to optimization procedures such as least squares solution which work very well for linear models under specified conditions discussed in other chapters of this book. In Geodesy and Geoinformatics, however, the observation equations are normally nonlinear thus requiring the use of nonlinear Gauss–Markov model, which is normally solved either by first linearizing the observation equations using Taylor series expansion to the second order terms about approximate values of the unknowns then applying linear model estimation procedures, or by using iterative procedures such as the Gauss–Newton approach. The linearization approach has the disadvantage that the linearized approximation of the nonlinear models may still suffer from nonlinearity and thus resulting in the estimates of such models being far from the real estimates of the nonlinear models. This can easily be checked by re-substituting the estimates from the linearized model into the original nonlinear model. For the iterative procedures, the greatest undoing may be the requirement of the initial approximate values to start off the iterations, which may not be available for some applications. For simper models, the approximate initial values may be computed, for others however, the approximate values may be impossible to compute. Apart from the problem of getting the initial approximate values, there also exists the problem that poor choice of approximate values may lead to lack of convergence, or if the approximate values are far from the real solutions, then a large number of iterations may be required to get close to the real solutions thus rendering the whole procedure to be quite slow, especially where multiple roots are available. For other procedures, such as the 7-parameter datum transformation that requires linearization and iterative methods, it is not feasible to take into account the stochasticity of both systems involved. Clearly, a procedure for solving nonlinear Gauss–Markov model that can avoid the requirement of initial approximate starting values for iteration and linearization approaches, and also take into consideration the stochasticity of the systems involved is the desire of modern day geodesist and geoinformatist (see, e.g., J. L. Awange and E. W. Grafarend 2005; J. L. Awange et al. 2010). With this background, this chapter aims at answering the following questions: • For geodetic, geoinformatics and Earth Science related problems requiring explicit solutions, can the algebraic tools of Groebner bases and Multipolynomial resultants that have found applications in other fields such as Robotics (for kinematic modelling of robots), Visions, Computer Aided Design (CAD), Engineering (offset surface construction in solid modelling), Computer Science
15-2 Background to Algebraic Solutions
531
(automated theorem proving) be used to solve systems of nonlinear observation equations of algebraic (polynomial) type? • Is there any alternative for solving the nonlinear Gauss–Markov model without resorting to linearization or iterative procedures that require approximate starting values? To answer the first question, the chapter uses the Groebner bases and Multipolynomial resultants to solve explicitly three geodetic problems of; Minimum Distance Mapping, GPS four-point pseudo-ranging, and the 7-parameter transformation problem in Sect. 15-4. The answer to the second problem becomes clear once the first question has been answered. Should the algebraic techniques of Groebner bases or Multipolynomial resultants be successful in solving explicitly the selected geodetic problems, they are then used as the computational engine of the combinatorial algorithm that was first suggested by C.F. Gauss and later on by C. G. I. Jacobi (1841) and extended by P. Werkmeister (1920). We refer to this algorithm simply as the Gauss–Jacobi combinatorial algorithm. In attempting to answer the questions above, the objectives of this chapter are: (1) To help you, the reader, understand the algebraic computational procedures of type Groebner bases and Multipolynomial resultants as applied to solve explicitly (in closed form) geodetic nonlinear problems. In this respect, the Groebner bases and Multipolynomial resultants techniques are used to solve explicitly (symbolically) geodetic problems of GPS pseudo-ranging four-point P4P, Minimum Distance Mapping and the 7-parameter transformation. By converting the nonlinear observation equations of these selected geodetic problems into algebraic (polynomial), Groebner bases and Multipolynomial resultants techniques are used to eliminate several variables in a multivariate systems of nonlinear polynomial equations in such a manner that the end products from the initial systems of nonlinear observation equations result in univariate polynomials. The elimination procedure is similar to the Gauss elimination approach in linear systems. (2) To introduce to you, the reader, an algebraic approach of Gauss–Jacobian combinatorial for solving overdetermined problems that are traditionally solved using least squares, which peg their operations on assumptions of linearity and prior information about the unknown values. From the principle of weighted arithmetic mean and using the Gauss–Jacobi combinatorial lemma (C. G. I. Jacobi 1841), an adjustment procedure that neither linearizes the nonlinear observation equations nor uses iterative procedures to solve the nonlinear Gauss–Markov model is developed. Linearization is permitted only for nonlinear error propagation/variance-covariance propagation. Such a procedure is only realized, thanks to the univariate polynomial generated by the algebraic computational procedures of type Groebner bases or Multipolynomial resultants as the computing engine for its minimal combinatorial sets.
532
15 Algebraic Solutions of Systems of Equations
15-3 Algebraic Methods for Solving Nonlinear Systems of Equations In the present Section, we depart from the traditional iterative procedures for estimating the unknown fixed parameters of the nonlinear Gauss–Markov model and present a combinatorial approach that traces its roots back to the work of C.F. Gauss. C.F. Gauss first proposed the combinatorial approach using the products of squared distances (from unknown point to known points) and the square of the perpendicular distances from the sides of the error triangle to the unknown point as the weights, see, e.g., Appendix D3. According to W.K. Nicholson (1999, pp. 272–273), the motto in Gauss seal read “pauca des matura” meaning few but ripe. This belief led C. F. Gauss not to publish most of his important contributions. For instance, W. K. Nicholson (1999, pp. 272–273) writes “Although not all his results were recorded in the diary (many were set down only in letters to friends), several entries would have each given fame to their author if published. Gauss new about the quartenions before Hamilton. . . ”. The combinatorial method, like many of his works, was later to be published after his death. Several years later, the method was to be developed further by C. G. I. Jacobi (1841) who used the square of the determinants as the weights in estimating the unknown parameters from the arithmetic mean. P. Werkmeister (1920) later established the relationship between the area of the error figure formed by the combinatorial approach and the standard error of the determined point. We will refer to this combinatorial approach as the Gauss–Jacobi combinatorial algorithm needed to solve an overdetermined system of equations algebraically. While the solution of the linear Gauss–Markov model by Best Linear Uniformly Unbiased Estimator (BLUUE) is straight forward, the solution of the nonlinear Gauss–Markov model has no straight forward procedure owing to the nonlinearity of the injective function (or map function) that maps Rm to Rn . The Gauss– Jacobi combinatorial algorithm is presented as a possible algebraic solution to the nonlinear Gauss–Markov model. In Sect. 15-31, we demonstrate how the procedure can be used to solve the nonlinear Gauss–Markov model.
15-31 Solution of Nonlinear Gauss–Markov Model The nonlinear Gauss–Markov model is solved in two steps: • Step 1: Combinatorial minimal subsets of observations are constructed and rigorously solved by means of the Multipolynomial resultant or Groebner basis (J.L . Awange and E.W. Grafarend 2005; J. L . Awange et al. 2010). • Step 2: The combinatorial solution points of step 1 are reduced to their final adjusted values by means of an adjustment procedure where the Best Linear Uniformly Unbiased Estimator (BLUUE) is used to estimate the vector of fixed
15-3 Algebraic Methods for Solving Nonlinear Systems of Equations
533
parameters within the linear Gauss–Markov model with the dispersion matrix of the real valued random vector of pseudo-observations from Step 1 generated via the nonlinear error propagation law also known in this case as the nonlinear variance-covariance propagation.
15-311 Construction and Solution of the Combinatorial Subsets Since n > m we construct a minimal combinatorial subsets comprising m equations solvable in closed form using either Groebner bases or Multipolynomial resultants which we present in Sects. 15-312 and 15-313. We begin by the following elementary definitions: Definition 15.1 (Permutation). Given a set S with elements fi; j; kg 2 S , the arrangement obtained by placing fi; j; kg 2 S in some sequence is called permutation. If we choose any of the elements say i first, then each of the remaining elements j; k can be put in the second position, while the third position is occupied by the unused letter either j or k. For the set S , the following permutations can be made: ijk ikj jik jki kij kji
# :
(15.1)
From (15.1), there exist three ways of filling the first position, two ways of filling the second position, and one way of filling the third position. Thus the number of permutations is given by 3 2 1 D 6. In general, for n different elements, the number of permutation is equal to n : : : 3 2 1 D nŠ Definition 15.2 (Combination). If for n elements only m elements are used for the permutation, then we have a combination of the mth order. If we follow the definition above, then the first position can be filled in n ways, the second in n 1 ways and the mth in n .m 1/ ways. In (15.1) above, the combinations are identical and contain the same elements in different sequences. If the arrangement is to be neglected, then we have for n elements, a combination of mth order being given by n.n 1/ : : : .n m C 1/ nŠ n D : (15.2) D m mŠ.n m/Š m321 n Given n nonlinear equations to be solved, we first form minimal comm binatorial subsets each consisting of m elements (where m is the number of the unknown elements). Each minimal combinatorial subset is solved using the algebraic procedures discussed in Sects. 15-312 and 15-313. In geodesy, the number of elements n normally consist of the observations in the vector y, while the number
534
15 Algebraic Solutions of Systems of Equations
of elements m normally consist of the unknown fixed parameters in the vector . In the following Sections, we present two algebraic algorithms that are used to solve the minimal combinatorial subsets (15.2) in closed form. We first present the approach based on the Groebner bases and thereafter consider the Multipolynomial resultants approach.
15-312 Groebner Basis Method Groebner basis is the greatest common divisors of a multivariate system of equations (J.L . Awange and E.W. Grafarend 2005; J. L . Awange et al. 2010). Its direct application is the elimination of variables in nonlinear systems of equations. Let us start by the problem of finding the greatest common divisors in Example 15.1: Example 15.1. (Greatest common divisors (gcd)): Given the numbers 12, 20, and 18, find their greatest common divisor. We proceed by writing the factors as 12 D 22 :31 :50
3
7 20 D 22 :30 :51 5 ! 21 :30 :50 D 2; 1
2
18 D 2 :3 :5
(15.3)
0
leading to 2 as the greatest common divisor of 12, 20 and 18. Next, let us consider the case of univariate polynomials f1 ; f2 2 kŒx in (15.4). f1 D 3x 4 3x 3 C 8x 2 C 2x 5 f2 D 5x 4 4x 2 9x C 21
# ! Euclidean algorithm D f 2 kŒx: (15.4)
Equation (15.4) employs the Euclidean algorithm which obtains one univariate polynomial as the gcd of the two univariate polynomials f1 and f2 . If on the other hand expressions in (15.4) were not univariate but multivariate, e.g., g1 ; g2 2 kŒx; y as in (15.5), then one applies the Buchberger algorithm which then computes the Groebner bases. For example, g1 D xy C x y 1 g2 D xy x y C 1
# ! Buchberger algorithm D Groebner basis: (15.5)
Groebner basis therefore, is the greatest common divisors of a multivariate system of polynomial equations fg1 ; g2 g. As stated earlier, it is useful for eliminating variables in nonlinear systems of equations. Gauss elimination technique on the other hand is applicable for linear cases as shown in Example 15.2.
15-3 Algebraic Methods for Solving Nonlinear Systems of Equations
535
Example 15.2. (Gauss elimination technique) Solve the linear system of equations 2
x C y C 2z D 2 6 4 3x y C z D 6
(15.6)
x C 3y C 4z D 4: The first step is to eliminate x in the second and third expressions of (15.6). This is achieved by multiplying the first expression by 3 and adding to the second expression to give the second expression of (15.7). The third expression of (15.7) is obtained by subtracting the first expression from the third expression in (15.6). 2
x C y C 2z D 2 6 4 2y C 7z D 12
(15.7)
2y C 2z D 2: The second step is to eliminate y in the second and third expressions of (15.7). This is achieved by subtracting the second expression from the third expression in (15.7) to give (15.8). 2 x C y C 2z D 2 6 (15.8) 4 2y C 7z D 12 5z D 10: The solution of z D 2 in (15.8) can now be substituted back into the second equation 2y C 7z D 12 to give the value of y D 1, which together with the value of z D 2 are substituted into the first equation to give the value of x D 1 to complete the Gauss elimination technique. In many applications however, equations relating unknown variables to the measured (observed) quantities are normally nonlinear and often consist of many variables (multivariate). In such cases, the Gauss elimination technique for the univariate polynomial equations employed in Example 15.2 gives way to Groebner basis as illustrated in Examples 15.3 and 15.4. In general, the Groebner basis algorithm reduces a system of multivariate polynomial equations. This is done by employing operations “addition” and “multiplication” on a polynomial ring (see, e.g., Awange and Grafarend 2005; Awange et al. 2010) to give more simplified expressions. Given a system of polynomial equations which are to be solved explicitly for unknowns, e.g., (15.5), Groebner basis algorithm is applied to reduce the set of polynomials into another set (e.g., from a system F .x; y; z/ to another system G.x; y; z/) of polynomials with suitable properties that allow solution. If F .x; y; z/ is a set of nonlinear system of polynomial equations, Groebner basis eliminates variables in a manner similar to Gauss elimination technique for linear cases to reduce it to G.x; y; z/. With Lexicographic ordering of the monomials (see Definition D1.2 in
536
15 Algebraic Solutions of Systems of Equations
Appendix D), one expression in G.x; y; z/ always turns out to be a univariate polynomial. Its roots are easily obtained using algebraic software of Matlab, Mathematica or Maple (see e.g., Appendix D3), and can be substituted in the other elements of the set G.x; y; z/ to obtain a complete solution which also satisfy the original set F .x; y; z/. Examples 15.3 and 15.4 elaborate on the application of Groebner basis. Example 15.3. (Groebner basis computation) Let us consider a simple example from Buchberger (2001). Consider a set F .x; y/ D ff1 ; f2 g to have as its elements "
f1 D xy 2y f2 D 2y 2 x 2 ;
(15.9)
where ff1 ; f2 g 2 I are the generators of the Ideal I (see definition of Ideal on p. 537). We now seek a simplified set of generators of this Ideal using Buchberger algorithm. By employing operations “addition” and “multiplication”, the Groebner basis algorithm (also called Buchberger algorithm) reduces the system of nonlinear equations (15.9) into another set G of F as G WD f2x 2 C x 3 ; 2y C xy; x 2 C 2y 2 g:
(15.10)
In Mathematica software, using the lexicographic order x > y, i.e., x comes before y, the Groebner basis could simply be computed by entering the command Groebner BasisŒF; fx; yg:
(15.11)
The set G in (15.10) contains one univariate polynomial 2x 2 Cx 3 , which can easily be solved using roots command in Matlab for solutions fx D 0; x D 0; x D 2g and substituted in any of the remaining elements of the set G to solve for y. The solutions of G, i.e., the roots fx D 0; x D 0; x D 2g/ and those of y satisfy polynomials in F . This can be easily tested by substituting these solutions into (15.9) to give 0. Let us consider as a second example an optimization problem. Example 15.4. (Minimum and maximization problem) Find the minimum and maximum of f .x; y; z/ D x 3 C 2xyz z2 , such that g.x; y; z/ D x 2 C y 2 C z2 1. First, we obtain the partial derivatives of f L g D 0 with respect to fx; y; z; Lg, where L is the Lagrangean multiplier as 2
3x 2 C 2yz 2xL D 0 6 6 2xz 2yL D 0 @f WD F D 6 6 2xy 2z 2zL D 0 @fx; y; z; Lg 4 x 2 C y 2 C z2 1 D 0:
(15.12)
15-3 Algebraic Methods for Solving Nonlinear Systems of Equations
537
Groebner basis is invoked in Mathematica by Groebner BasisŒfF g; fx; y; L; zg; which leads to 2
L 1:5x 1:5yz 43:7z6 62:2z4 17:5z2 D 0 6 2 6 x C y 2 C z2 1 D 0 GD6 6 y 2 z 1:8z5 C 2:8z3 z D 0 4 z7 1:5z5 C 0:6z3 0:04z D 0:
(15.13)
The solution of z in (15.13) can then be substituted into the third equation y 2 z 1:8z5 C 2:8z3 z D 0 to give the value of y. The obtained values of z and y are then substituted into the second equation to give the value of x, and thus complete the Groebner basis solution. Later in the chapter, we will introduce the reduced Groebner basis which can be used to obtain directly the last expression of (15.13), i.e., the univariate polynomial in z. The theory behind the operation of Groebner basis is however not so simple. In the remainder of this chapter, we will try to present in a simplified form the algorithm behind the computation of Groebner bases. The computation of Groebner basis is achieved by the capability to manipulate the polynomials to generate Ideals defined as Definition 15.3 (Ideal). An Ideal is generated by a family of generators as consisting of the set of linear combinations of these generators with polynomial coefficients. Let f1 ; : : : ; fs and c1 ; : : : ; cs be polynomials in k Œx1 ; : : : ; xn , then s X hf1 ; : : : ; fs i D ci fi : (15.14) i D1
In (15.14), hf1 ; : : : ; fs i is an Ideal and if a subset I k Œx1 ; : : : ; xn is an Ideal, it must satisfy the following conditions (Cox 1997, p. 29); • 02I • If f; g 2 I; then f C g 2 I (i.e., I is an additive subgroup of the additive group of the field k) • If f 2 I and c 2 k Œx1 ; : : : ; xn , then cf 2 I (i.e., I is closed under multiplication ring element). Example 15.5. (Ideal) Consider equations expressed algebraically as "
f1 WD .x1 x0 /2 C .y1 y0 /2 d12 f2 WD .x2 x0 /2 C .y2 y0 /2 d22 ;
(15.15)
538
15 Algebraic Solutions of Systems of Equations
where polynomials ff1 ; f2 g belong to the polynomial ring RŒx0 ; y0 . If the polynomials " c1 WD 4x0 C 6 (15.16) c2 WD x0 C y0 also belong to the same polynomial ring RŒx0 ; y0 , an Ideal is generated by a linear combination " hf1 ; f2 i D c1 f1 C c2 f2 I WD (15.17) D .4x0 C 6/f1 C .x0 C y0 /f2 : In this case, ff1 ; f2 g are said to be generators of the Ideal I . Definition (15.3) of an Ideal can be presented in terms of polynomial equations f1 ; : : : ; fs 2 k Œx1 ; : : : ; xn . This is done by expressing the system of polynomial equations as 2
f1 D 0 6 6 f2 D 0 6 6 : 6 6 6 : 4 fs D 0;
(15.18)
and using them to derive others by multiplying each individual equation fi by another polynomial ci 2 k Œx1 ; : : : ; xn and summing to get c1 f1 C c2 f2 C : : : C cs fs D 0 (cf., 15.14). The Ideal hf1 ; : : : ; fs i thus consists of a system of equations f1 D f2 D : : : D fs D 0, thus indicating that if f1 ; : : : ; fs 2 k Œx1 ; : : : ; xn , then hf1 ; : : : ; fs i is an Ideal generated by f1 ; : : : ; fs , i.e., being the basis of the Ideal I . In this case, a collection of these nonlinear algebraic equations forming Ideals are referred to as the set of polynomials generating the Ideal and forms the elements of this Ideal. Perhaps a curious reader may begin to wonder why the term Ideal is used. To quench this curiosity we refer to Nicholson (1999, p. 220) and quote from Becker (1993, p. 59) who wrote: “On the origin of the term Ideal, the concept is attributed to Dedekind who introduced it as a set theoretical version of Kummer’s “Ideal number” to circumvent the failure of unique factorization in certain natural extension of the domain Z. The relevance of Ideal in the theory of polynomial rings was highlighted by Hilbert Basis Theorem. The systematic development of Ideal theory; in more general rings is largely due to E. Noether . In the older literature, the term “module” is sometimes used for “Ideal” (cf., Macaulay, 1916). The term “ring” seems to be due to D. Hilbert ; Kronecker used the term “order” for ring”.
Example 15.6. (Ideal) Consider example (15.3) with polynomials in R Œx; y. The Ideal I D hxy 2y; 2y 2 x 2 i. The generators of an Ideal can be computed using the division algorithm defined as
15-3 Algebraic Methods for Solving Nonlinear Systems of Equations
539
Definition 15.4 (Division algorithm). Fix a monomial order of polynomials say x > y for polynomials F D .h1 ; : : : ; hs /. Then every f 2 k Œx; y can be written in the form f D a1 h1 C a2 h2 C : : : C as hs C r; where ai ; r 2 k Œx; y and either r D 0 or a linear combination with coefficients in k of monomials, none of which is divisible by any of LT .f1 /; : : : ; LT .fs / (see Definition D1.5 in Appendix D for leading term LT). Example 15.7. (Division algorithm in a univariate case) Divide the polynomial f D x 3 C2x 2 Cx C5 by h D x 2 2. We proceed as follows: 2
xC2 6 6 x 2 2 j x 3 C 2x 2 C x C 5 6 6 6 x 3 2x 6 6 6 6 2x 2 C 3x C 5 6 6 6 2x 2 4 4 3x C 1;
(15.19)
implying x 3 C 2x 2 C x C 5 D .x C 2/.x 2 2/ C .3x C 1/, with a D .x C 2/ and r D .3x C 1/. The division algorithm given in definition (15.4) fits well to the case of univariate polynomials as the remainder r can uniquely be determined. For multivariate polynomials, the remainder may not be uniquely determined as this depends on the order of the divisors. The division of the polynomial F by ff1 ; f2 g where f1 comes before f2 may not necessarily give the same remainder as the division of F by ff2 ; f1 g in whose case the order has been changed. This problem is overcome if we pass over to Groebner basis where the existence of every Ideal is assured by the Hilbert Basis Theorem (Cox 1997, pp. 47–61). The Hilbert Basis Theorem assures that every Ideal I k Œx1 ; : : : ; xn has a finite generating set, that is I D hg1 ; : : : ; gs i for some fg1 ; : : : ; gs g 2 I: The finite generating set G in Hilbert Basis Theorem is what is known as a basis. Suppose every non-zero polynomial is written in decreasing order of its monomials: n X
di xi ;
di ¤ 0; xi > xi C1 ;
(15.20)
i D1
if we let the system of generators of the Ideal be in a set G, a polynomial f is reduced with respect to G if no leading monomial of an element of G (LM (G)) divides the leading monomial of f (LM(f )). The polynomial f is said to be completely reduced with respect to G if no monomials of f is divisible by the leading monomial of an element of G (Davenport 1988, pp. 96–97). The basis G,
540
15 Algebraic Solutions of Systems of Equations
which completely reduces the polynomial f and uniquely determines the remainder r is also known as the Groebner basis and is defined as follows: Definition 15.5 (Groebner basis). A system of generators G of an Ideal I is called a Groebner basis (with respect to the order <) if every reduction of f 2 I to a reduced polynomial (with respect to G) always gives zero as a remainder. This definition is a special case of a more general definition given as: Fix a monomial order and let G D fg1 ; : : : ; gt g k Œx1 ; : : : ; xn : Given f 2 k Œx1 ; : : : ; xn ; then f reduces to zero Modulo G, written as f !G 0;
(15.21)
if f can be written in the form (cf., 15.18 on p. 538) f D a1 g1 C : : : C at gt ;
(15.22)
such that whenever ai gi ¤ 0; we have multideg(f ) multideg(ai gi ) (see Definition D1.5 in Appendix D for leading term LT, LM and Multideg). Following Definition 15.5, the reader can revisit Examples 15.3 and 15.4 which present the Groebner basis G of the original system F of equations. Groebner basis has become a household name in algebraic manipulations and finds application in fields such as mathematics and engineering for solving partial differential equations, e.g.,(Lidl 1998, p. 432). It has also found use as a tool for discovering and proving theorems to solving systems of polynomial equations as elaborated in publications by Buchberger and Winkler (1998). Groebner basis also give a solution to the Ideal membership problem. By reducing a given polynomial f with respect to the Groebner basis G, f is said to be a member of the Ideal if zero remainder is obtained. This implies that if G D fg1 ; : : : ; gs g is a Groebner basis of an Ideal I k Œx1 ; : : : ; xn and f 2 k Œx1 ; : : : ; xn a polynomial, f 2 I if and only if the remainder on division of f by G is zero. Groebner bases can also be used to show the equivalence of polynomial equations. Two sets of polynomial equations will generate the same Ideal if and only if their Groebner bases are equal with respect to any term ordering, e.g., the solutions of (15.10) satisfy those of (15.9). This property is important in that the solutions of the Groebner basis will satisfy the original system formed by the generating set of nonlinear equations. It implies that a system of polynomial equations f1 .x1 ; : : : ; xn / D 0; : : : ; fs .x1 ; : : : ; xn / D 0 will have the same solutions with a system arising from any Groebner basis of f1 ; : : : ; fs with respect to any term ordering. This is the main property of Groebner basis that is used to solve systems of polynomial equations as will be explained in the next section. B. Buchberger algorithm Given polynomials g1 ; : : : ; gs 2 I , the B. Buchberger algorithm seeks to derive the standard generators or the Groebner basis of this Ideal. Systems of equations
15-3 Algebraic Methods for Solving Nonlinear Systems of Equations
541
g1 D 0; : : : ; gs D 0 to be solved in practise are normally formed by these same polynomials which here generating the Ideal. The B. Buchberger algorithm computes the Groebner basis by making use of pairs of polynomials from the original polynomials g1 ; : : : ; gs 2 I and computes the subtraction polynomial known as the S -polynomial explained in D. Cox et al. (1997; p. 81) as follows: Definition 15.6 (S -polynomial). Let f; g 2 k Œx1 ; : : : xn be two non-zero polynomials. If multideg .f / D ˛ and multideg .g/ D ˇ, then let D 1 ; : : : ; n , where i D max f˛i ; ˇi g for each i . x is called the Least Common Multiple (LCM) of LM(f ) and LM(g) expressed as x D LCM fLM.f /; LM.g/g. The S -polynomial of f and g is given as S.f; g/ D
x x f g: LT .f / LT .g/
The expression above gives S as a linear combination of the monomials
(15.23) x , LT .f /
x with polynomial coefficients f and g and thus belongs to the Ideal LT .g/ generated by f and g. Definition 15.7 (Groebner basis in terms of S -polynomial). A basis G is Groebner basis if and only if for every pair of polynomials f and g of G, S.f; g/ reduces to zero with respect to G. More generally a basis G D fg1 ; : : : ; gs g for an Ideal I is a Groebner basis if and only if S.f; g/ !G 0; i ¤ j . The implication of Definition 15.7 is the following: Given two polynomials f; g 2 G such that LCM.LM.f /; LM.g// D LM.f /:LM.g/, the leading monomials of f and g are relatively prime leading to S.f; g/ !G 0: The concept of prime integer is clearly documented in K. Ireland and M. Rosen (1990 pp. 1–17). Example 15.8. (S -Polynomial ): Consider the two polynomials 2 4
g1 D x12 C 2a12 x1 x2 C x22 C aoo g2 D x22 C 2b23 x2 x3 C x32 C boo :
(15.24)
The S -polynomial can then be computed as follows: First we choose a lexicographic ordering fx1 > x2 > x3 g then
542
15 Algebraic Solutions of Systems of Equations
2
LM.g1 / D x12 ; LM.g2 / D x22 ; LT .g1 / D x12 ; LT .g2 / D x22
6 6 LCM.LM.g1 /; LM.g2 // D x12 x22 6 6 6 x 2x 2 x2x2 6 6 S.g1 ; g2 / D 1 2 2 .x12 C 2a12 x1 x2 C x22 C aoo / 1 2 2 .x22 C 2b23 x2 x3 C x32 C boo / 6 x1 x2 6 6 2 2 3 4 2 2 2 6 D x2 x1 C 2a12 x1 x2 C x2 C aoo x2 x1 x2 2b23 x12 x2 x3 x12 x32 boo x12 / 4 D boo x12 2b23 x12 x2 x3 x12 x32 C 2a12 x1 x23 C x24 C aoo x22
(15.25)
Example 15.9. (S -Polynomial ): Consider two polynomial equations given as 2 6 4
g3 D x12 2a0 x1 C x22 2b0 x2 C x32 2c0 x3 x42 C 2d0 x4 C a02 C b02 C c02 C d02 g4 D x12 2a1 x1 C x22 2b1 x2 C x32 2c1 x3 x42 C 2d1 x4 C a12 C b12 C c12 C d12 : (15.26)
By choosing the lexicographic ordering fx1 > x2 > x3 > x4 g, the S -polynomial is computed as follows 2
LM.g3 / D x12 ; LM.g4 / D x12 ; LT .g3 / D x12 ; LT .g4 / D x12 6 6 LCM.LM.g3 /; LM.g4 // D x 2 6 1 6 2 2 6 6 S.g ; g / D x1 .g / x1 .g / D g g 3 4 3 4 3 4 6 x12 x12 6 6 6 S.g3 ; g4 / D 2.a1 ao /x1 C 2.b1 bo /x2 C 2.c1 co /x3 C 4
(15.27)
C2.do d1 /x4 C ao a1 C bo b1 C co c1 C d0 d1 Example 15.10. (S -Polynomial ): As an additional example on the computation of S -polynomial, let us consider the minimum distance mapping problem. The last two polynomials equations are given as " g5 D x3 C b 2 x3 x4 Z (15.28) g6 D b 2 x12 C b 2 x22 C a2 x32 a2 b 2 : By choosing the lexicographic ordering fx1 > x2 > x3 > x4 g, the S -polynomial is computed as follows
15-3 Algebraic Methods for Solving Nonlinear Systems of Equations
543
2
LM.g5 / D x3 ; LM.g6 / D x12 ; LT .g5 / D x3 ; LT .g6 / D b 2 x12 6 6 LCM.LM.g5 /; LM.g6 // D x 2 x3 6 1 6 2 2 6 6 S.g ; g / D x1 x3 .x C b 2 x x Z/ x1 x3 .b 2 x 2 C b 2 x 2 C a2 x 2 a2 b 2 / 5 6 3 3 4 6 1 2 3 x3 b 2 x12 4 S.g5 ; g6 / D Zx12 C b 2 x12 x3 x4 x22 x3
a2 3 x b2 3
C a2 x3
(15.29)
Example 15.11. (computation of Groebner basis from the S -polynomials): By means of an Example given by J. H. Davenport et al. (1988, pp. 101–102), we illustrate how the B. Buchberger algorithm works. Let us consider the Ideal generated by the polynomial equations 2
g1 D x 3 yz xz2 6 6 g2 D xy 2 z xyz 4
(15.30)
g3 D x 2 y 2 z with the lexicographic ordering x > y > z adopted. The S -polynomials to be considered are S.g1 ; g2 /, S.g2 ; g3 / and S.g1 ; g3 /. We consider first S.g2 ; g3 / and show that the result is used to suppress g1 upon which any pair S.g1 ; gi / (e.g. S.g1 ; g2 / and S.g1 ; g3 /) containing g1 will not be considered. LT .g2 / D xy 2 z; LT .g3 / D x 2 y 2 ; then LCM.g2 ; g3 / D x 2 y 2 z respectively 2
x2 y 2 z x2y 2 z 6 S.g2 ; g3 / D xy 2 z g2 x 2 y 2 g3 6 6 6 D .x 2 y 2 z x 2 yz/ .x 2 y 2 z z2 / 4 D x 2 yz C z2
(15.31)
We immediately note that the leading term of the resulting polynomial LT(S.g2 ; g3 /) is not divisible by any of the leading terms of the elements of G, and thus the remainder upon the division of S.g2 ; g3 / by the polynomials in G is not zero (i.e. when reduced with respect to G) thus G is not a Groebner basis. This resulting polynomial after the formation of the S -polynomial is denoted g4 and its negative (to make calculations more reliable) added to the initial set of G leading to 2 g1 D x 3 yz xz2 6 6 g2 D xy 2 z xyz 6 (15.32) 6 6 g3 D x 2 y 2 z 4 g4 D x 2 yz z2 :
544
15 Algebraic Solutions of Systems of Equations
The S -polynomials to be considered are now S.g1 ; g2 /; S.g1 ; g3 /; S.g1 ; g4 /; S.g2 ; g4 / and S.g3 ; g4 /: In the set of G, one can write g1 D xg4 leading without any change in G to the suppression of g1 leaving only S.g2 ; g4 / and S.g3 ; g4 / to be considered. Then " S.g2 ; g4 / D xg2 yg4 (15.33) D x 2 yz C yz2 which can be reduced by adding g4 to give g5 D yz2 z2 , a non zero value thus the set G is not a Groebner basis. This value is added to the set of G to give 2
g2 D xy 2 z xyz;
6 6 g3 D x 2 y 2 z; 6 6 6 g4 D x 2 yz z2 ; 4
(15.34)
g5 D yz2 z2 ; the S -polynomials to be considered are now S.g3 ; g4 /; S.g2 ; g5 /; S.g3 ; g5 / and S.g4 ; g5 /: We then compute "
S.g3 ; g4 / D zg3 yg4 D yz2 z2
(15.35)
which upon subtraction from g5 reduces to zero. Further, 2 6 6 4
S.g2 ; g5 / D zg2 xyg5 D xyz2 C xyz2
(15.36)
D0 and
S.g4 ; g5 / D zg4 x 2 yg5 D x 2 z2 z3 ;
(15.37)
which is added to G as g6 giving 2
g2 6 6g 6 3 6 6 6 g4 6 6 6 g5 4 g6
D xy 2 z xyz; D x 2 y 2 z; D x 2 yz z2 ; D yz2 z2 ; D x 2 y 2 z3 ;
(15.38)
15-3 Algebraic Methods for Solving Nonlinear Systems of Equations
545
The S polynomials to be considered are now S.g3 ; g5 /; S.g2 ; g6 /; S.g3 ; g6 /; S.g4 ; g6 / and S.g5 ; g6 /: We now illustrate that all this S -polynomials reduces to zero as follows 2 S.g3 ; g5 / D z2 g3 x 2 yg5 D x 2 yz2 z3 zg4 D 0 6 6 S.g ; g / D xzg y 2 g D x 2 y 2 z2 C y 2 z3 C y 2 g D 0 6 2 6 2 6 4 6 6 2 2 2 3 3 (15.39) 6 S.g3 ; g6 / D z g3 y g6 D y z z .yz z/g5 D 0 6 6 6 S.g4 ; g6 / D zg4 yg6 D yz3 z3 zg5 D 0 4 S.g5 ; g6 / D x 2 g5 yg6 D x 2 z2 C yz3 C g6 zg5 D 0: Thus (15.39) comprise the Groebner basis of the original set in (15.30). The importance of the S -polynomials is that they lead to the cancelation of the leading terms of the polynomial pairs involved. In so doing the polynomial variables are systematically eliminated according to the polynomial ordering chosen. For example if the lexicographic ordering x > y > z is chosen, x will be eliminated first, followed by y and the final expression may consist only of the variable z. D. Cox et al. (1998, p.15) has indicated the advantage of lexicographic ordering as being the ability to produce Groebner basis with systematic elimination of variables. Graded lexicographic ordering on the other hand has the advantage of minimizing the amount of computational space needed to produce the Groebner basis. The procedure is thus a generalisation of the Gauss elimination procedure for linear systems of equations. If we now put our system of polynomial equations to be solved in a set G, S -pair combinations can be formed from the set of G as explained in the definitions above. The theorem, known as the Buchberger’s S -pair polynomial criterion, gives the criterion for deciding whether a given basis is a Groebner basis or not. It suffices to compute all the S -polynomials and check whether they reduce to zero. Should one of the polynomials not reduce to zero, then the basis fails to be a Groebner basis. Since the reduction is a linear combination of the elements of G, it can be added to the set G without changing the Ideal generated. B. Buchberger (1979) gives an optimisation criterion that reduces the number of the S -polynomials already considered in the algorithm. The criterion states that if there is an element p of G such that the leading monomial of p (LM(p )) divides the LCM(f; g 2 G), and if S.f; p/ ; S.p; g/ have already been considered, then there is no need of considering S.f; g/ as this reduces to zero. The essential observation in using the Groebner bases to solve a system of polynomial equations is that the variety (simultaneous solution of system of polynomial equations) does not depend on the original system of the polynomials F WD ff1 ; : : : ; fs g but instead on the Ideal I generated by F . This therefore means that for the variety V D V .I /, one makes use of the special generating set (Groebner basis) instead of the actual system F . Since the Ideal is generated by F , the solution obtained by solving for the affine variety of this Ideal satisfies the original system F of equations. B. Buchberger (1970) proved that V .I / is void,
546
15 Algebraic Solutions of Systems of Equations
and thus giving a test as to whether the system of polynomial F can be solved, if and only if the computed Groebner basis of polynomial Ideal I has f1g as its element. B. Buchberger (1970) further gives the criterion for deciding if V .I / is finite. If the system has been proved to be solvable and finite then F. Winkler (1996, theorem 8.4.4, p.192) gives a theorem for deciding whether the system has finitely or infinitely many solutions. The theorem states that if G is a Groebner basis, then a solvable system of polynomial equations has finitely many solutions if and only if for every xi ; 1 i n; there is a polynomial gi 2 G such that LM.gi / is a pure power of xi . The process of addition of the remainder after the reduction by the S -polynomials and thus expanding the generating set is shown by B. Buchberger (1970), D. Cox et al. (1997 p.88) and J. H. Davenport et al. (1988 p.101) to terminate. The B. Buchberger algorithm (see Appendix D2), more or less a generalization of the Gauss elimination procedure, makes use of the subtraction polynomials known as the S -polynomials in Definition 15.7 to eliminate the leading terms of a pair of polynomials. In so doing and if lexicographic ordering is chosen, the process end up with one of the computed S -polynomials being a univariate polynomial which can be solved and substituted back in the other S -polynomials using the Extension Theorem (D. Cox et al. 1998, pp. 25–26) to obtain the other variables. The Groebner bases approach adds to the treasures of methods that are used to solve nonlinear algebraic systems of equations in Geodesy, Photogrammetry, Machine Vision, Robotics and Surveying. Having defined the term Groebner basis and illustrated how the B. Buchberger algorithm computes the Groebner bases, we briefly mention here how the Groebner bases can be computed using algebraic softwares of Mathematica and Maple. In Mathematica Versions 8, the Groebner basis command is executed by writing In[1]:=GroebnerBasis[fpolynomoialsg,fvariablesg,felimsg] (where In[1]:= is the mathematica prompt) which computes the Groebner basis for the ideal generated by the polynomials with respect to the monomial order specified by monomial order options with the variables specified as in the executable command giving the reduced Groebner basis. Without specifying the elims part that indicates the elimination order, one gets too many elements of the Groebner basis which may not be relevant. In Maple Version 5 the command is accessed by typing > with (grobner); (where > is the Maple prompt and the semicolon ends the Maple command). Once the Groebner basis package has been loaded, the execution command then becomes > g basis (polynomials, variables, termorder) which computes the Groebner basis for the ideal generated by the polynomials with respect to the monomial ordering specified by termorder and variables in the executable command. Following suggestions from B. Buchberger (1999), Mathematica software is applied to the examples presented in Sect. 15-4.
15-313 Multipolynomial Resultants Method Whereas the resultant of two polynomials is well known and algorithms for computing it are well incorporated into computer algebra packages such as Maple,
15-3 Algebraic Methods for Solving Nonlinear Systems of Equations
547
the Multipolynomial resultant, i.e. the resultant of more than two polynomials still remain an active area of research. In Geodesy, the use of the two polynomial resultant also known as the Sylvester resultant is exemplified in the work of P. Lohse (1994, pp. 72–76). This section therefore extends on the use of Sylvester resultants to resultants of more than two polynomials of multiple variables (Multipolynomial resultant). J.L. Awange and E.W. Grafarend (2005) and J. L. Awange et al. (2010) illustrate how the tool can be exploited in Geodesy and Geoinformatics to solve nonlinear system of equations. The necessity of Multipolynomial resultant method in Geodesy is due to the fact that many geodetic problems involve the solution of more than two polynomials of multiple variables. This is true since we are living in a three-dimensional world. We shall therefore understand the term multipolynomial resultants to mean resultants of more than two polynomials. We treat it as a tool besides the Groebner bases and perhaps more powerful to eliminate variables in solution of polynomial systems. Publications on the subject can be found in the works of G. Salmon (1876), F. Macaulay (1902, 1916), A. L. Dixon (1908), C. Bajaj et al. (1988), J. F. Canny (1988), J. F. Canny et al. (1989), I. M. Gelfand et al. (1990, 1994), D. Manocha (1992, 1993, 1994a,b,c), D. Manocha and J.F. Canny (1991, 1992, 1993), G. Lyubeznik (1995), S. Krishna and D. Manocha (1995), J. Guckenheimer et al.(1997), B. Sturmfels (1994, 1998) and E. Cattani et al. (1998). Text books on the subject have been written by I. Gelfand et al. (1994), by D. Cox et al. (1998, pp.71–122) and, more recently, by J.L. Awange and E.W. Grafarend (2005), and J. L. Awange et al. (2010) who provides interesting material. In order to understand the Multipolynomial resultants technique, we first present the simple case; the resultant of two polynomial also known as the Sylvester resultant. Resultant of two polynomials Definition 15.8 (Homogeneous polynomial). If monomials of a polynomial p with non zero coefficients have the same total degree, the polynomial p is said to be homogeneous. Example 15.12. (Homogeneous polynomial equation): A homogeneous polynomial equation of total degree 2 is p D x 2 C y 2 C z2 C xy C xz C yz since the monomials fx; y; z; xy; xz; yzg all have the sum of their powers (total degree) being 2. To set the ball rolling, we examine next the resultant of two univariate polynomials p; q 2 kŒx of positive degree as p D k0 x i C : : : : C ki ; k0 ¤ 0; i > 0 q D l0 x j C : : : : C lj ; l0 ¤ 0; j > 0
(15.40)
548
15 Algebraic Solutions of Systems of Equations
the resultant of p and q, denoted Res.p; q/; is the .i C j / .i C j / determinant 2
k0 60 6 6 60 6 60 6 60 6 60 Res .p; q/ D det 6 6 l0 6 60 6 6 60 6 60 6 40 0
k1 k0 0 0 0 0 l1 l0 0 0 0 0
k2 k1 k0 0 0 0 l2 l1 l0 0 0 0
: k2 k1 k0 0 0 : l2 l1 l0 0 0
: : k2 k1 k0 0 : : l2 l1 l0 0
: : : k2 k1 k0 : : : l2 l1 l0
ki : : : k2 k1 lj : : : l2 l1
0 ki : : : k2 0 lj : : : l2
0 0 ki : : : 0 0 lj : : :
0 0 0 ki : : 0 0 0 lj : :
0 0 0 0 ki : 0 0 0 0 lj :
3 0 07 7 7 07 7 07 7 07 7 ki 7 7 07 7 07 7 7 07 7 07 7 05 lj
(15.41)
where the coefficients of the first polynomial p of (15.40) occupies j rows while those of the second polynomial q occupies i rows. The empty spaces are occupied by zeros as shown above such that a square matrix is obtained. This resultant is also known as the Sylvester resultant and has the following properties (B. Sturmfels 1998, D. Cox et al. 1998, 3.5) 1. Res.p; q/ is a polynomial in k0 ; : : : ; ki ; l0 ; : : : ; lj with integer coefficients 2. Res.p; q/ D 0 if and only if p.x/ and q.x/ have a common factor in QŒx 3. There exist a polynomial r; s 2 QŒx such that rp C sq D Res.p; q/ Sylvester resultants can be used to solve two polynomial equations as shown in the example below Example 15.13. (D. Cox et al.1998, p.72): Consider the two equations p WD xy 1 D 0
)
q WD x 2 C y 2 4 D 0:
(15.42)
In order to eliminate one variable, we use the hide variable technique i.e. we consider one variable say y as a constant (of degree zero). We then have the Sylvester resultant from (15.41) as 2
y 1
6 Res .p; q; x/ D det 6 40 y
0
3
7 4 2 1 7 5 D y 4y C 1
1 0 y2 4
(15.43)
15-3 Algebraic Methods for Solving Nonlinear Systems of Equations
549
which can be readily solved for the variable y and substituted back in any of the equations in (15.42) to get the values of the other variable x. For two polynomials, the construction of resultant is relatively simpler and algorithms for the execution are incorporated in computer algebra algorithms. For the resultant of more than 2 polynomials of multiple variables, we turn to the Multipolynomial resultants.
Multipolynomial resultants In defining the term multipolynomial resultant, D. Manocha (1994c) writes: “Elimination theory, a branch of classical algebraic geometry, deals with conditions for common solutions of a system of polynomial equations. Its main result is the construction of a single resultant polynomial of n homogeneous polynomial equations in n unknowns, such that the vanishing of the resultant is a necessary and sufficient condition for the given system to have a non-trivial solution. We refer to this resultant as the multipolynomial resultant and use it in the algorithm presented in the paper”. We present here two approaches for the formation of the design matrix whose determinant we need; first the approach based on F. Macaulay (1902) formulation and then a more modern approach based on B. Sturmfels (1998) formulation. Approach based on F. Macaulay (1902) formulation: With n polynomials, the construction of the matrix whose entries are the coefficients of the polynomials f1 ; : : : ; fn can be done in five steps as illustrated in the following approach of F. Macaulay (1902): Step 1: The given polynomials f1 D 0; : : : ; fn D 0 are considered to be homogeneous equations in the variables x1 ; : : : ; xn and if not, they are homogenized. Let the degree of the polynomial fi be di . The first step involves the determination of the critical degree given by C. Bajaj et al. (1988) as d D1C
X
.di 1/:
(15.44)
Step 2: Once the critical degree has been established, the given monomials of the polynomial equations are multiplied with each other to generate a set X whose elements consists of monomials whose total degree equals the critical degree. Thus if we are given the polynomial equations f1 D 0; : : : ; fn D 0, then each monomial of f1 is multiplied by those of f2 ; : : : ; fn , those of f2 are multiplied by those of f3 ; : : : ; fn until those of fn1 are multiplied by those of fn . The set X of monomials generated in this form is X d D fx d j d D ˛1 C ˛2 C C ˛n g and the variable x d D x1˛1 : : : xn˛n .
(15.45)
550
15 Algebraic Solutions of Systems of Equations
Step 3: The set X containing the monomials each of total degree d is now partitioned according to the following criteria (J. F. Canny 1988, p. 54) 8 ˆ X1d D ˆ ˆ ˆ ˆ ˆ ˆ X2d D ˆ ˆ ˆ ˆ ˆ <: :
fx ˛ 2 X d j ˛1 d1 g fx ˛ 2 X d j ˛2 d2 and ˛1 < d1 g :
ˆ ˆ : : : ˆ ˆ ˆ ˆ ˆ ˆ : : : ˆ ˆ ˆ ˆ : d Xn D fx ˛ 2 X d j ˛n dn and ˛i < di ; for i D 1; : : : ; n 1g:
(15.46)
The resulting sets of Xid are disjoint and every element of X d is contained in exactly one of them. Step 4: From the resulting subsets Xid X d , a set of polynomials Fi which are homogeneous in n variables are defined as follows Fi D
Xid xidi
fi
(15.47)
from which a square matrix A is now formed with the row elements being the coefficients of the monomials of the polynomials Fi ji D1;:::;n and the columns correspond of the set X d . The formed square matrix is of to the N monomials d Cn1 d Cn1 the order and is such that for a given polynomial Fi d d in (15.47), the row is made up of the symbolic coefficients of each polynomial. The square matrix A has a special property that the non trivial solution of the homogeneous equations Fi which also form the solution of the original equations fi are in its null space. This implies that the matrix must be singular or its determinant, det.A/, must be zero. For the determinant to vanish therefore, the original equations fi and their homogenized counterparts Fi must have the same non trivial solutions. Step 5: After computing the determinant of the square matrix A above, F. Macaulay (1902) suggests the computation of extraneous factor in order to obtain the resultant. D. Cox et al. (1998, Proposition 4.6, p.99) explains the extraneous factors to be integer polynomials in the coefficients of FN0 ; : : : ; FNn1 , where FNi D Fi .x0 ; : : : ; xn1 ; 0/ and is related to the determinant via determinant D Res.F1 ; : : : ; Fn /:Ext
(15.48)
with the determinant computed as in step 4, Res.F1 ; : : : ; Fn / being the Multipolynomial resultant and Ext the extraneous factor. This expression was established
15-3 Algebraic Methods for Solving Nonlinear Systems of Equations
551
as early as 1902 by F. Macaulay (1902) and this procedure of resultant formulation named after him. F. Macaulay (1902) determines the extraneous factor from the sub-matrix of the N N square matrix A and calls it a factor of minor obtained by deleting rows and columns of the N N matrix A. A monomial x ˛ of total degree d is said to be reduced if xidi divides x ˛ for exactly one i . The extraneous factor is obtained by computing the determinant of the sub-matrix of the coefficient matrix A obtained by deleting rows and columns corresponding to reduced monomials x ˛ : From the relationship of step 5, it suffices for our purpose to solve for the unknown variable hidden in the coefficients of the polynomials fi by obtaining the determinant of the N N square matrix A and equating it to zero neglecting the extraneous factor. This is because the extraneous factor is an integer polynomial and as such not related to the variable in the determinant of A. The existence of the nontrivial solution provides the necessary and sufficient conditions for the vanishing of the determinant. It should be pointed out that there exists several procedures for computing the resultants as exemplified in the works of D. Manocha (1992, 1993, 1994a,b,c) who solves the Multipolynomial resultants using the eigenvalueeigenvector approach, J. F. Canny (1988) who solves the resultant using the characteristic polynomial approach and B. Sturmfels (1994, 1998) who proposes a more compact approach for solving the resultants of a ternary quadric using the Jacobian determinant approach which we present below. Approach based on B. Sturmfels (1998, p.26) formulation: Given three homogeneous equations of degree two as follows F1 WD a11 x 2 C a12 y 2 C a13 z2 C a14 xy C a15 xz C a16 yz D 0 F2 WD a21 x 2 C a22 y 2 C a23 z2 C a24 xy C a25 xz C a26 yz D 0
(15.49)
F3 WD a31 x 2 C a32 y 2 C a33 z2 C a34 xy C a35 xz C a36 yz D 0 we compute the Jacobian determinants of (15.49) by 3 @F1 @F1 @F1 6 @x @y @z 7 7 6 7 6 7 6 6 @F2 @F2 @F2 7 7 J D det 6 6 @x @y @z 7 7 6 7 6 7 6 4 @F3 @F3 @F3 5 @x @y @z 2
(15.50)
which is a cubic polynomial in the coefficients fx; y; zg. Since the determinant polynomial J in (15.50) is a cubic polynomial, its partial derivatives will be quadratic polynomials in variables fx; y; zg and can be written in the form
552
15 Algebraic Solutions of Systems of Equations
@J WD b11 x 2 C b12 y 2 C b13 z2 C b14 xy C b15 xz C b16 yz D 0 @x @J WD b21 x 2 C b22 y 2 C b23 z2 C b24 xy C b25 xz C b26 yz D 0 @y
(15.51)
@J WD b31 x 2 C b32 y 2 C b33 z2 C b34 xy C b35 xz C b36 yz D 0: @z The coefficients bij in (15.51) are quadratic polynomials in aij of equation (15.49). The final step in computing the resultant of the initial system (15.49) involves the computation of the determinant of a 6 6 matrix given by 2
a11 a12 a13 a14 a15 a16
6 6 a21 6 6a 6 31 Res222 .F1 ; F2 ; F3 / D det 6 6 b11 6 6 4 b21
3
7 a22 a23 a24 a25 a26 7 7 a32 a33 a34 a35 a36 7 7 7: b12 b13 b14 b15 b12 7 7 7 b22 b23 b24 b25 b26 5
(15.52)
b31 b32 b33 b34 b35 a36 The resultant (15.52) vanishes if and only if (15.49) have a common solution fx; y; zg; where fx; y; zg are complex numbers or real numbers not all equal zero. In Sect. 15-4, we use the procedure to solve the GPS pseudo-range equations.
15-32 Adjustment of the combinatorial subsets Once the combinatorial minimal subsets have been solved using either the Groebner bases or the Multipolynomial resultant approach, the resulting sets of solution are considered as pseudo-observations. For each combinatorial, the obtained minimal subset solutions considered as pseudo-observations are used as the approximate values to generate the dispersion matrix via the nonlinear error propagation law/variance-covariance propagation (e.g. E. Grafarend and B. Schaffrin, 1993, pp. 469–471) as follows: From the nonlinear geodetic observation equations that have been converted into its algebraic (polynomial) form, the combinatorial minimal subsets will consist of polynomials f1 ; : : : ; fm 2 kŒx1 ; : : : ; xm with fx1 ; : : : ; xm g being the unknown variables (fixed parameters) to be determined and fy1 ; : : : ; yn g the known variables comprising the observations or pseudo-observations. We write the polynomials as
15-3 Algebraic Methods for Solving Nonlinear Systems of Equations
9 f1 WD g.x1 ; : : : ; xm ; y1 ; : : : ; yn / D 0 > > > > f2 WD g.x1 ; : : : ; xm ; y1 ; : : : ; yn / D 0 > > > = : > : > > > > : > > ; fm WD g.x1 ; : : : ; xm ; y1 ; : : : ; yn / D 0
553
(15.53)
which is expressed in matrix form as f WD g.x; y/ D 0
(15.54)
where the unknown variables fx1 ; : : : ; xm g are placed in a vector x and the known variables fy1 ; : : : ; yn g are placed in the vector y. Next, we implement the error propagation from the observations (pseudo-observations) fy1 ; : : : ; yn g to the parameters fx1 ; : : : ; xm g that are to be explicitly determined namely characterized by the first moments, the expectation Efxg D x and Efyg D y , as well as the second moments, the variance-covariance matrices/dispersion matrices Dfxg D ˙ x and Dfyg D ˙ y . From E. Grafarend and B. Schaffrin (1993, pp. 470–471), we have up to nonlinear terms 0 1 0 Dfxg D J 1 (15.55) x J y ˙ y J y .J x / with J x ; J y being the partial derivatives of (15.54) with respect to x,y respectively at the Taylor points .x ; y /. The approximate values of unknown parameters fx1 ; : : : ; xm g 2 x appearing in the Jacobi matrices J x ; J y are obtained from Groebner bases or Multipolynomial resultants solution of the nonlinear system of equations (15.53). 1 Given J i D J 1 xi J yi from the i th combination and J j D J xj J yj from the j th combinatorials, the correlation between the i th and j th combination is given by ˙ ij D J j ˙ yj yi J 0i
(15.56)
The sub-matrices variance-covariance matrix for the individual combinatorials ˙ 1 ; ˙ 2 ; ˙ 3 ; : : : ; ˙ k (where k is the number of combinations ) obtained via (15.55) and the correlations between combinatorials obtained from (15.56) form the variance-covariance/dispersion matrix 2
˙ 1 ˙ 12 : 6˙ 6 21 ˙ 2 : 6 ˙3 6 : ˙ D6 6 : 6 4 : : ˙ k1 :
3 : : ˙ 1k : : ˙ 2k 7 7 7 7 7 7 : 7 5 : : ˙k
(15.57)
for the entire k combinations. The obtained dispersion matrix ˙ is then used in the linear Gauss–Markov model to obtain the estimates O of the unknown parameters
554
15 Algebraic Solutions of Systems of Equations
with the combinatorial solution points in a polyhedron considered as pseudoobservations in the vector y of observations while the design matrix A comprises of integer values 1 being the coefficients of the unknowns as in (15.60). In order to understand the adjustment process, we consider the following example. Example 15.14. Consider that one has four observations in three unknowns in a nonlinear problem. Let the observations be given by fy1 ; y2 ; y3 ; y4 g leading to four combinations giving the solutions zI .y1 ; y2 ; y3 /, zII .y2 ; y3 ; y4 /, zIII .y1 ; y3 ; y4 / and zI V .y1 ; y2 ; y4 /. If the solutions are placed in a vector zJ D Œ zI zII zIII zI V 0 ; the adjustment model is then defined as EfzJ g D I 123 31 ; DfzJ g from Variance=Covariance propagation: Let
3 zI 6 zII 7 121 7 n D LzJ subject to zJ WD 6 4 zIII 5 2 R zIV
(15.58)
2
(15.59)
such that the postulations trDf n g D min i.e. “best” and Ef n g D for all n 2 Rm i.e. “uniformly unbiased” holds. We then have from (15.57), (15.58) and (15.59) the result O D .I 0312 ˙ zJ I 123 /I 0312 ˙ 1 zJ zJ
(15.60)
O D argft rDf n g D tr L˙ y L0 D min j UUEg L O of the estimates O is obtained. The shift from The dispersion matrix Dfg arithmetic weighted mean to the use of linear Gauss Markov model is necessitated as we do not readily have the weights of the minimal combinatorial subsets but their dispersion which we obtain via error propagation/Variance-Covariance propagation. If we employ the equivalence theorem of E. Grafarend and B. Schaffrin (1993, pp. 339–341), an adjustment using linear Gauss markov model instead of weighted arithmetic mean is permissible. Example 15.15. (error propagation for planar ranging problem): Let us consider a simple case of the planar ranging problem. From an unknown point P .X; Y / 2 E2 , let distances S1 and S2 be measured to two known points P1 .X1 ; Y1 / 2 E2 and P2 .X2 ; Y2 / 2 E2 respectively. We have the distance equations expressed as " 2 S1 D .X1 X /2 C .Y1 Y /2 (15.61) S22 D .X2 X /2 C .Y2 Y /2
15-3 Algebraic Methods for Solving Nonlinear Systems of Equations
555
which we express in algebraic form (15.53) as 2 4
f1 WD .X1 X /2 C .Y1 Y /2 S12 D 0 f2 WD .X2 X /2 C .Y2 Y /2 S22 D 0
:
(15.62)
On taking total differential of (15.62) we have 2
df1 WD 2.X1 X /dX1 2.X1 X /dX C 2.Y1 Y /d Y1 6 2.Y1 Y /d Y 2S1 dS1 D 0 6 6 6 4 df2 WD 2.X2 X /dX2 2.X2 X /dX C 2.Y2 Y /d Y2 2.Y2 Y /d Y 2S2 dS2 D 0
(15.63)
which on arranging the differential vector of the unknown terms fX; Y g D fx1 ; x2 g 2 x on the left hand side and that of the known terms fX1 ; Y1 ; X2 ; Y2 ; S1 ; S2 g D fy1 ; y2 ; y3 ; y4 ; y5 ; y6 g 2 y on the right hand side leads to 2
dS1
3
6 7 6 dX1 7 6 7 " # 6 dY 7 dX 6 17 Jx D Jy 6 7 6 dS2 7 dY 6 7 6 7 4 dX2 5
(15.64)
d Y2 with 2
@f1 6 @X 6 Jx D 6 4 @f 2 @X 2 @f 1 6 @S1 6 Jy D 6 4 0 " D
3 @f1 # " 2.X1 X / 2.Y1 Y / @Y 7 7 7D 2.X2 X / 2.Y2 Y / @f2 5 @Y and @f1 @X1
@f1 @Y1
0
2S1 2.X1 X / 2.Y1 Y / 0
0
0
0
0
@f2 @S2
@f2 @X2
0
0
0
(15.65)
3
3
7 7 7D @f2 5 @Y2 0
7 7 7 7 7 7 7 7 7 #7 7 5
2S2 2.X2 X / 2.Y2 Y / (15.66)
556
15 Algebraic Solutions of Systems of Equations
If we consider that " Dfxg D ˙ x D
X2 X Y
# (15.67)
YX Y2
and 2
S21
6 6 6 X1 S 1 6 6 6 Y1 S1 Dfyg D ˙ y D 6 6 6 S2 S1 6 6 6 X2 S1 4 Y2 S1
S1 X1 S1 Y1 S1 X2 S1 S2 S1 Y2 X2 1 Y1 X1 S2 X1 X2 X1 Y2 X1
3
7 X1 Y1 X1 S2 X1 X2 X1 Y2 7 7 7 7 2 Y1 Y1 S2 Y1 X2 Y1 Y2 7 7 7 S2 Y1 S22 S2 X2 S2 Y2 7 7 7 2 X2 Y1 X2 S2 X2 X2 Y2 7 5 Y2 Y1 Y2 S2 Y2 X2 Y22
(15.68)
we obtain with (15.64), (15.65) and (15.66) the dispersion (15.55) of the unknown variables fX; Y g D fx1 ; x2 g 2 x. The unknown values of fX; Y g D fx1 ; x2 g 2 x appearing in the Jacobi matrices (15.65) and (15.66) are obtained from Groebner bases or Multipolynomial resultants solution of the nonlinear system of equations (15.61).
15-4 Examples In this section, we present three examples from J.L. Awange and E.W. Grafarend (2005) and J. L. Awange et al. (2010) based on geodetic problems of Minimum Distance Mapping, GPS pseudo-ranging four-point P4P, and the 7-parameter transformation. Example 15.16. (Minimum distance mapping) In order to relate a point P on the Earth’s topographic surface to a point on the international reference ellipsoid E2a;a;b , a bundle of half straight lines so called projection lines which depart from P and intersect E2a;a;b either not at all or in two points are used. There is one projection line which is at minimum distance relating P to p. Figure 15.1 is an illustration of such a minimum distance mapping. Let us formulate such an optimization problem by means of the Lagrangean £.x1 ; x2 ; x3 ; x4 / in Solution 15.1.
15-4 Examples
557
Fig. 15.1 Minimum distance mapping of a point P on the Earth’s topographic surface to a point p on the international reference ellipsoid E2a;a;b
Solution 15.1 (Constraint minimum distance mapping in terms of Cartesian coordinates). 1 1 kX xk2 C x4 b 2 .x12 C x22 / C ax32 a2 b 2 2 2 1 D f.X x1 /2 C .Y x2 /2 C .Z x3 /2 2 C x4 b 2 .x12 C x22 / C ax32 a2 b 2 g
£.x1 ; x2 ; x3 ; x4 / WD
D
x 2 X WD fx 2 R3 j
mi n .x1 ; x2 ; x3 ; x4 /
(15.69)
x12 C x22 x32 C D 1g DW E2a;a;b a2 b2
(15.70)
In the first case, the Euclidean distance between points P and p in terms of Cartesian coordinates of P .X; Y; Z/ and of p.x1 ; x2 ; x3 / is represented. The Cartesian coordinates .x1 ; x2 ; x3 / of the projection point P are unknown. The constraint that the point p is an element of the ellipsoid of revolution E2a;a;b WD fx 2 R3 jb 2 .x12 C x22 / C a2 x23 a2 b 2 D 0; RC 3 a > b 2 RC g is substituted into the Lagrangean by means of the Lagrange multiplier x4 , which is unknown too. f.x1^ ; x2^ ; x3^ ; x4^ / D argf£.x1 ; x2 ; x3 ; x4 / D mi ng is the argument of the minimum of the constrained Lagrangean £.x1 ; x2 ; x3 ; x4 /. The result of the minimization procedure is presented by Lemma 15.10. Equation (15.71) provides the necessary conditions to constitute an extremum: The normal equations are of bilinear type. Products of the unknowns for instance x1 x4 ; x2 x4 ; x3 x4 and
558
15 Algebraic Solutions of Systems of Equations
squares of the unknowns, for instance x12 ; x22 ; x32 appear. Finally the matrix of second derivatives H3 in (15.73) which is positive definite constitutes the sufficient condition to obtain a minimum. Fortunately the matrix of second derivatives H3 is diagonal. Using (15.72i–15.72iv), together with (15.75) leads to (15.76), which are the eigenvalues of the Hesse matrix H3 . These values are 1 D 2 D X nx1^ ; 3 D Znx3^ and must be positive. Lemma 15.10 (Constrained minimum distance mapping). The functional £.x1 ; x2 ; x3 ; x4 / is minimal, if the conditions (15.71) and (15.73) hold. @£ ..x1^ ; x2^ ; x3^ ; x4^ // D 0 @xi
8
i=1,2,3,4:
(15.71)
On taking partial derivatives with respect to xi , we have 2
.i /
@£ D .X x1^ / C b 2 x1^ x4^ D 0 @.x1^ /
6 6 6 6 6 @£ ^ 2 ^ ^ 6 .ii/ ^ D .Y x2 / C b x2 x4 D 0 6 @.x / 2 6 6 6 6 @£ ^ 2 ^ ^ 6 .iii/ 6 ^ D .Z x3 / C a x3 x4 D 0 @.x 6 3 / 6 6 4 1 2 ^2 @£ ^2 2 ^2 2 2 .iv/ ^ D Œb .x1 C x2 / C a x3 a b D 0 @.x4 / 2 @2 £ .x ^ ; x ^ ; x ^ ; x ^ // > 0 @xi @xj 1 2 3 4
8
i,j 2 f1,2,3g:
@2 £ .x^ / @xi @xj 2 3 1 C b 2 x4^ 0 5 D4 0 0 1 C b 2 x4^ 0 0 1 C a2 x4^
(15.72)
(15.73)
H3 WD
2
R33
(15.74)
“eigenvalues” jH3 I3 j D 0
”
(15.75)
15-4 Examples
559
2
X Y 2 ^ 6 1 D 2 WD 1 C b x4 D x ^ D x ^ 6 1 2 4 Z 3 WD 1 C a2 x4^ D ^ x3
(15.76)
Without the various forward and backward reduction steps, we could automatically generate an equivalent algorithm for solving the normal equations (15.72i)– (15.72iv) in a closed form by means of Groebner basis approach. Groebner basis computation of the normal equations (15.72i)–(15.72iv) leads to 14 elements presented in Solution 15.2 interpreted as follows: The first expression is a univariate polynomial of order four (quartic) in the Lagrange multiplier, i.e., 2 c4 x44 C c3 x43 C c2 x42 C c1 x4 C co D 0 6 6 6 c4 D a6 b 6 6 6 6 c3 D .2a6 b 4 C 2a4 b 6 / 6 6 6 c2 D .a6 b 2 C 4a4 b 4 C a2 b 6 a4 b 2 X 2 a4 b 2 Y 2 a2 b 4 Z 2 / 6 6 6 c1 D .2a4 b 2 C 2a2 b 4 2a2 b 2 X 2 2a2 b 2 Y 2 2a2 b 2 Z 2 / 4
(15.77)
co D .a2 b 2 b 2 X 2 b 2 Y 2 a2 Z 2 /: With the admissible values x4 substituted in the linear equations (4),(8),(12) of the computed Groebner basis, i.e., 2
.1 C a2 x4 /x3 Z 4 .1 C b 2 x4 /x2 Y .1 C b 2 x4 /x1 X;
(15.78)
the values .x1 ; x2 ; x3 / D .x; y; z/ are finally produced. Solution 15.2 (Groebner basis MDM solution; see Awange et al. 2010 for further details and examples). (1) 2
a2 b 2 x44 C .2a6 b 4 C 2a4 b 6 /x43 C .a6 b 2 C 4a4 b 4 C a2 b 6 a4 b 2 X 2 a4 b 2 Y 2 6 6 a2 b 4 Z 2 /x 2 C C.2a4 b 2 C 2a2 b 4 2a2 b 2 X 2 2a2 b 2 Y 2 2a2 b 2 Z 2 /x4 4 4 C.a2 b 2 b 2 X 2 b 2 Y 2 a2 Z 2 /: 2 (2)
.a4 Z 2a2 b 2 Z C b 4 Z/x3 a6 b 6 x43 .2a6 b 4 C a4 b 6 /x42 6 6 .a6 b 2 C 2a4 b 4 a4 b 2 X 2 a4 b 2 Y 2 a2 b 4 Z 2 /x4 4 a2 b 4 C a2 b 2 X 2 C a2 b 2 Y 2 C 2a2 b 2 Z 2 b 4 Z 2 :
560
15 Algebraic Solutions of Systems of Equations
(3)
2
.2b 2 Z C b 4 x4 Z a2 Z/x3 C a4 b 6 x43 C .2a4 b 4 C a2 b 6 /x42 6 6 C.a4 b 2 C 2a2 b 4 a2 b 2 X 2 a2 b 2 Y 2 b 4 Z 2 /x4 4 Ca2 b 2 b 2 X 2 b 2 Y 2 2b 2 Z 2 : .1 C a2 x4 /x3 Z
(4) "
(5)
.a4 2a2 b 2 C b 4 /x32 C .2a2 b 2 Z 2b 4 Z/x3 a4 b 6 x42 2a4 b 4 x4 a4 b 2 C a2 b 2 X 2 C a2 b 2 Y 2 C b 4 Z 2 /:
(6)
(7)
2
.2b 2 a2 C b 4 x4 /x32 a2 Zx3 C a4 b 6 x43 C .2a4 b 4 C 2a2 b 6 /x42
6 4 C C .a4 b 2 C 4a2 b 4 a2 b 2 X 2 a2 b 2 Y 2 b 4 Z 2 /x4
C2a2 b 2 2b 2 X 2bY 2 2b 2 Z 2 : " .X 2 C Y 2 /x2 C a2 b 4 Y x42 C Y .a2 b 2 b 2 x32 b 2 Zx3 /x4 CY x32 Y 3 Y Zx3 YX 2 :
(8)
.1 C b 2 x4 /x2 Y
(9)
a2 x3 b 2 x3 C b 2 Z/x2 a2 x3 Y
(10)
Y x1 Xx2
(11) Xx1 C a2 b 4 x42 C .a2 b 2 C b 2 x32 b 2 Zx3 /x4 C x32 Zx3 C Y x2 X 2 Y 2 : (12)
.1 C b 2 x4 /x1 X
(13)
.a2 x3 b 2 x3 C b 2 Z/x1 a2 Xx3
(14)
x12 C a2 b 4 x42 C .2a2 b 2 C b 2 x32 b 2 Zx3 /x4 C 2x32 2Zx3 C x22 X 2 Y 2 :
Example 15.17. (GNSS positioning) E. Grafarend and J. Shan (1996) have defined the GPS pseudo-ranging fourpoint problem (“pseudo 4P”) as the problem of determining the four unknowns comprising the three components of the receiver position and the stationary receiver range bias from four observed pseudo-ranges to four satellite transmitter of given geocentric position. Geometrically, the four unknowns are obtained from the
15-4 Examples
561
intersection of four spherical cones given by the pseudo-ranging equations. Several procedures have been put forward for obtaining closed form solution of the problem. Amongst the procedures include the vectorial approach evidenced in the works of S. Bancroft (1985), P. Singer et al. (1993), H. Lichtenegger (1995) and A. Kleusberg (1994,1999). E. Grafarend and J. Shan (1996) propose two approaches. One approach is based on the inversion of a 3 3 coefficient matrix of a linear system formed by differencing of the nonlinear pseudo-ranging equations in geocentric coordinates, while the other approach uses the coefficient matrix from the linear system to solve the same equations in barycentric coordinates. In this example we present both the approaches of Groebner bases and Multipolynomial resultants (B. Sturmfel 1998 approach) to solve the same problem. We demonstrate our algorithms by solving the GPS Pseudo-ranging four-point problem already solved by A. Kleusberg (1994) and E. Grafarend and J. Shan (1996). Both the Groebner basis and the Multipolynomial resultant solve the same linear equations as those of E. Grafarend and J. Shan (1996) and lead to identical results (see also J. L . Awange and E. Grafarend 2002). We start with the pseudo-ranging equations written algebraically as 2
.x1 a0 /2 C .x2 b0 /2 C .x3 c0 /2 .x4 d0 /2 6 6 .x1 a1 /2 C .x2 b1 /2 C .x3 c1 /2 .x4 d1 /2 6 6 6 .x a /2 C .x b /2 C .x c /2 .x d /2 2 2 2 3 2 4 2 6 1 6 6 .x a /2 C .x b /2 C .x c /2 .x d /2 6 1 3 2 3 3 3 4 3 6 6 where x1 ; x2 ; x3 ; x4 2 6 6 6 6 .a0 ; b0 ; c0 / D .x 0 ; y 0 ; z0 / P 0 6 6 6 .a1 ; b1 ; c1 / D .x 1 ; y 1 ; z1 / P 1 6 6 6 .a2 ; b2 ; c2 / D .x 2 ; y 2 ; z2 / P 2 4
D0 D0 D0 D0 (15.79)
.a3 ; b3 ; c3 / D .x 3 ; y 3 ; z3 / P 3 ˚ with P 0 ; P 1 ; P 2 ; P 3 being the position of the four GPS satellites, their ranges to the stationary receiver at P given by fd0 ; d1 ; d2 ; d3 g. The parameters .fa0 ; b0 ; c0 g ; fa1 ; b1 ; c1 g ; fa2 ; b2 ; c2 g ; fa3 ; b3 ; c3 g ; fd0 ; d1 ; d2 ; d3 g/ are elements of the spherical cone that intersect at P to give the coordinates fx1 ; x2 ; x3 g of the receiver and the stationary receiver range bias x4 . The equations above is expanded and arranged in the lexicographic order fx1 > x2 > x3 > x4 g and re-written with the linear terms on one side and the nonlinear terms on the other as (Awange and Grafarend 2005; Awange et al. 2010)
562
15 Algebraic Solutions of Systems of Equations
2
x12 C x22 C x32 x42 6 2 6 x C x2 C x2 x2 2 3 4 6 1 6 6 x2 C x2 C x2 x2 6 1 2 3 4 4 2 2 2 x1 C x2 C x3 x42
D 2a0 x1 C 2b0 x2 C 2c0 x3 2d0 x4 C d02 a02 b02 c02 D 2a1 x1 C 2b1 x2 C 2c1 x3 2d1 x4 C d12 a12 b12 c12 D 2a2 x1 C 2b2 x2 C 2c2 x3 2d2 x4 C d22 a22 b22 c22 D 2a3 x1 C 2b3 x2 C 2c3 x3 2d3 x4 C d32 a32 b32 c32 : (15.80)
On subtracting (15.80 iv) from (15.80 i), (15.80 ii), and (15.80 iii), to three equations which are linear with four unknowns are derived (see, e.g., Awange et al. 2010, p. 177). Treating the unknown variable x4 as a constant, both Groebner bases and the Multipolynomial resultant techniques are applied to solve the linear system of equation for x1 D g.x4 /; x2 D g.x4 /; x3 D g.x4 /, where g.x4 / is a univariate polynomial (see Awange and Grafarend 2005 or Awange et al 2010, pp. 177–180 for further details). Example 15.18. (7-parameter datum transformation) Consider a case where coordinates have been given in two systems, A and B. For clarity purposes, let us assume the two coordinate systems to be e.g., photo image coordinates in system A and ground coordinates in system B. The ground coordinates fXi ; Yi ; Zi ji; : : : ; ng of the objects are obtained from, say, GPS measurements. Given the photo coordinates fai D xi ; bi D yi ; ci D f ji; : : : ; ng and their equivalent ground coordinates fXi ; Yi ; Zi ji; : : : ; ng, the 7-parameter datum transformation problem concerns itself with determining; (1) The scale parameter x1 2 R (2) Three translation parameters x2 2 R3 (3) The rotation matrix X3 2 R33 comprising three rotation elements. Once the listed unknowns have been determined, coordinates can subsequently be transformed from one system onto another. The nonlinear equations relating these unknowns and coordinates from both systems are given by 2
3 2 3 ai Xi 4 bi 5 D x1 X3 4 Yi 5 C x2 j i D 1; 2; 3; : : : ; n; ci Zi
(15.81)
subject to X03 X3 D I3 : (15.82) In (15.81), fai ; bi ; ci g and fXi ; Yi ; Zi g are the coordinates of the same points, e.g., in both photo and ground coordinate systems respectively. The determination of the unknowns x1 2 R, x2 2 R3 , X3 2 R33 require a minimum of three points in both
15-5 Notes
563
systems whose coordinates are known. Owing to the nonlinearity of (15.81), the solutions have always been obtained using a least squares approach iteratively. With this approach, (15.81) is first linearized and some initial approximate starting values of the unknown parameters used. The procedure then iterates, each time improving on the solutions of the preceding iteration step. This is done until a convergence criteria is achieved. Where the rotation angles are small e.g., in photogrammetry, the starting values of zeros are normally used. In other fields such as geodesy, the rotation angles are unfortunately not small enough to be initialized by zeros, thereby making the solution process somewhat difficult and cumbersome. Bad choices of initial starting values often lead to many iterations being required before the convergence criteria is achieved. In some cases, where the initial starting values are far from those of the unknown parameters, iteration processes may fail to converge. With these uncertainties in the initial starting values, the cumbersomeness of the linearization and iterations, procedures that would offer an exact solution to the 7-parameter datum transformation problem would be desirable. To answer this challenge, we propose algebraic approaches whose advantages over the approximate numerical methods have already been mentioned. Apart from the computational difficulties associated with numerical procedures, the 7-parameter datum transformation problem poses another challenge to existing algorithms. This is, the incorporation of the variance-covariance (weight) matrices of the two systems involved. In practice, users have been forced to rely on iterative procedures and linearized least squares solutions which are incapable of incorporating the variance-covariance matrices of both systems in play. This challenge is addressed algebraically in J.L. Awange and E.W. Grafarend (2005) or J.L. Awange et al. (2010, Chap. 17).
15-5 Notes The current known techniques for solving nonlinear polynomial equations can be classified into symbolic, numeric and geometric methods (D. Manocha 1994c). Symbolic methods, discussed in this chapter for solving closed form geodetic problems, apply the Groebner bases and the Multipolynomial resultants techniques to eliminate several variables in a multivariate system of equations in such a manner that the end product often consist of univariate polynomials whose roots can be determined by existing programs, however, such as the roots command in MATLAB. The current available programs however are efficient only for sets of low degree polynomial systems consisting of upto three to four polynomials due to the fact that computing the roots of the univariate polynomials can be ill conditioned for polynomials of degree greater than 14 or 15 (D. Manocha 1994c). Elaborate literature on Groebner bases can be found in the works of B. Buchberger (1965, 1970), J. H. Davenport et al. (1988, pp. 95–103), F. Winkler (1996), D. Cox et al. (1997, pp.47-99), H. M. Mller (1998), W. V. Vasconcelos
564
15 Algebraic Solutions of Systems of Equations
(1998), T. Becker and V. Weispfenning (1993,1998), B. Sturmfels (1996), G. Pistone and H. P. Wynn (1996), D. A. Cox (1998) and D. Cox et al.(1998, pp. 1–50), while literature on Multipolynomial resultants procedure include the works of G. Salmon (1876), F. Macaulay (1902, 1921), A. L. Dixon (1908), B. L. van Waerden (1950), C. Bajaj et al. (1988), J. F. Canny (1988), J. F. Canny et al. (1989), I. M. Gelfand et al. (1990), J. Weiss (1993), D. Manocha (1992, 1993, 1994a,b,c, 1998), D. Manocha and J. F. Canny (1991, 1992, 1993), I. M. Gelfand et al. (1994), G. Lyubeznik (1995), S. Krishna and D. Manocha (1995), J. Guckenheimer et al.(1997), B. Sturmfels (1994, 1998), E. Cattani et al. (1998) and D. Cox et al. (1998, pp.71–122). Besides the Groebner bases and resultant techniques, there exists another approach for variable elimination developed by WU Wen Ts¨un (W. T. Wu 1984) using the ideas proposed by J. F. Ritt (1950). This approach is based on Ritts characteristic set construction and was successfully applied to automated geometric theorem by Wu. This algorithm is referred by X. S. Gao and S. C. Chou (1990) as the Ritt-Wu’s algorithm (D. Manocha and F. Canny 1993). C. L. Cheng and J. W. Van Ness (1999) have presented polynomial measurement error models. Numeric methods for solving polynomials can be grouped into iterative and homotopy methods. For homotopy we refer to A. P. Morgan (1992). Also in this category are geometric methods which have found application in curve and surface intersection whose convergence are however said to be slow (D. Manocha 1994c). In general, for low degree curve intersection, the algebraic methods have been found to be the fastest in practice. In Sect. 15-312 and 15-313 we present in a nut shell the theories of Groebner bases and Multipolynomial resultants. The problem of nonlinear adjustment in Geodesy as in other fields continues to attract more attention from the modern day researchers as evidenced in the works of R. Mautz (2001) and L. Guolin (2000) who presents a procedure that tests using the F-distribution whether a nonlinear model can be linearized or not. The solution to the minimization problem of the nonlinear Gauss–Markov model unlike its linear counter part does not have a direct method for solving it and as such, always relies on the iterative procedures such as the Steepest-descent method, Newton’s method and the Gauss-Newton’s method discussed by P. Teunissen (1990). In particular, P. Teunissen (1990) recommends the Gauss-Newton’s method as it exploits the structure of the objective function (sum of squares) that is to be minimized. P. Teunissen and E. H. Knickmeyer (1988) considers in a statistical way how the nonlinearity of a function manifests itself during the various stages of adjustment. E. Grafarend and B. Schaffrin (1989, 1991) while extending the work of T. Krarup (1982) on nonlinear adjustment with respect to geometric interpretation have presented the necessary and sufficient conditions for least squares adjustment of nonlinear Gauss–Markov model and provided the geometrical interpretation of these conditions. Other geometrical approaches include the works of G. Blaha and R. P. Besette (1989) and K. Borre and S. Lauritzen (1983) while non geometrically treatment of nonlinear problems have been presented by K. K. Kubik (1967), T. Saito (1973), H. J. Schek and P. Maier (1976), A. Pope (1982) and H. G. B¨ahr (1988). A comprehensive review to the iterative procedures for solving the nonlinear
15-5 Notes
565
equations is presented in the work of P. Lohse (1994). M. Gullikson and I. Sderkvist (1995) have developed algorithms for fitting surfaces which have been explicitly or implicitly defined to some measured points with negative weights being acceptable by the algorithm. Our approach in the present chapter goes back to the work of C. F. Gauss (Published posthumously e.g. Awange et al. 2010) and C. G. I. Jacobi (1841). Within the framework of arithmetic mean, C. F. Gauss and C. G. I. Jacobi (1841) suggest that given n linear(ized) observation equations with m unknowns .n > m/; combinations, each consisting of m equations be solved for the unknown elements and the weighted arithmetic mean be applied to get the final solution. Whereas C.F. Gauss proposes weighting by using the products of the square of the measured distances from the unknown point to known points and the distances of the side of the error triangles, C. G. I. Jacobi (1841) proposed the use of the square of the determinants as weights. In tracing the method of least squares to the arithmetic mean, A. T. Hornoch (1950) shows that the weighted arithmetic mean proposed by C. G. I. Jacobi (1841) leads to the least squares solution only if the weights used are the actual weights and the pseudo-observations formed by the combinatorial pairs are uncorrelated. Using a numerical example, S. Wellisch (1910, pp. 41–49) has shown the results of least squares solution to be identical to those of the Gauss– Jacobi combinatorial algorithm once proper weighting is applied. P. Werkmeister (1920) illustrated that for planar resection case with three directional observations from the unknown point to three known stations, the area of the triangle (error figure) formed by the resulting three combinatorial coordinates of the new point is proportional to the determinant of the dispersion matrix of the coordinates of the new station. In this chapter, these concepts of the combinatorial linear adjustment were extended to nonlinear adjustment. Awange and Grafarend (2005) illustrated using a leveling network and planar ranging problem that the results of Gauss–Jacobi combinatorial algorithm are identical to those of linear Gauss–Markov model if the actual variance-covariance matrices are used. To test the algebraic computational tools of Groebner bases and Multipolynomial resultants, geodetic problems of Minimum Distance Mapping and GNSS pseudo-ranging four-point P4P, and 7-parameter transformation were solved. The procedures are, however, not limited only to these three examples but have been shown, e.g., in J.L . Awange and E.W. Grafarend (2005) and J. L . Awange et al. (2010) to be applicable to many problems in geodesy and geoinformatics. An example is the classical problem of resection. In general, the search towards the solution of the three-dimensional resection problem traces its origin to the work of a German mathematician J. A. Grunert (1841) whose publication appeared in the year 1841. J. A. Grunert (1841) solved the three-dimensional resection problem – what was then known as the “Pothenot’s” problem – in a closed form by solving an algebraic equation of degree four. The problem had hitherto been solved by iterative means mainly in Photogrammetry and Computer Vision. Procedures developed later for solving the three-dimensional resection problem revolved around the improvements of the approach of J. A. Grunert (1841) with the aim of searching
566
15 Algebraic Solutions of Systems of Equations
for the optimal means of distances determination. Whereas J. A. Grunert (1841) solves the problem by substitution approach in three steps, the more recent desire has been to solve the distance equations in lesser steps as exemplified in the works of S. Finsterwalder and W. Scheufele (1937), E. L . Merritt (1949), M. A. Fischler and R. C. Bolles (1981), S. Linnainmaa et al. (1988) and E. Grafarend, P. Lohse and B. Schaffrin (1989). Other research done on the subject of resection include the works of F. J. Mller (1925), E. Grafarend and J. Kunz (1965), R. Horaud et al. (1989), P. Lohse (1990), and E. Grafarend and J. Shan (1997a, 1997b). An extensive review of some of the procedures above are presented by F. J. M¨uller (1925) and R. M. Haralick et al. (1991, 1994). R. M. Haralick et al. (1994) reviewed the performance of six direct solutions (J. A. Grunert 1841, S. Finsterwalder and W. Scheufele 1937, E. L. Merritt 1949, M. A. Fischler and R. C. Bolles 1981, Linnainmaa et al. 1988, and E. Grafarend, P. Lohse and B. Schaffrin 1989) with the aim of evaluating the numerical stability of the solutions and the effect of permutation within the various solutions. All the six approaches follow the outline adopted by J. A. Grunert (1841) with the major differences being the change in variables, the reduction of these variables, and the combination of different pairs of equations. The study revealed that the higher the order of the polynomials, the more complex the computations became and thus the less accurate the solutions were due numerical instabilities. Consequently, S. Finsterwalder’s (SF) procedure which solves a third order polynomial is ranked first, J. A Grunert (JG), Fischler and Bolles (FB), and Grafarend et al. (GLS) solutions are ranked second, Linnainmaa et al. solution which generates an eighth order polynomial is ranked third. Though it does not solve the eight order polynomial, the complexity of the polynomial is still found to be higher than those of the other procedures. An amazing result is that of Merritt’s procedure which is ranked last despite the fact that it is a fourth order polynomial and is similar to Grunert’s approach except for the pairs of equations used. R. M. Haralick et al. (1994) attributes the poor performance of Merritt’s procedure to the conversion procedure adopted by E. L Merritt (1949) in reducing the equations from forth to third order. For planar resection problem, solutions have been proposed e.g. by D. Werner (1913), G. Brandsttter (1974) and J. van Mierlo (1988). Overdetermined planar resection problem has been treated graphically by E. Hammer (1896), C. Runge (1900), P. Werkmeister (1916) and P. Werkmeister (1920). E. Gotthardt (1940,1974) dealt with the overdetermined two-dimensional resection where more than four points were considered with the aim of studying the critical configuration that would yield a solution. The work was later to be extended by K. Killian (1990). A special case of an overdetermined two-dimensional resection has also been considered by H. G. B¨ahr (1991) who uses six known stations and proposes the measuring of three horizontal angles which are related to the two unknown coordinates by nonlinear equations. By adopting approximate coordinates of the unknown point, an iterative adjustment procedure is performed to get the improved two-dimensional coordinates of the unknown point. It should be noted that the procedure is based on the coordinate system of the six known stations. K. Rinner (1962) has also contributed to the problem of overdetermined twodimensional resection.
15-5 Notes
567
In order to relate a point on the Earth’s topographical surface uniquely (oneto-one) to a point on the International Reference Ellipsoid, E. Grafarend and P. Lohse (1991) have proposed the use of the Minimum Distance Mapping (i.e., Example 15.16). Other procedures that have been proposed are either iterative, approximate “closed”, closed or higher order equations. Iterative procedures include the works of N. Bartelme and P. Meissl. (1975), W. Benning (1987), K. M. Borkowski (1987, 1989), N. Croceto (1993), A. Fitzgibbon et al. (1999), T. Fukushima (1999), W. Gander et al. (1994), B. Heck (1987), W. Heiskannen and H. Moritz (1976), R. Hirvonen and H. Moritz (1963), P. Loskowski (1991), K. C. Lin and J. Wang (1995), M. K. Paul (1973), L. E. Sjoeberg (1999), T. Soler and L. D. Hothem (1989), W. Torge (1976), T. Vincenty (1978) and R. J. You (2000). Approximate “closed” procedures include B. R. Bowring (1976, 1985), A. Fotiou (1998), B. Hofman-Wellenhof et al. (1992), M. Pick (1985) and T. Vincenty (1980), while closed procedures include W. Benning (1987), H. Frhlich and H. H. Hansen (1976), E. Grafarend et al. (1995) and M. Heikkinen (1982). Procedures that required the solution of higher order equations include M. Lapaine (1990), M. J. Ozone (1985), P. Penev (1978), H. Snkel (1976), and P. Vaniceck and E. Krakiwski (1982). In this chapter, the Minimum Distance Mapping problem is solved using the Groebner bases approach. A univariate polynomial of fourth order is obtained together with 13 other elements of the Groebner basis. The obtained univariate polynomial and the linear terms are compared to those of E. Grafarend and P. Lohse (1991). Other reference include E. Grafarend and W. Keller (1995) and Mukherjee, K. (1996) The GNSS four-point pseudo-ranging problem presented in Example 15.17 is concerned with the determination of the coordinates of a stationary receiver together with its range bias. Several closed form procedures have been put forward for obtaining closed form solution of the problem. Amongst the procedures include the vectorial approach as evidenced in the works of L. O. Krause (1987), J. S. Abel and J. W. Chaffee (1991), P. Singer et al. (1993), J. W. Chaffee and J. S. Abel (1994), H. Lichtenegger (1995) and A. Kleusberg (1994, 1999). E. Grafarend and J. Shan (1996) propose two approaches; one approach is based on a closed form solution of the nonlinear pseudo-ranging equations in geocentric coordinates while the other approach solves the same equations in barycentric coordinates. S. C. Han et al. (2001) have developed an algorithm for very accurate absolute positioning through Global Positioning System (GPS) satellite clock estimation while S. Bancroft (1985) provides an algebraic closed form solution of the overdetermined GPS pseudo-ranging observations. In this chapter we solved using Groebner basis and Multipolynomial resultants GPS four-point pseudo-ranging problem. For literature on three-dimensional positioning models, we refer to E. W. Grafarend (1975), V. Ashkenazi and S, Grist (1982), G. W. Hein (1982a, 1982b), J. Zaiser (1986), F. W. O. Aduol (1989), U. Klein (1997) and S. M. Musyoka 1999). In J.L. Awange and E.W. Grafarend (2005) and J. L. Awange et al. (2010), two case studies; the conversion of the geocentric GPS points for the Baltic Sea level Project into the Gauss–Jacobi ellipsoidal coordinates and the determination of the seven datum transformation parameters are presented. Datum transformation
568
15 Algebraic Solutions of Systems of Equations
models have been dealt with e.g. in E. Grafarend, F. Okeke (1998), F. Krumm and F. Okeke (1995), E. Grafarend and F. Okeke (1998) and E. Grafarend and R. Syffus (1998). In summary, this chapter has demonstrated the power of the algebraic computational tools of Groebner bases and the Multipolynomial resultants in solving selected geodetic problems. In the case of the minimal combinatorial set, we have demonstrated how the problems of minimum distance mapping (Example 15.16) and the pseudo-ranging four-point problem (Example 15.17) could be solved in a closed form using either Groebner bases approach or the Multipolynomial resultants approach. We have succeeded in demonstrating that by converting the nonlinear observation equations of the selected geodetic problems above into algebraic (polynomials), the multivariate system of polynomial equations relating the unknown variables (indeterminate) to the known variables can be reduced into polynomial equations consisting of a univariate polynomial once lexicographic monomial ordering is specified. We have therefore managed to provide symbolic solutions to the problems of 7-parameter transformation, minimum distance mapping and the GPS pseudo-ranging four-point P4P by obtaining in each case a univariate polynomial that can readily be solved numerically once the observations are available. Although the algebraic techniques of Groebner bases and Multipolynomial resultants were applied to these selected geodetic problems, the tools can be used to solve explicitly any closed form problem in Geodesy. The only limitation may be the storage and computational speed of the computers when compounded with problems involving many variables. The ability of the algebraic tools (Groebner bases and the Multipolynomial resultants) to solve closed form solutions gave the Gauss–Jacobi combinatorial algorithm the required operational engine as evidenced in the solution of the overdetermined GPS pseudo-ranging problem in J.L . Awange and E.W. Grafarend (2005) and J. L . Awange et al. (2010) where the solutions are obtained without reverting to iterative or linearization procedures. The results of J.L. Awange and E.W. Grafarend (2005) and J. L . Awange et al. (2010) compared well with the solutions obtained using the linearized Gauss–Markov model giving legitimacy to the Gauss–Jacobi combinatorial procedure. For the case studies (see e.g., J.L. Awange and E.W. Grafarend (2005) and J. L . Awange et al. (2010)), the Groebner bases algorithm successfully determines explicitly the 7-parameters of the datum transformation problem and shows the scale parameter to be represented by a univariate polynomial of fourth order while the rotation parameters are represented by linear functions comprising the coordinates of the two systems and the scale parameter. The admissible value of the scale is readily obtained by solving the univariate polynomial and restricting the scale to a positive real value x1 2 R:C This eliminates the negative components of the roots and the complex values. The admissible value x1 2 RC of the scale parameter is chosen and substituted in the linear functions that characterize the three elements of the skew-symmetric matrix S leading to the solution of the elements of the rotation matrix X 3 . The translation elements are then deduced from the transformation
15-5 Notes
569
equation. The advantage of using the Groebner bases algorithm is the fact that there exists no requirement for prior knowledge of the approximate values of the 7-transformation parameters as is usually the case. The Groebner bases algorithm managed to solve the Minimum Distance Mapping problem and in so doing, enabling the mapping of points from the topographical surface to the International Reference Ellipsoid of Revolution. The univariate polynomial obtained was identical to that obtained by Grafarend and Lohse (1991). This implies that the algebraic tools of Groebner bases and the Multipolynomial resultants can also be used to check the validity of existing closed form procedures in Geodesy. The Gauss–Jacobi combinatorial algorithm highlighted one important fact while solving the overdetermined 7 parameter transformation problem; that the stochasticity of both systems involved can be taken into account. This has been the bottleneck of the problem of 7-datum transformation parameters.
Appendix A
Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
The key word “algebra” is derived from the name and the work of the nineth-century scientist Mohammed ibn Mˆusˆa al-Khowˆarizmi who was born in what is now Uzbekistan and worked in Bagdad at the court of Harun al-Rashid’s son. “al-jabr” appears in the title of his book Kitab al-jabr wail muqabala where he discusses symbolic methods for the solution of equations, K. Vogel: Mohammed Ibn Musa Alchwarizmi’s Algorismus; Das forested Lehrbuch zum Rechnen mit indischen Ziffern, 1461, Otto Zeller Verlagsbuchhandlung, Aalen 1963). Accordingly what is an algebra and how is it tied to the notion of a vector space, a tensor space, respectively? By an algebra we mean a set S of elements and a finite set M of operations. Each operation .opera/k is a single-valued function assigning to every finite ordered sequence .x1 ; : : : ; xn / of n D n.k/ elements of S a value .opera/k .x1 ; : : : ; xk / D xl in S. In particular for .opera/k .x1 ; x2 / the operation is called binary, for .opera/k .x1 ; x2 ; x3 / ternary, in general for .opera/k .x1 ; : : : ; xn / n-ary. For a given set of operation symbols .opera/1 , .opera/2 ; : : :, .opera/k we define a word. In linear algebra the set M has basically two elements, namely two internal relations .opera/1 worded “addition” (including inverse addition: subtraction) and .opera/2 worded “multiplication” (including inverse multiplication: division). Here the elements of the set S are vectors over the field R of real numbers as long as we refer to linear algebra. In contrast, in multilinear algebra the elements of the set S are tensors over the field of real numbers R. Only later modules as generalizations of vectors of linear algebra are introduced in which the “scalars” are allowed to be from an arbitrary ring rather than the field R of real numbers. Let us assume that you as a potential reader are in some way familiar with the elementary notion of a threedimensional vector space X with elements called vectors x 2 R3 , namely the intuitive space “ we locally live in”. Such an elementary vector space X is equipped with a metric to be referred to as threedimensional Euclidean. E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2, © Springer-Verlag Berlin Heidelberg 2012
571
572
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
As a real threedimensional vector space we are going to give it a linear and multilinear algebraic structure. In the context of structure mathematics based upon (a) (b) (c)
Order structure Topological structure Algebraic structure
an algebra is constituted if at least two relations are established, namely one internal and one external. We start with multilinear algebra, in particular with the multilinearity of the tensor product before we go back to linear algebra, in particular to Clifford algebra.
p
A-1 Multilinear Functions and the Tensor Space Tq
Let X be a finite dimensional linear space, e.g. a vector space over the field R of real numbers, in addition denote by X its dual space such that n D dim X D dim X . Complex, quaternion and octonian numbers C, H and O as well as rings will only be introduced later in the context. For p; q 2 ZC being an element of positive integer numbers we introduce Tpq .X; X / as the p-contravariant, q-covariant tensor space or space of multilinear functions f W X : : : X X : : : X ! Rp dim X
q dim X
:
If we assume x 1 ; : : : ; x p 2 X and x 1 ; : : : ; x q 2 X, then x 1 ˝ : : : ˝ x p ˝ x 1 ˝ : : : ˝ x q 2 Tpq .X ; X/ holds. Multilinearity is understood as linearity in each variable. “˝” identifies the tensor product, the Cartesian product of elements .x 1 ; : : : ; x p ; x 1 ; : : : ; x q /. '
$ Example A.1: Bilinearity of the tensor product x 1 ˝ x 2 For every x 1 ; x 2 2 X, x; y 2 X and r 2 R bilinearity implies
.x C y/ ˝ x 2 x 1 ˝ .x C y/ rx ˝ x 2 x 1 ˝ ry &
D x ˝ x2 C y ˝ x2 D x 1 ˝ x C x1 ˝ y D r.x ˝ x 2 / D r.x 1 ˝ y/
.internal .internal .external .external
left-linearity/ right-linearity/ left-linearity/ right-linearity/:
| %
p
A-1 Multilinear Functions and the Tensor Space Tq
573
The generalization of bilinearity of x 1 ˝ x 2 2 T02 to multilinearity of x 1 ˝ : : : ˝ x p ˝ x 1 ˝ : : : ˝ x q 2 Tpq is obvious. Definition A.1. (multilinearity of tensor space Tpq ): For every x 1 ; : : : ; x p 2 X and x 1 ; : : : ; x q 2 X as well as u; v 2 X , x; y 2 X and r 2 R multilinearity implies .u C v/ ˝ x 2 ˝ : : : ˝ x p ˝ x 1 ˝ : : : ˝ x q D u ˝ x2 ˝ : : : ˝ xp ˝ x1 ˝ : : : ˝ xq C v ˝ x2 ˝ : : : ˝ xp ˝ x 1 ˝ : : : ˝ xq
.internal left-linearity/
x 1 ˝ : : : ˝ x p ˝ .x C y/ ˝ x 2 ˝ : : : x q D x1 ˝ : : : ˝ xp ˝ x ˝ x2 ˝ : : : ˝ xq C x1 ˝ : : : ˝ xp ˝ y ˝ x2 ˝ : : : ˝ xq
.internal right-linearity/
ru ˝ x 2 ˝ : : : ˝ x p ˝ x 1 ˝ : : : ˝ x q D r.u ˝ x 2 ˝ : : : ˝ x p ˝ x 1 ˝ : : : ˝ x q /
.external left-linearity/
x 1 ˝ : : : ˝ x p ˝ rx ˝ x 2 ˝ : : : ˝ x q D r.x 1 ˝ : : : ˝ x p ˝ x ˝ x 2 ˝ : : : ˝ x q /
.external right-linearity/:
A possible way to visualize the different multilinear functions which span fT00 ; T10 ; T01 ; T20 ; T11 ; T02 ; : : : ; Tpq g is to construct a hierarchical diagram or a special tree as follows.
{0}∈
x1 ∈
x1⊗ x2∈
x1 ⊗ x2 ⊗ x3∈
3; 0
2 0
0 0S
x1 ∈
1 0
x1 ⊗ x1 ∈
x1 ⊗ x2 ⊗ x1 ∈
2; 1
1 1
0 1
x1 ⊗ x2 ∈
x1 ⊗ x2 ⊗ x1∈
2 1;
0 2
x1 ⊗ x2 ⊗ x3 ∈
0 3
574
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
When we learnt first about the tensor product symbolised by “˝” as well as its multilinearity we were left with the problem of developing an intuitive understanding of x1 ˝ x2 , x1 ˝ x1 and higher order tensor products. Perhaps it is helpful to represent ”the involved vectors” in a contravariant or in a covariant basis. For instances, x1 D e1 x1 C e2 x2 C e3 x3 or x1 D e1 x 1 C e2 x 2 C e3 x 3 is a left representation of a three-dimensional vector in a 3-left basis fe1 ; e2 ; e3 gl of contravariant type or in a 3-left basis fe1 ; e2 ; e3 gl of covariant type. Think in terms of x1 or x1 as a three-dimensional position vector with right coordinates fx1 ; x2 ; x3 g or fx 1 ; x 2 ; x 3 g , respectively. Since the intuitive algebras of vectors is commutative we may also represent the three-dimensional vector in a in a 3-right basis fe1 ; e2 ; e3 gr of contravariant type or in a 3-right basis fe1 ; e2 ; e3 gr of covariant type such that x1 D x1 e1 C x2 e2 C x3 e3 or x1 D x 1 e1 C x 2 e2 C x 3 e3 is a right representation of a three-dimensional position vector which coincides with its left representation thanks to commutativity. Further on, the tensor product x ˝ y enjoys the left and right representations
3 3 X X ei ˝ ej xi yj e1 x1 C e2 x2 C e3 x3 ˝ e1 y1 C e2 y2 C e3 y3 D i D1 j D1
and
3 3 X X xi yj ei ˝ ej x1 e1 C x2 e2 C x3 e3 ˝ y1 e1 C y2 e2 C y3 e3 D i D1 j D1
which coincides again since we assumed a commutative algebra of vectors. The product of coordinates .xi yj /; i; j 2 f1; 2; 3g is often called the dyadic product. Please do not miss the alternative covariant representation of the tensor product x ˝ y which we introduced so far in the contravariant basis, namely 3 3 X 1 1 X 2 3 2 3 e1 x C e2 x C e3 x ˝ e1 y C e2 y C e3 y D ei ˝ ej x i y j i D1 j D1
of left type and 3 3 X 1 1 X 2 3 2 3 x e1 C x e2 C x e3 ˝ y e1 C y e2 C y e3 D x i y j ei ˝ ej i D1 j D1
of right type. In a similar way we produce x˝y˝zD
3;3;3 X i;j;kD1
of contravariant type and
i
j
k
e ˝ e ˝ e xi yj zk D
3;3;3 X i;j;kD1
xi yj zk ei ˝ ej ˝ ek
p
A-1 Multilinear Functions and the Tensor Space Tq 3;3;3 X
x˝y˝zD
575 3;3;3 X
ei ˝ ej ˝ ek x i y j zk D
i;j;kD1
x i y j zk ei ˝ ej ˝ ek
i;j;kD1
of covariant type. “Mixed covariant-contravariant” representations of the tensor product x1 ˝ y1 are x1 ˝ y1 D
3 3 X X
3 3 X X
ei ˝ ej x i y j D
i D1 j D1
or x1 ˝ y1 D
3 3 X X
x i y j ei ˝ ej
i D1 j D1
ei ˝ ej x i y j D
3 3 X X
i D1 j D1
x i y j ei ˝ ej :
i D1 j D1
In addition, we have to explain the notion x1 2 X ; x1 2 X: While the vector x1 is an element of the vector space X , x1 is an element of its dual space. What is a dual space? Indeed the dual space X is the space of linear functions over the elements of X. For instance, if the vector space X is equipped with inner product, namely hei je j i D gij ; i; j 2 f1; 2; 3g, with respect to the base vectors fe1 ; e2 ; e3 g which span the vector space X, then ej D
3 X
ei g ij
i D1
transforms the covariant base vectors fe1 ; e2 ; e3 g into the contravariant base vectors fe1 ; e2 ; e3 g D span X by means of Œg ij D G 1 , the inverse of the matrix Œgij D G 2 R33 . Similarly the coordinates gij of the metric tensor g are used for “raising” or “lowering” the indices of the coordinates x i ; xj , respectively, for instance xi D
3 X
g ij xj ; xi D
j D1
3 X
gij x j :
j D1
In a finite dimensional vector space, the power of a linear space X and its dual X? does not show up. In contrast, in an infinite dimensional vector space X the dual space X? is the space of linear functionals which play an important role in functional analysis. While through the tensor product “˝” which operated on vectors, e.g. x ˝ y, we constructed the p-contravariant, q-covariant tensor space or p space of multilinear functions Tq .X; X / , e.g. T20 ; T02 ; T11 , we shall generalise the p representation of the elements of Tq by means of f D
nDdim XX
i1 ;:::;ip D1
p
ei1 ˝ : : : ˝ eip fi1 :::ip 2 T0
576
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
f D
nDdim XX
ei1 ˝ : : : ˝ eiq f i1 :::iq 2 T0q
i1 ;:::;iq D1
f D
nDdim XX nDdim XX
i ;:::;i
i1 ;:::;ip D1 j1 ;:::;jq D1
ei1 ˝ : : : ˝ eip ˝ ej1 ˝ : : : ˝ ejq fj11;:::;jpq 2 Tpq
for instance 3 X
f D
3 X
ei ˝ ej fij D
i;j D1 3 X
f D
i;j D1 3 X
ei ˝ ej f ij D
i;j D1 3 X
f D
fij ei ˝ ej 2 T20 f ij ei ˝ ej 2 T02
i;j D1 3 X
ei ˝ ej fji D
i;j D1
fji ei ˝ ej 2 T11
i;j D1 i ;:::;i
We have to emphasize that the tensor coordinates fi1 ;:::;ip ; f i1 ;:::;iq , fj11;:::;jpp are no longer of dyadic or product type. For instance, for (2,0)-tensor : bilinear functions: f D
n X
n X
ei ˝ ej fij D
i;j D1
fij ei ˝ ej 2 T20
i;j D1
fij ¤ fi fj (2,1)-tensor : trilinear functions: n X
f D
ei ˝ ej ˝ ek fijk D
n X
fijk ei ˝ ej ˝ ek 2 T21
i;j D1
i;j;kD1
fijk ¤ fi fj f k (3,1)-tensor : ternary functions: f D
n X
i
j
k
e ˝e ˝e ˝
el fijl k
D
n X
fijl k ei ˝ ej ˝ ek ˝ el 2 T31
i;j D1
i;j;k;lD1
fijl k
¤ fi fj fk f l
p
A-1 Multilinear Functions and the Tensor Space Tq
577
holds. Table A.1 is a list of .p; q/-tensors as they appear in various sciences. Of special importance is the decomposition of multilinear functions as elements p of the space Tq into their symmetric, antisymetric and residual constituents we are going to outline. p
Table A.1 Various examples of tensor spaces Tq (p & q W rank of tensor) (2,0) tensor, tensor space T20 Metric tensor Differential Gauss curvature tensor geometry Ricci curvature tensor Gravity gradient tensor Faraday tensor, Maxwell tensor Tensor of dielectric constant Tensor of permeability
Gravitation Electromagnetic
Strain tensor, stress tensor
Continuum mechanics
Energy momentum tensor
Mechanics Electromagnetism General relativity
Second order multipole tensor
Gravitostatics Magnetostatics Electrostatics
Variance-covariance matrix
Mathematical statistics
p
Table A.2 Various examples of tensor spaces Tq (p C q W rank of tensor) (2,1) tensor, tensor space T21 Cartaan torsion tensor Differential Deometry Third order multipole tensor
Gravitostatics Magnetostatics Electrostatics
skewness tensor Third momentum tensor of probability distribution
Mathematical Statistics
Tensor of piezoelectric constant
Coupling of stress and electrostatic field
578
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra p
Table A.3 Various examples of tensor spaces Tq (p C q W rank of tensor) (3,1) tensor, (2,2) tensor, tensor space T31 , T22 Riemann curvature tensor Differential Geometry Fourth order multipole tensor
Gravitostatics Magnetostatics Electrostatics
Hooke tensor
Stress-strain relation Constitutive equation Continuum mechanics Elasticity, viscosticity
Curtosis tensor Fourth moment tensor of a probability distribution
Mathematical Statistics
A-2 Decomposition of Multilinear Functions into Symmetric Multilinear Functions, Antisymmetric Multi-linear Functions and Residual Multilinear p p p p Functions T Tq D Sq ˚ Aq ˚ Rq p
Tpq as the space of multilinear functions follows the decomposition Tpq D Sq ˚ p p p p Aq ˚ Rq into the subspace Sq of symmetric multilinear functions, the subspace Aq p of antisymmetric multilinear functions and the subspace Rq of residual multilinear functions: Box A.2i. (Antisymmetry of the symbols fi1 :::ip ): fij D fj i fij k D fi kj ; fj ki D fj i k ; fkij D fkj i fij kl D fij lk ; fj kli D fj ki l ; fklij D fklj i ; fkij k D fli kj
Box A.2ii. (Symmetry of the symbols fi1 :::ip ): fij D fj i fij k D fi kj ; fj ki D fj i k ; fkij D fkj i fij kl D fij lk ; fj kli D fj ki l ; fklij D fklj i ; fkij k D fli kj
A-2 Decomposition of Multilinear Functions into Symmetric Multilinear Functions
Box A.2iii. (The interior product of bases, span Sp , n D 3):
dim X
D
579
dim X
D
1 i e 1Š 1 S2 W .e i ˝ e j C e j ˝ e i / 2Š S1 W
DW e i _ e j ; S3 W
e i _ e j D Ce j _ e i
1 i .e ˝ e j ˝ e k C e i ˝ e k ˝ e j C e j ˝ e k ˝ e i 3Š C ej ˝ ei ˝ ek C ek ˝ ei ˝ ej C ek ˝ ej ˝ ei / DW e i _ e j _ e k
ei _ ej _ ek D ei _ ek _ ej D ek _ ei _ ej D ek _ ej _ ei D ej _ ek _ ei D ej _ ei _ ek
Box A.2iv. (The exterior product of bases, span Ap , n D
dim X D dim X
1 i e 1Š 1 A2 W .e i ˝ e j e j ˝ e i / 2Š A1 W
DW e i ^ e j ; A3 W
e i ^ e j D e j ^ e i
1 i .e ˝ e j ˝ e k e i ˝ e k ˝ e j C e j ˝ e k ˝ e i 3Š ej ˝ ei ˝ ek C ek ˝ ei ˝ ej ek ˝ ej ˝ ei / DW e i ^ e j ^ e k
e i ^ e j ^ e k D e i ^ e k ^ e j D Ce k ^ e i ^ e j D e k ^ e j ^ e i D e j ^ e k ^ e i D e j ^ e i ^ e k
D 3):
580
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
Box A.2v. (Sp , symmetric multilinear functions):
T10 S1 3 f D
T20 S2 3 f D
D
T30 S3 3 f D
WD
8 dim X
i D1 nD X dim X
: 2Š
8 <1
;
e i _ e j fij
i;j D1
8 dim X
e i fi
9 =
;
e i _ e j f.ij / jf.ij /
i j nD X dim X
: 3Š
9 =
i;j;kD1
9 = 1 WD .fij C fj i / ; 2Š 9 =
8 n <X
e i _ e j _ e k fij k D ; :
e i _ e j _ e k f.ij k/ jf.ij k/
i <j
9 =
1 .fij k C fi kj C fj ki C fj i k C fkij C fkj i / ; 3Š
9 8 dim X = < 1 nD X p e i1 _ e e 2 _ : : : _ e ip fi1 i2 :::ip T0 Sp 3 f D ; : pŠ i ;i ;:::;i D1 1 2
D
8 dim X < nD X :
p
e i1 _ e i2 _ : : : _ e ip f.i1 i2 :::ip / jf.i1 i2 :::ip /
i1 i2 :::ip
WD
9 =
1 .fi :::i i C fi1 :::ip ip1 C : : : C fip :::i1 i2 C fip :::i2 i1 / : ; pŠ 1 p1 p
nCp1 2p 1 p Corollary: dim S D , in particular if n D p, then dim S D . p p
p
Box A.2vi. (Ap , antisymmetric multilinear functions):
T10 A1 3 f D
8 dim X
i D1
e i fi
9 = ;
A-2 Decomposition of Multilinear Functions into Symmetric Multilinear Functions
T20 A2 3 f D
D
T30 A3 3 f D
D
8 <1
nD X dim X
: 2Š
8 <1
nD X dim X
8 n <X
9 = ;
e i ^ e j fŒij jfŒij
i <j
: 3Š
:
e i ^ e j fij
i;j D1
8 dim X
i;j;kD1
581
9 = 1 WD .fij C fj i / ; 2Š
e i ^ e j ^ e k fij k
9 = ;
e i ^ e j ^ e k fŒij k jfŒij k
i <j
9 = 1 WD .fij k fi kj C fj ki fj i k C fkij fkj i / ; 3Š p T0
9 8 dim X = < 1 nD X Ap 3 f D e i1 ^ e e 2 ^ : : : ^ e ip fi1 i2 :::ip ; : pŠ i ;i ;:::;i D1 1 2
D
8 dim X < nD X :
p
e i1 ^ e i2 ^ : : : ^ e ip fŒi1 i2 :::ip jfŒi1 i2 :::ip
i1
9 = 1 .fi1 :::ip1 ip fi1 :::ip ip1 C : : : C fip :::i1 i2 fip :::i2 i1 / : WD ; pŠ n Corollary: dim Ap D nŠ=.pŠ.n p/Š/D in particular if nDp, then dim ApD1. p
Box A.2vii. (Aq , antisymmetric multilinear functions, exterior product Proposition): (a) For every x 1 ; : : : ; x i 1 ; x i C1 ; : : : ; x q 2 X as well as x; y 2 X and r; s 2 R multilinearity implies x 1 ^ : : : ^ x i 1 ^ .rx C sy/ ^ x i C1 ^ : : : ^ x q D r.x 1 ^ : : : ^ x i 1 ^ x ^ xi C1 ^ : : : ^ x q C s.x 1 ^ : : : ^ x i 1 ^ y ^ x i C1 ^ : : : ^ x q :
582
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
(b) For every permutation of f1; 2; : : : ; qg we have x 1 ^ x 2 ^ : : : ^ x q D sign ./x 1 ^ x 2 ^ : : : ^ x q : (c) Let A 2 Aq .X/, B 2 As .X/; then B ^ A D .1/qs A ^ B: (d) For every q, 0 q n, the tensor space Aq of antisymmetric multilinear functions has dimension n dim Aq D D nŠ=.qŠ.n q/Š/: q As detailed examples we like to decompose fT10 ; T01 ; T20 ; T02 g in R2 and R3 , respectively, into symmetric and antisymmetric constituents. p
p
p
Example A.2. (Tpq D Sq ˚ Aq ˚ Rq , decomposition of multilinear functions into symmetric and antisymmetric constituents): As a first example of the decomposition of multilinear functions (tensor space) into symmetric and antisymmetric constituents we consider a linear space X (vector space) of dimension dim X D n D 2. Its dual space X , dim X D dim X D n D 2, is spanned by orthonormal contravariant base vectors fe 1 ; e 2 g. Choose q D 0, p D 1 and p D 2. X D span fe 1 ; e 2 g versus X D span fe 1 ; e 2 g ( 2 ) X 1 1 1 i T0 D A D S 3 f D e fi D e 1 f1 C e 2 f2 2 X i D1
T20
2
2
DA ˚S 9 8 2 = <X T20 3 f D e i ˝ e j fij ; : i;j D1
1
D e ˝ e 1 f11 C e 1 ˝ e 2 f12 C e 2 ˝ e 1 f21 C e 2 ˝ e 2 f22 1 D e 1 ˝ e 1 f11 C e 2 ˝ e 2 f22 C .e 1 ˝ e 2 e 2 ˝ e 1 /f12 2 1 1 1 2 2 1 C .e ˝ e C e ˝ e /f12 .e 1 ˝ e 2 e 2 ˝ e 1 /f21 2 2 1 C .e 1 ˝ e 2 C e 2 ˝ e 1 /f21 2 1 D e _ e 1 f11 C e 2 _ e 2 f22 C e 1 ^ e 2 .f12 f21 /=2 C e 1 _ e 2 .f12 C f21 /=2 dim S2 D 3;
dim A2 D 1:
A-2 Decomposition of Multilinear Functions into Symmetric Multilinear Functions
583
As a second example of the decomposition of multilinear functions (tensor space) into symmetric and antisymmetric constituents we consider a linear space X (vector space) of dimension dim X D n D 3, spanned by orthonormal contravariant base vectors fe 1 ; e 2 ; e 3 g. Choose p D 0, q D 1 and 2. span X D fe 1 ; e 2 ; e 3 g T01
(
D A1 D S1 3 f D
3 X
) ei f
i
D e 1f 1 C e 2f 2 C e3f 3 2 X
i D1
T02
D A2 ˚ S2 9 8 3 = <X T02 3 f D e i ˝ e j f ij ; : ( D
i;j D1
3 X i D1
ei ˝ e1f
i1
C
3 X i D1
ei ˝ e2f
i2
C
3 X
) ei ˝ e3f
i3
i D1
D e 1 ˝ e 1 f 11 C e 2 ˝ e 1 f 21 C e 3 ˝ e 1 f 31 C e 1 ˝ e 2 f 12 C e 2 ˝ e 2 f 22 C e 3 ˝ e 2 f 32 C e 1 ˝ e 3 f 13 C e 2 ˝ e 3 f 23 C e 3 ˝ e 3 f 33 1 1 C .e 1 ˝ e 2 e 2 ˝ e 1 /f 12 C .e 1 ˝ e 2 C e 2 ˝ e 1 /f 12 2 2 1 1 .e 1 ˝ e 2 e 2 ˝ e 1 /f 21 C .e 1 ˝ e 2 C e 2 ˝ e 1 /f 12 2 2 1 1 C .e 2 ˝ e 3 e 3 ˝ e 2 /f 23 C .e 2 ˝ e 3 C e 3 ˝ e 2 /f 23 2 2 1 1 .e 2 ˝ e 3 e 3 ˝ e 2 /f 32 C .e 2 ˝ e 3 C e 3 ˝ e 2 /f 32 2 2 1 1 C .e 3 ˝ e 1 e 1 ˝ e 3 /f 31 C .e 3 ˝ e 1 C e 1 ˝ e 3 /f 31 2 2 1 1 .e 3 ˝ e 1 e 1 ˝ e 3 /f 13 C .e 3 ˝ e 1 C e 1 ˝ e 3 /f 13 2 2
|
Since the subspaces Spq , Apq and Rpq are independent, Spq ˚ Apq ˚ Rpq denotes the direct sum of subspace Spq , Apq and Rpq . Unfortunately Tpq as the space of multilinear functions cannot be completely decomposed in the space of symmetric and antisymmetric multilinear functions: For instance, the dimension identities n p apply dim Tp D np , dim Sp D .nCp1 p /, dim A D .p / with respect of a vector space X of dimension dim X D n, such that dim Rp D dim Tp dim Sp dim Ap D n p np .nCpC1 p / .p / < n , in general. There is one exception, namely the (2,0) or (1,1) or (0,2) tensor space where the dimension of the subspace R20 , R11 or R02 of residual multilinear functions is zero. An example is dim R2 D n2 . n2 .n C 1/n=2 n.n 1/2 D 0.
nC1
2
/ .2n / D
584
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
A-3 Matrix Algebra, Array Algebra, Matrix Norm and Inner Product Symmetry and antisymmetry of the symbols fi1 :::ip can be visualized by the trees of Box A.2i and A.2ii. With respect to the symbols of the interior product “_” and the exterior product “^” (“wedge product”) we are able to redefine symmetric and antisymmetric functions according to Box A.2iii–vi. Note the isomorphism of tensor algebra Tpq and array algebra, namely of (a) Œfi 2 Rn (b) Œfij 2 Rnn (c) Œfij k 2 Rnnn
(onedimensional array, “column vector”, dim Œfi D n 1) (twodimensional array, column-row array, “matrix”, dim Œfij D n n) (threedimensional array, “indexed matrix”, dim Œfij k D n n n)
etc. For the base space x 2 ˝ R3 to be threedimensional Euclidean we had answered the question how to measure the length of a vector (“norm”) and the angle between two vectors (“inner product”). The same question will finally been raised p for tensors tq 2 Tpq . The answer is constructively based on the vectorization of the arrays Œfij , Œfij k , : : : Œfi1 :::ip by taking advantage of the symmetry-antisymmetry structure of the arrays and later on applying the Euclidean norm and the Euclidean inner product to the vectorized array. For a 2-contravariant, 0-covariant tensor we shall outline the procedure. (a) Firstly let F D Œfij be the quadratic matrix of dimension dim F D n n, an element of T20 . Accordingly vec F is the vector 2
fi1 6 fi2 6 6 vec F D 6 ::: 6 4 fi
n1
3 7 7 7 7; 7 5
dim vec F D n2 1
fin
which is generated by stacking the elements of the matrix F columnwise in a vector. The Euclidean norm and the Euclidean inner product of vec F , vec G , respectively is kvec F k2 WD .vec F /T .vec F / D tr F T F ; hvec F jvec G i WD .vec F /T vec G D tr F T G : (b) Secondly let F D Œfij D Œfj i be the symmetric matrix of dimension dim F D n n, an element of S 2 . Accordingly vech F (read “vector half”) is
A-3 Matrix Algebra, Array Algebra, Matrix Norm and Inner Product
585
the n.n C 1/=2 1 vector which is generated by stacking the elements on and under the main diagonal of the matrix F columnwise in a vector: 3 f11 6: 7 7 6 6f 7 6 n1 7 7 6 6f 7 H) vech F WD 6 22 7 ; dim vech F D n.n C 1/=2 6: 7 7 6 6 f2n 7 7 6 4: 5 fnn 2
F D Œfij D Œfj i D F T
vech F D H vec F ;
dim H D n.n C 1/=2 n2 :
The Euclidean norm and the Euclidean inner product of vech F , vech G , respectively is F D Œfij D Œfj i D F T G D Œgij D Œgj i D G T
# H)
kvech F k2 WD .vech F /T .vech F /
hvech F jvechG i WD .vech F /T .vech G /: (c) Thirdly let F D Œfij D Œfj i be the antisymmetric matrix of dimension dim F D n n, an element of A2 . Accordingly veck F (read “vector skew”) is the n.n 1/=2 1 vector which is generated by stacking the elements under the main diagonal of the matrix F columnwise in a vector: 3 2 f21 7 6: 7 6 7 6f 6 n1 7 7 6 7 6f F D Œfj i D Œfj i D F T H) veck F WD 6 32 7 ; dim veck F D n.n 1/=2 7 6: 7 6 6fn2 7 7 6 5 4: fn1n veck F D K vec F ;
dim K D n.n 1/=2 n2 :
The Euclidean norm and the Euclidean inner product of veck F , veck G , respectively
586
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
F D Œfij D Œfj i D F T G D Œgij D Œgj i D G T
# H)
kveck F k2 WD .veck F /T .veck F /
: hveck F jveck G i WD .veck F /T .veck G / You will find the forms (a) vec F and (b) Vech F in the standard book of (Harville, 1997, Chap. 16, 2001, Chap. 16) including many examples. Here we will present Example A.3 in computing the norm as well as the inner product of a 2-contravariant, 0-covariant tensor for the operator vec, vech and veck. The veck operator has been introduced by (Grafarend and Schaffrin, 1993, pp. 418–419). Example A.3. (Norm and inner product of a 2-contravariant, 0-covariant tensor): 2 3 a d g .a/ A WD 4 b e h 5 ; dim A D 3 3 H) vec A D Œa; b; c; d; e; f; g; h; kT ; c f k dim vec A D 9 1 k vec Ak2 D .vec A/T .vec A/ D tr AT A D a2 C : : : C k 2 2 3 a b c .b/ A WD 4 b d e 5 D AT ; dim A D 3 3 H) vech A D Œa; b; c; d; e; f T ; c e f dim vech A D 6 1 vech A D H vec A 3 2 1 0 0 0 0 0 0 0 0 6 0 1=2 0 1=2 0 0 0 0 07 7 6 6 0 0 1=2 0 0 0 0 0 07 7 6 8 H WD 6 7 ; dim H D 6 9 60 0 0 0 1 0 0 0 07 7 6 40 0 0 0 1=2 0 1=2 0 5 0 0 0 0 0 0 0 0 0 1 kvech Ak2 D .vech A/T .vech A/ D a2 C b 2 C c 2 C d 2 C e 2 C f 2 (Henderson and Searle (1978, p. 68–69)) 2 3 0 a b c 6 a 0 d e 7 T 7 .c/ A WD 6 4 b d 0 f 5 D A ; dim A D 3 3 H) c e f 0 veck A D Œa; b; c; d; e; f T ; dim veck A D 6 1 veck A D K vec A
A-4 The Hodge Star Operator, Self Duality
2
0 1 0 0 1 0 0 0 0 0 0 6 0 0 1 0 0 0 0 0 1 0 0 6 16 60 0 0 1 0 0 0 0 0 0 0 8 K WD 6 2 6 0 0 0 0 0 0 1 0 0 1 0 6 40 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 dim K D 6 16 kveck Ak2 D a2 C b 2 C c 2 C d 2 C e 2 C f 2
587
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1
3 0 07 7 7 07 7; 07 7 05 0 |
A-4 The Hodge Star Operator, Self Duality In Chap. 3, we took advantage of the Hodge star operator, for instance for transforming an overdetermined system of linear equations (inconsistent system) into a system of condition equations. There we referred to multilinear operations “join” and“meet”. The most important operator of the algebra of antisymmetric multilinear functions is the “Hodge star operator” which we shall present finally. In addition, we shall bring to you the surprising special feature of skew algebra called “selfduality”. p The algebra Aq of antisymmetric multilinear functions has been based on the exterior product “^” (“wedge product”). There has been created a duality operator called the Hodge star operator which is a linear map of Ap ! Anp where n D dim X D dim X denotes the dimension of the base space X R3 . The basic idea of such a map of antisymmetric multilinear functions f 2 Ap into antisymmetric linear functions f 2 Anp originates according to Box A.2vii from the following situation: The multilinear base of Ap is spanned by f1; e i1 ; e i1 ^ e i2 ; : : : e i1 ^ e i2 ^ : : : ^ e ip g once we focus on p D 0; 1; 2; : : : n, respectively. Obviously for any dimension p number n and p-contravariant, q-covariant index of the skew tensor space Aq there is an associated cobasis, namely 8 f1; e i1 g ˆ < basis W W associated ˆ : cobasis W fe i1 ; 1g
n D 1;
p D 0; 1
n D 2;
8 f1; e i1 ; e i1 ^ e i2 g ˆ < basis W p D 0; 1; 2 W associated ˆ : cobasis W fe i1 ^ e i2 ; e i2 ; 1g
588
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
n D 3;
8 f1; e i1 ; e i2 ^ e i3 ; e i1 ^ e i2 ^ e i3 g ˆ < basis W p D 0; 1; 2; 3 W associated ˆ : cobasis W fe i1 ^ e i2 ^ e i3 ; e i2 ^ e i3 ; e i3 ; 1g:
in general, for arbitrary n 2 N, p D 0; 1; : : : ; n 1; n basis W f1; e i1 ; : : : ; e i1 ^ : : : ^ e in g
associated cobasis W fe i1 ^ e i2 ^ : : : ^ e in1 ^ e in ; e i2 ^ : : : e in1 ^ e in ; : : : ; e in1 ^ e in ; e in ; 1; p as long as we concentrate on p-contravariant A only. A similar set-up of basisp associated cobasis for q-covariant Aq and mixed Aq can be made. The linear map Ap ! Anp , the Hodge star operator 1 i1 :::ip ipC1 in i1 ip .e ^ : : : ^ e / WD e ^ ::: ^ e .n p/Š ipC1 :::in
maps by means of the permutation symbol 2
i ::: i
1 p ipC1 ::: in
C 1 for 6 of 6 6 WD 6 6 1 for 6 of 4
an even permutation f1; 2; : : : ; n 1; ng an odd permutation f1; 2; : : : ; n 1; ng
0 otherwise – sometimes called Eddington’s epsilons – on orthonormal (“unimodular”) base of Ap onto an orthonormal (“unimodular”) base of Anp . For antisymmetric multilinear functions also called antisymmetric tensor-valued functions represented in an orthonormal (“unimodular”) base the Hodge star operator is the following linear map 8 <1 p T0 Ap 3 f D : pŠ
p T0
8 <
nDdim XX
e i1 ^ : : : ^ e ip fi1 :::ip
i1 ;:::;ip D1
1 Anp 3 f WD :.n p/Š
nDdim XX nDdim XX ipC1 ;:::;in
i1 ;:::;ip
9 = ;
;
9 = 1 i1 :::ip ipC1 :::in e ipC1 ^ : : : ^ e in fi1 :::ip : ; pŠ
As soon as the base space x 2 ˝ R3 is not covered by Cartesian coordinates, rather by curvilinear coordinates, its coordinates base
A-4 The Hodge Star Operator, Self Duality
1
2
3
1
2
589
3
fb ; b ; b g D fdy ; dy ; dy g versus fb1 ; b2 ; b3 g D
@ @ @ ; ; @y 1 @y 2 @y 3
of contravariant versus covariant type is neither orthogonal nor normalized. It is for this reason that finally we present f , the Hodge star operator of an antisymmetric multilinear function f , also called the dual of f , in a general coordinate base. Definition A.2. (Hodge star operator, the dual of an antisymmetric multilinear function): If an antisymmetric .p; 0/ multilinear function is an element of the skew algebra Ap with respect, to a general base fbi1 ^ : : : ^ bip g is given 9 8 = < 1 nDdim XX f D bi1 ^ : : : ^ bip fi1 :::ip ; : pŠ i ;:::;i D1 1
p
then the8Hodge star operator, the dual of f , can be uniquely represented by nDdim < XX nDdim XX nDdim XX 1 1 bipC1 ^ : : : ^ bip (a) f D : .n p/Š pŠ i ;:::;i i ;:::i j ;:::;j n
pC1
8 <
1
p
1
p
p gi1 :::ip ipC1 :::in g i1 j1 : : : g ip jp fj1 :::jp nDdim XX
nDdim XX
)
9 =
1 ipC1 1 p b ^ : : : ^ bin gi1 :::ip ipC1 :::in f i1 :::ip :.n p/Š ; pŠ i ;:::;i i ;:::;i ) ( pC1 n 1 p nDdim PX p gi1 :::ip k1 :::knp f i1 :::ip (c) .f /k1 :::knp D
(b) f D
i1 ;:::ip
as an element of the skew algebra Anp in the general associated cobase fbpC1 ^ n : : :^bin g with respect to the base space x 2 X Rp of dimension n D dim X D p 1 dim X and Œg kl D G D adj G =det G , g D jgkl j. If we extend the algebra Ap of antisymmetric multilinear functions by 1 D e 1 ^ : : : e n 2 An and e 1 ^ : : : ^ e n D 1 2 A0 D R, respectively, let us collect some properties of f , the dual of f .
Proposition A.1. (Hodge star operator, the dual of an antisymmetric multilinear function): Let the linearly ordered base fe 1 ; : : : ; e n g be orthonormal (“unimodular”). Then the Hodge star operator of an antisymmetric multilinear function f , the dual of f , with respect to fe 1 ; : : : e n g satisfies the following:
590
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
(a) maps antisymmetric p-contravariant tensor-valued functions to antisymmetric .n p/-contravariant tensor-valued functions: W Ap ! Anp ( 1 D e 1 ^ : : : ^ e n DW e for every 1 2 A0 ; e 2 A (b) e D 1 for every e 2 An ; 1 2 Ap p.np/ (c) f D .1/ f for every f 2 Ap 2 1 (d) f ^ f D kf k e ^ : : : ^ e n with respect to the norm kf k2 WD
nDdim X 1 X fi :::i f i1 :::ip : pŠ i ;:::;i D1 1 p 1
p
Example A.4. (Hodge star operator n D dim X D dim X D 3, span X D fe 1 ; e 2 ; e 3 g, Ap ! Anp ): n D 3 ; p D 0 W 1 D e1 ^ e2 ^ e3 1 i1 i2 e ^ e i3 2 i2 i3 2 1 2 3 3 2 1 2 3 6 e D 2 .e ^ e e ^ e / D e ^ e 6 6 1 6 2 6 e D .e 3 ^ e 1 e 1 ^ e 3 / D e 3 ^ e 1 6 2 6 4 1 e 3 D .e 1 ^ e 2 e 2 ^ e 1 / D e 1 ^ e 2 2
n D 3 ; p D 1 W e i1 D
n D 3 ; p D 2 W e i1 ^ e i2 D ii31 i2 e i3 2 1 e ^ e 2 D e 3 6 4 e 2 ^ e 3 D e 1 e 3 ^ e 1 D e 2 n D 3 ; p D 3 W e i1 ^ e i2 ^ e i3 D 1
|
Example A.5. (Hodge star operator of an antisymmetric tensor-valued function, n D dim X D dim X D 3, Ap ! Anp ): Throughout we apply the summation convention over repeated indices. ( n D 3; p D 0 W
f
“0-differential form” 1
2
f D f dx ^ dx ^ dx
3
“3-differential form’’
A-4 The Hodge Star Operator, Self Duality
591
8 f D dx i1 fi1 “1-differential form” ˆ ˆ ˆ ˆ ˆ ˆ < f D 1 i1 dx i2 ^ dx i3 fi 1 2 i2 i3 n D 3; p D 1 W ˆ ˆ ˆ D f1 dx 2 ^ dx 3 C f2 dx 3 ^ dx 1 ˆ ˆ ˆ : C f3 dx 1 ^ dx 2 “2-differential form” 8 1 ˆ f D dx i1 ^ dx i2 fi1 i2 “2-differential form” ˆ ˆ ˆ 2 < n D 3 ; p D 2 W f D 1 i1 i2 dx i3 f i1 i2 ˆ i3 ˆ 2 ˆ ˆ : D f23 dx 1 C f31 dx 2 C f12 dx 3 “1-differential form” 8 ˆ ˆ f D 1 dx i1 ^ dx i2 ^ dx i3 fi1 i2 i3 “3-differential form” < 6 n D 3; p D 3 W ˆ 1 ˆ : f D i1 i2 i3 fi1 i2 i3 D f123 “0-differential form” | 6 Example A.6. (Hodge star operator, n D dim X D dim X D 3, “” product (cross product)): By means of the Hodge star operator we are able to interpret the “” product (“cross product”) in threedimensional vector space. If the vectors x; y 2 X, dim X D 3, presented in the orthonormal (“unimodular”) base fe 1 ; e 2 ; e 3 g the following equivalence between x ^ y and x y holds: x D e i xi ; x 2 X;
y D e j y j .summation convention
y 2X
i; j 2 f1; 2; 3; g/
# H)
x ^ y D e i ^ e j xi y j D e 1 ^ e 2 .x 1 y 2 x 2 y 1 / C e 2 ^ e 3 .x 2 y 3 x 3 y 2 / C e 3 ^ e 1 .x 3 y 1 x 1 y 3 / H) .x ^ y/ D .e 1 ^ e 2 /.x 1 y 2 x 2 y 1 / C .e 2 ^ e 3 /.x 2 y 3 x 3 y 2 / C .e 3 ^ e 1 /.x 3 y 1 x 1 y 3 / D ijk e k x i y j .x ^ y/ D e 3 .x 1 y 2 x 2 y 1 / C e 1 .x 2 y 3 x 3 y 2 / C e 2 .x 3 y 1 x 1 y 3 / D e 1 .x 2 y 3 x 3 y 2 / C e 2 .x 3 y 1 x 1 y 3 / C e 3 .x 1 y 2 x 2 y 1 / # x y D e i e j xi xj H) x y D .x ^ y/ | e i e j WD ij k e k
592
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
Example A.7. (Hodge star operator, n D dim X D dim X D 4; X 2 R4 , Minkwinski space, self-duality (Atiyah et al., 1978)): By means of the Examples A-5–A-7 we like to make you familiar with (a) the Hodge star operator of an antisymmetric tensor-valued function over R3 , (b) its equivalence to the “x” product (“cross product”) and (c) selfduality in a fourdimensional space. Such a selfduality plays a key role in differential geometry and physics as being emphasized by Atiyah et al. (1978). '
$
Historical Aside Thus we have constructed an anticommutative algebra by implementing the “exterior product” “^”, also called “wedge product”, initiated by H. Grassmann in “Ausdehnungslehre” (second version published in 1882). See also his collected works, H. Grassmann (1911). In addition the work by G. Peano (Calcolo geometrico secondo, l’Ausdehnungslehre di Grassmann, Fratelli Bocca Editori, Torino 1888) should be mentioned here. The historical development may be documented by the work of Forder (1960). A modern version of the “wedge product” is given by Berman (1961). In particular we mention the contribution by Barnabei et al. (1985) where by avoiding the notion of the dual X of a linear space X and based upon operations like union, intersection, and complement – i.e. known in Boolean algebra – have developed a double algebra with exterior products of type one (“wedge product”, “the join”) and of type two (“the meet”), namely “to restore H. Grassmanns original ideas to full geometrical power”. The star operator “” has been introduced by W. V. D. Hodge, being implemented into algebra in the work Hodge (1941) and Hodge and Pedoe (1968), pp. 232–309). Here the star operation has been called “dual Grassmann coordinates”; in addition “intersections and joins” have been introduced. & %
A-5 Linear Algebra Multilinear algebra is built on linear algebra we are going into now. At first we give a careful definition of linear algebra which secondly we deepen by the diagrams “Ass”, “Uni” and “Comm”. The subalgebra “ring with identity” which is of central importance for solving polynomial equations by means of Groebner bases, the Buchberger algorithm and the multipolynomial resultant method is our third subject. Section 4 introduces the motion of division algebra and the non-associative algebra. Fifthly, we confront you with Lie algebra (“God is a lie Group”); in particular with Witt algebra. Section 6 compares Lie algebra and Killing analysis. Here we add some notes on the difficulties of a composition algebra in Sect. 7. Finally in Sect. 8
A-5 Linear Algebra
593
matrix algebra is presented again, but this time as a division algebra. As examples of a division algebra as well as composition algebra we introduce complex algebra (Clifford algebra C l.0; 1/ ) in Sect. 9 and quaternion algebra in Sect. 10 (Clifford algebra C l.0; 2/ ) which is followed by an interesting letter of W. R. Hamilton (16 October 1943) to his son reproduced in Sect. 11. Octonian algebra (Clifford algebra with respect to H H) in Sect. 12 is an example for a “non associative” algebra as well as a composition algebra. Of course, we have reserved “Sect. 13” for the fundamental Hurwitz theorem of composition algebra and the fundamental Frobenius theorem of division algebra.
A-51 Definition of a Linear Algebra Up to now we have succeeded to introduce the base space X of vectors x 2 X D R3 equipped with a metric and specialized to be threedimensional Euclidean. We have extended the base space to a tensor space, namely from vector-valued functions to tensor-valuedfunctions on fRn ; gij g. Now we proceed to give linear and multilinear functions an algebraic structure, namely by the definition of two binary operations, (two internal relations) .opera/1 D ˛, .opera/2 D and one binary operation (external relation) .opera/3 D ˇ. Definition A.51. (linear algebra over the field of real numbers, linearity of vector space X): Let R be the field of real numbers. A linear algebra over R or R-algebra consists of a set X of objects, two internal relations (either “additive” or “multiplicative”) and one external relation .opera/1 DW ˛ W X X ! X .opera/2 DW ˇ W R X ! X
or X R ! X
.opera/3 DW W X X ! X: 1 With respect to the internal relation ˛ (“join”) X as a linear space is a vector space over R, an Abelian group written “additively” or “multiplicatively” : x; y; z 2 X additively written Abelian group ˛.x; y/ DW x C y .G 1C/ .x C y/ C z D x C .y C z/ .additive associativity/
multiplicatively written Abelian group ˛.x; y/ DW x ı y .G 1ı/ .x ı y/ ı z D x ı .y ı z/ .multiplicative associativity/
594
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
.G 2C/ x C 0 D x
.G 2ı/ x ı 1 D x
.additive identity; neutral element/
.multiplicative identity; neutral element/ .G 3ı/ x ı x 1 D 1
.G 3C/ x C .x/ D 0 .additive inverse/
.multiplicative inverse/
.G 4C/ x C y D y C x
.G 4ı/ x ı y D y ı x
.additive commutativity; Abelian axiom/
.multiplicative commutativity; Abelian axiom/:
The triplet of axioms f.G1C/; .G2C/; .G3C/g or f.G1ı/; .G2ı/; .G3ı/g constitutes the set of group axioms. 2 With respect to the external relation ˇ the following compatibility conditions are satisfied: x; y 2 X;r; s 2 R ˇ.r; x/ DWr x .D1C/ r .x C y/ D .x C y/ r D r xCr y D xr Cy r .1st additive distributivity/
.D1ı/ r .x ı y/ D .x ı y/ r D .r x/ ı y D x ı .y ı r/ .1st multiplicative distributivity/
.D2C/ .r C s/ x D x .r C s/ D r xCsx D xr Cxs .2nd additive distributivity/
.D2ı/ .r ı s/ x D x .r ı s/ D r .s x/ D .x r/ s .2nd multiplicative distributivity/
.D3/ 1 x D x 1 D x .left and right identity/ 3 With respect to the internal relation (“meet”) the following compatibility conditions are satisfied: x; y; z 2 X; r 2 R .x; y/ DW x y .G 1/ .x y/ z D x .y z/ .associativity w.r.t internal multiplication/
A-5 Linear Algebra
595
.D1 C/ x .y C z/ D x y C x z .x C y/ z D x z C y z .left and right additive distributivity w.r.t internal multiplication/ .D1 ı/ x .y ı z/ D .x y/ ı z .x ı y/ z D x ı .y z/ .left and right multiplicative distributivity w.r.t internal multiplication/ .D2 / r .x y/ D .r x/ y .x y/ r D x .yr/ .left and right distributivity of internal and external multiplication/ Please, in the following pay attention to the 3 axiom sets of the linear algebra. We specify now the three axiomatic sets called “associativity” (ass), “unity” (Uni) and “commutativity” (comm).
A-52 The Diagrams “Ass”, “Uni” and “Comm” Conventionally a linear algebra is minimally constituted by the triplet (X; ˛; ˇ/ where X as a linear space is a vector space equipped with the linear maps ˛ W X X ! X and ˇ W R X ! X satisfying the axioms (Ass) and (Uni) according to the following diagrams: (Uni):
(Ass):
The diagram
The square × ×
×
a × id
b × id
×
id × b
α
id × a ×
×
α Commutes.
Commutes. (Comm): The triangle
α
α
×
596
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
Axiom (Ass) expresses the requirement that the multiplication ˛ is associative whereas Axiom (Uni) means that the element ˇ.1/ of X is a left as well as a right unit for ˛. The algebra .X; ˛; ˇ/ is commutative if in addition it satisfies the axiom (Comm): commutes where X;X is the flip switching the factors: X;X .x ı y/ D y ı x. | Indeed we have expressed a set of axioms both explicitly as well as in a diagrammatic approach which minimally constitute a linear algebra .X; ˛; ˇ/. In addition, beside the first internal relation ˛ called “join” we have experienced a second internal relation called “meet” which had to be made compatible with the other relations ˛ and ˇ, respectively. Actually the diagram for the axiom (Dis) is left as an exercise. Obviously we have experienced the words “addition” and “multiplication” for various binary operations. Note that in the linear algebra isomorphic to the vector space as its geometric counterpart we have not specified the inner multiplication .x; y/ 2 X. In a threedimensional vector space of Euclidean type .x; y/ DW .x ^ y/ D x y namely the star of the exterior product x ^ y or the “cross product” x y, for x 2 R3 , y 2 R3 is an example. Sometimes .x; y/ DW Œx; y is written by rectangular brackets. '
$ Historical Aside
¨ Following a proposal of L. Kronecker (“Uber die algebraisch aufl¨osbaren Gleichungen (Abhandlung, 1929) the axiom of commutativity .G 4C/ or .G 4ı/ is called after N. H. Abel (Memoire sur un classe particuli`ere d’equations r´esolable alg´ebraique, Crelle’s J. reine angewandte Mathematik 4 (1828) 131–156 Oeuvres vol. 1, pp. 478–514, vol. 2, pp. 217–243, 329–331 edited by S. Lie and L. Sylow, Christiana 1881) who dealt with a particular class of equations of all degrees which are solvable by radicals, e.g. the cyclotomic equation x n 1 D 0. N. H. Abel has proved the following general theorem: If the roots of an equation are such that all roots can be expressed as rational functions of one of them, say x, and if any two of the roots, say r1 x and r2 x where r1 and r2 are rational functions are connected in such a way that r2 r1 x D r1 r2 x, then the equation can be solved by radicals. Refer r2 r1 x D r1 r2 x to .G 1/. & %
A-5 Linear Algebra
597
A-53 Ringed Spaces: The Subalgebra “Ring with Identity” In .G 2ı/ the neutral element 1 as well as in .G 3ı/ the inverse element has been multiplied from the right. Similarly left multiplication .G 2ı/ by the neutral element 1 as well as .G 3ı/ by the inverse element are defined. Indeed it can be shown that there exist exactly one neutral element which is both left-neutral and right-neutral as well as exactly one inverse element which is both left-inverse and right-inverse. A subalgebra is called a “ring with identity” if the following seven conditions hold:
A ring with identity .G 3/ is a division ring if every nonzero element of the ring has a multiplicative inverse. A commutative ring is a ring with commutative multiplication .G 4/. Modules are generalizations of the vector spaces of linear algebra in which the “scalars” are allowed to be from an arbitrary ring, rather than a field of real numbers. They will be discussed as soon as we introduce superalgebras. Now we take reference to Lemma A.53. (anticommutativity): x ı x D 0 for all x 2 X ” x ı y D y ı x for all x; y 2 X. “ı” is used in the notation “^” accordingly. Proof. “ H) ”x ı y C y ı x D x ı x C x ı y C y ı x C y ı y D x ı .x C y/ C y ı .x C y/ D .x C y/ ı .x C y/ D 0 “ (H ”x D y H) x ı x D x ı x H) x ı x D 0: Later on we refer to the following algebras.
|
598
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
A-54 Definition of a Division Algebra and Non-Associative Algebra
Definition A.23. (division algebra): A R-algebra is called division algebra over R, if all non-null elements of X0 WD X n f0g form additionally a group with respect to inner multiplication DW x ı y, namely x; y; z 2 Xnf0g .G 1ı/ .x ı y/ ı z D x ı .y ı z/ (associativity of inner multiplication) .G 2ı/ x ı 1 D x (identity of inner multiplication) .G 3ı/ x ı x 1 D 1 (inverse of inner multiplication) Definition A.24. (non-associative algebra): A weaking of a R-algebra is the non-associative algebra over R, if the axioms 1 , 2 and 3 of linear algebra according to Definition A.21 hold with the exception of .G 1ı/, that is the associativity of inner multiplication is cancelled.
A-55 Lie Algebra, Witt Algebra “perhaps god is a lie group” Many physicists believe that all modern physics is based on the operation called “Lie algebra”. Indeed more than 1000 textbooks and papers are written on this most important subject. Indeed many physicists believe in “perhaps god is a lie group”. Here, we can give a very short notice of “Lie algebra”. We skip a note of comparison “Lie algebra” and its brother “Killing analysis”.
Definition A.25. (Lie algebra): A non-associative algebra is called Lie algebra over R, if the following operations with respect to inner multiplication WD x ı y hold:
A-5 Linear Algebra
599
.L1/ x ı x D 0 .L2/ .x ı y/ ı z C .y ı z/ ı x C .z ı x/ ı y D 0: .Jacobi identity/ The examples of the Lie algebra are numerous. As a special Lie algebra we present the Witt algebra which is applied to Laurent polynomials. Example A.8. (Witt algebra on the ring of Laurent polynomials (Chen, 1995)): The Witt algebra W is the complex Lie algebra of polynomial fields on the unit circle S1 . An element of W is a linear combination of the elements of the form @ e i n @ , where is a real parameter, and the Lie bracket of W is given by @ @ @ e i m ; e i n D i.n m/e i.mCn/ : @ @ @
A-56 Definition of a Composition Algebra Various algebras are generated by adding an additional structure to the minimal set of axioms of a linear algebra blocked by (a), (b) and (c). Later on we use such an additional structure for matrix algebra. Definition A.26. (composition algebra): A non-associative algebra with 1 as identity of inner multiplication is called composition algebra over R, if there exists a regular quadratic form Q W X ! R which is compatible with the corresponding operations that is the following operations hold: .K1/ Q W X ! R is a regular quadratic form; x; y; z 2 R;
r 2R 2
Q.r x/ D r Q.x/ .quadratic form/ Q.x C y C z/ D Q.x C y/ C Q.x C z/ C Q.y C z/ Q.x/ Q.y/ Q.z/ Q.r x C y/ r Q.x C y/ D .r 1/ Œr Q.x/ Q.y/ Q.x/ D 0 ” x D 0 .regularity/ .K2/ Q.x ^ y/ D Q.x/ Q.y/ .multiplicativity/ Q.1/ D 1
600
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
The quadratic form introduced by Definition A.55 leads to the topological notion of scalar products, norm and metric we already used: Lemma A.56. (scalar product, norm, metric): In a composition algebra with a positive-definite quadratic form a scalar product (“inner product”) is defined by the bilinear form hji W X X ! R with hxjyi WD
1 ŒQ.x C y/ Q.x/ Q.y/I 2
a norm is defined by k k W X ! R with kxk WD CŒQ.x/1=2 and metric is defined by the bilinear form .x; y/ WD CŒQ.x y/1=2 : Thus to the algebraic structure a topological structure is added, if in addition .K3/
Q.x/ 0
for all x 2 X holds. Proof.
(i) scalar product
hji W X X ! R is a scalar product since 1 ŒQ.x C y/ Q.x/ Q.y/ 2 1 D ŒQ.y C x/ Q.y/ Q.x/ D hyjxi .symmetry/ 2 1 .2/ hx C yjzi D ŒQ.x C y C z/ Q.x C y/ Q.z/ 2 1 D ŒQ.x C z/ Q.x/ Q.z/ 2 1 C ŒQ.y C z/ Q.y/ Q.z/ 2 D hxjzi C hyjzi .additivity/ .1/
hxjyi D
A-5 Linear Algebra
601
1 ŒQ.rx C y/ Q.rx/ Q.y/ 2 1 D rŒQ.x C y/ Q.x/ Q.y/ 2 D r hxjyi
.3/ hrxjyi D
.homoge ni ty/
1 ŒQ.x C x/ Q.x/ Q.x/ 2 1 D ŒQ.2x/ 2 Q.x/ D Q.x/ 0 .positivity/ 2 (ii) norm
.4/ hxjxi D
k k W X ! R is a norm since .N 1/
kxk D CŒQ.x/1=2 0 (positivity) and kxk D 0
.N 2/
”
xD0
krxk D CŒQ.rx/1=2 D CŒr 2 Q.x/1=2 D jrj kxk (homogenity)
.N 3/ kx C yk D CŒQ.x C y/1=2 D CŒQ.x/ C Q.y/ C 2hxjyi1=2 h i1=2 D C kxk2 C kyk2 C 2kxk kyk kxk C kyk: .Cauchy-Schwarz’ inequality/“triangle inequality”/ (iii) metric W X X ! R is a metric since .M 1/ .x; y/ D CŒQ.x y/1=2 D kx yk 0
.posi ti vi ty/
and .x; y/ D 0
”
xy D0
”
xDy
.M 2/ .x; y/ D kx yk D j 1j ky xk D .y; x/
.sy mmet ry/
.M 3/ .x; y/ D kx yk D k.x z/ C .z y/k kx zk C kz xk D .x; z/ C .z; y/ (triangle inequality)
|
602
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
We treat in the next section a special case of a division algebra, namely matrix algebra.
A-6 Matrix Algebra Revisited, Generalized Inverses It is known that every nonsingular matrix A has a unique inverse, usually denoted by A1 such that AA1 D A1 A D I, where I is the unit matrix. A matrix has a inverse only if it is square, and even then only if it is nonsingular, or alternatively if its columns or rows are linearly independent. Over the many years past it was felt the need of some kind of partial inverse of a matrix that is singular or even rectangularsingular matrix. By the term generalized inverse of a given matrix A we shall mean a matrix X associated in some way with the matrix A that (a) exists for a class of matrices larger than the class of nonsingular matrices, (b) has some of the properties of the usual inverse, and (c) reduces to the usual inverse when A is non singular. Some others have used the term “pseudoinvese” rather than general inverse without specifying the chosen type of general inverse. For a given matrix A, the matrix equation AXA D A alone characterizes those generalized inverses X that are of use in analyzing the solutions of the linear system Ax D b. For other purposes, other relationships play an essential role. Thus if we are concerned with least squares properties, the equation AXA D A is not enough and must be supplemented by other relations. There results a more restricted class of generalized inverses. Here we are limited to generalized inverses finite matrices, but extension to infinite-dimensional space and to differential and integral operations are very well known. E.H. Moore is attributed to have written one of the first papers on the topic of generalized inverses called by him the “general reciprocal” of any finite matrix,square or rectangular. His first publication on the subject was an abstract talk given at a meeting of the American Mathematical Society which appeared in 1920. Details of his talk were published only in 1935 (Moore, 1920), Moore (1935); (Moore and Nashed, 1974) after Moore’s death! Little note was taken of Moore’s discovery for 30 years after his first publication, during which time generalized inverses were taken for matrices by C. L. Siegel and for operators by Tseng (1936), Tseng (1949a)–Tseng (1949c), Tseng (1956), Murray and von Neumann (1936) and many others. A surprising revival of interest in general inverses appeared in 1950s when least squares properties were analyzed. The important properties were realized by Bjerhammar (1951a), Bjerhammar (1951b), (1968), one swedish colleague teaching LESS and geodetic applications of the subject. In 1955 R. Penrose extended A. Bjerhammar’s results on linear systems and showed that E.H. Moore inverse 0 satisfied four equations of type (a) AXA D A, (b)XAX D X, (c) AX D AX, and 0 (d) XA D XA nowadays called the axioms of the Moore-Penrose inverse, often denoted by AC . There are excellent textbooks on matrix algebra and generalized inverses of matrices including many examples upto 500! We like to mention the excellent text
A-6 Matrix Algebra Revisited, Generalized Inverses
603
of Ben-Israel and Greville (1974), Bjerhammar (1973), Gere and Weaver (1965), Graybill (1963), Mardia et al. (1979), Nashed (1974), Rao and Mitra (1971), Rao and Rao (1998), Scarle (1982), Styan (1983). Here we will be unable to compete with the dense information given by these special texts. We assume that the reader is familiar with the standard notion of a matrix as a rectangular array of numbers offered by any undergraduate text of applied mathematics. Familiarity of various notion, for instance the dimension of a matrix of type n m, multiplication and addition of two matrices of type (a) “Cayley” or simply the matrix product C WD A:B, (b)“Kronecker-Zehfuss” denoted by C WD B ˝ A, (c) “Katri-Rao” denoted by C WD B ˇ A, and (d)“Hadamard” attributed by K WD G H. How are the various matrix product defined? Who defined the various algebras? Matrix algebra is the special division algebra over the field of real numbers. There is a natural generalization in terms of complex numbers for instance. The three axioms Let A D Œaij 2 Rnm be a rectangular matrix 1
A; B; C 2 Rnm ; first set of axiom ˛.A; B/ DW A C B .G 1C/ .A C B/ C C D A C .B C C /
2
.G 2C/
AC0 DA
.G 3C/
AA D 0
.G 4C/
ACB D B CA
A; B 2 Rnm ; second set of axiom r; s 2 R;
ˇ.r; A/ DW r A
.D1C/ r .A C B/ D r A C r B .D2C/ .r C s/ A D r A C sA .D3/ 3
1A DA
“multiplication of matrices” (a) “Cayley-product” (just “the matrix product”) A D Œaij 2 Rnl ; dim A D n l
#
B D Œbij 2 Rlm ; dim B D l m l P ai k bkl : dim C D n m cij WD kD1
H) C WD A B D Œcij 2 Rnm ;
604
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
The product was introduced by Cayley (1857); see also his Collected Works, vol. 2, 475–496. A historical perspective is given in Feldmann (1962). (b) “Kronecker-Zehfuß-product” # dim A D n m
A D Œaij 2 Rnm ; B D Œbij 2 Rkl
dim B D k l
H) C WD B ˝ A D Œcij 2 Rk nlm ; dim C D k n lm;
B ˝ A WD Œbij A:
The product was early referenced to L. Kronecker by Mac Duffee (1946). The other reference is Zehfuss (1858). See also Henderson et al. (1983) and Horn and Johnson (1990). Reference is also made to Steeb (1991) and Graham (1981). (c) “Khatri-Rao-product” (two rectangular matrices of identical column number) A D Œa1 ; : : : ; am 2 Rnm ;
dim A D n m
B D Œb1 ; : : : ; bm 2 Rkm ;
dim B D k m
#
H) C WD B ˇ A WD Œb1 ˝ a1 ; : : : ; bm ˝ am 2 Rk nm dim C D k n m: “two rectangular matrices of identical column numbers are multiplied.” The product was introduced by Khatri and Rao (1968). (d) “Hadamard product” (two rectangular matrices of the same dimension, elementwise product) G D Œgij 2 Rnm ; dim G D n m
#
H D Œhij 2 Rmm ; dim H D n m H) K WD G H D Œkij 2 Rnm ; .no summation/: dim K D n m; kij WD gij hij “two rectangular matrices of the same dimension define the elementwise product.” The product was introduced by Hadamard (1899). See also Moutard (1894), as well as Hadamard (1949) and Schur (1911) and Horn and Johnson (1991) and Styan (1973).
A-6 Matrix Algebra Revisited, Generalized Inverses
605
Example: (a)
123 AD 2 Z23 ; 456
2
3 23 B D 4 4 5 5 2 Z32 H) 67
(b) H) A B D
28 34 2 Z22 64 79
123 AD 2 Z23 ; 456
2
1 B D 43 5
.“integer numbers”/:
3 2 4 5 2 Z32 H) B ˝ A D Œbij A 6 3 2 4 6 8 10 12 7 7 7 7 4 8 12 7 7 2 Z66 16 20 24 7 7 7 6 12 18 5 24 30 36
2
1 2 3 6 4 5 6 6 22 3 3 6 12 6 6 3 6 8 D 4 4 3 4 5 ˝ A5 D 6 6 12 15 18 6 56 6 4 5 10 15 20 25 30 (c)
123 AD 2 Z23 ; 456
2
3 123 B D 4 4 5 6 5 2 Z33 789
22 3 2 3 2 3 2 3 2 3 2 33 1 1 2 2 3 3 H) B ˇ A D 44 4 5 ˝ 4 5 ; 4 5 5 ˝ 4 5 ; 4 6 5 ˝ 4 55 7 4 8 5 9 6 2
1 6 4 6 6 6 4 D6 6 16 6 4 7 23
4 10 10 25 16 40
3 9 18 7 7 7 18 7 7 2 Z63 36 7 7 27 5 54
606
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
(d) GD
123 2 Z23 ; 456
H D
H) G H D Œgij hij D
234 2 Z23 567
2 6 12 2 Z23 : 20 30 42
A-61 Special Matrices: Helmert, Hankel, and Vandemonte All the quoted textbooks review effectively special matrices of various types: symmetric, antisymmetric, diagonal, unity, zero, triangular, idempotent, normal, orthogonal, orthonormal, positive-definite, positive-semidefinite, permutation, and commutation matrices. Here, we review only orthonormal matrices of various representations. The Helmert representation of orthonomal matrix is given in various forms. In addition, we give examples for an orthogonal matrix, for instance the Hankel matrix of power sums and the Vandermonde matrix. Definition (orthogonal matrix) : The matrix A is called orthogonal if AA0 and A0 A are diagonal matrices. (The rows and columns of A are orthogonal.) Definition (orthonormal matrix): The matrix A is called orthonormal if AA0 D A0 A D I. (The rows and columns of A are orthonormal.) Facts (representation of a 22 orthonormal matrix) X 2 SO.2/: A 2 2 orthonormal matrix X 2 SO.2/ is an element of the special orthogonal group SO(2) defined by SO.2/ WD fX 2 R22 j X0 X D I2 and det X D C1g 8 < :
XD
x1 x2 x3 x4
9 ˇ 2 ˇ x1 C x22 D 1 = ˇ 2 R22 ˇˇ x1 x3 C x2 x4 D 0; x1 x4 x2 x3 D C1 ; ˇ x2 C x2 D 1 3 4
(a) XD
cos sin sin cos
2 R22 ;
is a trigonometric representation of X 2 SO.2/.
2 Œ0; 2
A-6 Matrix Algebra Revisited, Generalized Inverses
607
(b) " XD
px 1 x2
# p 1 x2 2 R22 ; x
x 2 ŒC1; 1
is an algebraic representation of X 2 SO.2/
p p 2 2 2 2 x11 C x12 D 1; x11 x21 C x12 x22 D x 1 x 2 C x 1 x 2 D 0; x21 C x22 D1 : (c) " XD
1x 2 1Cx 2 2x 1Cx 2
2x C 1Cx 2
#
1x 2 1Cx 2
2 R22 ;
x2R
is called a stereographic projection of X (stereographic projection of SO(2) S1 onto L1 ). (d) 1
X D .I2 C S/.I2 S/ ;
0 x SD ; x 0
where S D S0 is a skew matrix (antisymmetric matrix), is called a Cayley-Lipschitz representation of X 2 SO.2/. (e) X 2 SO.2/ is a commutative group (“Abel”) (Example: X1 2 SO.2/, X2 2 SO.2/, then X1 X2 D X2 X1 ) (SO(n) for n D 2 is the only commutative group, SO(nj n ¤ 2) is not “Abel”). Facts (representation of an n n orthonormal matrix) X 2 SO.n/: An n n orthonormal matrix X 2 SO.n/ is an element of the special orthogonal group SO(n) defined by SO.n/ WD fX 2 Rnn jX0 X D In and det X D C1g: As a differentiable manifold SO(n) inherits a Riemann structure from the ambient 2 2 space n with a Euclidean metric (vec X0 2 Rn ; dim vec X0 D n2 ). Any atlas of the special orthogonal group SO(n) has at least four distinct charts and there is one with exactly four charts. (“minimal atlas”: Lusternik-Schnirelmann category) (a) X D .In C S/.In S/1 ; where S D S0
608
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
is a skew matrix (antisymmetric matrix), is called a Cayley-Lipschitz representation of X 2 SO.n/. (nŠ=2.n 2/Š is the number of independent parameters/coordinates of X) (b) If each of the matrices R1 ; : : : ; Rk is an n n orthonormal matrix, then their product R1 R2 Rk1 Rk 2 SO.n/ is an n n orthonormal matrix. Facts (orthonormal matrix: Helmert representation): Let a0 D Œa1 ; : : : ; an represent any row vector such that ai ¤ 0 .i 2 f1; : : : ; ng/ is any row vector whose elements are all nonzero. Suppose that we require an n n orthonormal matrix, one row which is proportional to a0 . In what follows one such matrix R is derived. Let Œr0 1 ; : : : ; r0 n represent the rows of R and take the first row r0 1 to be the row of R that is proportional to a0 . Take the second row r0 2 to be proportional to the n-dimensional row vector Œa1 ; a21 =a2 ; 0; 0; : : : ; 0;
.H 2/
the third row r0 3 proportional to
a1 ; a2 ; .a21 C a22 /=a3 ; 0; 0; : : : ; 0
.H 3/
and more generally the first through nth rows r0 1 ; : : : ; r0 n proportional to " a1 ; a2 ; : : : ; ak1 ;
k1 X
# a2i =ak ; 0; 0; : : : ; 0
.Hn 1/
i D1
for k 2 f2; : : : ; ng; respectively confirm to yourself that the n-1 vectors (Hn1 ) are orthogonal to each other and to the vector a0 . In order to obtain explicit expressions for r0 1 ; : : : ; r0 n it remains to normalize a0 and the vectors (Hn1 ). The Euclidean norm of the kth of the vectors (Hn1 ) is 8 k1 <X :
i D1
a2i
C
k1 X
!2 a2i
i D1
91=2 ( ! k1 . = X 2 2 ak D ai ; i D1
k X
! a2i
.
) 1=2 a2k
i D1
Accordingly for the orthonormal vectors r0 1 ; : : : ; r0 n we finally find (1st row)
0
r1D
" n X i D1
#1=2 a2i
.a1 ; : : : ; an /
:
A-6 Matrix Algebra Revisited, Generalized Inverses
609
2
31=2
6 6 rk D6 6 k1 4 P
a2 !k
0
(kth row)
i D1
a2i
7 7 !7 7 k P 2 5 ai
i D1
a1 ; a2 ; : : : ; ak1 ;
i
i D1
31=2
2 .nth row/
6 6 r n D 6 n1 4 P
a2n
0
i D1
a2i
7 7 7 n P 2 5 ai :
"
!
k1 2 X a
a
k
; 0; 0; : : : ; 0
# n1 2 X ai a1 ; a2 ; : : : ; an1 ; : a i D1 n
i D1
The recipe is complicated: When a0 D Œ1; 1; : : : ; 1; 1, the Helmert factors in the 1st row, : : : ; kth row,. . . , nth row simplify to r0 1 D n1=2 Œ1; 1; : : : ; 1; 1 2n r0 k D Œk.k 1/1=2 Œ1; 1; : : : ; 1; 1 k; 0; 0; : : : ; 0; 0 2n r0 n D Œn.n 1/1=2 Œ1; 1; : : : ; 1; 1 n 2 Rn : The orthonormal matrix 2
r0 1 r0 2
3
6 7 6 7 6 7 6 0 7 6 r k1 7 6 0 7 2 SO.n/ 6 rk 7 6 7 6 7 6 7 4 r0 n1 5 r0 n is known as the Helmert matrix of order n. (Alternatively the transposes of such a matrix are called the Helmert matrix.) Example (Helmert matrix of order 3): 2
p p 3 p 1= 3 1= 3 1= 3 6 p 7 p 6 1= 2 1= 2 0 7 4 5 2 SO.3/: p p p 1= 6 1= 6 2= 6 Check that the rows are orthogonal and normalized.
610
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
Example (Helmert matrix of order 4): 2
3 1=2 1=2 1=2 1=2 p 6 p 7 6 1= 2 1= 2 7 0 0 6 p 7 2 SO.4/: p p 6 7 0 4 1= 6 1= 6 2= 6 5 p p p p 1= 12 1= 12 1= 12 3= 12 Check that the rows are orthogonal and normalized. Example (Helmert matrix of order n): 2
p 1= n p 1= 2 p 1= 6
6 6 6 6 6 6 6 6 6 6p 1 4 .n1/.n2/ p 1 n.n1/
p 1= n p 1= 2 p 1= 6
p 1= n
p 1= n
0 p 2= 6
p 1= n
0
0
0
0
1 1 p p .n1/.n2/ .n1/.n2/ p 1 n.n1/
p 1 n.n1/
1.n1/ p .n1/.n2/
p 1 n.n1/
p 3 1= n 7 0 7 7 0 7 7 7 2 SO .n/: 7 7 7 0 7 5
p 1n n.n1/
Check that the rows are orthogonal and normalized. An example is the nth row 1 .1 n/2 n1 1 2n C n2 1 CC C D C n.n 1/ n.n 1/ n.n 1/ n.n 1/ n.n 1/ D
n2 n n.n 1/ D D 1; n.n 1/ n.n 1/
where .n 1/ terms 1=Œn.n 1/ have to be summed. Definition (orthogonal matrix): A rectangular matrix A D Œaij 2 Rnm is called “a Hankel matrix” if the n C m 1 distinct elements of A, 3 2 a11 7 6 a 7 6 21 7 6 7 6 7 6 5 4 an11 an1 an2 anm only appear in the first column and last row.
A-6 Matrix Algebra Revisited, Generalized Inverses
611
Example: Hankel matrix of power sums Let A 2 Rnm be a nm rectangular matrix (n 6 m) whose entries are power sums. 2P n n P ˛i xi ˛i xi2 6 i D1 i D1 6 n n P 6P 6 ˛i xi2 ˛i xi3 6 i D1 A WD 6 i D1 :: :: 6 6 : : 6 n n 4P P ˛i xin ˛i xinC1 i D1
n P
:: :
i D1
3
7 7 7 ˛i ximC1 7 7 i D1 7 :: 7 7 : 7 n 5 P nCm1 ˛i xi i D1
˛i xim
n P
i D1
A is a Hankel matrix. Definition (Vandermonde matrix): Vandermonde matrix: V 2 Rnn 2 6 6 VWD6 6 4
1
1
1
x1 :: :
x2 xn :: :: :: : : :
3 7 7 7; 7 5
x1n1 x2n1 xnn1 Y .xi xj /: det V D i;j n i >j
Example: Vandermonde matrix V 2 R33 3 1 1 1 7 6 V WD 4 x1 x2 x3 5 ; det V D .x2 x1 /.x3 x2 /.x3 x1 /: x12 x22 x32 2
Example: Submatrix of a Hankel matrix of power sums Consider the submatrix P D Œa1 ; a2 ; : : : ; an of the Hankel matrix A 2 Rnm .n 6 m/ whose entries are power sums. The determinant of the power sums matrix P is det P D
n Y i D1
! ˛i
n Y
! xi .det V/2 ;
i D1
where det V is the Vandermonde determinant. Example: Submatrix P 2 R33 of a 3 4 Hankel matrix of power sums (n D 3, m D 4)
612
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra 2
˛1 x1 C ˛2 x2 C ˛3 x3 ˛1 x12 C ˛2 x22 C ˛3 x32 ˛1 x13 C ˛2 x23 C ˛3 x33 ˛1 x14 C ˛2 x24 C ˛3 x34
3
6 7 6 7 A D 6 ˛1 x12 C ˛2 x22 C ˛3 x32 ˛1 x13 C ˛2 x23 C ˛3 x33 ˛1 x14 C ˛2 x24 C ˛3 x34 ˛1 x15 C ˛2 x25 C ˛3 x35 7 4 5 ˛1 x13 C ˛2 x23 C ˛3 x33 ˛1 x14 C ˛2 x24 C ˛3 x34 ˛1 x15 C ˛2 x25 C ˛3 x35 ˛1 x16 C ˛2 x26 C ˛3 x36 2 3 ˛1 x1 C ˛2 x2 C ˛3 x3 ˛1 x12 C ˛2 x22 C ˛3 x32 ˛1 x13 C ˛2 x23 C ˛3 x33 6 7 6 7 P D Œa1 ; a2 ; a3 6 ˛1 x12 C ˛2 x22 C ˛3 x32 ˛1 x13 C ˛2 x23 C ˛3 x33 ˛1 x14 C ˛2 x24 C ˛3 x34 7 : 4 5 ˛1 x13 C ˛2 x23 C ˛3 x33 ˛1 x14 C ˛2 x24 C ˛3 x34 ˛1 x15 C ˛2 x25 C ˛3 x35
A-62 Scalar Measures of Matrices All reference textbooks review various scalar measures of matrices like • Linear independence • Column and row rank • Rank identities to which we refer. Permanently, we referred to a special technique called IPM (“Inverse Partitioned Matrix”) of a symmetric matrix we will review now. Facts: (Inverse Partitional Matrix/IPM/of a symmetric matrix): Let the symmetric matrix A be partitioned as " A WD
A11 A12
#
A0 12 A22
;
A0 11 D A11 ;
A0 22 D A22 :
Then its Cayley inverse A1 is symmetric and can be partitioned as well as 2 A
1
D4 2 D4
A11 A12 A0 12 A22
31 5
1 0 1 A0 A1 A1 A .A A0 A1 A /1 ŒI C A1 22 12 11 12 11 12 11 A12 .A22 A 12 A11 A12 / 11 12 1 A0 A1 .A22 A0 12 A1 12 11 11 A12 /
1 .A22 A0 12 A1 11 A12 /
3 5;
if A1 11 exists; 2 A
1
D4 2 D4
A11 A12 A0 12 A22
31 5
1 .A11 A0 12 A1 22 A12 /
1 A A1 .A11 A0 12 A1 12 22 22 A12 /
1 0 0 1 ŒI C A1 A0 .A A A1 A0 /1 A A1 A1 11 12 22 12 22 12 12 22 A 12 .A11 A 12 A22 A12 / 22
3 5;
A-6 Matrix Algebra Revisited, Generalized Inverses
613
if A1 22 exists: S11 WD A22 A0 12 A1 11 A12
and
S22 WD A11 A0 12 A1 22 A12
are the minors determined by properly chosen rows and columns of the matrix A called “Schur complements” such that " A
1
D
A11 A12
"
#1 D
A0 12 A22
1 0 1 1 1 .I C A1 11 A12 S11 A 12 /A11 A11 A12 S11 0 1 S1 11 A 12 A11
#
S1 11
if A1 11 exists; " A
1
D
A11 A12
#1
" D
A0 12 A22
S1 22
1 S1 22 A12 A22
#
0 1 1 0 1 1 A1 22 A 12 S22 ŒI C A22 A 12 S22 A12 A22
if A1 22 exists; are representations of the Cayley inverse partitioned matrix A1 in terms of “Schur complements”. The formulae S11 and S22 were first used by J. Schur (1917). The term “Schur complements” was introduced by Haynsworth (1968). Albert (1969) replaced the Cayley inverse A1 by the Moore-Penrose inverse AC . For a survey we recommend Cottle (1974), Ouellette (1981), Carlson (1986) and Styan (1985). Proof. For the proof of the “inverse partitioned matrix” A1 (Cayley inverse) of the partitioned matrix A of full rank we apply Gauss elimination (without pivoting). AA1 D A1 A D I " A11 2 Rmm ; A12 2 Rml A11 A12 0 0 AD ; A 11 D A11 ; A 22 D A22 0 A 12 A22 A0 12 2 Rlm ; A22 2 Rll " B11 2 Rmm ; B12 2 Rml B11 B12 1 0 0 ; B A D D B ; B D B 11 22 11 22 B0 12 B22 B0 12 2 Rlm ; B22 2 Rll 2
AA1
A11 B11 C A12 B0 12 D B11 A11 C B12 A0 12 D Im .1/ 6 A B C A B D B A C B A D 0 .2/ 12 22 11 12 12 22 6 11 12 D A1 A D I , 6 0 4 A 12 B11 C A22 B0 12 D B0 12 A11 C B22 A0 12 D 0 .3/ A0 12 B12 C A22 B22 D B0 12 A12 C B22 A22 D Il : .4/
614
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
Case (a) W A1 11 exists “forward step” A11 B11 C A12 B0 12 D Im .first left equation:multiply by A0 12 A1 11 / A0 12 B11 C A22 B0 12 D 0 .secondrightequation) ,
,
0 0 1 A0 12 B11 A0 12 A1 11 A12 B 12 D A 12 A11
#
A0 12 B11 C A22 B0 12 D 0 " A11 B11 C A12 B0 12 D Im
0 0 1 .A22 A0 12 A1 11 A12 /B 12 D A 12 A11
1 0 1 ) B0 12 D .A22 A0 12 A1 11 A12 / A 12 A11 0 1 B0 12 D S1 11 A 12 A11
or
"
Im
0
A0 12 A1 11 Il
#"
A11 A12 A0 12 A22
#
" D
A12
A11
0 A22 A0 12 A1 11 A12
# :
Note the “Schur complement” S11 WD A22 A0 12 A1 11 A12 . “backward step” A11 B11 C A12 B0 12 D Im
#
1 0 1 B0 12 D .A22 A0 12 A1 11 A12 / A 12 A11 0 0 1 ) B11 D A1 11 .Im A12 B 12 / D .Im B12 A 12 /A11 0 1 1 0 1 B11 D ŒIm C A1 11 A12 .A22 A 12 A11 A12 / A 12 A11 1 1 0 1 B11 D A1 11 C A11 A12 S11 A 12 A11
A11 B12 C A12 B22 D 0 .second left equation/ 1 0 1 1 ) B12 D A1 11 A12 B22 D A11 A12 .A22 A 12 A11 A12 / 1 , B22 D .A22 A0 12 A1 11 A12 /
B22 D S1 11 : Case (b) W A1 22 exists “forward step”
#
A-6 Matrix Algebra Revisited, Generalized Inverses
615
#
A11 B12 C A12 B22 D 0 .third right equation) A0 12 B12 C A22 B22 D Il .fourth left equation: multiply by A12 A1 22 / # A11 B12 C A12 B22 D 0 , 0 1 A12 A1 22 A 12 B12 A12 B22 D A12 A22 " A0 12 B12 C A22 B22 D Il , 0 1 .A11 A12 A1 22 A 12 /B12 D A12 A22 0 1 0 1 ) B12 D .A11 A12 A1 22 A 12 / A 12 A22 1 B12 D S1 or 22 A12 A22 " #" # Im A12 A1 A11 A12 22
0
Il
A0 12 A22
" D
0 A11 A12 A1 22 A 12 0
A0 12
A22
# :
Note the “Schur complement” 0 S22 WD A11 A12 A1 22 A 12 :
“backward step” A0 12 B12 C A22 B22 D Il
#
0 1 1 B12 D .A11 A12 A1 22 A 12 / A12 A22 0 0 0 1 ) B22 D A1 22 .Il A 12 B 12 / D .Il B 12 A12 /A22 0 1 0 1 1 B22 D ŒIl C A1 22 A 12 .A11 A12 A22 A 12 / A12 A22 1 0 1 1 B22 D A1 22 C A22 A 12 S22 A12 A22
A0 12 B11 C A22 B0 12 D 0 .third left equation/ 0 1 0 1 0 1 ) B0 12 D A1 22 A 12 B11 D A22 A 12 .A11 A12 A22 A 12 /
,
0 1 B11 D .A11 A12 A1 22 A 12 /
B11 D S1 22 :
|
The representations B11 ; B12 ; B21 D B0 12 ; B22 in terms of A11 ; A12 ; A21 D A0 12 ; A22 have been derived by Banachiewicz (1937). Generalizations are referred to Ando (1979), Brunaldi and Schneider (1963), Burns et al. (1974), Carlson (1980), Meyer (1973) and Mitra (1982), Li and Mathias (2000). We leave the proof of the following fact as an exercise.
616
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
Fact (Inverse Partitioned Matrix /IPM/ of a quadratic matrix): Let the quadratic matrix A be partitioned as " A WD
A11 A12 A21 A22
# :
Then its Cayley inverse A1 can be partitioned as well as " A1 D
A11 A12
#1
" D
A21 A22
1 1 1 1 1 A1 11 C A11 A12 S11 A21 A11 A11 A12 S11 1 S1 11 A21 A11
# ;
S1 11
if A1 11 exists " A
1
D
A11 A12 A21 A22
#1
" D
S1 22
1 S1 22 A12 A22
1 1 1 1 1 A1 22 A21 S22 A22 C A22 A21 S22 A12 A22
# ;
if A1 22 exists and the “Schur complements” are defined by S11 WD A22 A21 A1 11 A12
and
S22 WD A11 A12 A1 22 A21 :
Facts: Cayley Inverse, DG matrix identity, SMW matrix identity, sum of two matrices .DG id/ BD.A C CBD/1 D .B1 C DA1 C/1 DA1 (Duncan-Guttman matrix identity):
.SMW id/ .A C CBD/1 D A1 A1 C.B1 C DA1 C/1 DA1 (Sherman - Morrison - Woodbury matrix identity): Duncan (1944) calls (DG) the Sherman-Morrison-Woodbury matrix identity. If the matrix A is singular consult Henderson and Searle (1981a), Henderson and Searle (1981b), Ouellette (1981), Hager (1989), Stewart (1977) and Riedel (1992). (SMW) has been noted by Duncan (1944) and Guttman (1946): The result is directly derived from the identity .A C CBD/.A C CBD/1 D I ) A.A C CBD/1 C CBD.A C CBD/1 D I .A C CBD/1 D A1 A1 CBD.A C CBD/1
A-6 Matrix Algebra Revisited, Generalized Inverses
617
A1 D .A C CBD/1 C A1 CBD.A C CBD/1 DA1 D D.A C CBD/1 C DA1 CBD.A C CBD/1 DA1 D .I C DA1 CB/D.A C CBD/1 DA1 D .B1 C DA1 C/BD.A C CBD/1 .B1 C DA1 C/1 DA1 D BD.A C CBD/1 : Up to now we presented IPM of a symmetric matrix. Alternatively, we review rank factorization of a symmetric and positive semidefinite/definite matrix. Facts (rank factorization): (a) If the n n matrix is symmetric and positive semidefinite, then its rank factorization is " AD
# G1 G2
G0 1 G0 2 ;
where G1 is a lower triangular matrix of the order O.G1 / D rk A rk A with rk G2 D rk A, whereas G2 has the format O.G2 / D .n rk A/ rk A: In this case we speak of a Choleski decomposition. (b) In case that the matrix A is positive definite, the matrix block G2 is not needed anymore: G1 is uniquely determined. There holds 0 1 A1 D .G1 1 / G1 :
Various notions, for instance submatrices adjoint, traces and determinants, as scalar measures of a matrix are given by D. A. Harville (2001, examples, Sects. 2,5 and 13) to which we refer. We already introduced vector-valued matrix forms by means of the operations vec, vech and veck. Please pay attention to the vec-forms (a) vecA B C0 D .A ˝ C/vecB (b) .A/0 vecB D tr.A B0 / (c) vec xy0 D x ˝ y (d)
vec(A B/ D .B0 ˝ In /vecA D .B0 ˝ A/vecIm D .Iq ˝ A/vecB;
for all A 2 Rnm ;
B 2 Rmq
given by Pukelsheim (1977), pp. 324, 326 or Harville (2001), examples, Sect. 16, 22 pp.
618
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
A-63 Three Basic Types of Generalized Inverses In Chaps. 1, 3 and 5 we obtained the complete class of solution of equation “Ax D y00 and showed that every solution can be expressed in the form x D Gy where G is the properly chosen generalized inverse. Here we inquire to find solutions for such generalized inverses which are of the following type .i/
rk A D rk .AA0 / D n W “minimum norm”
Consistent system of observation equations: Underdetermine. Theorem (“minimum norm”: Rao and Mitra (1971), pp. 44–47) Let G be a generalized inverse of the matrix A such that Gy is a minimum norm solution of the equation Ax D y; rk A D rk .AA0 / D n. Then it is necessary and sufficient that Min or Ax D y (b) AGA D A and .GA/0 D GA. or 0 0 (c) A mr D A .AA / (reflexive generalized inverse which is minimum norm). n Min o If we use the weighted generalized inverse of the type kxk2 D x0 Gx Ax D y we get the Gx -MINOS solution of Chap. 1, case (a), case (b) and case (c). (a) kxk2 D kGy k2 D
.ii/
rk A D rk .A0 A/ D m W “generalized least-squares solution for Ax D y” Inconsistent system of linear observation equations: overdetermine.
Theorem (“Least-squares”: Rao and Mitra (1971), pp. 48–50) Let G be a matrix (not necessarily a generalized inverse) such that Gy is a leastsquares solution of Ax D y such that rk A D rk .A0 A/ D m. Then it is necessary and sufficient that (a) ky Axk2 D mi n .inconsistent/ or (b) AGA D A and .GA/0 D GA: or (c) Alr D .A0 A/ A (reflexive generalized inverse which is least-squares). nMin If we use the weighted generalized inverse of the type ky Axk2Gy D x o .y Ax/0 Gy .y Ax/ we enjoy receiving the Gy -LESS solution of Chap. 3, case (a), case (b) and case (c).
A-7 Complex Algebra, Quaternion Algebra, Octonian Algebra, Clifford Algebra
.iii/
619
rk A < mi n fm; ng W “generalized inverse for a minimum norm leastsquares solution”
Inconsistent system of linear observation equations with a datum defect: overdetermine-underdetermined. t u Here we start with a definition Definition (“MINOLESS”: Rao and Mitra (1971), p. 51) G is said to be a minimum norm, least-squares generalized inverse of the matrix A, if G is A l and for any y 2 Y kGy km kxkm
for all x 2 fx j ky Axkm ky Azkm
for all z 2 X
where k:kn and k:km are norms in Xm and Yn respectively. Finally we will refer to the basic results generating “MINOLESS”. Theorem (“MINOLESS”: Rao and Mitra (1971), pp. 50–54,64) Let G be a matrix such that Gy is a minimum norm, least-squares solution of the inconsistent equation Ax D y C i. Then it is necessary and sufficient that x D AC y namely, (a) AGA D A, (b) .AG/0 D AG, (c) GAG/ D G, (d) .GA/0 D GA (Penrose 1955). or AC D A0 AŒ.A0 A/2 A0 D .I C A0 A/ A0 ŒA0 A.I C A0 A/ A A li m
AC D k ! 1
k X
li m
A0 .I C AA0 /k D k ! 1
kD1 li m
k X
or
.I C A0 A/k A0
or
kD1 li m
AC D ! 0C A0 .I C AA0 /1 D ! 0C .I C A0 A/1 A0 If we use the weighted norms kxk2Px and ky Axk2Py , we receive the Gx -minimum norm, Gy - least squares solution of Chap. 5, case (a), case (b) and case (c).
A-7 Complex Algebra, Quaternion Algebra, Octonian Algebra, Clifford Algebra, Hurwitz Theorem The greatest advantage of Clifford algebra is the following fact: We are able to write down the field equations namely “rot” and “div” equation of gravitational field and electromagnetic field in one equation. The related literature is very rich and we could recommend the book by Meetz and Engl (1980). A more detailed introduction into Clifford algebra and its fascinating chess board as well as Clifford analysis is listed in Grafarend (2004) with more than 300 references. From some point of view, the variona algebra presented here are rather out of fashion. Indeed
620
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
• Complex algebra, Clifford algebra Cl.0; 1/ • Quaternion algebra, Clifford algebra Cl.0; 2/
A-71 Complex Algebra as a Division Algebra as well as a Composition Algebra, Clifford algebra C l.0; 1/ Here we present to you “complex algebra” and refer it to Clifford algebra C l.0; 1/. The “complex algebra” C due to C. F. Gauß is a division algebra over R as well as a composition algebra. x2C x D e 0x0 C e 1x1
subject to fx 0 ; x 1 g 2 R2
span C D f1; e 1 g
1
x; y; z 2 R2 ˛.x; y/ DW x C y: The axioms .G1C/, .G2C/, .G3C/, .G4C/ of an Abelian group apply.
2
x; y 2 R2 ; r; s 2 R ˇ.r; x/ DW r x: The axioms .D1C/, .D2C/, .D3/ of additive distributivity apply.
3
One way to explicitly describe a multiplicative group with finitely many elements is to give a table listing the multiplications just representing the map W C C ! C. multiplication diagram, Cayley diagram 1 e1 1 1 e1 e 1 e 1 1 Note that in the multiplication table each entry of a group appears exactly once in each row and column. The multiplication has to be read from left to right that is, the entry at the intersection of the row headed by e 1 and the column headed by e 1 is the product e 1 e 1 . Such a table is called a Cayley diagram of the multiplicative group.
A-7 Complex Algebra, Quaternion Algebra, Octonian Algebra, Clifford Algebra
621
Here note in addition the associativity of the internal multiplication given in the table. Such a “complex algebra” C is not a Lie algebra since neither x x D 0 .L1/ nor .x y/ zC C.y z/ x C .z x/ y D 0 (Jacobi identity) .L2/ hold. Just by means of the multiplication table compute x x D 1f.x 0 /2 .x 1 /2 g C 2e 1 x 0 x 1 6D 0: 4
Begin with the choice x 1 D
1 .1x 0 e 1 x 1 / .x 0 /2 C .x 1 /2
in order to end up with x x 1 D .1x 0 C e 1 x 1 /
1x 0 e 1 x 1 D1 .x 0 /2 C .x 1 /2
Accordingly .G 1/, .G 2/, .G 3/ of a division algebra apply. 5
Begin with the choice Q.x/ D Q.1x 0 C e 1 x 1 / WD .x 0 /2 C .x 1 /2 in order to prove (K1), (K2) and (K3). We only focus on (K2i): Q.x y/ D Q.x/ Q.y/ 0 0
.multiplicativity/ 1 1
Q.x y/ D Qf1.x y x y / C e 1 .x 1 y 0 C x 0 y 1 /g D .x 0 /.y 0 /2 C .x 1 /2 .y 1 /2 C .x 1 /2 .y 0 /2 C .x 0 /2 .y 1 /2 Q.x/ Q.y/ D f.x 0 /2 C .x 1 /2 g f.y 0 /2 C .y 1 /2 g D .x 0 /2 .y 0 /2 C .x 1 /2 .y 1 /2 C .x 1 /2 .y 0 /2 C .x 0 /2 .y 1 /2 Q.x y/ D Q.x/ Q.y/ q:e:d: How can be one dream about such a complex algebra C? Gauss (1887) had p been motivated in his number theory to introduce complex numbers with i WD 1 as the “imaginary unit”. Identify 1x 0 with the “real part” and e 1 x 1 D ix 1 with the “imaginary part” of x and we are left with the standard theory of complex numbers. x 1 is based upon the complex conjugate 1x 0 e 1 x 1 of x being divided by the norm of x. There is a remarkable isomorphism between complex numbers and complex algebra. The proper algebraic interpretation of complex numbers is in terms of Clifford algebra C l.0; 1/. Observe g.e 1 ; e 1 / D 1 which interprets the binary operation of the base vector which spans the vector part of a complex number. Now translate the multiplication table into the language of the Clifford product, namely
622
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
1 ^ 1 D 1;
1 ^ e1 D e1
e 1 ^ 1 D e 1 ; e 1 ^ e 1 D 1 in order to convince yourself that the Clifford algebra C l.0; 1/ is algebraically isomorphic to the space of complex numbers. | How can we relate complex numbers to Clifford algebra C l1 ? Observe g.e 1 ; e 1 / D 1 which interprets the binary operation of the base vector which spans the vector part of a complex number. While the scalar part of a complex number is an element of A 0 , its vector part can be considered to be an element of A 1 . The direct sum A0 ˚ A1 of spaces A 0 , A 1 is algebraically isomorphic to the space of complex numbers, being an element of the Clifford algebra C l1 . |
A-72 Quaternion Algebra as a Division Algebra as well as a Composition Algebra, Clifford algebra C l.0; 2/ Next we review “quaternion algebra” quoting W. R. Hamilton and refer it to Clifford algebra C l.0; 2/. The ”quaternion algebra” H due to Hamilton (1843) is a division algebra over R as well as a composition algebra: x2H x D 1x 0 C e 1 x 1 C e 2 x 2 C e 3 x 3
subject to
fx 0 ; x 1 ; x 2 ; x 3 g 2 R4
span H D f1; e 1 ; e 2 ; e 3 g
1
x; y; z 2 R4 ˛.x; y/ DW x C y: The axioms .G1C/, .G2C/, .G3C/, .G4C/ of an Abelian additive group apply.
2
x; y 2 R4 ;
r; s 2 R
ˇ.r; y/ DW r x: The axioms .D1C/, .D2C/, .D3/ of additive distributivity apply.
A-7 Complex Algebra, Quaternion Algebra, Octonian Algebra, Clifford Algebra
3
623
One way to explicitly describe a multiplicative group with finitely many elements is to give a table listing the multiplications just representing the map W H H ! H. multiplication table, Cayley diagram 1 e1 e2 e3
1 e1 e2 e3 1 e1 e2 e3 e 1 1 e 3 e 2 : e 2 e 3 1 e 1 e 3 e 2 e 1 1
Note that in the multiplication table each antry of a group appears exactly once in each row and column. The multiplication has to be read from left to right that is, the entry at the intersection of the row headed by e 1 and the column headed by e 2 is the product e 1 e 2 . Such a table is called a Cayley diagram of the multiplicative group. Here note in addition the associativity of the internal multiplication given by the table, e.g. e 1 .e 2 e 3 / D e 1 e 1 D 1 D e 3 e 3 D .e 1 e 2 / e 3 or ! 3 X X x y D 1 xıy ı xk y k C e k .x ı y k C x k y ı C ijk x i y j / kD1
i;j;k
such that .x y/ z D x .y z/. Such a “Hamilton algebra” H is not a Lie algebra since neither x x D 0 (L1) nor .x y/ z C .y z/ x C .z x/ y D 0 (L2) (Jacobi identity) hold. Just by means of the multiplication table compute x x D 1f.x ı /2 .x 1 /2 .x 2 /2 .x 3 /2 g C 2e 1 x ı x 1 C 2e 2 x ı x 2 C 2e 3 x ı x 3 6D 0 4
Begin with the choice x 1 D
1 .1x ı e 1 x 1 x 2 x 2 e 2 x 3 / .x ı /2 C .x 1 /2 C .x 2 /2 C .x 3 /2
in order to to end up with x x 1 D Œ1x ı C e 1 x 1 C e 2 x 2 C e 3 x 3
1x ı e 1 x 1 e 2 x 2 e 3 x 3 D 1: .x ı /2 C .x 1 /2 C .x 2 /2 C .x 3 /2
Accordingly .G1/, .G2/, .G3/ of a division algebra apply. 5
Begin with the choice
624
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
Q.x/ D Q.1 C e 1 x 1 C e 2 x 2 C e 3 x 3 / WD .x 0 /2 C .x 1 /2 C .x 2 /2 C .x 3 /2 in order to prove .K 1/. .K 2/ and .K 3/. The laborious proofs are left as an exercise. How can one dream about such a “quaternion algebra” H? Hamilton (16 October 1843) invented quaternion numbers as outlined in a letter (1865) to his son A. H. Hamilton for the following reason: “If I may be allowed to speak of myself in connexion with the subject, I might do so in a way which would bring you in, by referring to an antequaternionic time, when you were a mere child, but had caught from me the conception of a Vector, as represented by a Triplet; and indeed I happen to be able to put the finger of memory upon the year and month – October, 1843 – when having recently returned from visits to Cork and Parsonstown, connected with a Meeting of the British Association, the desire to discover the laws of the multiplication referred to regained with me a certain strength and carnestness, which had for years been dormant, but was then on the point of being gratified, and was occasionally talked of with you. Every morning in the early part of the above cited month, on my coming down to breakfast, your (then) little brother William Edwin, and yourself, used to ask me, “Well, Papa, can you multiply triplets”? Whereto I was always obliged to reply, with a sad shake of the head: “No, I can only add and subtract them.” But on the 16th day of the same month – which happened to be a Monday, and a Council day of the Royal Irish Academy – I was walking in to attend and preside, and your mother was walking with me, along the Royal Canal, to which she had perhaps driven; and although she talked with me now and then, yet an under-current of thought was going on in my mind, which gave at last a result, whereof it is not too much to say that I felt at once the importance. An electric circuit seemed to close; and a spark flashed forth. The herald (as I foresaw, immediately) of many long years to come of definitely directed thought and work, by myself if spared, and at all events on the part of others, if should even be allowed to live long enough distinctly to communicate the discovery. Nor could I resist the impulse – unphilosophical as it may have been – to cut with a knife on a stone of Brougham Bridge, as we passed it, the fundamental formula with the symbols, i; j; k; namely i 2 D j 2 D k2 D ijk D 1
A-7 Complex Algebra, Quaternion Algebra, Octonian Algebra, Clifford Algebra
625
which contains the Solution of the Problem, but of course, as an inscription, has long since moldered away. A more durable notice remains, however, on the Council Books of the Academy for that day (October 16th, 1843), which records the fact, that I then asked for and obtained base to read a Paper on Quaternion, at the First General Meeting of the Session: which reading took place accordingly, on Monday the 13th of the November following.” Obviously the vector part e 1 x 1 C e 2 x 2 C e 3 x 3 D i x 1 C j x 2 C kx 3 of a quaternion number replaces the imaginary part of a complex number, the scalar part 1x 0 the real part. The quaternian conjugate 1x ı
3 X
ek x k DW x
kD1
substitutes the complex conjugate of ta complex number, leading to the quaternion inverse x 1 D
x : Q.x/
The proper algebraic interpretation of quaternion numbers is in terms of Clifford algebra C l.0; 2/. If n D dim X D 2 is the dimension of the linear space X which we base the Clifford algebra on, its basis elements are
f1; e 1 ; e 2 ; e 1 ^ e 2 g subject to
e 1 ^ e 2 C e 2 ^ e 1 D 0;
e 1 ^ e 1 D g.e 1 ; e 1 /1 D 1 ; e 2 ^ e 2 D g.e 2 ; e 2 /1 D 1;
.e 2 ^ e 2 /2 D .e 1 ^ e 2 / ^ .e 1 ^ e 2 / D e 2 ^ .e 1 ^ e 1 / ^ e 1 D Ce 2 ^ e 2 D 1
.e 2 ^ e 2 / ^ e 1 D e 2 ^ .e 1 ^ e 1 / D e 2
.e 2 ^ e 2 / ^ e 2 D e 1 ^ .e 2 ^ e 2 / D e 1 and
e 3 WD e 1 ^ e 2 by classical notation. Obviously
x D 1x ı C e 1 x 1 C e 2 x 2 C e 1 ^ e 2 x 3 2 C l.0; 2/ is an element of Clifford algebra C l.0; 2/. There is an algebraic isomorphism between the quaternion algebra H of vectors and the quaternion algebra of matrices, namely either M.R44 / of 4 4 real matrices or M.C22 / of 2 2 complex matrices.
626
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
Firstly we define the 4 4 real matrix basis E and decompose it into the four constituents ˙ 0 , ˙ 1 , ˙ 2 , ˙ 3 of 4 4 real Pauli matrices which form a multiplicative group of the multiplication table of Hamilton type 3 2 1 e1 e2 e3 6 e 1 1 e 3 e 2 7 7 D 1˙ 0 C e 1 ˙ 1 C e 2 ˙ 2 C e 3 ˙ 3 2 R44 E WD 6 4 e 2 e 3 1 e 1 5 e 3 e 2
e1
1
2
2 3 3 1000 010 0 60 1 0 07 6 1 0 0 0 7 6 7 7 ˙0 D 6 4 0 0 1 0 5 ; ˙ 1 WD 4 0 0 0 1 5 ; 0001 001 0 2 2 3 3 0 010 00 01 6 0 0 0 17 6 0 0 1 0 7 6 7 7 ˙2 D 6 4 1 0 0 0 5 ; ˙ 3 WD 4 0 1 0 0 5 ; 0 1 0 0 1 0 0 0 multiplication table, Cayley diagram
˙0 ˙1 ˙2 ˙3
˙0 ˙1 ˙2 ˙3 ˙0 ˙1 ˙2 ˙3 ˙ 1 ˙ 0 ˙ 3 ˙ 2 ˙ 2 ˙ 3 ˙ 0 ˙ 1 ˙ 3 ˙ 2 ˙ 1 ˙ 0 or
˙ 0˙ 0 D ˙ 0; ˙ 0˙ i D ˙ i ˙ 0 D ˙ i ˙ i ˙ j D ıij ˙ 0 C ij k ˙ k
for all i 2 f1; 2; 3g
for all i; j; k 2 f1; 2; 3g
Let A, B, C 2 M (R44 , Hamilton) constituted by means of 2 3 a1 a2 a3 a4 6 a2 a1 a4 a3 7 6 7 4 a3 a4 a1 a2 5 DW A 2
a4 a3
a2
a1
3 b1 b2 b3 b4 6 b2 b1 b4 b3 7 6 7 4 b3 b4 b1 b2 5 DW B b4 b3 b2 b1
A-7 Complex Algebra, Quaternion Algebra, Octonian Algebra, Clifford Algebra
627
such that the Cayley-product 2
3 c1 c2 c3 c4 6 c2 c1 c4 c3 7 7 AB D 6 4 c3 c4 c1 c2 5 DW C 2 M c4 c3 c2 c1
.R44 ; Hamilton/
c1 D a1 b1 a2 b2 a3 b3 a4 b4 c2 D a1 b2 C a2 b1 C a3 b4 a4 b3 c3 D a1 b3 a2 b4 C a3 b1 C a4 b2 c4 D a1 b4 C a2 b3 a3 b2 C a4 b1 fulfilling the axioms .G1ı/, .G2ı/, .G3ı/ of a non-Abelian multiplicative group, namely .G1ı/ .AB/C D A.BC / .associativity/ .G2ı/ AI D A .identity/ .inverse/; .G3ı/ AA 1 D I but .G4ı/ does not apply, in particular det A D a12 C a22 C a32 C a42 ;
det B D b12 C b22 C b32 C b42
det .AB/ D .det A/.det B/: Secondly we define the 2 2 complex matrix basis E and decompose it into the four constituents ˙ 0 , ˙ 1 , ˙ 2 , ˙ 3 of 2 2 complex Pauli matrices which form a multiplicative group of the multiplication table of Hamilton type "
1 C i e1 e2 C i e3
#
D 1˙ 0 C e1˙ 1 C e 2 ˙ 2 C e 3 ˙ 3 2 C22 e 2 C i e 3 1 i e 1 10 i 0 ˙ 0 WD ; ˙ 1 WD ; 01 0 i 01 0i ; ˙ 3 WD ˙ 2 WD 10 i 0 E WD
multiplication table, Cayley diagram
628
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
0
˙ ˙1 ˙2 ˙3
˙0 ˙1 ˙2 ˙3 ˙0 ˙1 ˙2 ˙3 ˙ 1 ˙ 0 ˙ 3 ˙ 2 : ˙ 2 ˙ 3 ˙ 0 ˙ 1 ˙ 3 ˙ 2 ˙ 1 ˙ 0
Note that E 2 C22 is “Hermitean”. Let A, B, C 2 M .C22 , Hamilton) constituted by means of " "
a1 C i a2 a3 C i a4 a3 C i a4 a1 i a2 b1 C i b2 b3 C i b4
# DW A # DW B
b3 C i b4 b1 i b2 such that the Cayley-product "
c1 C i c2 c3 C i c4
#
DW C 2 M .C2 ; Hamilton/ c3 C i c4 c1 i c2 det .AB/ D .det A/.det B/ D a12 C a22 C a32 C a42 b12 C b22 C b32 C b42 AB D
D c12 C c22 C c32 C c42 c1 C i c2 D .a1 C i a2 /.b1 C i b2 / C .a3 C i a4 /.b3 C i b4 / D a1 b1 a2 b2 a3 b3 a4 b4 C i.a1 b2 C a2 b1 C a3 b4 a4 b3 / c3 C i c4 D .a1 C i a2 /.b3 C i b4 / C .a3 C i a4 /.b1 i b2 / D a1 b3 a2 b4 C a3 b1 C a4 b2 C i.a1 b4 C a2 b3 a3 b2 C a4 b1 / fulfilling the axioms .G1ı/, G2ı/, .G3ı/ of a non-Abelian multiplicative group. The spinor " s WD
1 C i e1 e2 C i e3
#
" D
s1 s2
#
as a vector of length zero relates to the 2 2 complex matrix basis by " ED
s1 s 2 s2 s1
# :
|
A-7 Complex Algebra, Quaternion Algebra, Octonian Algebra, Clifford Algebra
629
A-73 Octanian Algebra as a Non-Associative Algebra as well as a Composition Algebra, Clifford algebra with Respect to H H The octanian algebra O also called “the algebra of octaves” due to Graves (1843) and Cayley (1845) is a composition algebra over R as well as a non-associative algebra: x2O x D 1x ı C e 1 x 1 C e 2 x 2 C e 3 x 3 C e 4 x 4 C e 5 x 5 C e 6 x 6 C e 7 x 7
subject to
fx ı ; x 1 ; : : : ; x 6 ; x 7 g 2 R8 span O D f1; e 1 ; e 2 ; e 3 ; e 4 ; e 5 ; e 6 ; e 7 g x; y; z 2 R8
1
˛.x; y/ DW x C y The axioms .G1C/, .G2C/, .G3C/, .G4C/ of an Abelian additive group apply. 2 x; y 2 R8 ; r; s 2 R ˇ.r; x/ DW r x The axioms .D1C/, .D2C/, .D3/ of additive distributivity apply. 3 One way to explicitly describe a multiplicative group with finitely many elements is to give a table listing the multiplications just representing the map W O O ! O. multiplication table, Cayley diagram
1 e1 e2 e3 e4 e5 e6 e7
1 1 e1 e2 e3 e4 e5 e6 e7
e1 e1 1 e 3 e2 e 5 e4 e7 e 6
e2 e3 e2 e3 e 3 e 2 1 e 1 e 1 1 e 6 e 7 e 7 e 6 e 4 e 5 e5 e4
e4 e5 e6 e4 e5 e6 e 5 e 4 e 7 e 6 e 7 e 4 e 7 e 6 e 5 1 e 1 e 2 e 1 1 e 3 e 2 e 3 1 e 3 e 2 e 1
e7 e7 e6 e 5 e 4 e 3 e2 e 1 1
630
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
Note that in the multiplication table each entry of a group appears exactly once in each row and column. The multiplication has to be read from left to right that is, the entry at the intersection of the row headed by e 5 is the product e 3 e 5 . Such a table is called a Cayley diagram of the multiplicative group. Note the non-associativity of the internal multiplication given by the table, e.g. e 2 .e 3 e 4 / 6D .e 2 e 3 /e 4 , namely by means of e 3 e 4 D e 7 , e 2 .e 3 e 4 / D e 2 e 7 D e 5 versus e 2 e 3 D e 1 , .e 2 e 3 / e 4 D e 1 e 4 D Ce 5 . Such an “octonian algebra” O is not a Lie algebra since neither x x D 0 (L1) nor .x y/z C .y z/ x C .z x/ y D 0 (L2) (Jacobi identity) hold. Just by means of the multiplication table compute .e 2 e 3 / e 4 C .e 3 e 4 / e 2 C .e 4 e 2 / e 3 D e 5 6D 0: 4 does not apply. 5 Begin with the choice Q.x/ D Q.1x ı C e 1 x 1 C C e 6 x 6 C e 7 x 7 / D .x ı /2 C .x 1 /2 C C .x 6 /2 C .x 7 /2 in order to prove (K1), (K2), (K3). The laborious proofs are left as an exercise. The proper algebraic interpretation of octonian numbers is in terms of Clifford algebra, namely with respect to the eight dimensional set H H DW H2 where H is the usual skew field of Hamilton’s quaternions, algebraically isomorphic to C l.0; 2/. Indeed it would have been temptating to base “octonian algebra” O on C l.0; 3/, dim C l.0; 3/ D 23 D 8, but its generic elements
f1; e 1 ; e 2 ; e 3 ; e 1 ^ e 2 ; e 2 ^ e 3 ; e 3 ^ e 1 ; e 1 ^ e 2 ^ e 3 g are not representing the octonian multiplication table e.g.
e 1 ^ e 2 ^ e 3 /2 D g.e 1 ^ e 2 ^ e 3 ; e 1 ^ e 2 ^ e 3 /
D .e 1 ^ e 2 ^ e 3 / ^ .e 1 ^ e 2 ^ e 3 /
D e 1 ^ e 2 ^ e 1 ^ e 3 ^ e 2 ^ e 3 D e 1 ^ e 2 ^ e 1 ^ e 2 ^ .e 3 ^ e 3 /
D e 1 ^ e 2 ^ e 1 ^ e 2 D e 1 ^ .e 2 ^ e 2 / ^ e 1 D .e 2 ^ e 1 / D C1 In contrast, let us introduce the pair x WD .a; b/ 2 fX j a 2 H; b 2 Hg
A-7 Complex Algebra, Quaternion Algebra, Octonian Algebra, Clifford Algebra 0
x 2 H2 ;
1
x 2 H2 ;
631
00
x 2 H2
0
0
0
0
˛.x; x DW x C x / 0
x C x D .a C a ; b C b /: The axioms .G1C/, .G2C/, .G3C/, .G4C/ of an Abelian additive group apply. 0 0 2 x:x 2 H2 ; r; r 2 R ˇ.r; x/ DW r x r x D .r a; r b/: The axioms .D1C/, .D2C/, .D3/ of additive distributivity apply. 0
x 2 H H D H2 ;
3
x 2 H H D H2
0
.x; x / DW x x 0
0 0
0
0
0
x x DW .aa b b; b a C ba / 0
0
a, b denote the conjugate of a 2 H, b 2 H, respectively. If .a; b/, .a ; b / are represented by 1 0 1 0 3 3 3 3 X X X X 0 0 0 0 @1˛ ı C e i ˛ i ; 1ˇ ı C e j ˇ jA ; @1˛ ı C e i 0 ˛ i ; 1ˇ ı C e j 0 ˇ jA i D1
i 0 D1
j D1
j D1
respectively, where
span H D f1; e 1 ; e 2 ; e 1 ^ e 2 D e 3 g or
0
0
span H D f1 ; e 10 ; e 20 ; e 10 ^ e 20 D e 30 g D f1 ; e 5 ; e 6 ; e 7 g 0
the “octonian product” x x results in ' 3 P 0 ı 0ı ı 0ı k ‘k k 0k xx D 1 ˛ ˛ ˇ ˇ .˛ ˛ ˇ ˇ / 3 P
$
kD1
0
C &
3 P i;j;kD1
k
0
e k Œ˛ ˇ C ˛ k ˇ ı C ijk .˛ i ˛ j ˇ j ˇ i /; i;j;kD1 3 P 0 0 0 1 ˛ı ˇ ı ˛ ı ˇı .˛ k ˇ ‘k ˛ k ˇ k /
C
ı
kD1
ı
0k
k
0ı
e k Œ˛ ˇ C ˛ ˇ C
0 ijk .˛ j ˇ i
0j
!
i
˛ ˇ / : %
632
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
the axioms .D1 C/, .D2 / of distributivity apply. The pair .1; 0/ is the neutral element. 4 Begin with the choice of .1st/ the transpose x WD a; b/ of x 2 H2 .2nd/ x x D Q.x/ or x x D .aa C bb; 0/ .3rd/ x 1 D
x if x 6D 0 Q.x/
in order to end up with x x 1 D 1: A historical perspective of octonian numbers is given by Van Der Waerden (1950). Reference is made to Graves (1848) and A. Caley: Collected Mathematical Papers, vol. 1, p. 127 and vol. 11, pp. 368–371. | The exceptional role of the examples on complex, quaternion and octonian algebra illustrating Clifford algebra C l.0; 1/, C l.0; 2/ as well as Clifford algebra with respect to H2 is established by the following theorems: ' $ Theorem (“Hurwitz’ theorem of composition algebras”): A complete list of composition algebras over R consists of (a) the real numbers R, (b)
the complex numbers C,
(c)
the quaternions H,
(d) the octonians O . & '
% $
Theorem (“Frobenius’ theorem of division algebras”): The only finite-dimensional division algebra over R are (˛) the real numbers R, (ˇ) ( ) &
the complex numbers C, the quaternions H . % Historical Aside
For details consult the historical texts Hurwitz (1898), Frobenius (1878) as well as Hazlett (1917). A more recent reference is Jacobson (1974), pp. 425 and 430.
A-7 Complex Algebra, Quaternion Algebra, Octonian Algebra, Clifford Algebra
633
A-74 Clifford Algebra We already took advantage of the notion of Clifford algebra when we referred to complex algebra, quaternion algebra, and octonian algebra. Here we finally confront you with the definition of “orthogonal clifford algebra C l.p; q/”. But on our way to Clifford algebra we have to generalize at first the notion of a basis, in particular its bilinear form. ' $ Theorem (bilinear form): Suppose that the bracket hji or g.; / W XX ! R is a bilinear form on a finite dimensional linear space X, e. g. a vector space, over the field R of real numbers, in addition X its dual space such that n D dim X D dimX. There exists a basis fe 1 ; : : : ; e n g such that (a) he i j e j i D 0 or g.e i ; e j / D 0 for i 6D j 2 he i1 j e i1 i D C1 or g.e i1 ; e i1 / D C1 for 1 i1 < p ; 6 (b) 6 4 he i2 j e i2 i D 1 or g.e i2 ; e i2 / D 1 for p C 1 i2 p C q D r ; he i3 j e i3 i D 0 or g.e i3 ; e i3 / D 0 for r C 1 i3 n holds. &
%
The numbers r and p are determined exclusively by the bilinear form. r is called the rank, r p D q is called the index and the ordered pair .p; q/ the signature. The theorem assures that any two spaces of the same dimension with bilinear forms of the same signature are isometrically isomorphic. A scalar product (“inner product”) in this context is a nondegenerate bilinear form, i.e., a form with rank equal to the dimension of X. When dealing with low dimensional spaces as we do, we will often indicate the signature with a series of plus and minus signs and zeroes where appropriate. For example, the signature of R41 may be written .C C C/ instead of .3; 1/. If the bilinear form is nondegenerate, a basis with the properties listed in corresponding Theorem is called an orthonormal basis (“unimodular”) for X with respect to the bilinear form. Definition (orthogonal Clifford algebra C l.p; q/): The orthogonal Clifford algebra C l.p; q/ is the algebra of polynomials generated by the direct sum of the space of multilinear functions m
˚nmD0 ^ .X / on a linear space X, respectively its dual X over the field of real numbers Rnp of dimension
634
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra m
dim ˚nmD0 ^ .X / D 2n and signature .p; q/, namely nDdim nDdim P X i1 P X i1 i2 1f0 C e fi1 C e ^ e fi1 i2 nDdim PX
i1 D1
i1
i2
i1 ;i2 D1
i3
C e ^ e ^ e fi1 i2 i3 C C i1 ;i2 ;i3 D1
nDdim PX
i1 ;:::;in D1
e i1 ^ ^ e in fi1 in
subject to the Clifford product, also called the Clifford dualizer, #
.i/ ei ^ ej D ej ^ ei f or i 6D j 2 e ^ e D g.ei1 ; ei1 / D C1 f or 1 i1 p ; 6 i1 i1 .ii/ 6 4 ei2 ^ ei2 D g.ei2 ; ei2 / D 1 f or p C 1 i2 p C q D r ;
ei ^ ei3 D g.ei3 ; ei3 / D 0 f or r C 1 i3 n ; "3
!
or
e1 ^ ej C ej ^ ei
2 g.e i1 ; e i1 / D C1 for 1 i1 p 6 D 2g.e i ; e j /ıij 1 subject to 6 4g.e i2 ; e i2 / D 1 for p C 1 i2 p C q D r g.e i3 ; e i3 / D 0 for r C 1 i3 n
1 being the neutral element. If e k ^ e k D 0 or g.ek ; ek / D 0 holds uniformly the orthogonal Clifford algebra C l.p; q/ reduces to the polynomial algebra of antisym-metric multilinear functions ˚nmD0 Am D ˚nmD0 m .X / D .X / dim ˚nmD0 Am D dim ˚nmD0 m .X / D dim .X / D 2n represented by # 1 1f0 C 1Š nDdim XX
nDdim XX i1 D1
1 e fi1 C 2Š i1
nDdim XX i1 ;i2 D1
1 1 C e i1 ^ e i2 ^ e i3 fi1 i2 i3 C C 3Š i ;i ;i D1 nŠ 1 2 3 "
e i1 ^ e i2 fi1 i2
nDdim XX
i1 ;:::;in D1
e i1 ^ ^ e in fi1 in : !
A-7 Complex Algebra, Quaternion Algebra, Octonian Algebra, Clifford Algebra
635
'
$
Example: Clifford product: x y, x y 2 X; sign X D .3; 0/ Let x y 2 X be a real three-dimensional vector space X of signature (3,0). then x y (read: “x Clifford y”) with respect to a set of the bases fe1 ; e2 ; e3 g accounts for 3 3 X X
x^y D
ei ^ ej x i y j D
i D1 j D1
3 3 X X
x i y j ei ^ ej
i D1 j D1
x ^ y D .e1 x 1 C e2 x 2 C e3 x 3 / ^.e1 y 1 C e2 y 2 C e3 y 3 /
x ^ y D 1.x 1 y 1 C x 2 y 2 C x 3 y 3 / C e1 ^ e2 .x 1 y 2 x 2 y 1 / C e2 ^ e3 .x 2 y 3 x 3 y 2 / C e3 ^ e1 .x 3 y 1 x 1 y 3 /
Indeed x ^ y as a Clifford number is decomposed into a scalar part and an antisymmetric tensor part with respect to the bilinear basis fe1 ^ e2 ; e2 ^ e3 ; e3 ^ e1 g. & % Tensor algebra or the algebra of multilinear functions, namely
1f0 C
nDdim XX
i1
e fi1 C
i1 D1
C
nDdim XX
nDdim XX
e i ˝ e i2 fi1 i2
i1 ;i2 D1
e i1 ˝ e i2 ˝ e i3 fi1 i2 i3 C C
i1 ;i2 ;i3 D1
2
˚nmD0
nDdim XX
e i1 ˝ ˝ e in fi1 in
i1 ;:::;in D1
C
˝ .X / D ˝.X/ D T ˚ T " C T D ˚hD0 ˝2h .X/ .“even”/ subject to T D ˚kD0 ˝2kC1 .X/ .“odd”/ is the sum of two spaces, T C and T , respectively, in particular n n 1 X i1 i2 1 X i1 i2 i3 i4 e ^ e fi1 i2 C e ^ e ^ e ^ e fi1 i2 i3 i4 C ; 2Š i ;i D1 4Š i ;i ;i ;i 1 2 3 4 1 2 n n 1 X i1 1 X C l 3 e fi1 C e i1 ^ e i2 ^ e i3 fi1 i2 i3 C : 1Š i D1 3Š i ;i ;i D1
C l C 3 1f0 C
1
1 2 3
636
A Tensor Algebra, Linear Algebra, Matrix Algebra, Multilinear Algebra
Obviously C l C as well as C l are subalgebras of C l. Let the Clifford numbers z be divided into zC 2 C l C and z 2 C l , then the properties C
C
C
z ^ z 2 C l ;
C
z ^ z 2 Cl ;
C
z ^ z 2 Cl
prove that C l.p; q/ is graded over the cydric group Z2 D f0; 1g.
|
Appendix B
Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
We begin with sampling distributions by two sections on the transformation of random variables. A first confidence interval for Gauss-Laplace normal distribution observations 1 2 known is constructed, including its famous Three Sigma Rule. The second confidence interval of Gauss-Laplace i.i.d. observations is derived for the sample mean b BLUE of 1 when the variance is known. We take advantage of the forward-backward Helmert transformation leading to Helmert chi-square
2 probability distribution (Helmert, 1875). In contrast, the sampling distributions from Gauss-Laplace i.i.d. normal distributions within the third confidence interval is analyzed for the mean, but an unknown variance. Our basis is student’s p t-distribution on the random variable n.b /=b where the sample mean b – is BLUE of whereas the sample confidence for the “true” mean , variance 2 unknown which is based on student’s probability distribution. In detail, we derive the related Uncertainty Principle generated by the Magic Triangle of (a) length of the confidence interval, (b) coefficient of negative confidence called the uncertainty number and (c) the number of observations. For more details we refer to Lemma B.12 of W.S. Gosset. The sampling from the Gauss-Laplace normal distribution, namely the fourth confidence interval for the variance is reviewed: We introduce the missing confidence interval for the “true” variance 2 : Table B.18 as a flow chart summarizes the various steps in computing a confidence interval for the variance. This time the related Uncertainty Principle is built on (a) the coefficient of complementary confidence ˛, also called uncertainty number (b) the number of observations, n, and (c) the length ı 2 .n; ˛/ of the confidence interval the “true” variance 2 . The most detailed sampling theory from multidimensional Gauss-Laplace normal distribution, namely for the confidence region for the fixed parameters in the linear Gauss–Markov, is stated by an extensive Example B.19. We need the Principle Component Analysis PCA, also called Singular Value Decomposition (SVD) or Eigenvalue Analysis (EIGEN). Theorem B.17 summarizes the marginal distributions special linear Gauss–Markov model with datum defect. In our appendix we concentrate on multidimensional variance analysis, namely sampling form the multivariate Gauss-Laplace normal distribution. In short E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2, © Springer-Verlag Berlin Heidelberg 2012
637
638
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
introduction we review the distribution of sample mean and variance-covariance. We define the k-dimensional noncentral WHISHART distribution by (B377) and the characteristic function of a vector-valued random variate of type Gauss-Laplace distribution. Our highlight is the distribution of xN^ and SN of type WHISHART. Another highlight is the distribution related to the correlation coefficients originated by R.A. Fisher (1915), programmed by F.N. David (1954).
B-1 A First Vehicle: Transformation of Random Variables If the probability density function (pdf ) of a random vector y D Œy1 ; : : : ; yn 0 is known, but we want to derive the probability density function (pdf ) of a random vector x D Œx1 ; : : : ; xn 0 (pdf ) which is generated by an injective mapping x =g(y) or xi D gi Œy1 ; : : : ; yn for all i 2 f1; : : : ; ng we need the results of Lemma B.1. Lemma B.1. (transformation of pdf): Let the random vector y WD Œy1 ; : : : ; yn 0 be transformed into the random vector x D Œx1 ; : : : ; xn 0 by an injective mapping x D g.y/ or xi D gi Œy1 ; : : : ; yn for all i 2 f1; : : : ; ng which is of continuity class C1 (first derivatives are continuous). Let the Jacobi matrix Jx W D .@gi =@yi / be regular (det Jx ¤ 0), then the inverse transformation y D g 1.x/ or yi D gi1 Œx1 ; : : : ; xn is unique. Let fx .x1 ; : : : ; xn / be the unknown pdf, but fy .y1 ; : : : ; yn / the given pdf, then ˇ ˇ fx .x1 ; : : : ; xn / D f .g11 .x1 ; : : : ; xn /; : : : ; g11 .x1 ; : : : ; xn // ˇdet Jy ˇ with respect to the Jacobi matrix
@yi Jy WD @xj
@gi1 D @xj
for all i; j 2 f1; : : : ; ng holds. Before we sketch the proof we shall present two examples in order to make you more familiar with the notation. Example B.1. (“counter example”): The vector-valued random variable .y1 ; y2 / is transformed into the vector-valued random variable .x1 ; x2 / by means of x1 D y1 C y2 ; x2 D y12 C y22 @x @x1 =@y1 @x1 =@y2 1 1 D D Jx W D 2y1 2y2 @x2 =@y1 @x2 =@y2 @y 0
B-1 A First Vehicle: Transformation of Random Variables
639
x12 D y12 C 2y1 y2 C y22 ; x2 C 2y1 y2 D y12 C 2y1 y2 C y22 x12 D x2 C 2y1 y2 ; y2 D .x12 x2 /=.2y1 / x1 D y1 C y2 D y1 C
y1˙ y2˙
1 x12 x2 ; 2 y1
1 x1 y1 D y12 C .x12 x2 / 2
1 y12 x1 y1 C .x12 x2 / D 0 2 s x12 1 2 1 .x1 x2 / D x1 ˙ 2 4 2 s x12 1 2 1 .x1 x2 /: D x1
2 4 2
At first we have computed the Jacobi matrix Jx , secondly we aimed at an inversion of the direct transformation .y1 ; y2 / 7! .x1 ; x2 /. As the detailed inversion step proves, namely the solution of a quadratic equation, the mapping x D g.y/ is not injective. Example B.2. Suppose .x1 ; x2 / is a random variable having pdf fx .x1 ; x2 / D
exp .x1 x2 /; x1 0; x2 0 0 ; otherwise:
We require to find the pdf of the random variable .x1 C x2 ; x2 =x1 /: The transformation y1 D x1 C x2 ; y2 D
x2 x1
has the inverse x1 D
y1 y1 y2 ; x2 D : 1 C y2 1 C y2
The transformation provides a one-to-one mapping between points in the first quadrant of the .x1 ; x2 / – plane P2x and in the first quadrant of the .y1 ; y2 / – plane P2y . The absolute value of the Jacobian of the transformation for all points in the first quadrant is
640
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
ˇ ˇ 1 ˇ ˇ @.x1 ; x2 / ˇ ˇˇ @x @y1 ˇ ˇ ˇ @.y ; y / ˇ D ˇˇ @x2 1 2 @y 1
@x1 @y2 @x2 @y2
ˇ ˇ ˇ ˇ ˇ ˇ y1 .1 C y2 /2 ˇ ˇ .1 C y2 /1 ˇ ˇDˇ ˇ y2 .1 C y2 /1 y1 .1 C y2 /2 Œ.1 C y2 / y2 ˇ
D y1 .1 C y2 /3 C y1 y2 .1 C y2 /3 D
y1 : .1 C y2 /2
Hence we have found for the pdf of .y1 ; y2 / " fy .y1 ; y2 / D
y1 exp.y1 / .1Cy 2 ; y1 > 0; y2 > 0 2/ 0 ; otherwise:
Incidentally it should be noted that y1 and y2 are independent random variables, namely fy .y1 ; y2 / D f1 .y1 /f .y2 / D y1 exp.y1 /.1 C y2 /2 :
|
Proof. The probability that the random variables y1 ; : : : ; yn take on values in the region ˝y is given by Z
Z
fy .y1 ; : : : ; yn /dy1 dyn :
˝y
If the random variables of this integral are transformed by the function xi D gi .y1 ; : : : ; yn / for all i 2 f1; : : : ; ng which map the region ˝y onto the regions ˝x , we receive Z
Z ˝y
D
fy .y1 ; : : : ; yn /dy1 dyn
Z
Z
ˇ ˇ fy .g11 .x1 ; : : : ; xn /; : : : ; gn1 .x1 ; : : : ; xn // ˇdet Jy ˇ dx1 dxn
˝x
from the standard theory of transformation of hypervolume elements, namely dy1 dyn D j det Jy jdx1 dxn or .dy1 ^ ^ dyn / D j det Jy j .dx1 ^ ^ dxn /: Here we have taken advantage of the oriented hypervolume element dy1 ^ ^ dyn (Grassmann product, skew product, wedge product) and the Hodge star operator * applied to the n – differential form dy1 ^ ^ dyn 2 n (the exterior algebra n ).
B-1 A First Vehicle: Transformation of Random Variables
641
The star * : p ! np in Rn maps a p – differential form onto a .n p/ – differential form, in general. Here p D n, n p D 0 applies. Finally we define fx .x1 ; : : : ; xn / WD f .g11 .x1 ; : : : ; xn /; : : : ; gn1 .x1 ; : : : ; xn //j det Jy j as a function which is certainly non-negative and integrated over ˝x to one.
|
In applying the transformation theorems of pdf we meet quite often the problem that the function xi D gi .y1 ; : : : ; yn / for all i 2 f1; : : : ; ng is given but not the inverse function yi D gi1 .x1 ; : : : ; xn / for all i 2 f1; : : : ; ng. Then the following results are helpful. Corollary B.1. (Jacobian): If the inverse Jacobian j det Jx j D j det.@gi =@yj /j is given, we are able to compute j det Jy j D j det
@gi1 .x1 ; : : : ; xn / @g .y1 ; : : : ; yn / 1 j D j det Jj1 D j det i j : @xj @yj
Example B.3. (Jacobian): Let us continue Example B.2. The inverse map @y 1 1 x1 C x2 g11 .y1 ; y2 / ) 0 D D yD x2 = x12 1= x1 g21 .y1 ; y2 / x2 = x1 @x ˇ ˇ ˇ @y ˇ 1 x2 x1 C x2 jy D jJy j D ˇˇ 0 ˇˇ D C 2 D @x x1 x1 x12 ˇ ˇ ˇ @x ˇ x2 x12 jx D jJx j D jy1 D jJy j1 D ˇˇ 0 ˇˇ D D 1 @y x1 C x2 y1
allows us to compute the Jacobian Jx from Jy . The direct map xD
y1 =.1 C y2 / g1 .y1 ; y2 / x1 D D g2 .y1 ; y2 / y1 y2 =.1 C y2 / x2
leads us to the final version of the Jacobian. jx D jJx j D
y1 : .1 C y2 /2
For the special case that the Jacobi matrix is given in a partitioned form, the results of Corollary B.3 are useful.
642
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
Corollary B.2. (Jacobian): If the Jacobi matrix Jx is given in the partitioned form jJx j D
@gi @yj
D
U ; V
then det Jx D
q
j det Jx J0x j D
p det.UU0 / detŒVV0 .VU0 /.UU0 /1 UV0
if det.UU0 / ¤ 0 q p det Jx D j det Jx J0x j D det.VV0 / detŒUU0 UV0 .VV0 /1 VU0 ; if det.VV0 / ¤ 0 j det Jy j D j det Jx j1 : Proof. The Proof is based upon the determinantal relations of a partitioned matrix of type det det
AU VD AU VD
AU det VD
D det A det.D VA1 U/ if det A ¤ 0 D det D det.A UD1 U/ if det D ¤ 0
D D det A V.adjA/U;
which have been introduced by Frobenius (1908) and Schur (1917).
B-2 A Second Vehicle: Transformation of Random Variables Previously we analyzed the transformation of the pdf under an injective map of random variables y 7! g.y/ D x. Here we study the transformation of polar coordinates Œ1 ; 2 ; : : : ; n1 ; r 2 Y as parameters of an Euclidian observation space to Cartesian coordinates Œy1 ; : : : ; yn 2 Y. In addition we introduce the hypervolume element of a sphere Sn1 Y; dim Y D n. First, we give three examples. Second, we summarize the general results in Lemma B.4.
B-2 A Second Vehicle: Transformation of Random Variables
643
Example B.4. (polar coordinates: “2d”): Table B.1 collects characteristic elements of the transformation of polar coordinates .1 ; r/ of type “longitude, radius” to Cartesian coordinates .y1 ; y2 /; their domain and range, the planar elements dy1 ; dy2 as well as the circle S1 embedded into E2 WD fR2 ; ıkl g, equipped with the canonical metric I2 D Œıkl and its total measure of arc !1 . Table B.1 Cartesian and polar coordinates of a two-dimensional observation space, total measure of the arc of the circle .1 ; r/ 2 Œ0; 2 0; 1Œ; .y1 ; y2 / 2 R2 dy1 dy2 D rdrd1 S1 WD fy 2 R2 jy12 C y22 D 1g Z2
!1 D
d1 D 2 : 0
Example B.5. (polar coordinates: “3d”): Table B.2 is a collectors’ item for characteristic elements of the transformation of polar coordinates .1 ; 2 ; r/ of type “longitude, latitude, radius” to Cartesian coordinates .y1 ; y2 ; y3 /; their domain and range, the volume element dy1 ; dy2 ; dy3 as well as of the sphere S2 embedded into E3 WD fR3 ; ıkl g equipped with the canonical metric I3 D Œıkl and its total measure of surface !2 . Table B.2 Cartesian and polar coordinates of a three-dimensional observation space, total measure of the surface of the circle y1 D r cos 2 cos 1 ; y2 D r cos 2 sin 1 ; y3 D r sin 2 .1 ; 2 ; r/ 2 Œ0; 2
; Œ 0; rŒ; .y1 ; y2 / 2 R2 2 2
.y1 ; y2 ; y3 /; 2 R3 dy1 dy2 dy3 D r 2 dr cos 2 d1 d2 S2 WD fy 2 R jy12 C y22 C y32 D 1g Z2
!2 D
C =2 Z
d1 0
d2 cos 2 D 4 : =2
644
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
Example B.6. (polar coordinates: “4d”): Table B.3 is a collection of characteristic elements of the transformation of polar coordinates .1 ; 2 ; 3 ; r/ to Cartesian coordinates .y1 ; y2 ; y3 ; y4 /; their domain and range, the hypervolume element dy1 ; dy2 ; dy3 ; dy4 as well as of the 3 - sphere S3 embedded into E4 WD fR4 ; ıkl g equipped with the canonical metric I4 D Œıkl and its total measure of hypersurface. Table B.3 Cartesian and polar coordinates of a four-dimensional observation space total measure of the hypersurface of the 3-sphere y1 D r cos 3 cos 2 cos 1 ; y2 D r cos 3 cos 2 sin 1 ; y3 D r cos 3 sin 2 ; y4 D r sin 3 .1 ; 2 ; 3 ; r/ 2 Œ0; 2
; Œ ; Œ 0; 2 Œ 2 2 2 2
dy1 dy2 dy3 dy4 D r 3 cos2 3 cos 2 drd3 d2 d1 Jy WD 2
@.y1 ; y2 ; y3 ; y4 / @.1 ; 2 ; 3 ; r/
3 r cos 3 cos 2 sin 1 r cos 3 sin 2 cos 1 r sin 3 cos 2 cos 1 cos 3 cos 2 cos 1 6 7 6 Cr cos 3 cos 2 cos 1 r cos 3 sin 2 sin 1 r sin 3 cos 2 sin 1 cos 3 cos 2 sin 1 7 D6 7 4 5 0 Cr cos 3 cos 2 r sin 3 sin 2 cos 3 cos 2 0 0 r cos 3 sin 3 j det Jy j D r 3 cos2 3 cos 2 S3 WD fy 2 R4 jy12 C y22 C y32 C y42 D 1g !3 D 2 2 :
Lemma B.2. (polar coordinates, hypervolume element, hypersurface element): 2
3 3 2 y1 cos n1 cos n2 cos n3 cos 2 cos 1 6 y2 7 6 cos n1 cos n2 cos n3 cos 2 sin 1 7 6 7 7 6 6 y 7 7 6 cos n1 cos n2 cos n3 sin 2 6 3 7 7 6 6 y 7 7 6 cos 6 4 7 7 6 n1 cos n2 cos 3 6 7 7 6 Let 6 7 D r 6 7 6 7 7 6 6 yn3 7 7 6 cos n1 cos n2 sin n3 6 7 7 6 6 yn2 7 7 6 cos n1 cos n2 6 7 7 6 4 yn1 5 5 4 cos n1 sin n2 yn sin n1
B-2 A Second Vehicle: Transformation of Random Variables
645
be a transformation of polar coordinates .1 ; 2 ; : : :,n2 ; n1 ; r/ to Cartesian coordinates .y1 ; y2 ; : : :,yn1 ; yn /, their domain and range given by .1 ; 2 ; : : : ; n2 ; n1 ; r/ 2 Œ0; 2
C Œ ; C Œ 0; 1Œ ; 2 2 2
;C Œ ; 2 2 2
then the local hypervolume element dy1 dyn D r n1 dr cosn2 n1 cosn3 n2 : : : cos2 3 cos 2 dn1 dn2 : : : d3 d2 d1 ; as well as the global hypersurface element
!n1
2 .n1/=2 D WD . n1 2 /
C =2 Z
n2 cos n1 dn1
C =2 Z
=2
Z2
cos 2 d2
=2
d1 ; 0
where .X / is the gamma function. Before we care for the proof, let us define Euler’s gamma function. Definition B.1. (Euler’s gamma function): Z1 .x/ D
e t t x1 dt
.x > 0/
0
is Euler’s gamma function which enjoys the recurrence relation .x C 1/ D x .x/ subject to .1/ D 1 .2/ D 1 .n C 1/ D nŠ if n is integer, n 2 ZC
or
p 1 D
2 3 1 1 1p D D
2 2 2 2 p pq pq D q q q p if is a rational number, p=q 2 QC . q
646
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
Example B.7. (Euler’s gamma function): (a) .1/ D 1
.i /
(b) .2/ D 1
.ii/
(c) .3/ D 1 2 D 2
.iii/
(d) .4/ D 1 2 3 D 6
.iv/
p 1 D
2 1 1 1p 3
D D 2 2 2 2 3 3p 3 5 D D
2 2 2 4 5 15 p 7 5 D D
: 2 2 2 8
Proof. Our proof of Lemma B.4 will be based upon computing the image of the tangent space Ty Sn1 En of the hypersphere Sn1 En . Let us embed the hypersphere Sn1 parameterized by .1 ; 2 ; : : : ; n2 ; n1 / in En parameterized by .y1 ; : : : ; yn /, namely y 2 En , y D e1 r cos n1 cos n2 cos 2 cos 1 C e2 r cos n1 cos n2 cos 2 sin 1 C C en1 r cos n1 sin n2 C en r sin n1 : Note that 1 is a parameter of type longitude, 0 1 2 , while 2 ; : : : ; n1 are parameters of type latitude, =2 < 2 < C =2; : : : ; =2 < n1 < C =2 (open intervals). The images of the tangent vectors which span the local tangent space are given in the orthonormal n- leg fe1 ; e2 ; : : : ; en1 ; en j0g by g1 WD D1 y D e1 r cos n1 cos n2 cos 2 sin 1 C e2 r cos n1 cos n2 cos 2 cos 1 g2 WD D2 y D e1 r cos n1 cos n2 sin 2 cos 1 e2 r cos n1 cos n2 sin 2 sin 1 C e3 r cos n1 cos n2 cos 2 ::: gn1 WD Dn1 y D e1 r sin n1 cos n2 cos 2 cos 1 en1 r sin n1 sin n2 C en r cos n1
B-2 A Second Vehicle: Transformation of Random Variables
647
gn WD Dn y D e1 r cos n1 cos n2 cos 2 cos 1 C e2 r cos n1 cos n2 sin 2 sin 1 C C en1 r cos n1 cos n2 C en r sin n1 D y=r: fg1 ; : : : ; gn1 g span the image of the tangent space in En . gn is the hypersphere normal vector, kgn k D 1. From the inner products hgi jgj i D gij , i; j 2 f1; : : : ; ng, we derive the Gauss matrix of the metric G WD Œgij . hg1 jg1 i D r 2 cos2 n1 cos2 n2 cos2 3 cos2 2 hg2 jg2 i D r 2 cos2 n1 cos2 n2 cos2 3 hgn1 jgn1 i D r 2 ; hgn jgn i D 1: The off-diagonal elements of the Gauss matrix of the metric are zero. Accordingly p p det Gn D det Gn1 D r n1 .cos n1 /n2 .cos n2 /n3 .cos 3 /2 cos 2 : The square root
p p det Gn , det Gn1 elegantly represents the Jacobian determinant Jy WD
p @.y1 ; y2 ; : : : ; yn / D det Gn : @.1 ; 2 ; : : : ; n1 ; r/
p Accordingly we have found the local hypervolume element det Gn dr dn1 dn2 d3 d2 d1 . For the global hypersurface element !n1 , we integrate Z2
d1 D 2
0 C =2 Z
C =2
cos 2 d2 D Œsin 2 =2 D 2 =2 C =2 Z
cos2 3 d3 D
=2
1 C =2 Œcos 3 sin 3 3 =2 D =2 2
648
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions C =2 Z
cos3 4 d4 D
=2 C =2 Z
:::
.cos n1 /n2 dn1 D
=2
1 4 C =2 Œcos2 4 sin 4 2 sin 4 =2 D 3 3
1 C =2 Œ.cos n1 /n3 =2 n2 1 C n3
C =2 Z
.cos n1 /n4 dn1
=2
recursively. As soon as we substitute the gamma function, we arrive at !n1 .
B-3 A First Confidence Interval of Gauss–Laplace Normally Distributed Observations , 2 Known, the Three Sigma Rule The first confidence interval of Gauss-Laplace normally distributed observations constrained to .; 2 / known, will be computed as an introductory example. An application is the Three Sigma Rule. In the empirical sciences, estimates of certain quantities derived from observations are often given in the form of the estimate plus or minus a certain amount. For instance, the distance between a benchmark on the Earth’s surface and a satellite orbiting the Earth may be estimated to be .20; 101; 104:132 ˙ 0:023/m with the idea that the first factor is very unlikely to be outside the range 20; 101; 104:155 m to 20; 101; 104:109m : A cost accountant for a publishing company in trying to allow for all factors which enter into the cost of producing a certain book, actual production costs, proportion of plant overhead, proportion of executive salaries, may estimate the cost to be 21 ˙ 1; 1 Euro per volume with the implication that the correct cost very probably lies between 19.9 and 22.1 Euro per volume. The Bureau of Labor Statistics may estimate the number of unemployed in a certain area to be 2:4 ˙ 0:3 million at a given time though intuitively it should be between 2.1 and 2.7 million. What we are saying is that in practice we are quite accustomed to seeing estimates in the form of intervals. In order to give precision to these ideas we shall
B-3 A First Confidence Interval of Gauss–Laplace Normally Distributed Observations
649
consider a particular example. Suppose that a random sample x 2 fR; pdf g is taken from a Gauss-Laplace normal distribution with known mean and known variance 2 . We ask the key question. What is the probability of the random interval . c; C c/ to cover the mean as a quantile c of the standard deviation ? To put this question into a mathematical form we write the probabilistic two-sided interval identity. P .x1 < X < x2 / D P . c < X < C c/ D ; x2 DCc Z
x1 Dc
1 1 p exp 2 .x /2 dx D 2 2
with a left boundary l D x1 and a right boundary r D x2 . The length of the interval is x2 x1 D r l. The center of the interval is .x1 C x2 /=2 or . Here we have taken advantage of the Gauss-Laplace pdf in generating the cumulative probability P.x1 < X < x2 / D F.x2 / F.x1 / F.x2 / F.x1 / D F. C c/ F. C c/: Typical values for the confidence coefficient are D 0:95 ( D 95% or 1 D 5% negative confidence), D 0:99 ( D 1% or 1 D 1% negative confidence) or D 0:999 ( D 999%O or 1 D 1%O negative confidence). Consult Fig. B.1 for a geometric interpretation. The confidence coefficient is a measure of the probability mass between x1 D c and x2 D C c. For a given confidence coefficient Zx2
f .xj; 2 /dx D
x1
establishes an integral equation. To make this point of view to be better under-stood let us transform the integral equations to its standard form. x 7! z D 1 .x / , x D C z x2 DCc Z ZCc 1 1 2 1 1 2 p exp 2 .x / dx D p exp z d z D 2 2 2
2
x1 Dc
Zx2 x1
c
ZCc 2 f .xj; /dx D f .zj0; 1/d z D : c
650
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
f(x)
μ–3σ μ–2σ μ–σ μ μ+σ μ+2σ μ+3σ x
Fig. B.1 Probability mass in a two-sided confidence interval x1 < X < x2 or c < X < C c , three cases: (a) c D 1, (b) c D 2 and (c) c D 3
The special Helmert transformation maps x to z, now being standard GaussLaplace normal: 1 is the dilatation factor, also called scale variation, but the translation parameter. The Gauss-Laplace pdf is symmetric, namely f .x/ D f .Cx/ or f .z/ D f .Cz/. Accordingly we can write the integral identity Zx2
Zx2
2
f .xj; /dx D 2 x1
f .xj; 2 /dx D
0
ZCc Zc , f .zj0; 1/d z D 2 f .zj0; 1/d z D : c
0
The classification of integral equations tells us that Zz .z/ D 2
f .z /d z
0
is a linear Volterra integral equation the first kind.
B-3 A First Confidence Interval of Gauss–Laplace Normally Distributed Observations
651
In case of a Gauss-Laplace standard normal pdf, such an integral equation is solved by a table. In a forward computation Zz F.z/ WD
Zz
f .z j0; 1/d z or ˚.z/ WD 1
1
1 2 1 exp z d z 2
2
are tabulated in a regular grid. For a given value F .z1 / or F .z2 /, z1 or z2 are determined by interpolation. C.F. Gauss did not use such a procedure. He took advantage of the Gauss inequality which has been reviewed in this context by Pukelsheim (1994). There also the Vysochanskii-Petunin inequality has been discussed. We follow here a two-step procedure. First, we divide the domain z 2 Œ0; 1 into two intervals z 2 Œ0; 1 and z 2 Œ1; 1. In the first interval f .z/ is isotonic, differentiable and convex, f 00 .z/ D f .z/.z2 1/ < 0; while in the second interval isotonic, differentiable and concave, f 00 .z/ D f .z/.z2 1/ > 0. z D 1 is the point of inflection. Second, we setup Taylor series of f .z/ in the interval z 2 Œ0; 1 at the point z D 0, while in the interval z 2 Œ1; 1 at the point z D 1 and z 2 Œ2; 1 at the point z D 2. Three examples of such a forward solution of the characteristic linear Volterra integral equation of the first kind will follow. They establish:
The One Sigma Rule
The Two Sigma Rule
The Three Sigma Rule
Box B.1. (Operational calculus applied to the Gauss-Laplace probability distribution): “generating differential equations” f 00 .z/ C 2f 0 .z/ C f .z/ D 0 subject to C1 Z
f .z/d z D 1 1
652
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
“recursive differentiation” 1 1 f .z/ D p exp z2 2 2
f 0 .z/ D zf .z/ DW g.z/ f 00 .z/ D g 0 .z/ D f .z/ zg.z/ D .z2 1/f .z/ f 000 .z/ D 2zf .z/ C .z2 1/g.z/ D .z3 C 3z/f .z/ f .4/ .z/ D .3z2 C 3/f .z/ C .z3 C 3z/g.z/ D .z4 6z2 C 3/f .z/ f .5/ .z/ D .4z3 12z/f .z/ C .z4 6z2 C 3/g.z/ D .z5 C 10z3 15z/f .z/ f .6/ .z/ D .5z4 C 30z2 15/f .z/ C .z5 C 10z3 15z/g.z/ D .z6 15z4 C 45z2 15/f .z/ f .7/ .z/ D .6z5 60z3 C 90z/f .z/ C .z6 15z4 C 45z2 15/g.z/ D .z7 C 21z5 105z3 C 105z/f .z/ f .8/ .z/ D .7z6 C 105z4 315z2 C 105/f .z/ C.z7 C 21z5 105z3 C 105z/f .z/ D .z8 28z6 C 210z4 420z2 C 105/f .z/ f .9/ .z/ D .8z7 168z5 C 840z3 840z/f .z/ C.z8 28z6 C 210z4 420z2 C 105/g.z/ D .z9 C 36z7 378z5 C 1260z3 945z/f .z/ f .10/ .z/ D .9z8 C 252z6 1890z4 C 3780z2 945/f .z/ C C.z9 C 36z7 378z5 C 1260z3 945/g.z/ D .z10 45z8 C 630z6 3150z4 C 4725z2 945/f .z/ “upper triangle representation of the matrix transforming f .z/ ! f n .z/” 2
3 2 f .z/ 1 6 f 0 .z/ 7 6 0 6 7 6 6 f 00 .z/ 7 6 1 6 7 6 6 000 7 6 6 f .z/ 7 6 0 6 .4/ 7 6 6 f .z/ 7 6 3 6 7 6 6 f .5/ .z/ 7 D f .z/ 6 0 6 7 6 6 f .6/ .z/ 7 6 15 6 7 6 6 .7/ 7 6 6 f .z/ 7 6 0 6 .8/ 7 6 6 f .z/ 7 6 105 6 7 6 4 f .9/ .z/ 5 4 0 f .10/ .z/ 945
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 3 0 1 0 0 0 0 0 0 6 0 1 0 0 0 0 15 0 10 0 1 0 0 0 0 45 0 15 0 1 0 0 105 0 105 0 21 0 1 0 0 420 0 210 0 28 0 1 945 0 1260 0 378 0 36 0 0 4725 0 3150 0 630 0 45
0 0 0 0 0 0 0 0 0 1 0
32 3 1 0 6 7 07 76z 7 6 2 7 07 76z 7 76 3 7 076z 7 76 4 7 6 7 07 76z 7 6 5 7 07 76z 7 : 7 6 7 076 6z 7 76 7 7 076z 7 76 8 7 6 7 07 76z 7 0 5 4 z9 5 z10 1
B-3 A First Confidence Interval of Gauss–Laplace Normally Distributed Observations
653
B-31 The Forward Computation of a First Confidence Interval of Gauss–Laplace Normally Distributed Observations: ; 2 Known We can avoid solving the linear Volterra integral equation of the first kind if we push forward the integration for a fixed value z. Example B.8. (Series expansion of the Gauss-Laplace integral, first interval): Let us solve the integral Z1 .z D 1/ WD 2
f .z /d z
0
in the first interval 0 z 1 by Taylor expansion with respect to the successive differentiation of f .z/ outlined in Box B.1 and the specific derivatives f n .0/ given in Table B.4. Based on those auxiliary results, Box B.2 presents us the detailed interpretation. First, we expand exp.z2 =2/ up to order O.14/. The specific Taylor series are uniformly convergent. Accordingly, in order to compute the integral, second we integrate termwise up to order O.15/. For the specific value z D 1, we have computed the coefficient of confidence .1/ D 0:683. The result P. < X < C / D 0:683 is known as the One Sigma Rule. 68.3% of the sample are in the interval 1; C 1 Œ, 0.317% outside. If we make three experiments, one experiment is outside the one interval. Box B.2. A specific integral “expansion of the exponential function” exp x D 71 C
x2 x3 xn x C C CC C O.n/ 8 jxj <1 1Š 2Š 3Š nŠ
1 x WD z2 2 1 1 1 1 1 8 1 10 1 12 z z C z C O.14/ exp z2 D 1 z2 C z4 z6 C 2 2 8 48 384 3840 46080
654
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
“the specific integral” Zz 0
1 f .z /d z D p 2
Zz
2
exp z 0
1 dz D p 2
1 1 1 7 z z3 C z5 z 6 40 336
1 9 1 11 1 z z C z13 C O.15/ C 3456 42240 599040
“the specific values z D 1” Z1 f .z/d z
.1/ D 2 0
1 1 1 1 1 2 1 C C C O.15/ D p 1 C 6 40 336 3456 42240 599040 2
2 D p .1 0:166;667 C 0:025;000/ 2
0:002;976 C 0:000;289 0; 000;024 C 0:000;002/ 2 D p 0:855;624 D 0:682;689 2
D 0:683 “coefficient of confidence” 1 0:683 D 1 317:311 103 D 1 : 3 Table B.4 (Special values of derivatives- Gauss-Laplace probability distribution): zD0 1 p D 0:398; 942 2
1 f .0/ D p ; f 0 .0/ D 0; 2
1 00 1 f 00 .0/ D p ; f .0/ D 0:199; 471 2 2Š
B-3 A First Confidence Interval of Gauss–Laplace Normally Distributed Observations
655
1 .4/ 3 f .0/ D C0:049; 868 f 000 .0/ D 0; f .4/ .0/ D C p ; 4Š 2
1 .6/ 15 f .5/ .0/ D 0; f .6/ .0/ D p ; f .0/ D 0:008; 311: 2 6Š
Example B.9. (Series expansion of the Gauss-Laplace integral, 2nd interval): Let us solve the integrals Z2 .z D 2/ WD 2
Z1
f .z /d z D 2 0
f .z /d z C2 0
Z2 .z D 2/ D .z D 1/ C 2
Z2
f .z /d z
1
f .z /d z ;
1
namely in the second interval 1 z 2. First, we setup Taylor series of f .z/ “around the point z D 1”. The derivatives of f .z/ “at the point z=1” are collected up to order 10 in Table B.5. Second, we integrate the Taylor series termwise and receive the specific integral of Box B.3. Note that termwise integrated is permitted since the Taylor series are uniformly convergent. The detailed computation up to order O(12) has led us to the coefficient of confidence .2/ D 0:954. The result P. 2 < X < C 2/ D 0:954 is known as the Two Sigma Rule. 95.4% of the sample interval 2; C 2 Œ, 0.046% outside. If we make 22 experiments, one experiment is outside the 2 interval. Box B.3. (A specific integral): “expansion of the exponential function” 1 2 1 f .z/ WD p exp z 2 2
f .z/ D f .1/ C
1 0 1 1 10 f .1/.z 1/ C f 00 .1/.z 1/2 C C f .1/.z 1/10 1Š 2Š 10Š
C O.11/
656
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
“the specific integral” Z2
f .z /d z D f .1/.z 1/ C
1
11 0 1 1 00 f .1/.z 1/2 C f .1/.z 1/3 2 1Š 3 2Š
1 1 000 1 1 .4/ f .1/.z 1/4 C f .1/.z 1/5 4 3Š 5 4Š 1 1 .5/ 1 1 .10/ f .1/.z 1/6 C C f C .1/.z 1/11 C O.12/ 6 5Š 11 10Š C
case z D 2 Z2 .2/ D .1/ C 2
f .z/d z D.1/ C 2.0:241; 971 1
0:120; 985 C 0:020; 122 0:004; 024 0:002; 016 C 0:000; 768 C 0:000; 120 0:000; 088 0:000; 002 0:000; 050 C O.12// D 0:682; 690 C 0:271; 632 D 0:954 “coefficient of confidence” 0:954 D 1 45:678 103 D 1
1 : 22
Table B.5 (Special values of derivatives – Gauss-Laplace probability distribution): zD1
1 1 D 0:398; 942; exp D 0:606; 531 p 2 2
1 1 f .1/ D p exp D 0:241; 971 2 2
1 1 exp D 0:241; 971; f 00 .1/ D 0 f 0 .1/ D f .1/ D p 2 2
1 2 000 exp D 0:482; 942 f .1/ D 2f .1/ D p 2 2
B-3 A First Confidence Interval of Gauss–Laplace Normally Distributed Observations
657
1 000 f .1/ D C0:080; 490 3Š f .4/ .1/ D 2f .1/ D 0:482; 942 1 .4/ f .1/ D 0:020; 122 4Š
1 6 exp D 1:451; 826 f .5/ .1/ D 6f .1/ D p 2 2
1 .5/ f .1/ D 0:012; 098 5Š 1 16 .6/ exp D 3:871; 536 f .1/ D 16f .1/ D p 2 2
1 .6/ f .1/ D 0:005; 377 6Š 20 1 D 4:829; 420 f .7/ .1/ D 20f .1/ D p exp 2 2
1 .7/ f .1/ D 0:000; 958 7Š f .8/ .1/ D 132f .1/ D 31:940; 172 1 .8/ f .1/ D 0:000; 792 8Š f .9/ .1/ D 28f .1/ D 6:775; 188 1 .9/ f .1/ D 0:000; 019 9Š f .10/ .1/ D 8234f .1/ D 1992:389 1 .10/ f .1/ D 0:000; 549: 10Š Example B.10. (Series expansion of the Gauss-Laplace integral, third interval): Let us solve the integrals Z3 .z D 3/ WD 2
Z1 f .z/d z D 2
0
Z2 f .z/d z C 2
0
Z3 f .z/d z C 2
1
Z3 .z D 3/ D .z D 1/ C .z D 2/ C 2
f .z/d z; 2
f .z/d z 2
658
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
namely in the third interval 2 z 3. First, we setup Taylor series of f(z) “around the point z D 2”. The derivatives of f .z/ “at the point z D 2” are collected up to order 10 in Table B.6. Second, we integrate the Taylor series termwise and receive the specific integral of Box B.4. Note that termwise integration is permitted since the Taylor series are uniformly convergent. The detailed computation up to order O(12) leads us to the coefficient of confidence .3/ D 0:997. The result P . 3 < X < C 3/ D 0:997 is known as the Three Sigma Rule. 99.7% of the sample are in the interval 3; C 3Œ, 0.003% outside. If we make 377 experiments, one experiment is outside the 3 interval. Table B.6 (Special values of derivatives Gauss-Laplace probability distribution): zD2 1 D 0:398; 942; exp .2/ D 0:135; 335 p 2
1 f .2/ D p exp .2/ D 0:053; 991 2
f 0 .2/ D 2f .2/ D 0:107; 982 1 00 f .2/ D C0:080; 986 2Š 1 000 f 000 .2/ D 2f .2/; f .2/ D 0:017; 997 3Š 1 .4/ f .2/ D 0:011; 248 f .4/ .2/ D 5f .2/; 4Š 1 .5/ f .2/ D C0:008; 099 f .5/ .2/ D 18f .2/; 5Š 1 .6/ f .2/ D 0:000; 825 f .6/ .2/ D 11f .2/; 6Š 1 .7/ f .7/ .2/ D 86f .2/; f .2/ D 0:000; 921 7Š 1 .8/ f .2/ D C0:000; 333 f .8/ .2/ D C249f .2/; 8Š 1 .9/ f .2/ D C0:000; 028 f .9/ .2/ D 190f .2/; 9Š 1 .10/ f .10/ .2/ D 2621f .2/; f .2/ D 0:000; 039: 10Š f 00 .2/ D 3f .2/;
B-3 A First Confidence Interval of Gauss–Laplace Normally Distributed Observations
659
Box B.4 (A specific integral) “expansion of the exponential function” 1 1 2 f .z/ WD p exp z 2 2
f .z/ D f .2/ C
1 1 .10/ 1 0 f .2/.z 2/ C f 00 .2/.z 2/2 C C f .2/.z 2/10 1Š 3Š 10Š
C O.11/ “the specific integral” Zz
f .z /d z D f .2/.z 2/ C
2
11 0 1 1 00 f .2/.z 2/2 C f .2/.z 2/3 2 1Š 3 2Š
1 1 000 1 1 .4/ f .2/.z 2/4 C f .2/.z 2/5 4 3Š 5 4Š 1 1 .5/ 1 1 .10/ f .2/.z 2/6 C C f C .2/.z 2/11 C O.12/ 6 5Š 11 10Š C
case z D 3 Z3 .3/ D .1/ C .2/ C 2
f .z/d z 2
D 0:682; 690 C 0:271; 672 C 2.0:053; 991 0:053; 991 C 0:026; 995 0:004; 499 0:002; 250 C 0:001; 350 0:000; 118 C 0:000; 037 C 0:000; 003 0:000; 004 C O.12// D 0:682; 690 C 0:271; 632 C 0:043; 028 D 0:997 “coefficient of confidence” 0:997 D 1 2:65 103 D 1
1 : 377
B-32 The Backward Computation of a First Confidence Interval of Gauss–Laplace Normally Distributed Observations: ; 2 Known Finally we solve the Volterra integral equation of the first kind by the technique of series inversion, also called series reversion. Let us recognize that the interval
660
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
integration of a Taylor series expanded Gauss-Laplace normal density distribution led us to a univariate homogeneous polynomial of arbitrary order. Such a univariate homogeneous polynomial y D a1 x C a2 x 2 C C an x n (“input”) can be reversed as a univariate homogeneous polynomial x D b1 y C b2 y 2 C C bn y n (“output”) as outlined in Table B.7. Consult Abramowitz and Stegun (1965), p. 16 for a review, but Grafarend (1996) for a derivation based upon Computer Algebra. Table B.7 (Series inversion (Grafarend, 1996)): “input: univariate homogeneous polynomial” y D a1 x C a2 x 2 C a3 x 3 C a4 x 4 C a5 x 5 C a6 x 6 C a7 x 7 C O.x 8 / “output: reverse univariate homogeneous polynomial” x D b1 y C b2 y 2 C b3 y 3 C b4 y 4 C b5 y 5 C b6 y 6 C b7 y 7 C O.y 8 / “coefficient relation” (a) a1 b1 D 1 (b) a13 b2 D a2 (c) a15 b3 D 2a22 a1 a3 (d) a17 b4 D 5a1 a2 a3 a12 a4 5a23 (e) a19 b5 D 6a12 a2 a4 C 3a12 a32 C 14a24 a13 a5 21a1 a22 a3 (f) a111 b6 D 7a13 a2 a5 C 7a13 a2 a4 C 84a1 a23 a3 a14 a6 28a1 a2 a32 42a25 28a12 a22 a4 (g) a113 b7 D 6a14 a2 a6 C 8a14 a2 a5 C 4a14 a42 C 120a12 a23 a4 C 180a12 a22 a32 C 132a26 a15 a7 36a13 a22 a5 72a13 a2 a3 a4 12a13 a33 330a1 a24 a3 . Example B.11. (Solving the Volterra integral equation of the first kind): Let us define the coefficient of confidence D 0:90 or 90%. We want to know the quantile c0:90 which determines the probability identity P . c < X < C c/ D 0:90: If you follow the detailed computation of Table B.5, namely the input as well as the output data up to order O(5), you find the quantile c0:90 D 1:64; as well as the confidence interval P . 1:64 < X < C 1:64/ D 0:90:
B-3 A First Confidence Interval of Gauss–Laplace Normally Distributed Observations
661
90% of the sample are in the interval 1:64; C 1:64Œ, 10% outside. If we make 10 experiments one experiment is outside the 1.62 interval. Table B.8 (Series inversion quantile c0:90 ): (i) input Z1
f .z /d z C 2
.z/ D 2 0
C
Zz
f .z /d z D 0:682; 689 C 2Œf .1/.z 1/
0
11 0 1 1 f .1/.z 1/2 C C f .n1/ .1/.z 1/n C O.n C 1/ 2 1Š n .n 1/Š
11 0 1 Œ.z/ 0:682; 689 D f .1/.z 1/ C f .1/.z 1/2 C 2 2 1Š 1 1 f .n1/ .1/.z 1/n C O.n C 1/ C n .n 1/Š y D a1 x C a2 x 2 C C an x n C O.n C 1/ x WD z 1 y WD .0:900; 000 0:682; 689/=2 D 0:108; 656 a1 WD f .1/ D 0:241; 971 11 0 f .1/ D 0:241; 971 2 1Š 1 1 00 f .1/ D 0 a3 WD 3 2Š 1 1 000 a4 WD f .1/ D 0:020; 125 4 3Š 1 1 .4/ f .1/ D 0:004; 024 a5 WD 5 4Š a2 WD
an WD
1 1 f .n1/ .1/: n .n 1/Š
(ii) output b1 D
1 D 4:132; 726 a1
b2 D
1 a2 D 8:539; 715 a13
662
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
b3 D
1 .2a22 a1 a3 / D 35:292; 308 a15
b4 D
1 .5a1 a2 a3 a12 a4 5a23 / D 158:070 a17
b5 D
1 .6a12 a2 a4 C 3a12 a32 C 14a24 a13 a5 21a1 a22 a3 / D 475:452; 152 a19
b1 y D 0:449; 045; b2 y 2 D 0:100; 821; b3 y 3 D 0:045; 273; b4 y 4 D 0:022; 032; b5 y 5 D 0:007; 201 x D b1 y C b2 y 2 C b3 y 3 C b4 y 4 C b5 y 5 C O.6/ D 0:624; 372 z D x C 1 D 1:624; 372 D c0:90 : At this end we would like to give some sample references on computing the “inverse error function” Zx y D F .x/ WD 0
1 1 p exp z2 d z DW erfx 2 2
versus
x D F 1 .y/ D i nverfy; namely Carlitz (1963) and Strecok (1968).
B-4 Sampling from the Gauss–Laplace Normal Distribution: A Second Confidence Interval for the Mean, Variance Known The second confidence interval of Gauss-Laplace i.i.d. observations will be constructed for the mean b BLUUE of , when the variance 2 is known. “n” is the size of the sample, namely to agree to the number of observations. Before we present the general sampling distribution we shall work through two examples. Example B.12 has been chosen for a sample size n D 2, while Example B.12 for n D 3 observations. Afterwards the general result is obvious and sufficiently motivated. Example B.12. (Gauss-Laplace i.i.d. observations, observation space Y, dim Y) In order to derive the marginal distributions of b BLUUE of and b 2 BIQUUE 2 of for a two dimensional Euclidean observation space we have to introduce
B-4 Sampling from the Gauss–Laplace Normal Distribution:
663
f2(x) = f2(σ ˆ 2 / σ2 )
x: = σˆ 2 / σ2
Fig. B.2 Special Helmert pdf of 2p for one degree of freedom, p D 1
various images. First, we define the probability function of two Gauss-Laplace i.i.d. observations and consequently implement the basic .b ; b 2 / decomposition into the 0 pdf. Second, we separate the quadratic form .y 1/ .y 1/ into the quadratic form .y 1b /0 .y 1b /, the vehicle to introduce b 2 BIQUUE of 2 , and into the 2 quadratic form .b / the vehicle to bring in b BLUUE of . Third, we aim at transforming the quadratic form .y 1b /0 .y 1b / D y0 My; rk M D 1, into the 2 special the marginal distributions ˇ form 1=2.y1 y2 / DW x. Fourth, we generate f1 .b ˇ; 2 =2 / of the mean b BLUUE of and f2 . 2 / of the sample variance b 2 2 BIQUUE of . The basic results of the example are collected in Corollary B.6. The special Helmert pdf 2 with one degree of freedom is plotted in Fig. B.2, but the special Gauss-Laplace normal pdf of variance 2 =2 in Fig. B.3. The first action item Let us assume an experiment of two Gauss-Laplace i.i.d. observations. Their pdf is given by f .y1 ; y2 / D f .y1 /f . y2 /; 1 1 2 2 exp 2 Œ.y1 / C .y2 / ; f .y1 ; y2 / D 2 2 2 1 1 0 f .y1 ; y2 / D exp 2 .y 1/ .y 1/ : 2 2 2
664
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions 0.4
0.3
f1(z | 0, 1)
0.2
0.1
0 –5
–4
–3
–2
–1
0
1
2
3
4
5
z = (mˆ − m)/ s 2
2
Fig. B.3 Special Gauss-Laplace normal pdf of .b /= 2 =2
The second action item The coordinates of the observation vector have been denoted by Œy1 ; y2 0 D y 2 Y; dim Y D 2. The quadratic form .y 1/0 .y 1/ allows the fundamental decomposition y 1 D .y 1b / C 1.b / .y 1/0 .y 1/ D .y 1b /0 .y 1b / C 10 1.b /2 .y 1/0 .y 1/ D b 2 C 2.b /2 : Here, b is BLUUE of and b 2 BIQUUE of 2 . The detailed computation proves our statement. / C .b /2 C Œ.y2 b / C .b /2 Œ.y1 b D .y1 b /2 C .y2 b /2 C 2.b /2 C 2.b /Œ.y1 b / C .y2 b / /2 C .y2 b /2 C 2.b /2 C 2b .y1 b / C 2b .y2 b / D .y1 b 2.y1 b / 2.y2 b /
B-4 Sampling from the Gauss–Laplace Normal Distribution:
665
1 .y1 C y2 / 2 b 2 D .y1 b /2 C .y2 b /2 : b D
As soon as we substitute b and b 2 we arrive at .y1 /2 C .y2 /2 D Œ.y1 b / C .b /2 C Œ.y2 b / C .b /2 Db 2 C 2.b /2 ; since the residual terms vanish: 2b .y1 b / C 2b .y2 b / 2.y1 b / 2.y2 b / D 2b y1 2b 2 C 2b y2 2b 2 2y1 C 2b 2y2 C 2b 2 2.y1 C y2 / C 4b D 2b .y1 C y2 / 4b D 4b 2 4b 2 4b C 4b D 0: The third action item The cumulative pdf d F D f .y1 ; y2 /dy1 dy2 D f1 .b /f2 .x/db dx D f1 .b /f2
b 2 2
db d
b 2 2
has to be decomposed into the first pdf f1 .b j ; 2 =n/ representing the pdf of the sample mean b and the second pdf f2 .x/ of the new variable x WD .y1 y2 /2 =.2 2 / D b 2 = 2 representing the sample variance b 2 , normalized by 2 . How can the second decomposition f1 f2 be understood? Let us replace the quadratic form .y 1/0 .y 1/ D b 2 C 2.b /2 in the cumulative pdf 1 1 0 d F D f .y1 ; y2 /dy1 dy2 D exp 2 .y 1/ .y 1/ dy1 dy2 2 2 2 1 1 2 2 D exp 2 Œb C 2.b / dy1 dy2 2 2 2 d F D f .y1 ; y2 /dy1 dy2 1 1 1b 1 1 1 .b /2 2 exp exp dy1 dy2 : Dp p p 2 2 =2 2 2 2 p2 2
2
666
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
The quadratic form b 2 , conventionally given in terms of the residual vector y 1b will be rewritten in terms of the coordinates Œy1 ; y2 0 D y of the observation vector. 2 2 1 1 b 2 D .y1 b /2 C .y2 b /2 D y1 .y1 C y2 / C y2 .y1 C y2 / 2 2 b 2 D
1 1 1 .y1 y2 /2 C .y2 y1 /2 D .y1 y2 /2 : 4 4 2
The fourth action item The new variable x WD 21 2 .y1 y2 /2 will be introduced in the cumulative pdf dF D f .y1 ; y2 /dy1 dy2 . The new surface element db dx will be related to the old surface element dy1 dy2 . ˇ " #ˇ ˇ Dy1 b Dy2 b ˇˇ ˇ db dx D ˇ det ˇ dy1 dy2 D jJj dy1 dy2 ˇ Dy x Dy x ˇ 1
2
Dy1 b WD
@b 1 D ; @y1 2
Dy1 x WD
@x y1 y2 D ; @y1 2
Dy2 b WD
@b 1 D @y2 2
Dy2 x WD
@x y1 y2 D : @y2 2
The absolute value of the Jacobi determinant amounts to ˇ " 1 #ˇ 1 ˇ ˇ y1 y2 2 ˇ ˇ 2 2 1 D jJj D ˇ det y y ; jJj D : ˇ 1 2 2 ˇ ˇ 2 y1 y2 y1 y 2 2
In consequence, we have derived dy1 dy2 D
2 db dx D p p db dx y1 y2 2 x
based upon xD
p 1 1 .y1 y2 /2 ) 2x D .y1 y2 /: 2 2
In collecting all detailed partial results we can formulate a corollary.
B-4 Sampling from the Gauss–Laplace Normal Distribution:
667
Corollary B.3. (marginal probability distributions of b ; 2 given, and b 2 ): The cumulative pdf of a set of two observations is represented by d F D f .y1 ; y2 /dy1 dy2 D f1 .b j; 2 =2/f2 .x/db dx subject to /2 1 1 .b 1 j; ; 2 =2/ WD p f1 .b exp 2 2 =2 2 p2 2 b 1 1 1 f2 .x/ D f2 subject to WD p p exp x 2 2 2 x C1 Z f1 .b / db D 1 1
and
C1 Z f2 .x/ dx D 1: 0
f1 .b / is the pdf of the sample mean b D .y1 C y2 /=2 and f2 .x/ the pdf of the sample variance b 2 D .y1 y2 /2 =2 D 2 x. f1 .b / is a Gauss-Laplace pdf with mean and variance 2 =n, while f2 .x/ is a Helmert 2 with one degree of freedom. Example B.13. (Gauss-Laplace i.i.d. observations, observation space Y, dim Y D 3): In order to derive the marginal distribution of b BLUUE of and b 2 BIQUUE of 2 for a three-dimensional Euclidean observation space, we have to act in various scenes. First, we introduce the probability function of three Gauss-Laplace i.i.d. observations and consequently implement the .b ; b 2 / decomposition in the pdf. 0 Second, we force the quadratic form .y 1b / .y 1b / to be decomposed into b 2 2 2 2 and .b / , actually a way to introduce the sample variance b BIQUUE of b and the sample mean b BLUUE of . Third, we succeed to transform the quadratic form .y 1b /0 .y 1b / D yMy; rkM D 2 into the canonical form z21 C z22 by means of 0 0 Œz1 ; z2 D HŒy1 ; y2 ; y3 0 . Fourth, we produce the right inverse H k D H in order to 0 0 0 invert H, namely Œy1 ; y2 ; y3 D H Œz1 ; z2 . Fifth, in order to transform the original quadratic form 2 .y 1b /0 .y 1b / into the canonical form z21 C z22 C z23 we review the general Helmert transformation z D 1 H.y / and its inverse y D H0 z subject to H 2 SO.3/ and identify its parameters translation, rotation and dilatation (scale). Sixth, we summarize the marginal probability distributions f1 .b j ; 2 =3/ 2 2 of the sample mean b BLUUE of and f2 .2b = / of the sample variance b 2 2 2 BIQUUE of . The special Helmert pdf with two degrees of freedom is plotted in Fig. B.4 while the special Gauss-Laplace normal pdf of variance 2 =3 in Fig. B.5. The basic results of the example are collected in Corollary B.7.
668
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
f2(2sˆ 2 | s2)
x := 2s2 | s2
Fig. B.4 Special Helmert pdf of 2p for two degrees of freedom, p D 2
f1(mˆ | m, s ) 3 2
s2 (mˆ − m) / 3 2
Fig. B.5 Special Gauss-Laplace normal pdf of .b /= 3
B-4 Sampling from the Gauss–Laplace Normal Distribution:
669
The first action item Let us assume an experiment of three Gauss-Laplace i.i.d. observations. Their pdf is given by f .y1 ; y2 ; y3 / D f .y1 /f . y2 /f . y3 /; 1 2 2 2 3=2 3 f .y1 ; y2 ; y3 / D .2 / exp 2 Œ.y1 / C .y2 / C .y3 / ; 2 1 f .y1 ; y2 ; y3 / D .2 /3=2 3 exp 2 .y 1/0 .y 1/ : 2 The coordinates of the observation vector have been denoted by Œy1 ; y2 ; y3 0 D y 2 Y, dim Y D 2. The second action item The coordinates of the observation vector have been denoted by D y 2 Y, dim Y D 2. The quadratic form .y1 1/0 .y2 1/ allows the fundamental decomposition y 1 D .y 1b / C 1.b /; 0
.y 1/ .y 1/ D .y 1b /0 .y 1b / C 10 1.b /2 .y 1/0 .y 1/ D 2b 2 C 3.b /2 : Here, b is BLUUE of and b 2 BIQUUE of 2 . The detailed computation proves our statement. / C .b /2 C Œ.y2 b / C .b /2 C Œ.y3 b / C .b /2 Œ.y1 b D .y1 b /2 C .y2 b /2 C .y3 b /2 C 3.b /2 C 2b .y1 b / C 2b .y2 b / C 2b .y3 b / 2.y1 b / 2.y2 b / 2.y3 b / 1 .y1 C y2 C y3 / 3 1 b 2 BIQUUE of 2 W b 2 D Œ.y1 b /2 C .y2 b /2 C .y3 b /2 : 2 b BLUUE of W b D
As soon as we substitute b and b 2 we arrive at .y1 /2 C .y2 /2 C .y3 /2 D Œ.y1 b /2 C .b /2 C Œ.y2 b /2 C .b /2 2 C Œ.y3 b /2 C .b /2 2 D 2b 2 C 3.b /2 :
670
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
The third action item Let us begin with transforming the cumulative probability function d F D f .y1 ; y2 ; y3 /dy1 dy2 dy3 1 D .2 /3=2 3 exp 2 .y 1/0 .y 1/ dy1 dy2 dy3 2 1 3=2 3 2 2 D .2 / exp 2 Œ2b C 3.b / dy1 dy2 dy3 2 into its canonical form. In detail, we transform the quadratic form b 2 BIQUUE of 2 canonically. 1 y1 b D y1 y1 3 1 y2 b D y2 y2 3 1 y3 b D y3 y3 3
1 2 1 2 1 1 .y2 C y3 / D y1 .y2 C y3 / D y1 y2 y3 3 3 3 3 3 3 1 1 2 1 .y1 C y3 / D y1 C y2 y3 3 3 3 3 1 1 1 2 .y1 C y2 / D y1 y2 C y3 3 3 3 3
as well as 4 2 y C 9 1 1 .y2 b /2 D y12 C 9 1 .y3 b /2 D y12 C 9
/2 D .y1 b
1 2 y C 9 2 4 2 y C 9 2 1 2 y C 9 2
1 2 4 2 y3 y1 y2 C y2 y3 9 9 9 1 2 4 4 y y1 y2 y2 y3 C 9 3 9 9 4 2 2 4 y3 C y1 y2 y2 y3 9 9 9
4 y3 y1 9 2 y3 y1 9 4 y3 y1 9
and .y1 b /2 C .y2 b /2 C .y3 b /2 D
2 2 .y C y22 C y32 y1 y2 y2 y3 y3 y1 /: 3 1
We shall prove / D y0 My D z21 C z22 ; .y 1b /0 .y 1b
rkM D 2;
that the symmetric matrix M, 2
3 2 1 1 M D 4 1 2 1 5 1 1 2
M 2 R33
B-4 Sampling from the Gauss–Laplace Normal Distribution:
671
has rank 2 or rank deficiency 1. Just compute jMj D 0 as a determinant identity as well as the subdeterminant ˇ ˇ ˇ 2 1 ˇ ˇ ˇ ˇ 1 2 ˇ D 3 ¤ 0 2 32 3 2 1 1 y1 1 / D y0 My D Œy1 ; y2 ; y3 4 1 2 1 5 4 y2 5 : .y 1b /0 .y 1b 3 1 1 2 y3 How to transform a degenerate quadratic form to a canonical form? Helmert (1875) had the bright idea to implement what we call nowadays the forward Helmert transformation 1 .y1 y2 / z1 D p 12 1 z2 D p .y1 C y2 2y3 / 23
or
z21 D
1 2 .y 2y1 y2 C y22 / 2 1
z22 D
1 2 .y C y22 C 4y32 C 2y1 y2 4y2 y3 4y3 y1 / 6 1
z21 C z22 D
2 2 .y C y22 C y32 y1 y2 y2 y3 y3 y1 / : 3 1
Indeed we found .y 1b /0 .y 1b / D .y1 b /2 C .y2 b /2 C .y3 b /2 D z21 C z22 : In algebraic terms, a representation of the rectangular Helmert matrix is
z zD 1 z2
"
D
p1 12 p1 23
p112 p1 23
" z D H23 y;
H23 WD
p1 12 p1 23
#2y 3
0
1
p2
23
p1
12 p1 23
4 y2 5 y3
0 p223
# :
672
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
The rectangular Helmert matrix is right orthogonal, " H23 H0 23 D
p1 12 p1 23
p1 12 p1 23
0 p223
#
2
p1 6 12 6 p1 4 12
0
p1 23 p1 23 p2 23
3
7 7 D 1 0 D I2 : 5 01
The fourth action item By means of the forward Helmert transformation we could prove that z21 C z22 represents the quadratic form .y1b /0 .y1b /. Unfortunately, the forward Helmert transformation only allows indirectly by a canonical transformation. What would be needed is the inverse Helmert transformation z 7! y, also called backward Helmert transformation. The rectangular Helmert matrix has the disadvantage to have no Cayley inverse. Fortunately, its right inverse 0 0 1 D H0 23 2 R32 H R WD H 23 .H23 H 23 /
solves our problem. The inverse Helmert transformation 0 y D H R z D H 23 z
brings / D .y 1b /0 .y 1b
2 2 .y C y22 C y32 y1 y2 y2 y3 y3 y1 / 3 1
via 2
0
y D H23 z
y1 D p1 z1 C p1 z2 12 23 6 p1 z1 C p1 z2 y D or 6 2 4 12 23 y3 D p2 z2 ; 23
y12 C y22 C y32 D
1 2 1 2 1 1 1 1 2 z1 C z2 C p z1 z2 C z22 C z22 p z1 z2 C z22 ; 2 6 2 6 3 3 3
1 1 1 1 1 1 y1 y2 C y2 y3 C y3 y1 D z21 C z22 C p z1 z2 z22 p z1 z2 z22 ; 2 6 3 2 3 3 2 2 2 3 2 3 2 .y1 C y22 C y32 y1 y2 y2 y3 y3 y1 / D z1 C z2 D z21 C z22 ; 3 3 2 2 into the canonical form.
B-4 Sampling from the Gauss–Laplace Normal Distribution:
673
The fifth action item Let us go back to the partitioned pdf in order to inject the canonical representation of the deficient quadratic form y0 My; M 2 R33 , rkM D 2. Here we meet first the problem to transform 1 dF D .2 /3=2 3 exp 2 Œ2b 2 C 3.b /2 dy1 dy2 dy3 2 by an vector Œz1 ; z2 ; z3 0 DW z into the canonical form dF D .2 /3=2 extended 1 2 2 exp 2 .z1 C z2 C z23 / d z1 d z2 d z3 , which is generated by the general forward Helmert transformation z D 1 H.y 1/ 2 1 3 p z1 6 112 1 4 z2 5 D 6 p 4 123 z3 p 2
3
p1 12 p1 23 p1 3
3
3 2 y1 7 p223 7 4 y2 5 5 y3 p1 0
3
or its backward Helmert transformation, also called the general inverse Helmert transformation 3 2 1 2 3 2 3 p p1 p1 12 23 3 z1 y1 7 6 1 4 y2 5 D 6 p p1 p1 7 4 z 5 : 2 4 12 23 35 y3 z3 0 p2 p1 23
3
y 1 D H0 z thanks to the orthonormality of the quadratic Helmert matrix H3 , namely H3 H0 3 D 0 H0 3 H3 D I3 or H1 3 D H 3. Secondly, notice the transformation of the volume element dy1 dy2 dy3 D d.y1 /d.y2 /d.y3 / D 3 d z1 d z2 d z3 ; which is due to the Jacobi determinant ˇ ˇ J D 3 ˇH0 ˇ D 3 jHj D 3 : Let us prove that z3 WD
p 1 b 3 .b / D p
3
674
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
brings the first marginal density 1 1 2 1 WD p exp 2 3.b f1 D b j; /2 3 2 2 p3 into the canonical form 1 1 f1 .z3 j0; 1 / D p exp z23 : 2 2
Let us compute b as well as 3.b /2 which concludes the proof. z3 D 1
2 3 y 1 1 4 1 1 p ;p ; p y2 5 3 3 3 y3
3 y1 C y2 C y3 3 p 7 3 7 5 y1 C y2 C y3 Db ) y1 C y2 C y3 D 3b 3 b 1 ) z3 D 3 1 p ; z23 D 2 3.b /2 : 3
z3 D 1
Indeed the extended Helmert matrix H3 is ingenious to decompose 1 .y 1b /0 .y 1b / D z21 C z22 C z23 2 into a canonical quadratic form relating z21 C z22 to b 2 and z23 to .b /2 . At this point, we have to interpret the general Helmert transformation z D 1 H.y /: Structure elements of the Helmert transformation 1
scale or dilatation rotation translation
H
1 2 RC produces a dilatation or a scale change, H 2 SO.3/ WD fH 2 R33 jH0 H D I3 and jHj D C1g a rotation (3 parameters) and 1 2 R3 a translation. Please, prove for yourself that the quadratic Helmert matrix is orthonormal, that is HH0 D H0 H D I3 and jHj D C1.
B-4 Sampling from the Gauss–Laplace Normal Distribution:
675
The sixth action item / Finally we are left with the problem to split the cumulative pdf into one part f1 .b which is a marginal distribution of the arithmetic mean b BLUUE of and another part f2 .b / which is a marginal distribution of the standard deviation b , b 2 BIQUUE 2 2 of , Helmert’s 2 with two degrees of freedom. First let us introduce polar coordinates .1 ; r/ which represent the Cartesian coordinates z1 D r cos 1 , z2 D r sin 1 . The index 1 is needed for later generalization to higher dimension. As a longitude, the domain of 1 is 1 2 Œ0; 2 or 0 1 2 . The new random variable z21 C z22 D kzk2 DW x or radius r relates to Helmert’s
2 D z21 C z22 D
2b 2 1 D 2 .y 1b /0 .y 1b / : 2
Secondly, the marginal distribution of the arithmetic mean b , b BLUUE of , D f1 .z3 j0; 1 /d z3 N ; p j ; p db f1 b 3 3 is a Gauss-Laplace normal distribution with mean and variance 2 =3 generated by dF1 D f1 b j ; p db D f1 .z3 j0; 1 /d z3 3 D .2 /
1=2
C1 C1 Z Z 1 2 1 exp z3 d z3 .2 /1 exp .z21 C z22 / d z1 d z2 2 2
or
1 1
p 3 3 1=2 2 D .2 / exp 2 .b j ; p db : / db f1 b 2 3 Third, the marginal distribution of the sample variance 2b 2 = 2 D z21 C z22 DW x, 2 Helmert’s distribution for p D n 1 D 2 degrees of freedom, f2 .2b 2 = 2 / D f2 .x/ D
x p 1 2 1 exp x p 2p=2 . 2 / 2
is generated by C1 Z Z2
1 1 2 1 1=2 dF2 D .2 / exp z3 d z3 d1 .2 /1 exp x dx 2 2 2 1
0
676
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
1 1 dF2 D .2 /1 !2 exp x dx 2 2
Z2
subject to !2 D
d1 D 2
0
1 1 dF2 D exp x dx 2 2 and x WD z21 C z22 D r 2 ) dx D 2rdr; d z1 d z2 D rdrd1 D
dr D
dx 2r
1 dxd1 2
is the transformation of the surface element d z1 d z2 . In collecting all detailed results let us formulate a corollary. Corollary B.7. (marginal probability distributions of b , 2 given, and of b 2 ): The cumulative pdf of a set of three observations is represented by 2 f2 .x/db j; dx dF D f .y1 ; y2 ; y3 /dy1 dy2 dy3 D f1 b 3 subject to 1 /2 2 1 1 .b WD p j; f1 b p exp 3 2 2 =3 2 = 3 1 1 f2 .x/ D exp x 2 2 subject to x WD z21 C z22 D 2b 2 = 2 ;
dx D
2 db 2 2
and Z1
Z1 f1 .b / db D 1 versus
1
f2 .x/ dx D 1: 0
f1 .b / is the pdf of the sample mean b D .y1 C y2 C y3 /=3 and f2 .x/ the pdf of the sample variance b 2 D .y0 1b /0 .y 1b /=2 normalized by 2 . f1 .b / is a
B-4 Sampling from the Gauss–Laplace Normal Distribution:
677
Gauss-Laplace pdf with mean b and variance 2 =3, while f2 .x/ a Helmert 2 with two degrees of freedom. In summary, an experiment with three Gauss-Laplace i.i.d. observations is characterized by two marginal probability densities, one for the mean b BLUUE of and another one for b 2 BIQUUE of 2 : Marginal probability densities n D 3, Gauss-Laplace i.i.d. observations j ; p db b W f1 b 3 /2 1 1 .b 1=2 db D .2 / p exp 2 2 =3 = 3 or f1 .z3 j0; 1 /d z3 1 D .2 /1=2 exp z23 d z3 2 z3 WD
b 3
b 2 W f2 .2b 2 = 2 /db 2 2 b 2 b D 2 exp 2 db 2 or f2 .x/dx 1 1 D exp x dx 2 2 x WD 2
b 2 2
b, 2 B-41 Sampling Distributions of the Sample Mean 2 b Known, and of the Sample Variance The two examples have prepared us for the general sampling distribution of the sample mean b , 2 known, and of the sample variance b 2 for Gauss-Laplace i.i.d. observations, namely samples of size n. By means of Lemma B.8 on the rectangular Helmert transformation and Lemma B.9 on the quadratic Helmert transformation we prepare for Theorem B.10 which summaries both the pdfs for b BLUUE of , 2 known, for the standard deviation b and for b 2 BIQUUE of 2 . Corollary B.11 focusses on the pdf of D qb where is an unbiased estimation of the standard deviation , namely Ef g D . Lemma B.3. (rectangular Helmert transformation): The rectangular Helmert matrix Hn1;n 2 Rn.n1/ transforms the degenerate quadratic form .n 1/b 2 WD .y 1b /0 .y 1b / D y0 My; rkMD n1 subject to b D
1 0 1y n
678
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
into the canonical form .n 1/b 2 D z0 n1 zn1 D z21 C C z2n1 : The special Helmert transformation yn 7! Hn1;n yn D zn1 is represented by 2
p1 12 p1 23
6 6 6 6 6 6p 1 4 .n1/.n2/ p 1 n.n1/
p1 12 p1 23
p
0
Hn1;n WD 0
p223
0
0
0
0
0
n1 p.n1/.n2/
1 1 1 p p .n1/.n2/ .n1/.n2/ .n1/.n2/ p 1 p 1 p 1 n.n1/ n.n1/ n.n1/
p 1 n.n1/
0 n pn.n1/
3 7 7 7: 7 7 7 5
0 0 The inverse Helmert transformation z 7! y D H R z D H z or yn D H n1n zn is based on its right inverse which thanks to the right orthogonality Hn1;n H0 n;n1 D In1 coincides with its transpose, 0 0 1 D H0 2 Rn.n1/ : H R D H .HH /
Lemma B.4. (quadratic Helmert transformation): The quadratic Helmert matrix H 2 Rnn , also called extended Helmert matrix or augmented Helmert matrix, transforms the quadratic form 1 1 .y 1/0 .y 1/ D 2 Œ.y 1b / C 1.b /0 Œ.y 1b / C 1.b / 2 subject to b D
1 0 1y n
by means of z D 1 H.y 1/ or y D H0 .z 1/ into the canonical form n1
X 1 0 2 2 2 .y 1/ .y 1/ D z C C z C z D z2j C z2n 1 n1 n 2 j D1 z2n D n.b /2 :
B-4 Sampling from the Gauss–Laplace Normal Distribution:
679
Such a Helmert transformation y 7! z D 1 H.y 1/ is represented by displaylines H WD 2
p1 12 1 p 23 p1 34
6 6 6 6 6 6 6 6 6 6 6p 1 6 .n1/.n2/ 6 6 6 p 1 n.n1/ 4 p1 n
p1 12 1 p 23 p1 34
0 p2
23
p1 34
0
0
0
0
p334
0
p
1 1 1 p p .n1/.n2/ .n1/.n2/ .n1/.n2/
n1 p.n1/.n2/
p 1 n.n1/
p 1 n.n1/
p 1 n.n1/
p 1 n.n1/
p1 n
p1 n
p1 n
p1 n
0
3
7 7 7 7 7 0 7 7 7 7 7 7 0 7 7 7 n 7 pn.n1/ 5 0
p1 n
Since the quadratic Helmert matrix is orthonormal, the inverse Helmert transformation is generated by z 7! y 1 D H0 z: The proofs for Lemmas B.8 and B.9 are based on generalizations of the special cases for n D 2, Example B.11, and for n D 3, Example B.12. Any proof will be omitted here. The highlight of this paragraph is the following theorem. Theorem B.10. (marginal probability distribution of .b ; 2 / and b 2 ): The cumulative pdf of a set of n observations is represented by dF D f .y1 ; : : : ; yn /dy1 dyn D f1 .b /f4 .b /db db D f1 .b /f2 .b 2 /db db 2 as the product of the marginal pdf f1 .b / of the sample mean b D n1 1y and the marginal pdf f2 .b / of the sample standard deviation b D p .y 1b /0 .y 1b /=.n 1/, also called r.m.s. (root mean square error), or the marginal pdf f2 .b 2 / of the sample variance b 2 . Those marginal pdfs are represented by dF1 D f1 .b /db dF4 D f4 .b /db and dF2 D f2 .b 2 /db 2;
680
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
(a) sample mean b 2 WD j; / D f1 b f1 .b n
1 /2 1 .b p exp 2 2 =n p 2
n
p
n 1 2 1 z WD .b / W f1 .z/d z D p exp z d z 2 2
(b) sample r.m.s. b p WD n 1 dF4 D f4 .b /db 2 2p p pb p1 f4 .b / D p p=2 p b exp 2 .2/ 2 2 p p p b b x WD n 1 D
2 1 2 p1 f4 .x/dx D p=2 p x exp x dx 2 .2/ 2 dF4 D f4 .x/dx (c) sample variance b 2 p WD n 1 2/ D f2 .b
2 1 b p=2 p2 p p ; b exp p 2p=2 . p2 / 2 2 1
b 2 p 2 D 2b W 2 p 1 1 1 2 f2 .x/dx D p=2 p x exp x dx: 2 .2/ 2 x WD .n 1/
2
f1 .b j; n / as the marginal pdf of the sample mean BLUUE of is a Gaussp Laplace pdf with mean and variance 2 =n. f2 .x/; x W D pb =, is the standard pdf of the normalized root-mean-square error with p degrees of freedom. In contrast, f2 .x/; x W D pb 2 = 2 is a Helmert Chi Square 2p pdf with p D n 1 degrees of freedom. Before we present a sketch of a proof of Theorem B.10 which will be run with five action items and a special reference to the first and second vehicle, we give
B-4 Sampling from the Gauss–Laplace Normal Distribution:
681
some historical comments. Kullback (1934) refers the marginal pdf f1 .b / of the “arithmetic mean” b to Poisson (1827), Haussdorff (1901) and Irwins (1937). He has also solved the problem to find the marginal pdf of the “geometric mean”. The marginal pdf f2 .b 2 / of the sample variance b 2 has been originally derived by Helmert (1875), Helmert (1876a), Helmert (1876b). A historical discussion of Helmert’s distribution is offered by David (1957), Kruskal (1946), Lancaster (1965), Lancaster (1966), Pearson (1931) and Sheynin (1995). The marginal pdf f4 .b / has not found any interest in practice so far. The reason may be found in the effect that b is not an unbiased estimate of the standard deviation , namely Efb g D . According to Cz¨uber (1891), p. 162, Ros`een (1948), p. 37, Schmetterer (1956), p. 203, Stoom (1967), p. 199, 218, Fisz (1971), p. 240 and Richter and Mammitzsch (1973), p. 42 have documented that r
D
p 2
p 2 b D qb pC1 2
is an unbiased estimation of the standard deviation , namely Efb g D . p D n 1 again denotes the number of degrees of freedom. Schaffrin (1979), p. 240 has proven that r p p 1 .y 1b /0 .y 1/ 2p b p WD Db p r p 2 2p 1 1 p 2 is an asymptotic (“from above”) unbiased estimation b p of the standard deviation . Let us implement BLUUE of into the marginal pdf f .b /:
Corollary B.8. (marginal probability distributions of for Gauss-Laplace i.i.d. observations, Ef g D ):
The marginal pdf of , an unbiased estimation of the standard deviation, is represented by
dF4 D f4 . /d 0 0 12 1 pC1 pC1 B B C 2C 2p p p1 2 2 B C C
f4 . / D p p=2 p exp B B C
p A A @ @ 2 p 2 pC1 p 2 2 2
p
682
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
Fig. B.6 Marginal pdf for the sample standard deviation (r.m.s.)
and dF4 D f4 .x/dx pC1 p 2 2
p b x WD 2 2 1 2 p1
p x f4 .x/ D exp x 2 2p=2 2 subject to pC1 2
p : 2
p Efxg D 2
Figure B.6 illustrates the marginal pdf for “degrees of freedom” p 2 f1; 2; 3; 4; 5g.
B-4 Sampling from the Gauss–Laplace Normal Distribution:
683
Proof. The first action item The pdf of n Gauss-Laplace i.i.d. observations is given by f .y1 ; : : : ; yn / D f .y1 / f .yn / D
n Y
f .yi /
i D1
1 f .y1 ; : : : ; yn / D .2 /n=2 n exp 2 .y 1/0 .y 1/ : 2 The coordinates of the observation space Y have been denoted by Œy1 ; : : : ; yn 0 D y. Note dim Y D n. The second action item The quadratic form .y 1/0 .y 1/ > 0 allows the fundamental decomposition y 1 D .y 1b / C 1.b /; 0
.y 1/ .y 1/ D .y 1b /0 .y 1b / C 10 1.b /2 10 1 D n .y 1/0 .y 1/ D .n 1/b 2 C n.b /2 : Here, b is BLUUE of and b 2 BIQUUE of 2 . The decomposition of the quadratic 0 .y 1/ .y 1/ into the sample variance b 2 and the square .b /2 of the shifted sample mean b has already been proved for n D 2 and n=3. The general result is obvious. The third action item Let us transform the cumulative probability into its canonical forms. dF D f .y1 ; : : : ; yn /dy1 dyn 1 D .2 /n=2 n exp 2 .y 1/0 .y 1/ dy1 dyn 2
1 2 C n.b D .2 /n=2 n exp 2 .n 1/b /2 dy1 dyn 2 z D 1 H.y 1/ or y 1 D H0 z 1 .y 1/0 .y 1/ D z21 C z22 C C z2n1 C z2n : 2 Here, we have substituted the divert Helmert transformation (quadratic Helmert matrix H) and its inverse. Again 1 is the scale factor, also called dilatation,
684
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
H an orthonormal matrix, also called rotation matrix, and 1 2 Rn the translation, also called shift. dF D f .y1 ; : : : ; yn /dy1 dyn 1 2 1 1 1 2 2 exp .z1 C C zn / dz1 dz2 dzn1 dzn D p exp zn 2 .2 /.n1/=2 2 2
based upon dy1 dy2 dyn1 dyn D n d z1 d z2 d zn1 d zn J D n jH0 j D n jHj D n : J again denotes the absolute value of the Jacobian determinant introduced by the first vehicle. The fourth action item First, we identify the marginal distribution of the sample mean b . 2 dF1 D f1 b db : j; n According to the specific structure of the quadratic Helmert matrix zn is generated by
zn D 1
2
3 y1 1 y1 C C yn n 1 6 7 p ; : : : ; p 4 ::: 5 D 1 p ; n n n yn
upon substituting b D
y1 C C yn 1 0 1yD ) y1 C C yn D nb ) n n
zn D 1
n.b / .b /2 1p p ; d z D n db : ) z2n D n n 2 n
Let us implement d zn in the marginal distribution. 1 1 dF1 D p exp z2n d zn 2 2
B-4 Sampling from the Gauss–Laplace Normal Distribution:
685
C1 C1 Z Z 1 .2 /.n1/=2 exp .z21 C C z2n1 / d z1 d zn1 ; 2
1
1
C1 C1 Z Z 1 .2 /.n1/=2 exp .z21 C C z2n1 / d z1 d zn1 D 1 2
1
1
1 1 2 dF1 D p exp zn d zn 2 2
! /2 2 1 1 1 .b db D f1 b db : j; dF1 D p exp 2 2 n 2 pn n
The fifth action item Second, we identify the marginal distribution of the sample variance b 2 . We depart the ansatz /db D f2 .b 2 /db 2 dF2 D f2 .b in order to determine the marginal distribution f2 .b / of the sample root-meansquare errors b and the marginal distribution f2 .b 2 / of the sample variance b 2. A first version of the marginal probability distribution dF2 is Z1 1 1 2 exp zn d zn dF2 D p 2 2
1 1 1 2 2 .z exp C C z / d z1 d zn1 : n1 .2 /.n1/=2 2 1 Transform the Cartesian coordinates .z1 C C zn1 / 2 Rn1 to spherical coordinates .˚1 ; ˚2 ; : : : ; ˚n2 ; rn1 /. From the operational point of view, p D n1, the number of “degrees of freedom”, is an optional choice. Let us substitute the global hypersurface element !n1 or !p into dF2 , namely 1 p 2
dF2 D
Z1 1
1 2 exp zn d zn D 1 2
1 2 n2 r r exp dr 2.n1/=2 2 1
686
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions C =2 Z
n3
cos
C =2 Z
n2 dn2
=2
p2
cos
=2
cos
C =2 Z
n3 dn3
=2
C =2 Z
p1
p3
cos
C =2 Z
p2 dp2
2
C =2 Z
Z2
cos2 d2 =2
d1 0
2 2 .n1/=2 D p p=2 n1 2 2
p2
cos
d1 0
cos 3 d3
=2
!n1 D !p D
D
cos2 d2
=2
=2
Z =2
Z2
1 1 2 p1 r dr r exp .2 /p=2 2
dF2 D C =2 Z
n4
Z =2 p1 dp1
=2
cosp3 p2 dp2
=2
Z =2
2
Z =2
cos 3 d3 =2
=2
Z2
cos2 d2
!p 1 2 p1 dF2 D r exp r dr .2 /p=2 2 dF2 D
d'1 0
2 1 p r p1 exp r 2 dr : 2 2p=2 2
The marginal distribution of the r.m.s. f2 .b / is generated as soon as we substitute q p 2 2 the radius r D z1 C C zn1 D n 1 b 1 =. Alternatively the marginal
2 / is produced when we substitute the radius distribution of the sample variance f2 .b square r 2 D z21 C C z2n1 D .n 1/b 2 = 2 . Project A p p p p b db n1 b = D p ) dr D dF2 D f2 .b /db 2p p p1 1 p 2 f2 .b / D p p=2 b exp 2 b : 2 2 rD
B-4 Sampling from the Gauss–Laplace Normal Distribution:
687
Indeed, f2 .b / establishes the marginal distribution of the root-mean-square error b with p D n 1 degrees of freedom. Project B 2
x WD r 2 D .n 1/
dx D 2rdr b b 6 2 D p DW
) 4 dx 1 dx p 2 2 dr D D p 2r 2 x 2
2
1 p 1 x 2 dx 2 dF2 D f2 .x/dx
r p1 dr D
p 1 1 1 2 f2 .x/ WD p=2 p x exp x : 2 .2/ 2
Finally, we have derived Helmert’s Chi Square 2p distribution f2 .x/dx D dF2 by substituting r 2 , r p1 and dr in factor of x WD r 2 and dx D 2rdr. Project C 2 = 2 D pb 2 = 2 by rescaling Replace the radical coordinate squared r 2 D .n 1/b 2 on the basis p= x D r 2 D z21 C C z2n1 D z21 C C z2p D .n 1/ dx D
p db 2
b 2 p 2 D 2b 2
within Helmert’s 2p with p D n 1 degrees of freedom 2 /db 2 dF2 D f2 .b f2 .b 2/ D
1 p 2 p=2 p2 p b exp b : p 2p=2 . p2 / 2 2 1
Recall that f2 .b 2 / establishes the marginal distribution of the sample variance b 2 with p D n 1 degrees of freedom. Both, the marginal pdf f2 .b / of the sample standard deviation b , also called root-mean-square error, and the marginal pdf f2 .b 2 / of the sample variance b 2 document the dependence on the variance 2 and p its power . “Here is our journey’s end.”(W. Shakespeare: Hamlet) |
688
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
B-42 The Confidence Interval for the Sample Mean, Variance Known An application of Theorem B.10 is Lemma B.12 where we construct the confidence interval for the sample mean b , BLUUE of , variance 2 known, on the basis of its sampling distribution. Example B.14 is an example of a random sample of size n D 4. Lemma B.12. (confidence interval for the sample mean, variance known): The sampling distribution of the sample mean b D n1 10 y, BLUUE of , is Gauss 2 Laplace normal, N .b j; =n/, if the observations yi ; i 2 f1; : : : ; ng, are Gauss-Laplace i.i.d. The “true” mean is an element of the two-sided confidence interval. 2b p c1˛=2 ; b C p c1˛=2 Œ n n with confidence P b p c1˛=2 < < b C p c1˛=2 D 1 ˛ n n of level 1 ˛. For three values of the coefficient of confidence D 1 ˛, Table B.9 is a list of associated quantiles c1˛=2 . Example B.14. (confidence interval for the sample mean b , 2 known): Suppose that a random sample 2
3 2 3 y1 1:2 6 y2 7 6 3:4 7 7 6 7 D 2:7; 2 D 9; r:m:s: W 3 y WD 6 4 y3 5 D 4 0:6 5 ; b y4
5:6
of four observation is known from a Gauss-Lapace normal distribution with unknown mean and a known standard deviation D 3. b BLUUE of is the arithmetic mean. We intend to determine upper and lower limits which are rather certain to contain the unknowns parameter between them. Previously, for samples of size 4 we have known that the random variable zD
b p n
D
b 3 2
B-4 Sampling from the Gauss–Laplace Normal Distribution:
689
is normally distributed with mean zero and unit variance. b is the sample mean p 2.7 and 3/2 is = n. The probability D 1 ˛ that z will be between any two arbitrarily chosen numbers c1 D c and c2 D c is Zc2 P fc1 < z < Cc2 g D
f .z/d z D D 1 ˛; c1
ZCc P fc < 2 < Ccg D f .z/d z D D 1 ˛; c
Zc f .z/d z D 1 1
Zc f .z/d z D 1
˛ ; 2
˛ : 2
is the coefficient of confidence, ˛ the coefficient of negative confidence, also called complementary coefficient of confidence. The four representations of the probability D 1 ˛ to include z in the confidence interval c < z < Cc have led to the linear Voltera integral equation of the first kind Zc f .z/d z D 1 1
1C ˛ D : 2 2
Three values of the coefficient of confidence or its compliment ˛ are popular and listed in Table B.9. Table B.9 (Values of the coefficient of confidence): ˛ 1
˛ 2
˛ 2
D
0:950 0:050 0:025 0:975
1C 2
0:990 0:010 0:005 0:995
0:999 0:001 0:000; 5 0:999; 5
In solving the linear Voltera integral equation of the first kind Zz 1
1 1 f .z /d z D 1 ˛.z/ D Œ1 C .z/: 2 2
Table B.10 collects the quantiles c1˛=2 given the coefficients of confidence on their complements which we listed in Table B.10.
690
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
Table B.10 (Quantiles for the confidence interval of the sample mean, variance unknown): D 1C 2 0:975 0:995 0:999; 5
1
˛ 2
0:95 0:99 0:999
˛ 0:05 0:01 0:001
c1˛=2 1:960 2:576 3:291
Given the quantiles c1˛=2 , we are going to construct the confidence interval for 2 the sample mean b , the variance p to be known. For this purpose, we convert the forward transformations b ! z D n.b /= to . b p zD n
b 1 WD b p c1˛=2 < < b C p c1˛=2 DW b 2 : n n The interval b 1 < < b 2 for the fixed value z D c1˛=2 contains the “true” mean with probability D 1 ˛. P b p c1 ˛2 < < b C p c1 ˛2 n n 2 Zc Zb 2 D f .z/d z D f b db D D 1˛ j; n c b 1 since 2 Zb
1
C p n c1˛=2 b c1˛=2 Z Z ˛ f .b /db D f .b /db D f .b /db D1 : 2 1
1
An animation of the coefficient of confidence and the probability functions of a confidence interval is offered by Figs. B.7 and B.8. b 1 D b c1˛=2 p ; n
C c1˛=2 p b 2 D b n
Let us specify all the integrals to our example c1˛=2
Z
f .z/d z D 1 ˛=2 1
B-4 Sampling from the Gauss–Laplace Normal Distribution: Fig. B.7 Two-sided confidence interval 1 ; b 2 Œ; f .b j; 2 =n/ 2b pdf :
691
f(mˆ | m, σ2)
f(mˆ | m, σ2) a/2
μ
mˆ1
mˆ 1 = mˆ − c1−a /2 σ n
a/2
g = 1-a
m
mˆ 2
mˆ 1
mˆ 1 = mˆ + c1−a /2
σ n
Fig. B.8 Two-sided confidence interval, quantile c1˛=2
1:960 Z
2:576 Z
f .z/d z D 0:975; 1
3:291 Z
f .z/d z D 0:995; 1
f .z/d z D 0:999; 5: 1
Those data lead to a triplet of confidence intervals. case .a/ D 0:95; D 0:05; c1˛=2 D 1:960 3 3 P 2:7 1:96 < < 2:7 C 1:96 D 0:95 2 2 P f0:24 < < C5:64g D 0:95 case .b/ D 0:99; D 0:01; c1˛=2 D 2:576 3 3 P 2:7 2:576 < < 2:7 C 2:576 D 0:99 2 2 P f1:164 < < 6:564g D 0:99 case .c/ D 0:999; D 0:001; 3 P 2:7 3:291 < < 2:7 C 2
c1˛=2 D 3:291 3 3:291 D 0:999 2
P f2:236 < < 7:636g D 0:999 : With probability 95% the “true” mean is an element of the interval ]0.24, C5.64[. In contrast, with probability 99% the “true” mean is an element of the larger interval ]1.164, C6.564[. Finally, with probability 99.9% the “true” mean is an element of the largest interval [2.236, C7.636].
692
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
B-5 Sampling from the Gauss–Laplace Normal Distribution: A Third Confidence Interval for the Mean, Variance Unknown In order to derive the sampling distributions for the sample mean, variance unknown, of Gauss-Laplace i.i.d. observation B51 introduces two examples (two and three observations, respectively, for generating Student’s tp distribution. Lemma B.13 reviews Student’s t-distribution of the random variable n0 .b /=b where the sample mean b is BLUUE of , whereas the sample variance b 2 is BIQUUE of 2 . B52 by means of Lemma B.13 introduces the confidence interval for the “true” mean variance 2 unknown, which is based on Student’s probability distribution. For easy computation, Table B.12 is its flow chart. B53 discusses The Uncertainly Principle generated by The Magic Triangle of (a) length of confidence interval, (b) coefficient of negative confidence, also called the uncertainty number, and (c) the number of observations. Various figures and examples pave the way for the routine analyst’s use of the confidence interval for the mean, variance unknown.
B-51 Student’s Sampling Distribution of the Random Variable b /= b . Two examples for n D 2 or n D 3 Gauss-Laplacep i.i.d. observations keep us to derive Student’s t-distribution for the random variable n.b /=b where b is BLUUE of , whereas p D n1 is BIQUUE of 2. Lemma B.12 and its proof is the highlight of this paragraph in generating the sampling probability distribution of Student’s t. Example B.15 (Student’s t-distribution for two Gauss-Laplace i.i.d. observations): First, assume an experiment of two Gauss-Laplace i.i.d. observations called y1 and y2 : We want to prove that .y1 C y2 /=2 and .y1 y2 /2=2 or the sample mean b and the sample variance b 2 are stochastically independent. y1 and y2 are elements of the joint pdf. f .y1 ; y2 / D f .y1 /f .y2 / 1 1 2 2 exp 2 Œ.y1 / C .y2 / f .y1 ; y2 / D 2 2 2 1 1 0 f .y1 ; y2 / D exp .y 1/ .y 1/ : 2 2 2 2 2, The quadratic form .y 1/0 .y 1/ is decomposed into the sample variance b 2 BIQUUE of , and the deviate of the sample mean b , BLUUE of , from by means of the fundamental separation
B-5 Sampling from the Gauss–Laplace Normal Distribution: A Third Confidence
693
y 1 D y 1b C 1.b / /0 .y 1b / C 10 1.b /2 .y 1/0 .y 1/ D .y 1b .y 1/0 .y 1/ D b 2 C 2.b /2 f .y1 ; y2 / D
2 1 1 2 2 exp : exp b .b / 2 2 2 2 2 2
The joint pdf f .y1 ; y2 / is transformed into a special form if we replace b D .y1 C y2 /=2; /2 C .y2 b /2 D b 2 D .y1 b
1 .y1 y2 /2 ; 2
namely 1 y1 C y2 1 1 1 2 2 / : f .y1 ; y2 / D exp 2 .y1 y2 / exp 2 . 2 2 2 2 2 Obviously the product decomposition of the joint pdf documented that (y1+y2)/2 and (y1-y2)2/2 or b and b 2 are independent random variables. p Second, we intend to derive the pdf of Student’s random variable t WD 2.b /=b , the deviate of the sample mean b form the “true” mean , normalized by the sample standard deviations b . Let us introduce the direct Helmert transformation 1 y1 y2 b z1 D p D ; 2
p 2b 2 y1 C y2 p ; z2 D D 2 2
or
z1 z2
1 D
"
p1 2 p1 2
p1 p1 2
2
#
y1 ; y2
as well as the inverse
" 1 p y1 2 D y2 p1
p1 2 p1 2 2
#
z1 ; z2
which brings the joi ntpdf dF D f .y1 ; y2 /dy1 dy2 D f .z1 ; z2 /d z1 d z2 into the canonical form.
694
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
1 2 1 2 1 1 dF D p exp z1 p exp z2 d z1 d z2 2 2 2
2
ˇ ˇ 1 ˇ p p1 ˇ 1 ˇ 2 2ˇ dy1 dy2 D ˇ 1 1 ˇ d z1 d z2 D d z1 d z2 : ˇp p ˇ 2 2
The Helmert random variable x WD z21 or z1 D such that
2
p x replaces the random variable z1
1 1 p dxd z2 2 x 1 1 1 1 1 2 dF D p p exp x p exp z2 dxd z2 2 2 2 2 x 2
d z1 d z2 D
is the joint pdf of x and z2 . Finally, we introduce Student’s random variable p b 2 b p 2 .b / ) b D p z2 z2 D 2 p p b D z1 D x z1 D x D ) b
t WD
p z2 t D p , z2 D x t ) z22 D x t 2 : x Let us transform dF D f .x; z2/dxd z2 to dF D f .t; x/dtdx, namely from the joint pdf of the Helmert random variable x and the Gauss-Laplace normal variate z2 to the joint pdf of the Student random variable t and the Helmert random variable x. ˇ ˇ ˇD z D z ˇ ˇ t 2 x 2ˇ d z2 dx D ˇ ˇ dt dx ˇ Dt x Dx x ˇ ˇp p ˇ ˇ ˇ 2 ˇ x 2 x 3=2 t ˇ d z2 dx D ˇ ˇ dt dx ˇ 0 ˇ 1 p d z2 dx D x dt dx dF D f .t; x/dtdx 1 1 1 exp .1 C t 2 /x : f .t; x/ D 2 2 2
B-5 Sampling from the Gauss–Laplace Normal Distribution: A Third Confidence
695
The marginal distribution of Student’s random variable t, namely dF3 D f3 .t/dt, is generated by 1 1 f3 .t/ WD 2 2
Z1 0
1 2 exp .1 C t /x dx 2
subject to the standard integral Z1 0
1 1 1 exp.ˇx/dx D exp .ˇx/ D ˇ ˇ 0 ˇ WD
1 .1 C t 2 /; 2
2 1 D ˇ 1 C t2
such that f3 .t/ D dF3 D
1 1 ; 2 1 C t 2 1 1 dt 2 1 C t 2
and characterized by a pdf f3 .t/ which is reciprocal to .1 C t 2 /. Example B.16. (Student’s t-distribution for three Gauss-Laplace i.i.d. observations): First, assume an experiment of three Gauss-Laplace i.i.d. observations called y1 ; y2 and y3 : We want to derive the joint pdf f .y1 ; y2 ; y3 / in terms of the sample mean b , BLUUE of , and the sample variance b 2 , BIQUUE of 2 . f .y1 ; y2 ; y3 / D f .y1 /f .y2 /f .y3 / 1 1 2 2 2 exp Œ.y / C .y / C .y / f .y1 ; y2 ; y3 / D 1 2 3 .2 /3=2 3 2 2 1 1 0 f .y1 ; y2 ; y3 / D exp .y 1/ .y 1/ : .2 /3=2 3 2 2 2 The quadratic form .y 1/0 .y 1/ is decomposed into the sample variance b and the deviate sample mean b from the “true” mean by means of the fundamental separation
696
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
y 1 D y 1b C 1.b / /0 .y 1b / C 10 1.b /2 .y 1/0 .y 1/ D .y 1b .y 1/0 .y 1/ D 2b 2 C 3.b /2 dF D f .y1 ; y2 ; y3 /dy1 dy2 dy3 3 1 1 2 2 exp dy1 dy2 dy3 : D exp 2b .b / .2 /3=2 3 2 2 2 2 p Second, we intend to derive the pdf of Student’s random variable t WD 3.b /=b , the deviate of the sample mean b from the “true” mean , normalized by the sample standard deviation b . Let us introduce the direct Helmert transformation 1 1 p .y1 C y2 / D p .y1 C y2 2/ 2 2 1 1 z2 D p .y1 C y2 2y3 C 2/ D p .y1 C y2 2y3 / 23 23 1 1 z3 D p .y1 C y2 C y3 / D p .y1 C y2 C y3 3/ 3 3 z1 D
or 2 1 3 p z1 6 112 1 4 z2 5 D 6 p 4 23 z3 p1 2
3
p1 12 p1 23 p1 3
3
2 3 y1 7 4 y2 5 p2 7 23 5 y3 p1 0
3
as well as its inverse 2 1 3 p p1 23 y1 6 112 4 y2 5 D 6 p p1 4 12 23 y3 0 p2 2
p1 3 p1 3 p1 23 3
3
2 3 7 z1 7 4 z2 5 5 z3
in general z D 1 H.y /
versus .y 1/ D H0 z;
which help us to bring the joint pdf dF .y1 ; y2 ; y3 /dy1 dy2 dy3 Df .z1 ; z2 ; z3 /d z1 d z2 d z3 into the canonical form.
B-5 Sampling from the Gauss–Laplace Normal Distribution: A Third Confidence
697
1 1 2 1 2 2 dF D exp .z1 C z2 / exp z3 d z1 d z2 d z3 .2 /3=2 2 2 ˇ 1 1 ˇp ˇ 12 p23 ˇ 1 dy1 dy2 dy3 D ˇˇ p112 p123 3 ˇ ˇ 0 p2 23
p1 3 p1 3 p1 3
ˇ ˇ ˇ ˇ ˇ d z1 d z2 d z3 D d z1 d z2 d z3 : ˇ ˇ ˇ
q p The Helmert random variable x WD z21 C z22 or x D z21 C z22 replaces the random q variable z21 C z22 as soon as we introduce polar coordinates z1 D r cos 1 , z2 D r sin 1 , z21 C z22 D r 2 DW x and compute the marginal pdf
Z2
1 2 1 1 1 1 exp x dx d1 ; dF .z3 ; x/ D p exp z3 d z3 2 2 2 2 2
0
by means of x WD z21 C z22 D r 2 ) dx D 2rdr;
dr D
dx ; 2r
1 d z1 d z2 D rdrd1 D dxd1 ; 2 1 1 1 1 2 dF .z3 ; x/ D exp x p exp z3 dxd z3 ; 2 2 2 2
the joint pdf of x and z. Finally, we inject Student’s random variable t WD
p b ; 3 b
decomposed into p 3 z3 D .b / ) b D p z3 3 q q p p b p D p z21 C z22 D x D 2 ) b z21 C z22 D p x 2 2 z3 p 1 p 1 tDp 2 , z3 D p x t ) z23 D x t 2 : 2 x 2
698
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
Let us transform dF D f .x; z3 /dxd z3 to dF D f .t; x/dtdx. Alternatively we may say that we transform the joint pdf of the Helmert random variable x and the GaussLaplace normal variate z3 to the joint pdf of the Student random variable t and the Helmert random variable x. ˇ ˇ ˇD z D z ˇ ˇ t 3 x 3ˇ d z3 dx D ˇ ˇ dtdx ˇ Dt x Dx x ˇ ˇp ˇ ˇ px p ˇ 1 ˇ 2 2 2 x 3=2 t ˇ d z3 dx D ˇ ˇ dtdx ˇ 0 ˇ 1 p x d z3 dx D p dtdx 2 dF D f .t; x/dtdx 1 p t2 1 1 f .t; x/ D p p x exp .1 C /x : 2 2 2 2 2
The marginal distribution of Student’s random variable t, namely dF3 D f3 .t/dt, is generated by 1 1 f3 .t/ WD p p 2 2 2
Z1 0
p t2 1 1C x dx x exp 2 2
subject to the standard integral Z1
x ˛ exp.ˇx/dx D
0
.˛ C 1/ ; ˇ ˛C1
˛D
1 ; 2
ˇD
1 t2 1C 2 2
such that Z1 0
p . 32 / t2 1 1C x dx D 23=2 x exp 2 2 2 .1 C t2 /3=2 1p 3 D
2 2 3 1 p1 f3 .t/ D ; 2 .1C t 2 /3=2 2 2 p 2 dF3 D dt: 2 4.1 C t2 /3=2
Again Student’s t-distribution is reciprocal t.1 C t2=2/3=2.
B-5 Sampling from the Gauss–Laplace Normal Distribution: A Third Confidence
699
Lemma B.13. Student’s t-distribution for the derivate of the mean .b /=b (Gosset, 1908)): Let the random vector of observations y D Œy1 ; : : : ; yn 0 be Gauss-Laplace i.i.d. Student’s random variable t WD
b p n; b
where the sample mean b is BLUUE of and the sample variance b 2 is BIQUUE 2 of ; associated to the pdf f .t/ D
. pC1 1 1 2 / : p . p2 / p .1 C t 2 /.pC1/=2 p
p D n 1 is the “degree of freedom” of Student’s distribution fp .t/. Gosset p (1908) published the t-distribution of the ratio n.b /=b under the pseudonym “Student”: The probable error of a mean. Proof.
p /= and The joint probability distribution of the random variable zn WD n.b the Helmert random variable x WD z21 C C z2n1 D .n 1/b 2 = 2 represented by dF D f1 .zn /f2 .x/d zn dx and .n 1/b 2 are stochastically due to the effect that zn and z21 C C z2n1 or b independent. Let us take reference to the specific pdfs with n and n 1 D p degrees of freedom dF1 D f1 .zn /d zn
and dF2 D f2 .x/dx; p=2 1 1 1 1 1 .p2/=2 x ; f1 .zn / D p exp z2n and f2 .x/ D x exp 2 . p2 / 2 2 2
or dF1 D f1 .b /db and dF2 D f2 .b 2 /db 2; p n .b /2 n f1 .b / D p exp 2 2 2
f2 .b 2/ D and
1 p p=2b p2 p 2p=2 .p=2/ 1 p 2 exp 2 b 2
we derived earlier. Here, let us introduce Student’s random variable
700
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
zn p b p t WD p n1 D n b x
p p x x or zn D p t D p t: p n1
By means of the Jacobi matrix J and its absolute Jacobi determinant jJj
dt dx
Dzn t Dx t D Dzn x Dx x
p 3 12 x 3=2 zn p 5; JD4 0 1 2p
p p x
d zn dx
d zn DJ dx
p p jJj D p ; x
jJj
1
p x Dp p
we transform the surface element d zn dx to the surface element jJj1 dtdx, namely p x d zn dx D p dtdx: p Lemma B.14. (Gamma Function): Our final action item is to calculate the marginal probability distributions dF3 D f3 .t/dt of Student’s random variable t, namely dF3 D f3 .t/dt 1 1 f3 .t/ WD p 2p .p=2/
p=2 Z1 1 t2 1 .p1/=2 1C x dx: x exp 2 2 p 0
Consult (Gr¨obner and Hofreiter, 1973, p. 55), for the standard integral Z1 0
Z1 0
x ˛ exp.ˇx/dx D
˛Š .˛ C 1/ D ˇ ˛C1 ˇ ˛C1
and
. pC1 t2 1 2 / 1C x dx D 2.pC1/=2 x .p1/=2 exp ; 2 t 2 p .1 C /.pC1/=2 p
where p D n 1 is the rank of the quadratic Helmert matrix Hn . Notice a result p1 of the gamma function . pC1 2 / D . 2 /Š. In summary, substituting the standard integral
B-5 Sampling from the Gauss–Laplace Normal Distribution: A Third Confidence
701
dF3 D f3 .t/dt f3 .t/ D
. pC1 / 1 1 2 p p 2 t .2/ p .1 C /.pC1/=2 p
resulted in Student’s t distribution, namely the pdf of Student’s random variable .b /=b .
B-52 The Confidence Interval for the Mean, Variance Unknown Lemma B.12 is the basis for the construction of the confidence interval of the “true” mean, variance unknown, which we summarize in Lemma B.13, Example B.17 contains all details for computing such a confidence interval, namely Table B.11, a collection of the most popular values of the coefficient of confidence, as well as Tables B.12, B.13 and B.14, listing the quantiles for the confidence interval of the Student random variable with p D n 1 degrees of freedom. Figures B.9 and B.10 illustrate the probability of two-sided confidence interval for the mean, variance unknown, and the limits of the confidence interval. Table B.15 as a flow chart paves the way for the “fast computation” of the confidence interval for the “true” mean, variance unknown. Lemma B.15. (confidence interval unknown):
for the sample mean,
variance
p /=b characterized by the ratio of the deviate The random variable t D n.b of the sample mean b D n1 10 y, BLUUE of , from the “true” mean and the standard deviation b , b 2 D .y 1b /0 .y 1b /=.n 1/ BIQUUE of the “true” 2 variance , has the Student t-distribution with p D n 1 “degrees of freedom”. The “true” mean is an element of the two-sided confidence interval b 2b p c1˛=2 ; n
b b C p c1˛=2 Œ n
with confidence b b C p c1˛=2 D 1 ˛ P b p c1˛=2 < < b n n
of level 1˛. For three values of the coefficient of confidence D 1˛, Table B.12 is a list of associated quantiles c1˛=2 .
702
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
Example B.17. (confidence interval for the example mean b , 2 unknown): Suppose that a random sample 2 3 2 3 y1 1:2 6 y2 7 6 3:4 7 6 7D6 7 4 y3 5 4 0:6 5 ; y4
b 2 D 5:2;
b D 2:7;
b D 2:3
5:6
of four observations is characterized byp the sample mean b D 2:7 and the sample variance b 2 D 5:2. 2.2:7 /=2:3 D n.b /=b D t has Student’s pdf with p D n1 D 3 degrees of freedom. The probability D 1˛ that t will be between any two arbitrarily chosen numbers c1 D c and c2 D Cc is Zc2 Pfc1 < t < c2 g D
f .t/dt D D 1 ˛ c1
ZCc Pfc < t < cg D f .t/dt D D 1 ˛ c
Zc Pfc < t < cg D
Zc f .t/dt
1
Zc f .t/dt D 1 1
f .t/dt D D 1 ˛
1
˛ ; 2
Zc f .t/dt D 1
˛ 2
is the coefficient of confidence, ˛ the coefficient of negative confidence, also called complementary coefficient of confidence. The four representations of the probability D 1 ˛ to include t in the confidence interval c < t < Cc have led to the linear Volterra integral equation of the first kind Zc f .t/dt D 1 1
1 ˛ D .1 C /: 2 2
Three values of the coefficient of confidence or its complement ˛ are popular and listed in Table B.10. Table B.11 (Values of the coefficient of confidence): ˛ 1
˛ 2
˛ 2
D
1C 2
0:950 0:050 0:025 0:975
0:990 0:010 0005 0:995
0:999 0:001 0:000; 5 0:999; 5
B-5 Sampling from the Gauss–Laplace Normal Distribution: A Third Confidence
703
In solving the linear Volterra integral equation of the first kind Zt 1
1 1 f .t /dt D 1 ˛.t/ D Œ1 C .t/; 2 2
which depends on the pdegrees of freedom p D n 1, Tables B.12–B.14 collect the quantiles c1˛=2 = n given the coefficients of confidence or their complements which we listed in Table B.11. p Table B.12 (Quantiles c1˛=2 = n for the confidence interval of the Student random variable with p D n 1 degrees of freedom): 1 ˛=2 D .1 C /=2 D 0:975; p
n
1 2 3 4 5 6 7 8 9
2 3 4 5 6 7 8 9 10
p1 c1˛=2 n
8:99 2:48 1:59 1:24 1:05 0:925 0:836 0:769 0:715
p
D 0:95; n
14 15 19 20 24 25 29 30 39 40 49 50 99 100 199 200 499 500
˛ D 0:05
p1 c1˛=2 n
0:554 0:468 0:413 0:373 0:320 0:284 0:198 0:139 0:088
p Table B.13 (Quantiles c1˛=2 = n for the confidence interval of the Student random variable with p D n 1 degrees of freedom): 1 ˛=2 D .1 C /=2 D 0:995; p
n
1 2 3 4 5 6 7 8 9
2 3 4 5 6 7 8 9 10
p1 c1˛=2 n
45:01 5:73 2:92 2:06 1:65 1:40 1:24 1:12 1:03
p
D 0:990; n
14 15 19 20 24 25 29 30 39 40 49 50 99 100 199 200 499 500
˛ D 0:010
p1 c1˛=2 n
0:769 0:640 0:559 0:503 0:428 0:379 0:263 0:184 0:116
704
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
p Table B.14 (Quantiles c1˛=2 = n for the confidence interval of the Student random variable with p D n 1 degrees of freedom): 1 ˛=2 D .1 C /=2 D 0:999:5; D 0:999; ˛ D 0:001 p p p n c1˛=2 c1˛=2 = n p n c1˛=2 c1˛=2 = n 1 2 636:619 450:158 14 15 4:140 1:069 2 3 31:598 18:243 19 20 3:883 0:868 3 4 12:941 6:470 24 25 3:725 0:745 4 5 8:610 3:851 29 30 3:659 0:668 5 6 6:859 2:800 30 41 3:646 0:655 6 7 5:959 2:252 40 41 3:551 0:555 7 8 5:405 1:911 60 61 3:460 0:443 8 9 5:041 1:680 120 121 3:373 0:307 9 10 4:781 1:512 1 1 3:291 0 p Since Student’s pdf depends on p D n 1, we have tabulated c1˛=2 = n for the p confidence interval we are going to construct. The Student’s random variable t D n.b /=b is solved for , evidently our motivation to introduce the confidence interval b 1 < < b 2 tD
p b b b )b D p t ) Db n p t b n n
b b p c1˛=2 < < b C p c1˛=2 DW b 2 : b 1 WD b n n The interval b 1 < < b 2 for the fixed value t D c1˛=2 contains the “true” mean with probability D 1 ˛.
b b P b p c1˛=2 < < b C p c1˛=2 n n ZCc D f .t/dt D D 1 ˛ c
because of cZ 1˛=2
Zc f .t/dt D 1
f .t/dt D 1 1
˛ : 2
B-5 Sampling from the Gauss–Laplace Normal Distribution: A Third Confidence Fig. B.9 Two-sided confidence interval 1 ; b 2 Œ, Student’s pdf 2b for p D 3 degrees of freedom 1 Dp .n D 4/ b b b c1˛=2 = n, p b 2 D b Cb c1˛=2 = n
f(t)
g = 1–a
a/2
t
−c1−a / 2 mˆ – sˆ c1−a /2 / n
705
a/2
+c1−a / 2
mˆ + sˆ c1−a /2 / n
m
mˆ 1
mˆ 2
Fig. B.10 Two-sided confidence interval for the “true” mean , quantile c1˛=2
Figures B.9 and B.10 illustrate the coefficient of confidence and the probability function of a confidence interval. Let us specify all the integrals to our example. cZ 1˛=2
f .t/dt D 1 ˛=2 1 cZ0:975
cZ 0:999;5
cZ0:995
f .t/dt D 0:975; 1
f .t/dt D 0:995; 1
f .t/dt D0:999; 5: 1
These data substituted into Tables B.12–B.14 lead to the triplet of confidence intervals for the “true” mean. case .a/ W D 0:95; p D 3;
˛ D 0:05;
n D 4;
1 ˛=2 D 0:975 p c1˛=2 = n D 1:59
Pf2:7 2:3 1:59 < < 2:7 C 2:3 1:59g D 0:95 Pf0:957 < < 6:357g D 0:95: case .b/ W D 0:99; p D 3;
˛ D 0:01;
n D 4;
1 ˛=2 D 0:995 p c1˛=2 = n D 2:92
706
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
Pf2:7 2:3 2:92 < < 2:7 C 2:3 2:92g D 0:99 Pf4:016 < < 9:416g D 0:99: case .c/ W D 0:999;
˛ D 0:001;
1 ˛=2 D 0:999; 5
p p D 3; n D 4; c1˛=2 = n D 6:470 Pf2:7 2:3 6:470 < < 2:7 C 2:3 6:470g D 0:999 Pf12:181 < < 17:581g D 0:999: The results may be summarized as follows: With probability 95% the “true” mean is an element of the interval 0:957, C6:357Œ. In contrast, with probability 99% the “true” mean is an element of the larger interval 4:016, C9:416Œ. Finally, with probability 99.9is an element of the largest interval 12:181, C17:581Œ. If we compare the confidence intervals for the mean , 2 known, versus 2 unknown, we realize much larger intervals for the second model. Such a result is not too much a surprise, since the model “ 2 unknown” is much weaker than the model “ 2 known”. Table B.15 (Flow chart: Confidence interval for the mean , 2 unknown): g
Choose a coefficient of confidence g according toTable B11.
Solve the linear Volterra integral equation of the first kind F(c1−a / 2) = 1−a / 2 = (1 + g ) / 2 by Table B12, Table B13 or Table B14 for a Student pdf with p = n−1 degrees of freedom: read c1−α / 2 / n.
ˆ Compute the sample mean mˆ and the sample standard deviation s. ˆ 1−a / 2 / n and m−sc ˆ ˆ 1−a / 2 / n, mˆ + sc ˆ 1−a / 2 / n. Compute sc
B-5 Sampling from the Gauss–Laplace Normal Distribution: A Third Confidence
707
B-53 The Uncertainty Principle Figure B.11 is the graph of the function p .nI ˛/ WD 2b c1˛=2 = n; where is the length of the confidence interval of the “true” mean ; 2 unknown. The independent variable of the function is the number of observations n. The function .nI ˛/ is plotted for fixed values of the coefficient of complementary confidence ˛, namely ˛ D 10%; 5%; 1%: For reasons given later the coefficient of complementary confidence ˛ is called uncertainty number. The graph function .nI ˛/ illustrates two important facts. Fact #1: For a constant uncertainty number ˛, the length of the confidence interval is smaller the larger the number of observations n is chosen. Fact #2: For a constant number of observations n, the smaller the number of uncertainly ˛ is chosen, the larger is the confidence interval . Evidently, the diversive influences of (a) the length of the confidence interval ; (b) the uncertainly number ˛ and (c) the number of observations n, which we collected in The Magic Triangle of Figure B.12, constitute The Uncertainly Principle, in formula .n 1/ W k˛ ;
Fig. B.11 Length of the confidence interval for the mean against the number of observations
708
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions length of the confidence interval
number of observations n
uncertainty number a
Fig. B.12 The Magic Triangle, constituents: (a) uncertainty number ˛ (b), number of observations n, (c) length of the confidence interval
where k˛ is called the quantum number for the mean which depends on the uncertainty number ˛. Table B.15.2 is a list of those quantum numbers. Let us interpret the uncertainty relation .n 1/ k˛ . The product .n 1/ defines geometrically a hyperbola which we approximated out of the graph of Fig. B.12. Given the uncertainty number ˛, the product .n 1/ has a smallest number, here denoted by k˛ . For instance, choose ˛ D 1% such that k˛ =.n 1/ or 16:4=.n1/ . For n taken values 2, 11, 101, we get the inequalities 8:2 , 1:64 , 0:164 . Table B.15.2 (Coefficient of complementary confidence ˛, uncertainty number ˛, versus quantum number of the mean k˛ (Grafarend, 1970)): ˛ k˛ 10% 6:6 5% 9:6 1% 16:4 !0 !1
B-6 Sampling from the Gauss–Laplace Normal Distribution: A Fourth Confidence Interval for the Variance Theorem B.10 already supplied us with the sampling distribution of the sample variance, namely with the probability density function f .x/ of Helmert’s random variable x D .n 1/b 2 = 2 of Gauss-Laplace i.i.d. observations n of sample vari2 ance b , BIQUUE of 2 . B61 introduces accordingly the so far missing confidence interval for the “true” variance 2 . Lemma B.15 contains the details, followed by Example B.16. Tables B.17–B.19 contain the properly chosen coefficients of complementary confidence and their quantiles c1 .pI ˛=2/; c2 .pI 1 ˛=2/;
B-6 Sampling from the Gauss–Laplace Normal Distribution: A Fourth
709
dependent on the “degrees of freedom” p D n 1. Table B.20 as a flow chart summarizes various steps in computing a confidence interval for the variance 2 . B62 reviews The Uncertainty Principle which is built on (a) the coefficient of complementary confidence ˛, also called uncertainty number, (b) the number of observations n and (c) the length 2 .nI ˛/ of the confidence interval for the “true” variance 2 .
B-61 The Confidence Interval for the Variance Lemma B.15 summaries the construction of a two-sided confidence interval for the “true” variance based upon Helmert’s Chi Square distribution of the random variable .n1/b 2 = 2 where b 2 is BIQUUE of 2 . Example B.18 introduces a random sample of size n D 100 with an empirical variance b 2 D 20:6. Based upon coefficients of confidence and complementary confidence given in Table B.14, related confidence intervals are computed. The associated quantiles for Helmert’s Chi Square distribution are tabulated in Table B.15 . D 0:95/, Table B.18 . D 0:99/ and Table B.19 . D 0:998/: Finally, Table B.20 as a flow chart pares the way for the “fast computation” of the confidence interval for the “true” variance 2 . Lemma B.16. (confidence interval for the variance): The random variable x D .n 1/b 2 = 2 , also called Helmert’s 2 , characterized by the ration of the sample variance b 2 BIQUUE of 2 , and the “true” variance 2 2 , has the ' pdf of p D n 1 degrees of freedom, if the random observations yi ; i 2 f1; : : : ng are Gauss-Laplace i.i.d. The “true” variance 2 is an element of the two-sided confidence interval 2 2
.n 1/b 2 .n 1/b 2 ; Œ c2 .pI 1 ˛=2/ c1 .pI ˛=2/
with confidence P
.n 1/b 2 .n 1/b 2 < 2 < c2 .pI 1 ˛=2/ c1 .pI ˛=2/
D 1˛ D
of level 1 ˛. Tables D.8, D.18 and D.16 list the quantiles c1 .pI 1 ˛=2/ and c2 .pI ˛=2/ associated to three values of Table B.19 of the coefficient of complementary confidence 1 ˛. In order to make yourself more familiar with Helmert’s Chi Square distribution we recommend to solve the problems of Exercise B.1.
710
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
Exercise B.1. (Helmert’s Chi Square 2' distribution): 2 = 2 has the non-symmetric Helmert’s random variable x WD .n 1/b 2 = 2 D pb 2
' pdf. Prove that the first four central moments are, (a) (b) (c) (d)
1
2
3
4
D 0, Efxg D D p D x2 D 2p p D .2p/3=2 .8=p/1=4 (coefficient of skewness 32 = 22 D 8=p/ D 6 4 .1 C 2p/ (coefficient of curtosis 4 = 22 3 D 3 C 12=p/.
Guide .a/
.b/
3 p 2 Efb g 5 ) Efxg D p 2 2 Efb g D 2 .unbiasedness/ 3 2 2 4 D 4 Dfb 2g D n1 p 7 7 ) Dfxg D 2p: 5 2 p 2 Dfxg D 4 Dfb g Efxg D
Example B.18. (confidence interval for the sample variance b 2 ): Suppose that a random sample .y1 ; : : : yn / 2 Y of size n D 100 has led to an empirical variance b 2 D 20:6. xD .n1/b 2 = 2 or 99 20:6= 2 D 2039:4= 2 has Helmert’s pdf with p D n 1 D 99 degrees of freedom. The probability D 1 ˛ that x will be between c1 .pI ˛=2/ and c2 .pI 1 ˛=2/ is Zc2 Pfc1 .pI ˛=2/ < x < c2 .pI 1 ˛=2/g D
f .x/dx D D 1 ˛ c1
Zc2 Pfc1 .pI ˛=2/ < x < c2 .pI 1 ˛=2/g D
Zc1 f .x/dx
0
f .x/dx D D 1 ˛ 0
Zc2 f .x/dx D F.c2 / D 1 ˛=2 D .1 C /=2; 0
Zc1 f .x/dx D F.c1 / D ˛=2 D .1 /=2 0
Pfc1 .pI ˛=2/ < x < c2 .pI 1 ˛=2/g D F.c2 / F.c1 / D .1 C /=2 .1 /=2 D c1 .pI ˛=2/ < x < c2 .pI 1 ˛=2/ , c1 .pI ˛=2/ < .n 1/
b 2 < c2 .pI 1 ˛=2/ 2
B-6 Sampling from the Gauss–Laplace Normal Distribution: A Fourth
711
or 2 1 .n 1/b 2 .n 1/b 2 1 2 < < , < < c2 .n 1/b 2 c1 c2 c1 2 2 .n 1/b .n 1/b D 1 ˛ D : P < 2 < c2 .pI 1 ˛=2/ c1 .pI ˛=2/ Since Helmert’s pdf f .xI p/ is now-symmetric there arises the question how to distribute the confidence or the complementary confidence ˛ D 1 on the confidence interval limits c1 and c2 , respectively. If we setup F.c1 / D ˛=2, we define a cumulative probability half of the complementary confidence. If F.c2 / F.c1 / D Pfc1 .pI ˛=2/ < x < c2 .pI 1 ˛=2/g D 1 ˛ D is the cumulative probability contained in the interval c1 < x < c2 , we derive F.c2 / D 1 ˛=2. Accordingly c1 .pI ˛=2/ < x < c2 .pI 1 ˛=2/ is the confidence interval based upon the quantile c1 with cumulative probability ˛=2 and the quantile c2 with cumulative probability 1 ˛=2. The four representations of the cumulative probability of the confidence inter-val c1 < x < c2 establish two linear Volterra integral equations of the first kind Zc1
Zc2 f .x/dx D ˛=2 and
0
f .x/dx D 1 ˛=2; 0
dependent on the degree of freedom p D n 1 of Helmert’s pdf f .x; p/. As soon as we have established the confidence interval c1 .pI ˛=2/ < x < c2 .pI 1 ˛=2/ for Helmert’s random variable x D .n1/b 2 = 2 D pb 2 = 2 , we are left with the problem of how to generate a confidence interval for the “true” variance 2 , the sample variance b 2 given. If we take the reciprocal interval c21 < x 1 < c11 for Helmert’s inverse random variable 1=x D 2 =Œ.n 1/b 2 , we are able to multiply 2 both sides by .n 1/b . In summary, a confidence interval which corresponds to c1 < x < c2 is .n 1/b 2 =c2 < 2 < .n 1/b 2 =c1 . Three values of the coefficient of confidence or of complementary confidence ˛ D 1 which are most popular we list in Table B.16. Table B.16 (Values of the coefficient of confidence ,c1 .pI ˛=2/ and c2 .pI 1 ˛=2/): ˛ ˛=2 ˛=2 1 ˛=2
0:950 0:050 0:025 0:975
0:990 0:010 0:005 0:995
0:998 0:002 0:001 0:999
712
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
In solving the linear Volterra integral equations of the first kind Zc1
Zc2 f .x/dx D F.c1 / D ˛=2 and
0
f .x/dx DF.c2 / D 1 ˛=2; 0
which depend on the degrees of freedom p D n 1, Tables B.17–B.19 collect the quantiles c1 .pI ˛=2/ and c2 .pI 1 ˛=2/ for given values p and ˛. Table B.17 (Quantiles c1 .pI ˛=2/, c2 .pI 1 ˛=2/ for the confidence interval of the Helmert random variable with p D n 1 degrees of freedom): ˛=2 D 0:025; p 1 2 3 4 5 6 7 8 9
1 ˛=2 D 0:975;
n c1 .pI ˛=2/ c2 .pI 1 ˛=2/ 2 0:000 5:02 3 0:506 7:38 4 0:216 9:35 5 0:484 11:1 6 0:831 12:8 7 1:24 14:4 8 1:69 16:0 9 2:18 17:5 10 2:70 19:0
˛ D 0:05;
D 0:95
p n c1 .pI ˛=2/ c2 .pI 1 ˛=2/ 14 15 5:63 26:1 19 20 8:91 32:9 24 25 12:4 39:4 29 30 16:0 45:7 39 40 23:7 58:1 49 50 31:6 70:2 99 100 73:4 128
Table B.18 (Quantiles c1 .pI ˛=2/, c2 .pI 1 ˛=2/ for the confidence interval of the Helmert random variable with p D n 1 degrees of freedom): ˛=2 D 0:005; p 1 2 3 4
n 2 3 4 5
5 6 7
6 7 8
1 ˛=2 D 0:995;
˛ D 0:01;
D 0:99
c1 .pI ˛=2/ c2 .pI 1 ˛=2/ p n c1 .pI ˛=2/ c2 .pI 1 ˛=2/ 0:000 7:88 8 9 1:34 22:0 0:010 9 10 1:73 23:6 0:072 0:207 14 15 4:07 31:3 19 20 6:84 38:6 0:412 24 25 9:89 45:6 0:676 29 30 13:1 52:3 0:989 39 40 20 65:5 49 50 27:2 78:2 99 100 66:5 139
B-6 Sampling from the Gauss–Laplace Normal Distribution: A Fourth
713
0.5
f(x)
0
c1(p;a / 2)
c2 (p;1−a / 2)
Fig. B.13 Two-sided confidence interval 2 2p 2 =c2 ; p 2 =c1 Œ Helmert’s pdf, f(x)
Table B.19 (Quantiles c1 .pI ˛=2/, c2 .pI 1 ˛=2/ for the confidence interval of the Helmert random variable with p D n 1 degrees of freedom): ˛=2 D 0:001; p 1 2 3 4 5 6 7 8
1 ˛=2 D 0:999;
n c1 .pI ˛=2/ c2 .pI 1 ˛=2/ 2 0:00 10:83 3 0:00 13:82 4 0:02 16:27 5 0:09 18:47 6 0:21 20:52 7 0:38 22:46 8 0:60 24:32 9 0:86 26:13
˛ D 0:002;
D 0:998
p n c1 .pI ˛=2/ c2 .pI 1 ˛=2/ 9 10 1:15 27:88 14 15 3:04 36:12 19 20 5:41 43:82 24 25 8:1 51:2 29 30 11:0 58:3 50 51 24:7 86:7 70 71 39:0 112:3 99 100 60:3 147:4 100 101 61:9 149:4
Those data collected in Tables B.17–B.19 lead to the triplet of confidence intervals for the “true” variance case .a/ W D 0:95;
˛ D 0:05;
p D 99;
c1 .pI ˛=2/ D 73:4;
n D 100;
˛=2 D 0:025;
1 ˛=2 D 0:975
c2 .pI 1 ˛=2/ D 128
99 20:6 99 20:6 < 2 < P 128 73:4
D 0:95
Pf15:9 < 2 < 27:8g D 0:95:
714
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
Fig. B.14 Two-sided confidence interval for the “true” variance 2 , quantiles c1 .pI ˛=2/ and c2 .pI 1 ˛=2/
case .b/ W D 0:99;
˛ D 0:01;
p D 99;
c1 .pI ˛=2/ D 66:5;
n D 100; P
˛=2 D 0:005;
1 ˛=2 D 0:995
c2 .pI 1 ˛=2/ D 139
99 20:6 99 20:6 < 2 < 139 66:5
D 0:99
Pf14:7 < 2 < 30:7g D 0:99: case .c/ W D 0:998; p D 99;
n D 100; P
˛ D 0:002;
˛=2 D 0:001;
c1 .pI ˛=2/ D 60:3;
1 ˛=2 D 0:999
c2 .pI 1 ˛=2/ D 147:3
99 20:6 99 20:6 < 2 < 147:3 60:3
D 0:996
Pf13:8 < 2 < 33:8g D 0:998: The results can be summarized as follows. With probability 95%, the “true” variance 2 is an element of the interval [15.9, 27.8]. In contrast, with probability 99%, the “true” variance is an element of the larger interval [14.7, 30.7]. Finally, with probability 99.8% the “true” variance is an element of the largest interval [13.8, 33.8]. If we compare the confidence intervals for the variance 2 , we realize much larger intervals for smaller complementary confidence namely 5%, 1% and 0.2%. Such a result is subject of The Uncertainty Principle.
B-6 Sampling from the Gauss–Laplace Normal Distribution: A Fourth
715
Table B.20 (Flow chart – confidence interval for the variance 2 ): Choose a coefficient of confidence g according to Table B16. Solve thelinear Volterra integral equation of the first kind F(c1(p;a /2)) – a / 2, F(c2(p;1−a / 2))−1−a / 2 by Table B17, Table B18 or Table B19 for a Helmert pdf with p = n−1 degrees of freedom.
Compute the sample variance sˆ 2and the quantiles s2 ∈ ]
(n−1)sˆ 2
(n−1)sˆ 2
c2(p;a / 2)
c1(p;a / 2)
[.
B-62 The Uncertainty Principle Figure B.15 is the graph of the function 2 .nI ˛/ WD .n 1/b 2
1 1 ; c1 .n 1I ˛=2/ c2 .n 1I 1 ˛=2/
where 2 is the length of the confidence interval of the “true” variance 2 . The independent variable of the functions is the number of observations n. The function 2 .nI ˛/ is plotted for fixed values of the coefficient of complementary confidence ˛, namely ˛ D 5%, 1%, 0.2%. For reasons given later on the coefficient of complementary confidence ˛ is called uncertainty number. The graph of the function 2 .nI ˛/ illustrates two important facts. • Fact #1: For a contrast uncertainty number ˛; the length of the confidence interval 2 is smaller, the larger number of observations n is chosen. • Fact #2: For a contrast number of observations n, the smaller the number of uncertainty ˛ is chosen, the larger is the confidence interval 2 . Evidently, the divisive influences of (a) the length of the confidence interval 2 , (b) the uncertainty number ˛ and (c) the number of observations n, which we collect in The Magic Triangle of Fig. B.16, constitute The Uncertainty Principle, formulated by the inequality 2 .n 1/ k 2 ˛ ;
716
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
Fig. B.15 Length of the confidence interval for the variance against the number of observations length of the confidence interval
number of observations n
uncertainty number a
Fig. B.16 The magic triangle, constituents: (a) uncertainty number ˛ (b), number of observations n, (c) length of the confidence interval 2
where k 2 ˛ is called quantum number for the variance 2 . The quantum number depends on the uncertainty number ˛. Let us interpret the uncertainty relation 2 .n 1/ k 2 ˛ . The product 2 .n 1/ defines geometrically a hyperbola which we approximated to the graph of Fig. B.15. Given the uncertainty number ˛, the product 2 .n 1/ has a smallest number denoted by k 2 ˛ . For instance, choose ˛ D 1% such that k 2 ˛ =.n 1/ 2 or 42=.n 1/ 2 . For the number of observations n, for instance n D 2; 11; 101, we find the inequalities 42 2 ; 4:2 2 ; 0:42 2 . Table B.21 (Coefficient of complementary confidence ˛, uncertainty number ˛, versus quantum number of the mean k 2 ˛ (Grafarend, 1970): ˛ k 2 ˛ 10% 19:5 5% 25:9 1% 42:0 !0 !1 At the end, pay attention to the quantum numbers k 2 ˛ listed in Table B.21.
B-7 Sampling from the Multidimensional Gauss–Laplace Normal Distribution
717
B-7 Sampling from the Multidimensional Gauss–Laplace Normal Distribution: The Confidence Region for the Fixed Parameters in the Linear Gauss–Markov Model
Example B.19. (special linear Gauss–Markov model, marginal probability distributions): For a simple linear Gauss–Markov model Efyg D A, A 2 Rnm , rkA D m, namely n D 3, m D 2, Dfyg D I 2 of Gauss-Laplace i.i.d. observations, we are going to compute • The sample pdf of b BLUUE of • The sample pdf of b 2 BIQUUE of 2 . We follow the action within seven items. First, we identify the pdf of Gauss-Laplace BLUUE of i.i.d. observations y 2 Rn . Second, we review the estimations of b as well as b 2 BIQUUE of 2 . Third, we decompose the Euclidean norm of the observation space ky Efygk2 into the Euclidean norms ky Ab k2 and kb k2 . 0 Fourth, we present the eigenspace A A analysis and the eigenspace synthesis of the associated matrices M WD In A.A0 A/1 A0 and N WD A0 A within the Eulidean k2A0 A D .b /0 N.b /, respectively. norms ky Ak2 D y0 My and kb The eigenspace representation leads us to canonical random variables .z1 ; : : : ; znm / relating to the norm ky Ak2 D y0 My and .znmC1 ; : : : ; zn / relating to the norm kb k2N which are standard Gauss-Laplace normal. Fifth, we derive the cumulative probability of Helmert’s random variable x WD .n rkA/b 2 = 2 D .n m/b 2 = 2 b and of the unknown parameter vector BLUUE of or its canonical counterpart b BLUUE of , multivariate Gauss-Laplace normal. Action six generates the marginal pdf of b or b , both BLUUE of or , respectively. Finally, action seven leads us to Helmert’s Chi Square distribution 2p with p D n rkA D n m (here: n m D 1) “degrees of freedom”. The first action item Let us assume an experiment of three Gauss-Laplace i.i.d. observations Œy1 ; y2 ; y3 0 D y which constitute the coordinates of the observation space Y, dim The observationsyi , i 2 f1; 2; 3g are related to a parameter space with coordinates Œ1 ; 2 D in the sense to generate a straight line yfkg D 1 C 2 k. The fixed effects j ; j 2 f1; 2g, define geometrically a straight line, statistically a special linear Gauss–Markov model of one variance component, namely
718
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
the first moment 82 39 2 2 3 3 10 1 k1 < y1 = Efyg D A E 4 y2 5 D 4 1 k2 5 1 D 4 1 1 5 1 ; : ; 2 2 y3 1 k3 12
rkA D 2:
The central second moment 82 39 2 3 100 < y1 = Dfyg D In 2 Dfyg D D 4 y2 5 D 4 0 1 0 5 2 ; ; : 001 y3
2 < 0:
k represents the abscissa as a fixed random, y the ordinate as the observation, naturally a random effect. Samples of the straight line are taken at k1 D 0, k2 D 1, k3 D 2, this calling for y.k1 / D y.0/ D y1 , y.k2 / D y.1/ D y2 , y.k3 / D y.2/ D y3 , respectively. Efyg is a consistent equation. Alternatively, we may say Efyg 2 R.A/. The matrix A 2 R32 is rank deficient by p D nrkA D 1, also called “degree of freedom”. The dispersion matrix Dfyg, the central moment of second order, is represented as a linear model, too, namely by one-variance component 2 . The joint probability function of the three Gauss-Laplace i.i.d. observations dF D f .y1 ; y2 ; y3 /dy1 dy2 dy3 D f .y1 /f .y2 /f .y3 /dy1 dy2 dy3 1 f .y1 ; y2 ; y3 / D .2 /3=2 3 exp 2 .y Efyg/0 .y Efyg/ 2 will be transformed by means of the special linear Gauss–Markov model with onevariance component. The second action item For such a transformation, we need b BLUUE of and b 2 BIQUUE 2 . 2
b D .A0 A/1 A0 y; 6 6 6 A0 A D 3 3 ; .A0 A/1 D 1 5 3 ; 6 35 6 3 3 b BLUUE of W 6 2 3 6 y1 6 6 1 5 2 1 4 5 4b D y2 ; 6 3 0 3 y3 2 1 .y A/0 .y A/ b 2 D 6 n rkA 6 6 1 0 0 1 0 b 2 BIQUUE of 2 W 6 6 D n rkA y .In A.A A/ A /y; 6 4 1 b 2 D .y12 4y1 y2 C 2y1 y3 C 4y22 4y2 y3 C y32 /: 6
B-7 Sampling from the Multidimensional Gauss–Laplace Normal Distribution
719
The third action item The quadratic form .y Efyg/0 .y Efyg/ allows the fundamental decomposition y Efyg D y A D y A C A.b / .y Efyg/0.y Efyg/ D .y A/0 .y A/ D .y A/0 .y A/ C .b /0 A0 A.b / .y Efyg/0 .y Efyg/ D .n rkA/b 2 C .b /0 A0 A.b / 33 b . /: .y Efyg/0 .y Efyg/ D b 2 C .b /0 35 The fourth action item 2C In order to bring the quadratic form .y Efyg/0.y Efyg/ D .n rkA/b 0 0 b b C. / A A. / into a canonical form, we introduce the generalised forward and backward Helmert transformation HH0 D In z D 1 H.y Efyg/ D 1 H.y A/
and
y Efyg D H0 z .y Efyg/0 .y Efyg/ D 2 z0 HH0 z D 2 z0 z 1 .y Efyg/0.y Efyg/ D z0 z D z21 C z22 C z23 : 2 How to relate the sample variance b 2 and the sample quadratic form .b 0 0 / to the canonical quadratic form z0 z? / A A.b Previously, for the example of direct observations in the special linear Gauss– Markov model Efyg D 1, Dfyg D In 2 we succeeded to relate z21 C C z2n1 to b 2 and z2n to .b /2 . Here the sample variance b 2 , BIQUUE 2 , as well as the quadratic form of the deviate of the sample parameter vector b from the “true” parameter vector have been represented by b 2 D
1 1 .y A/0 .y A/ D y0 ŒIn A.A0 A/1 A0 y n rkA n rkA rkŒIn A.A0 A/1 A0 D n rkA D n m D 1 versus .b /0 A0 A.b /; rk.A0 A/ D rkA D m:
720
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
The eigenspace of the matrices M and N, namely M WD In A.A0 A/1 A 2 3 1 2 1 1 D 4 2 4 2 5 6 1 2 1
and N WD A0 A D
33 ; 35
will be analyzed. The eigenspace analysis of the
The eigenspace analysis of the
matrix M j 2 f1; : : : ; ng
matrices N; N 1 i 2 f1; : : : ; mg
V0 MV
U0 NU D Diag.1 ; : : : ; m / D N
D
"
V0 1
#
V0 2
M ŒV1 ; V2
U0 N1 U D Diag.1 ; : : : ; m / D N1 1 ; : : : ; m D 1 1 1 D ; : : : ; m D 1 rkN D rkN1 D m 1 D
D Diag.1 ; : : : ; nm ; 0; : : : ; 0/ D M rkM D n m
Orthonormality of the eigencolumns V0 1 V1 D Inm ;
0
v01 v1 D 1 v01 v2 D 0 ::: 0 vn1 vn D 0 v0n vn D 1 V 2 SO.n/ W eigencolumns W .M j In /vj D 0 eigenvalues jM j In j D 0 in particular
Orthonormality of the eigencolumns
V0 2 V2 D I m
V0 1 V2 D 0 2 R.nm/m 3 3 2 h i V 0 I 4 1 5 V V D 4 nm 5 1 2 V0 2 0 Im 2
1 m 1 m
U0 U D I m
u01 u1 D 1 u01 u2 D 0 ::: 0 um1 um D 0 u0m um D 1 U 2 SO.m/ W eigencolumns W .N j In /uj D 0 eige nvalues jN i Im j D 0
B-7 Sampling from the Multidimensional Gauss–Laplace Normal Distribution
eigenspace analysis of the matrix M;
721
eigenspace analysis of the matrix N;
rkM D n m; M 2 R33 ; A 2 R32 ; rkM D 1
rkN D m; N 2 R22 ; rkN D 2
M D Diag.1; 0; 0/ 3 2 0:4082 0:7024 0:5830 7 6 7 V D ŒV1 ; V2 D 6 4 0:8165 0:5667 0:1109 5 0:4082 0:4307 0:8049
N D Diag.0:8377; 7:1623/ " # 0:8112 0:5847 UD 0:5847 0:8112 U 2 R22
V1 2 R31 V2 2 R32
to be completed by eigenspace synthesis of the matrix M M D VM V0 " M D ŒV1 ; V2 M
V0 1 V0
eigenspace synthesis of the matrix N; N 1
#
N D UN U0 ;
2
"
D ŒV1 ; V2 Diag.1 ; : : : ; nm ; 0; : : : ; 0/ M D V1 Diag.1 ; : : : ; nm
/V0
V0 1 V
0
#
N1 D UN1 U0
N D U Diag.1 ; : : : ; m /U0 N 1 D U Diag.1 ; : : : ; m /U0
2
1
in particular 2
"
3
0:4082 h i 6 7 M D 4 0:8165 5 1 0:4082 0:8165 0:4082 0:4082 1 D 1
versus
N1 D
0:8112 0:5847
#"
0:5847 0:8112 # ND " 0:8112 0:5847
1 0
#
0 2
0:5847 0:8112 1 D 0:8377;
2 D 7:1623
0:8112 0:5847 0:5847 0:8112
1 0 0 2
0:8112 0:5847 0:5847 0:8112
1 D 1 versus 1 D 11 D 1:1937;
2 D 21 D 0:1396:
The non-vanishing eigenvalues of the matrix M have been denoted by .1 ; : : : ; nm /, m eigenvalues are zero such that eigen, 0; : : : ; 0/. The eigenvalues of the regular matrix N span eigen .N/ D .1 ; : : : ; m /. Since the dispersion matrix Dfb g D .A0 A/1 2 D N1 2 is generated by the inverse of the matrix A0 A D N, we have computed, in addition, the eigenvalues of the matrix N1 by means of eige n.N1 / D .1 ; : : : ; m /. The eigenvalues of N and N1 , respectively, are related by 1 1 D 1 1 ; : : : ; m D m
or
1 D 11 ; : : : ; m D m1 :
722
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
In the example, the matrix M had only one non-vanishing eigenvalue 1 D 1. In contrast, the regular matrix N was characterized by two eigenvalues 1 D 0:8377 and 2 D 7:1623, its inverse matrix N1 by 1 D 1:1937 and 2 D 0:1396. The two quadratic forms, namely y0 My D y0 VM V0 y
versus .b /0 N.b / D .b /0 UM U0 .b /;
build up the original quadratic form 1 1 1 .y Efyg/0 .y Efyg/ D 2 y0 My C 2 .b /0 N.b / 2 1 1 D 2 y0 VM V0 y C 2 .b /0 UN U0 .b / in terms of the canonical random variables V0 y D y , y D Vy
and U0 .b / D b
such that 1 1 1 .y Efyg/0 .y Efyg/ D 2 .y /0 M y C 2 .b /N .b / 2 nm m 1 X 2 1 X 1 0 .y Efyg/ .y Efyg/ D .y / C .b i /2 i j 2 2 j D1 j 2 i D1 i 1 .y Efyg/0 .y Efyg/ D z21 C C z2nm C z2nmC1 C C z2n : 2 The quadratic form z0 z splits up into two terms, namely z21 C C z2nm D
nm 1 X 2 .y / j 2 j D1 j
z2nmC1 C C z2n and
D
m 1 X .b i i /2 j 2 i D1
here 1 2 b 2 y D and 1 2 2 1 z22 C z23 D 2 Œ.b 1 1 /2 1 C .b 2 2 /2 2 ; or 1 z21 D .y 2 4y1 y2 C 2y1 y3 C 4y22 4y2 y3 C y32 / 6 2 1 z21 D
and
B-7 Sampling from the Multidimensional Gauss–Laplace Normal Distribution
z22
C
z23
723
1 1 b 2 2 0 3 3 b . /: D 2 Œ0:8377.b 1 1 / C 7:1623.b 2 2 / D 2 . / 35
The fifth action item We are left with the problem to transform the cumulative probability dF D f .y1 ; y2 ; y3 /dy1 dy2 dy3 into the canonical form dF D f .z1 ; z2 ; z3 /d z1 d z2 d z3 . Here we take advantage of Corollary B.3. First, we introduce Helmert’s random variable x WD z21 and the random variables b 1 and b 2 of the unknown parameter vector b of fixed 0 0 effects .b / A A.b / D z22 C z23 D kzk2A0 A if we denote z WD Œz2 ; z3 0 . d z1 d z2 D
p p det A0 A db 1 db 2 D 6 db 1 db 2;
according to Corollary B.3 is special representation of the surface element by means of the matrix of the metric A0 A. In summary, the volume element dx p det A0 A db 1 db 2 d z1 d z2 d z3 D p x leads to the first canonical representation of the cumulative probability 1 dx jA0 Aj1=2 1 1 b 0 0 b dF D p exp x p exp . / A A. / db 1 db 2: 2 2 2 x 2 2 2
The left pdf establishes Helmert’s pdf of x D z21 D b 2 = 2 , dx D 2 db 2 . In contrast, the right pdf characterizes the bivariate Gauss-Laplace pdf of kb k2 . 0 Unfortunately, the bivariate Gauss-Laplace A A normal pdf is not given in the canonical form. Therefore, second we do correlate kb k2A0 A by means of eigenspace synthesis. 1 1 A A D UA0 A U D U Diag.1 ; 2 /U D U Diag U0 ; 1 2 1 1 U0 .b kb k2A0 A WD .b /0 A0 A.b / D .b /0 U Diag ; / 1 2 0
jA0 Aj1=2 D
0
0
1 p 1 2 D p 1 2
b WD U0 .b / , b D U.b / 1 1 2 0 b .b / D kb k2D / Diag ; k kA0 A D .b 1 2
724
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
2 0 3 3 b b .b / k kA0 A D . / 35 2 1 3 0 6 7 D .b /0 4 1:1937 / D kb k2D : 5 .b 1 0 0:1396 By means of the canonical variables b D U0b we derive the cumulative probability 1 1 1 1 dF D p p exp x dx p 2 2 2 1 2 2 x 1 .b 2 2 /2 .b 1 1 /2 db 1 db exp 2 C 2 2 1 2
or
dF D f .x/f .b 1 /f .b 2 /dxdb 1 db 2 : Third, we prepare ourselves for the cumulative probability dF D f .z1 ; z2 ; z3 /d z1 d z2 d z3 . We depart from the representation of the volume element dx p det A0 Adb 1 db 2 d z1 d z2 d z3 D p 2 x subject to x D z21 D
1 2 1 b D 2 y0 My 2
b D .A0 A/1 A0 y: The Helmert random variable is a quadratic form of the coordinates .y1 ; y2 ; y3 / of the observation vector y. In contrast, b BLUUE of is a linear form of the coordinates .y1 ; y2 ; y3 / of observation vector y. The transformation of the volume element dxdb 1 db 2 D jJx jdy1 dy2 dy3 is based upon the Jacobi matrix Jx .y1 ; y2 ; y3 / 3 " # D1 x D2 x D3 x 0 a 13 7 6 b Jx D 4 D1 1 D2b 1 D3b 1 5 D 23 .A0 A/1 A0 2 D2b 2 D3b 2 D1b 2
B-7 Sampling from the Multidimensional Gauss–Laplace Normal Distribution
725
2
2 3 3 D1 x y1 2y2 C y3 1 2 4 2y1 C 4y2 2y3 5 a D 4 D2 x 5 D 2 My D 3 2 D3 x y1 2y2 C y3 1 0 1 y My D .y 2 4y1 y2 C 2y1 y3 C 4y22 4y2 y3 C y32 / 2 6 2 1 1 5 2 1 .A0 A/1 A0 D 6 3 0 3 xD
2 3 2y 4y2 C 2y3 4y1 C 8y2 4y3 2y1 4y2 C 2y3 1 4 1 5 Jx D 5 2 1 6 2 3 0 3 det Jx D
p det.A0 A/1 detŒa0 a a0 A.A0 A/1 A0 a p
det Jx D
detŒa0 a a0 A.A0 A/1 A0 a p det.A0 A/
4 0 0 4 y M My D 4 y0 My 4 4 a0 A.A0 A/1 A0 a D 4 y0 M0 A.A0 A/1 A0 My a0 a D
4 0 y ŒI3 A.A0 A/1 A0 A.A0 A/1 A0 ŒI3 A.A0 A/1 A0 y D 0 4 p 2 y0 My det Jx D p 2 det.A0 A/ D
j det Jy j D j det Jx j
1
2 D 2
p det.A0 A/ : p 0 y My
The various representations of the Jacobian will finally lead us to the special form of the volume element p 0 1 y My 1 dx db 1 db 2 D 2 0 1=2 dy1 dy2 dy3 2 jA Aj and the cumulative probability
726
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
1 1 2 2 2 dF D 3 exp .z1 C z2 C z3 / d z1 d z2 d z3 .2 /3=2 2 1 1 1 b dx jA0 Aj1=2 0 0 b D p exp x p exp . / A A. / db 1 db 2 2 2 2 x 2 2 2
1 1 0 D 3 exp .y Efyg/ .y Efyg/ dy1 dy2 dy3 : .2 /3=2 2 The sixth action item The first target is to generate the marginal pdf of the unknown parameter vector b , BLUUE of . Z1 1 b 1 1 dx jA0 Aj1=2 0 0 b dF1 D p exp x p exp 2 . / A A. / db 1 db 2 2 2 x 2 2 2
0
! Z1 2 1 1 dx 1 i i /2 1 X .b dF1 D p exp x p p exp 2 2 db 1 db 2 2 i D1 i x 2 2 1 2 2
0
Let us substitute the standard integral Z1 0
1 1 dx p exp x p D 1; 2 x 2
in order to have derived the marginal probability j; .A0 A/1 2 /db 1 db 2 dF1 D f1 .b jA0 Aj1=2 1 b 0 0 b b f1 ./ WD exp 2 . / A A. / 2 2 2 dF1 D f1 .b j; N1 2 /db 1 db 2 2
1 i i /2 1 1 X .b f1 .b / D p exp 2 2 1 2 2 i D1 i
! :
The seventh action item The second target is to generate the marginal pdf of Helmert’s random variable x WD b 2 = 2 , b 2 BIQUUE 2 , namely Helmert’s Chi Square pdf 2p with p D n rkA (here p D 1) “degree of freedom”.
B-7 Sampling from the Multidimensional Gauss–Laplace Normal Distribution
727
C1 Z C1 Z dx jA0 Aj1=2 1 1 b 1 2 b b dF2 D p exp x p exp k k A0 A d 1 d 2 : 2 2 2 2 2 x 2
1 1
Let us substitute the integral C1 Z C1 Z
1 1
jA0 Aj1=2 1 b 2 b b exp k k A0 A d 1 d 2 D 1 2 2 2 2
in order to have derived the marginal distribution dF2 D f2 .x/dx;
0x1 p2 1 1 2 x exp x ; f2 .x/ D p=2 2 .p=2/ 2 p D n rkA D n m;
subject to
here W p D 1;
1 1 1 f2 .x/ D p p exp x : 2 2 x
p 1 D
2
The results of the example will be generalized in Lemma B. Theorem B.11. (marginal probability distributions, special linear GaussMarkov model): Efyg D A Dfyg D In 2
" subject to
A 2 Rnm ; 2 2 RC
rkA D m;
Efyg 2 R.A/
defines a special linear Gauss–Markov model of fixed effects 2 Rm and 2 2 RC based upon Gauss-Laplace i.i.d. observations y WD Œy1 ; : : : ; yn 0 . " b D .A0 A/1 A0 y
subject to
Efb g D b Dfg D .A0 A/1 2
and 2 b 2 D
1 .y A/0 .y A/ n rkA
subject to 4
2 Efb 2g D b Dfb 2g D
2 4 n rkA
728
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
identify b BLUUE and b 2 BIQUUE of 2 . The cumulative pdf of the multidimensional Gauss-Laplace probability distribution of the observation vector y D Œy1 ; : : : ; yn 0 2 Y f .yjEfyg; Dfyg D In 2 /dy1 dyn 1 1 0 D exp 2 .y Efyg/ .y Efyg/ dy1 dyn .2 /n=2 n 2 D f1 .b /f2 .b 2 /db 1 db m db 2 can be split into two marginal pdfs f1 .b / of b , BLUUE of , and f2 .b 2 / of b 2, BIQUUE of 2 . (i) b BLUUE of The marginal pdf of b ; BLUUE of , is represented by (1st version) dF1 D f1 .b /d 1 d m f1 .b / D
ˇ 0 ˇ1=2 0 0 ˇA Aˇ exp 1 .b b / A A. / d 1 d m .2 /m=2 m 2 2 1
or
(2nd version) /d1 dm dF1 D f1 .b m i /2 1 1 X .b 1=2 f1 .b / D . / exp 1 2 m1 m .2 /m=2 . 2 /m=2 2 2 i D1 i
! ;
by means of Principal Component Analysis PCA also called Singular Value Decomposition (SVD) or Eigenvalue Analysis (EIGEN) of .A0 A/1 , " U0 .A0 A/1 U D D Diag.1 ; 2 ; : : : ; m1 ; m / D U0 subject to 0 b b D U U0 U D Im ; det U D C1 j; 2 / D f .b 1 /f .b 2 / f .b m1 /f .b m / f1 .b 1 .b i i /2 1 f .b i / D p p exp 2 2 i i 2
8i 2 f1; : : : ; mg:
The transformed fixed effects .b 1 ; : : : ;b m /, BLUUE of .1 ; : : : ; m /, are mutually independent and Gauss-Laplace normal b i N .i j 2 i /
8i 2 f1; : : : ; mg:
B-7 Sampling from the Multidimensional Gauss–Laplace Normal Distribution
729
(third version) b i 1 1 W f1 .zi /d zi D p exp z2i d zi zi WD p 2 2
2 i
8i 2 f1; : : : ; mg:
(ii) b 2 BIQUUE 2 The marginal pdf of b 2 , BIQUUE 2 , is represented by (first version) p D n rkA dF2 D f2 .b 2 /db 2 f2 .b 2/ D
2 1 1 b p=2 p2 p p : b exp p 2p=2 .p=2/ 2 2
(second version) dF2 D f2 .x/dx p 2 1 b 2 D 2b D 2 .y A/0 .y A/ 2 p 1 1 1 2 x f2 .x/ D p=2 exp x : 2 .p=2/ 2 x WD .n rkA/
f2(x) as the standard pdf of the normalized sample variance is a Helmert Chi Square 2p pdf with p D n rkA “degree of freedom”. Proof. The first action item
1
First, let us decompose the quadratic form kyEfygk2 into estimates E fyg of Efyg.
1 1
y Efyg D y E fyg C .E fyg Efyg/ y Efyg D y Ab C A.b / and
1
1 1 1 1 1 D ky E fygk C kE fyg Efygk
.y Efyg/0 .y Efyg/ D .y E fyg/0 .y E fyg/ C .E fyg Efyg/0.E fyg Efyg/ ky Efygk2
2
2
.y Efyg/0 .y Efyg/ D .y A/0 .y A/ C .b /0 A0 A.b / ky Efygk2 D ky Ak2 C kb k2A0 A : Here, we took advantage of the orthogonality relation.
730
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
.b /0 A0 .y A/0 D .b /0 A0 .In A.A0 A/1 A0 /y D .b /0 .A0 A0 A.A0 A/1 A0 /y D 0: The second action item Second, we implement b 2 BIQUUE of 2 into the decomposed quadratic form. ky Ak2 D .y A/0 .y A/ D y0 .In A.A0 A/1 A0 /y D y0 My D .n rkA/b 2 ky Efygk2 D .n rkA/b 2 C .b /0 A0 A.b / ky Efygk2 D y0 My C .b /0 N.b /: The matrix of the normal equations N WD A0 A; rkN D rkA0 A D rkA D m, and the matrix of the variance component estimation M WD In A.A0 A/1 A0 , rkM D n rkA D n m have been introduced since their rank forms the basis of the generalized forward and backward Helmert transformation.
zD
1
HH0 D In H.y Efyg/ D 1 H.y A/ y Efyg D H0 z
and
1 .y Efyg/0.y Efyg/ D z0 H0 Hz D z0 z 2 1 ky Efygk2 D kzk2 : 2 The standard canonical variable z 2 Rn has to be associated with norms ky Ab k and kb kA0 A . The third action item Third, we take advantage of the eigenspace representation of the matrices .M; N/ and their associated norms. y0 My D y0 VM V0 y
versus .b /0 N.b / D .b /0 UN U0 .b /
M D Diag.1 ; : : : ; nm ; 0; : : : ; 0/ n
2R DR
nm
R
m
versus
N D Diag.1 ; : : : ; m /: 2 Rm
m eigenvalues of the matrix M are zero, but nrkA D nm is the number of its nonvanishing eigenvalues which we denote by .1 ; : : : ; nm /. In contrast, m D rkA is
B-7 Sampling from the Multidimensional Gauss–Laplace Normal Distribution
731
the n-m number of eigenvalues of the matrix N, all non-zero. The canonical random variable V0 y D y , y D Vy
and U0 .b / D b
lead to 1 1 1 .y Efyg/0 .y Efyg/ D 2 .y /M y C 2 .b /0 N .b / 2 nm m 1 X 2 1 X 1 0 .y Efyg/ .y Efyg/ D .y / C .b i i /2 i j 2 2 j D1 j 2 i D1 1 .y Efyg/0 .y Efyg/ D z21 C C z2nm C z2nmC1 C C z2n 2 subject to z21 C C z2nm D
1 2
nm P j D1
.yj /2 j
and
z2nmC1 C C z2n D
1 2
m P i D1
.b i i /2 i
kzk2 D z0 z D z21 C C z2nm C z2nmC1 C C z2n 1 0 1 y My C 2 .b /0 N.b / 2 1 1 D 2 ky Efygk2 D 2 .y Efyg/0.y Efyg/:
D
Obviously, the eigenspace synthesis of the matrices N D A0 A and M D In A.A0 A/1 A0 has guided us to the proper structure synthesis of the generalized Helmert transformation. The fourth action item Fourth, the norm decomposition unable us to split the cumulative probability dF D f .y1 ; : : : ; yn /dy1 dyn into the pdf of the Helmert random variable x WD z21 C C z2nm D 2 .n rkA/b 2 D 2 .n m/b 2 and the pdf of the difference random parameter vector 2 2 2 b znmC1 C C zn D . /0 A0 A.b /. dF D f .z1 ; : : : ; znm ; znmC1 ; : : : ; zn /d z1 d znm d znmC1 d zn
732
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
1 1 0 f .z1 ; : : : ; zn / D exp z z .2 /n=2 2 nm m2 2 1 1 2 1 2 D exp .z1 C C znm / 2 2 2
1 exp .z2nmC1 C C z2n / : 2 Part A x WD r 2 D z21 C C z2nm ) dx D 2.z1 d z1 C C znm d znm / z1 D r cos nm1 cos nm2 cos 2 cos 1 z2 D r cos nm1 cos nm2 cos 2 sin 1 znm1 D r cos nm1 sin nm2 znm D r sin nm1 2
3 z1 4 5 D 1 Diag.p1 ; : : : ; pnm /V0 1 y znm V D ŒV1 ; V2 ;
V0 1 V1 D Inm ;
V0 2 V2 D Im ;
VV0 D In ; V 2 Rnn ; V1 2 Rn.nm/ ; 2 3 znmC1 4 5 D 1 Diag.p1 ; : : : ; pm /U0 .b / zn
V0 1 V2 D 0
V2 2 Rnm
and
altogether 3 z1 7 6::: 7 6 " # p p 7 6 Diag. 1 ; : : : ; nm /V0 1 y 6 znm 7 1 : 7D 6 p p 0 b 6 znmC1 7 Diag. ; : : : ; /U . / 1 m 7 6 5 4::: zn 2
The partitioned vector of the standard random variable z is associated with the norm kznm k2 and kzm k2 , namely
B-7 Sampling from the Multidimensional Gauss–Laplace Normal Distribution
733
kznm k2 C kzm k2 D z21 C C z2nm C z2nmC1 C C z2n 1 0 p p p p y V1 Diag. 1 ; : : : ; nm /Diag. 1 ; : : : ; nm /V0 1 y 2 1 p p p p C 2 .b /0 U Diag. 1 ; : : : ; m /Di ag. 1 ; : : : ; m /U0 .b / 1 1 /0 U Diag.1 ; : : : ; m /U0 .b / D 2 y0 V1 Diag.1 ; : : : ; nm /V0 1 y C 2 .b
D
D d z1 d z2 d znm1 d znm D r nm1 dr.cos nm1 /nm1 .cos nm2 /nm2 cos2 3 cos 2 dnm1 dnm2 d3 d2 d1 : The representation of the local .n m/ dimensional hypervolume element in terms of polar coordinates .1 ; 2 ; : : : ; nm1 ; r/ has already been given by Lemma B.4. Here, we only transform the random variable r to Helmert’s random variable x. x WD r 2 W dx D 2rdr;
dx dr D p ; 2 x
r nm1 dr D
r nm1 D x .nm1/=2
1 .nm1/=2 x dx: 2
Part A concludes with the representation of the left pdf in terms of Helmert’s polar coordinates dF` D
1 2
nm 2
exp 12 .z21 C C z2nm /d z1 d znm
nm 2 1 1 nm2 D x 2 dx.cos nm1 /nm1 .cos nm2 /nm2 2 2
cos2 3 cos 2 dnm1 dnm2 d3 d2 d1 : Part B Part B focuses on the representation of the right pdf, first in terms of the random variables b , second in terms of the canonical random variables b . dFr D
1 2
m2
1 exp .z2nmC1 C C z2n / d znmC1 d zn 2
z2nmC1 C C z2n D d znmC1 d zn D
1 b . /0 A0 A.b / 2
1 jA0 Aj1=2 db 1 db m: m=2
734
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
The computation of the local m-dimensional hyper volume element d znmC1 d zn has followed Corollary B.3 which is based upon the matrix of the metric 2 A0 A. The first representation of the right pdf is given by m 1 2 1 2 2 dFr D exp .znmC1 C C zn / d znmC1 d zn 2
2 m2 1 b jA0 Aj1=2 1 0 0 b D exp . / A A. / db 1 db m: 2
m=2 2 2
Let us introduce the canonical random variables .b 1 ; : : : ;b m / which are generated by the correlating quadratic form kb k2A0 A . 1 1 0 0 0 b b b . / A A. / D . / U Diag U0 .b ;:::; /: 1 m Here, we took advantage of the eigenspace synthesis of the matrix A0 A DW N and .A0 A/1 DW N1 . Such an inverse normal matrix is the representing dispersion matrix Dfb g D .A0 A/1 2 D N1 2 . UU0 D Im U 2 SO.m/ WD fU 2nm jUU0 D Im ; jUj D C1g N WD A0 A D U Diag.1 ; : : : ; m /U0 N
1
0
WD .A A/
1
D U Diag.1 ; : : : ; m /U0
1 1 D 1 1 ; : : : ; m D m
jA0 Aj1=2 D
p
versus subject to
or 1 D 11 ; : : : ; m D m1
1 1 m D p 1 m
b WD U0 .b / , b WD U0 .b / kb k2A0 A DW .b /0 A0 A.b / D .b /0 Diag
1 1 ;:::; 1 m
.b /:
The local m-dimensional hypervolume element db 1 db m is transformed to the local m-dimensional hypervolume element db 1 db m by db 1 db m D jUjdb 1 db m D db 1 db m : Accordingly we have derived the second representation of the right pdf f .b /. dFr D
1 2
m2 m=2
1 p 1 m
1 1 1 0 .b / db 1 db exp 2 .b / Diag ;:::;y m : 2 1 m
B-7 Sampling from the Multidimensional Gauss–Laplace Normal Distribution
735
Part C Part C is an attempt to merge the left and right pdf nm 2 1 1 x .nm2/=2 dxd!nm1 2 2
m2 1 jA0 Aj1=2 1 b 0 0 b exp A A. / db 1 db m or . / 2
m 2 nm 2 1 1 dF D dF` dFr D x .nm2/=2 dxd!nm1 2 2
m2 1 1 1 1 1 0 p exp .b / Diag ; : : : ; 1 db m : .b / db 2 m 1 m 2 2 1 m dF D dF` dFr D
The local .n m 1/-dimensional hypersurface element has been denoted by d!nm1 according to Lemma B.4. The fifth action item Fifth, we are going to compute the marginal pdf of b BLUUE of . dF1 D f1 .b /db 1 db m as well as /db 1 db m dF1 D f1 .b include the first marginal pdf f1 .b / and f1 .b /, respectively. The definition f1 .b / WD
Z1
Z dx
0
1 2
1 d!nm1 2
m2
1 2
nm 2
x .nm2/=2
jA0 Aj1=2 1 b 0 0 b exp . / A A. / . 2 /m=2 2
subject to Z
Z1 dx
d!nm1
0
1 2
1 2
nm 2
x .nm2/=2 D 1
leads us to f1 .b / D
1 2
m2
jA0 Aj1=2 1 b 0 0 b . / exp A A. / : m 2
736
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
Unfortunately, such a general multivariate Gauss-Laplace normal distribution cannot be tabulated. An alternative is offered by introducing canonical unknown parameters b as random variables. The definition Z
Z1 f1 .b / WD
dx
d!nm1
0
1 2
1 2
nm 2
x .nm2/=2
m 1 2 1 .1 2 m1 m /1=2 2
m 1 1 1 .b / exp 2 .b /0 Diag ;:::; 2 1 m
subject to Z
Z1 dx 0
1 d!nm1 2
1 2
nm 2
x .nm2/=2 D 1
alternatively leads us to ! m 1 1 i /2 1 X .b f1 .b / D p exp 2 m 1 m 2 i D1 i f1 .b 1 ; : : : ;b m / D f1 .b 1 / f1 .b m / 2 i / 1 .b 1 8i 2 f1; : : : ; mg: f1 .b i / WD p p exp 2 i i 2
1 2
m2
Obviously the transformed random variables .b 1 ; : : : ; yb m / BLUUE of .1 ; : : : ; m / are mutually independent and Gauss-Laplace normal. The sixth action item Sixth, we shall compute the marginal pdf of Helmert’s random variable x D 2 = 2 , b 2 BIQUUE 2 , .n rkA/b 2 = 2 D .n m/b dF2 D f2 .x/dx includes the second marginal pdf f2 .x/. The definition Z f2 .x/ WD
1 d!nm1 2
1 2
nm 2
x
.nm2/=2
1 exp 2 .b /0 A0 A.b / 2
C1 C1 m2 Z Z 1 jA0 Aj1=2 b b d 1 d m 2
m 1
1
B-7 Sampling from the Multidimensional Gauss–Laplace Normal Distribution
737
subject to Z !nm1 D
d!nm1 D
2 .nm1/=2 ; . nm1 / 2
according to Lemma B.4 C1 C1 m2 Z Z 1 jA0 Aj1=2 1 b 0 0 b db 1 db m exp . / A A. / 2
m 2 2
1
1
C1 m2 Z 1 1 d z1 d zm exp .z21 C C z2m / 2
2
C1 Z
D 1
1
leads us to
p
p WD n rkA D n m
n m
p 1 nm1 nm1 D D D 2 2 2 2 2 f2 .x/ D
p 1 1 2 1 exp x x ; 2p=2 .p=2/ 2
namely the standard pdf of the normalised sample variance, known as Helmert’s Chi Square pdf 2p with p D n rkA D n m “degree of freedom”. If you substitute 2 = 2 , dx D .nrkA/ 2 db 2 D .nm/ 2 db 2 x D .nrkA/b 2 = 2 D .nm/b 2 we arrive at the pdf of the sample variance b , in particular dF2 D f2 .b 2 /db 2
2 1 1 b p=2 p2 p b f2 .b / D r p=2 exp p 2 : 2 .p=2/ 2 2
Here is our proof’s end. Theorem B.12. (marginal probability distributions, special linear Gauss– Markov model with datum defect):
Efyg D A Dfyg D V 2
A 2nm ; r WD rkA < minfn; mg subject to Efyg 2 R.A/ V 2nn ;
rkV D n
738
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
defines a special linear Gauss–Markov model with datum defect of fixed effects 2 Rm and a positive definite variance-covariance matrix Dfyg D ˙ y of multivariate Gauss-Laplace distributed observations y WD Œy1 ; : : : ; yn 0 . 2
b D Ly “linear” 6 C b D A y subject to 4 kLA Im k2 D min “minimum bias” trDfb g D tr L˙ y L0 D min “best” and 2 Efb 2g D 2 1 6 2 0 1 .y Ab / ˙ y .y Ab b D / subject to 4 2 4 n rkA Dfb 2g D n rkA identify b BLIMBE of and b 2 BIQUUE of 2 . Part A The cumulative pdf of the multivariate Gauss-Laplace probability distribution of the observation vector y D Œy1 ; : : : ; yn 0 2 Y f .yjEfyg;
Dfyg D ˙ y /dy1 dyn 1 1 1 0 1 D exp .y Efyg/ ˙ y .y Efyg/ dy1 dyn .2 /n=2 j˙ y j1=2 2
is transformed by p p p p ˙ y D W0 Diag.1 ; : : : ; n /W D W0 Diag. 1 ; : : : ; n / Diag. 1 ; : : : ; n /W p p ˙ 1=2 n /W y WD Diag. 1 ; : : : ; 0 1=2 versus ˙ y D .˙ 1=2 y / ˙y 1 1 0 D W Diag ; : : : ; ˙ 1 W y 1 n 1 1 1 1 D W0 Diag p ; : : : ; p Diag p ; : : : ; p W 1 n 1 n 1 1 ˙ 1=2 WD Diag p ; : : : ; p W 1 n
/0 ˙ 1=2 ˙ 1 D .˙ 1=2 y y
subject to the orthogonality condition WW0 D In ky Efygk2˙ 1 WD .y Efyg/0˙ 1 y .y Efyg/ y
B-8 Multidimensional Variance Analysis, Sampling from the Multivariate
1 1 D .y Efyg/ W Diag p ; : : : ; p 1 n 0
0
739
1 1 Diag p ; : : : ; p 1 n
W.y Efyg/0 D .y Efyg/0 .y Efy g/ DW ky Efy gk2In subject to the star or canonical coordinates 1 1 Wy D ˙ 1=2 y D Diag p ; : : : ; p y y 1 n f .yjEfyg; Dfy g D In /dy1 dyn 1 1 0 .y D exp Efy g/ .y Efy g/ dy1 dyn .2 /n=2 2 1 1 2 D exp ky Efy gk dy1 dyn .2 /n=2 2 into the canonical Gauss-Laplace pdf. Part B The marginal pdf of b BLUUE of , is represented by (1st version) dF1 D f1 .b /d 1 d n :
B-8 Multidimensional Variance Analysis, Sampling from the Multivariate Gauss–Laplace Normal Distribution Let x1 ; : : : ; xN denote a random sample from as k-variate Gauss-Laplace normal distribution with as the mean vector and ˙ as the variance-covariance matrix, such that .B 370/
xi WD .Xi;1 ; : : : ; Xi;k /0
for i 2 f1; : : : ; N g. Let X1 WD .x1 ; : : : ; xN /. The joint density of the random sample is generated by the product of N multivariate normal densities. Let us denote the parameter space by f; ˙ I 2 Rk ; ˙ positive definiteg DW ˝. Here we describe estimation of the parameters from a multivariate Gauss-Laplace normal distribution and describe properties of these estimates. In addition, we care for inference for a simple, multiple, and partial correlation coefficients based on the normal random sample. Of course, we have to proof that we can assume a multivariate GaussLaplace normal distribution. We add some methods for assessing Gauss-Laplace normality and describe suitable transformation to normality.
740
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
B-81 Distribution of Sample Mean and Variance-Covariance By the representation of the sample mean of the random sample x1 ; : : : ; xN by (B371) and the sample variance-covariance matrix by (B372)–(B375), namely the k k dimensional matrix SN =.N 1/ D fSj l g for all j; l 2 f1; : : : ; kg subject N P Xi;j =N . The sample variance-covariance matrix has k variances Sjj for to X D i D1
j 2 f1; : : : ; kg and k.k 1/=2 possibly distinct covariances. Sj l for j; l; j; l 2 f1; : : : ; kg. The generalized sample variance is the scalar quality jSN =.N 1/j which determines the degree of “peakedness” of the joint density of x1 ; : : : ; xN and is a normal summary measure of variability in the sample.
Table B.1 Sample mean, sample variance-covariance matrix Sample mean N P x^ xi N WD
(B371)
iD1
Sample variance-covariance matrix N N P P ^ ^0 SN WD .xi x^ xi xi0 N xi x^ N /.xi xN /0 D N xi xN iD1
iD1
SN =.N a/ DW fSjl g for j; l 2 f1; : : : ; kg Sjl WD
N P
(B372) (B373)
.Xi;j X^ j /.Xil Xl /=.N 1/
(B374)
Xi;j N
(B375)
iD1
Subject to X^ j D
N P iD1
Given a univariate random sample X1 ; : : : ; XN from a N.; 2 / population, that the sample mean X^ has a normal distribution with mean and variance 2 =N , the static .N 1/S2 = 2 is distributed as a 2 variable and the two distributions are independent. The corresponding distributional result for the k-variable situations is given by our previous result. We first define a multivariate distribution called the Wishart distribution which is derived from the multivariate Gauss-Laplace normal distribution as its sampling distribution of the sample statistics N X ^ 0 .Xi X^ N /.Xi XN /
.B 376/
i D1
The Wishart distribution is a multivariate extension of the chi-square distribution. Definition B.18. Wishart distribution A random k-dimensional matrix W is said to follow a k-dimensional noncentral Wishart distribution. Wk .˙ ; m; x/ with m degrees of freedom, parameter ˙ ,
B-8 Multidimensional Variance Analysis, Sampling from the Multivariate
741
P and noncentrality parameter WD 0 1 0 =2 if W can be represented as, m P P W WD xi xj where xj for all j 2 f1; : : : ; mg are i:i:i:Nk .; / vectors. Provide j D1
m:k the density function of W is f .WI ; ˙ / D
jWj.mk1/=2 expftr.˙ 1W=2/g 2km=2 j˙ jm=2 k .m=2/
.B 377/
where k .m=2/ D k.k1/=4 ˘jkD1 . 12 .m C 1 j // is the multivariate Gamma functions. If D 0, we say that W follows a central Wishart distribution Wk .˙ ; m/; since ˙ is p.d., it is clear that D 0 if and only if D 0. The additivity property of Wishart matrices states that if Wj i nd Wk .˙ ; mj /; j D 1; : : : ; J , then J J P P Wj Wk ˙ ; mj .
j D1
j D1
End of Definition B.18: Wishart distribution Intermezzo Before we proceed we must introduce results about the Fourier transform of a random vector, namely called “characteristic function” being generalized in Appendix C to the “characteristic functional” for random functions, namely stochastic processes. We will introduce some lemmas and definitions. Definition B.19. Characteristic function of a random vector The characteristic function of a random vector X is Efexp i t 0 xg define for every vector t. Definition B.20. Decomposition of a complex-valued function Let the complex-valued function g.x/ be written as g.x/ D g1 .x/ C ig2 .x/
.B 378/
where g1 .x/ and g2 .x/ are real-valued. Then the expected value of g.X/ is Efg.X/g D Efg1 .x/g C i Efg2 .x/g
.B 379/
in particular, Efexp.i t 0 x/g D Efcos.t 0 x/g C i Efsin.t 0 x/g
.B 380/
742
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
Lemma B.18. Product rule for independent variables Let X0 WD .X.1/0 X.2/0 /. If X.1/ and X.2/ are independent and g.x/ D g.1/.x.1/ /g.2/ .x.2/ /, then Efg.X/g D Efg .1/ .X.1/ /gEfg .2/ .X.2/ /g
.B 381/
Lemma B.19. Components of independent distribution data If the component of X are independently distributed, then Efexp.i t 0 X^ N /g D
N Y
Efi tj Xj g
.B 382/
j D1
Theorem B.20. Characteristic function of a vector-valued random variant of type Gauss-Laplace distribution The characteristic function of X which is Gauss-Laplace distribution according to N.; ˙ / is
1 Efexp.i t X/g D exp t C t 0 ˙ t 2 0
0
.B 383/
for any real vector t. Theorem B.21. Characteristic function .t/ If the random vector X has the density f .x/ and the characteristic function f .t/, then 1 f .x/ D .2 /
C1 C1 Z Z ::: exp.i t 0 x/ N .t/d!1 : : : d! 1
.B 384/
1
and C1 C1 Z Z ::: exp.Ci !0 x/f .x/dx1 : : : dx f .!/ D EfCi ! Xg D N
0
1
1
illustrating synthesis and analysis Here ends our intermezzo
.B 385/
B-8 Multidimensional Variance Analysis, Sampling from the Multivariate
f N .!/ D EfCi !0 Xg D
C1 C1 Z Z ::: exp.Ci !0 x/f .x/dx1 : : : dx 1
743
.B 386/
1
Results: Distribution of x^ N and SN Let x1 ; : : : ; xN denote a random sample from the Gauss-Laplace normal distribution Nk .; /. (a) The distribution of x^ N is Nk . =N / (b) For N 2, SN follows a Wishart distribution Wk .˙ 01 N 1/ (c) x^ N and SN are independently distributed. Proof
Efexp.i t 0 x^ N /g D
N Y
1 Efexp.i tj0 xj /=N g D exp.t 0 C t 0 .˙ =N /t 2 j D1
.B 387/
for N D 2 it follows 0 0 1 1 1 1 S2 D x1 .x1 C x2 / x1 .x1 C x2 / C x2 .x1 C x2 / x2 .x1 C x2 / 2 2 2 2 1 .x1 x2 /.x1 x2 /0 .B 388/ 2 p Obviously .x1 x2 / 2 is distributed according to Nk .0; ˙ / and S2 according to Wk .˙ ; 1/ ) S2 D
0 1 1 S3 D x1 .x1 C x2 C x3 / x1 .x1 C x2 C x3 / 3 3 0 1 1 C x2 .x1 C x2 C x3 / x2 .x1 C x2 C x3 / 3 3 0 1 1 C x3 .x1 C x2 C x3 / x3 .x1 C x2 C x3 / 3 3 2 ^ 0 S3 D S2 C .x3 x^ 2 /.x3 x2 / 3
.B 390/
subject to x^ 2 D
1 .2x^ 2 C x3 / 3
.B 391/
.B 389/
744
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
in general 1 ^ .mxm C xmC1 / and mC1 m ^ 0 .xmC1 x^ D Sm C m /.xmC1 xm / mC1
x^ m D SmC1
.B 392/
B-82 Distribution Related to Correlation Coefficients xi for all i 2 f1; : : : ; m C 1g are mutually independent and therefore Since ^ x^ ; mC1 Sm are functions only of xi for i 2 f1; : : : ; mg, it follows that xmC1 ; Sm ^ are independent of xmC1 . By the hypothesis of induction we enjoy the independence of x^ m and Sm x^ mC1 ; Sm
and xmC1
.B 393/
are mutually independent. Thus, Sm is independent of x^ m ;xmC1 and therefore mC1 mC1 xmC1 x^ independent of xmC1 x^ m which follows a Nk 0; m ˙ m .xmC1 m ^ 0 xm / are distributed as Wk .˙; 1/. Note that ^ cov xi x^ (B 394) m ; xn D 0 h i0 ^ 0 Which implies that x1 x^ ; : : : ; x x is uncorrelated and therefore N N N ^ independent of xN . In consequence, Sn is distributed independently of x^ N. P Results: Maximum likelihood estimation if and : : : ; xN from a Nk .; ˙/ distribution, the maximum Based on a random sample x1 ;P likelihood estimates of and are b ML D
N 1 X xi D x^ N; N i D1
.B 395/
and N X
N 1 ^ 0 b ML D 1 SN xi x^ x X D ˙ i N N N i D1 N
.B 396/
Proof The likelihood function L .; ˙I x1 ; : : : ; xN / has the form shown on the right b ML , and are the side. The MLE’s of and ˙ are denoted by b ML and ˙
B-8 Multidimensional Variance Analysis, Sampling from the Multivariate
745
1 values that maximize is p.d., the distance ^ ^ 0 1 L .; ˙I x1 ; : : : ; xN /. Since ˙ xN in the exponent of the likelihood function is positive xN ˙ unless D XN . The MLE of is then x^ N , since it is the value of that maximizes the likelihood function. We next maximize the following function with respect to ˙:
exp L .b ML ; ˙/ D
12 tr
PN ^ 0 1 ^ x i XN ˙ i D1 xi xN .2 /N k=2 j˙jN=2
.B 397/
P ^ 0 ^ With A D ˙; B D N xi XN , and b D N=2, we can show that i D1 xi xN
P ^ 0 ^ b ML D 1 N ˙ x i XN maximizes L .b ML ; ˙/. The maximum i D1 xi xN N likelihood is
b ML D L b ML ; ˙
exp N2k b ML jN=20 .2 /N k=2 j˙
.B 397/
which completes the proof. Corollary: Invariance property of MLE Using the invariance property
ofMLE’s which states that the MLE of a function g ./ of a parameter is g b ML , we see that b 1 b 1. The MLE of the function 0˙ 1 is b 0ML ˙ , P ML ML p p 2. The MLE of lj , the (l,j)th element of is given by b lj , the square root of P b the MLE of the(l,j)th entry in ML b ML is a biased Although the estimator b ML is an unbiased estimator of ; ˙ estimator of ˙. analogous to the univariate case, an unbiased estimator of ˙ is b D 1 SN . The unbiased estimators of abd ˙ denoted by b b respectively ˙ and ˙ N 1 are complete sufficient statistics. Of course, you have realized that we gave no space to derive the Wishart distribution. Instead we refer to his works in Wishart (1928), Wishart (1931), Wishart (1955) as well as Wishart and Bartlett (1932). the general theory of multivariate statistical analysis including the Gauss-Laplace multivariate normal distribution, the estimation of the mean vector and the variance-covariance matrix the distribution of sample correlation coefficients, the generalized t 2 statistics and the distribution of the sample covariance matrix can be taken from the excellent review of Anderson (1958) including 413 references. finally we advice the readers to consider the works of Koch (1988a), Koch (1988b), pp. 160–173 B.8 Distributions related to correlation coefficients Traditionally we divide simple, multiple and partial correlations. Here we restrict our study to the analysis of the simple correlation based on i id bivariate Gauss-Laplace normal samples.
746
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
Xi;j ; Xi;l
for all i 2 f1; : : : ; N g
.B 399/
th th where Xi;j and Xi;l denote the j and l components of the k - l dimensional vector x. Suppose E Xi;j D j ; E .Xi;l / D l , var Xi;j D jz ; var .Xi;l / D lz and corr Xi;j ; Xi;l D j l . The exact sampling distribution p of was p first derived by Fisher (1955) who showed that for D 0 the statistic N 2 ^ 1 is distributed as t with (N-2) degrees of freedom. He introduced the Z-transformation,
Z WD
1 1 C ^ =1 ^ 2
.B 400/
and proved that even for relative small samples it is approximately Gauss-Laplace normally distributed, and that as N increases the variance of the Z-transformation rapidly becomes nearly independent of converging to 1=.N 3/. In 1938, F. N. David (1938) published tables of the exact sampling distribution of ^ for D 0.0:1/0:9, N D B.1/25; 50; 100; 200; 400; ^ D 1.0:01/C1, and investigated the accuracy of the approximate distributions suggested by R.H. Fisher. She found approximate formulas for the mean/variance of Z: They were “extraordinarily accurate even for the low values of N provided j j is not too near unity”. Hotdling (1953) intensively studied the mathematical properties of the distribution function of ^ and the suggested improvements of the Z-transformation in terms of a hypergeometric function. Beside discussing methods of interpolating the density he also calculated moments of ^ . From the vast reference list of the subject we like to mention the contribution of Soper et al. (1955), Kraemer (1973), Wilcox (1997) and Green (1952) as well as Khan (1994). Example B: Correlation coefficient, “iid” Gauss-Laplace two dimensional normal We start from the two dimensional Gauss-Laplace normal observations .xi ; yi / for N-independent identical distributed data .“i id 00 / estimated by BLUUE and BIQUUE. By a special decomposition we shift the sample distribution into two normal distributions relating to 2 2 1b .b 1 ; b 2 / on the one side and to b 21b 22b 12 or b 2 12 on the other side. b 12 indicates the sample correlation coefficient via the characteristic function based on Fourier-Stieltjes integration we derive the sampling distribution of the correlation coefficient b 12 . 12 / using tables of David (1954). “N independently, identically distributed observations : iid” “Gauss-Laplace normal, two-dimensional observations: .xi ; yi / for all i 2 f1; : : : ; N g”
B-8 Multidimensional Variance Analysis, Sampling from the Multivariate
747
“Probability density”
f .x; y/ D
1 1 1 p exp 2 1 2 1 2 2 .1 2 / " # .x 1 / .xi1 / .y 2 / .y 2 /2 C2 C 12 1 2 22
f .x1 ; : : : ; xN ; y1 ; : : : ; yN / D
D
1 p 2 1 2 1 2
!N
N Y
.B 31/
f .xi ; yi / D f .x1 ; y1 / .xN ; yN /
i D1
!! N X 1 .xi 1 / .yi 2 / .y i mu2 /2 .xi 1 /2 exp 2 C 2 .1 2 / i D1 1 2 12 22 .B 32/ BLUUE W ^ 1 D
N 1 X xi D k0 x=N N i D1
.B 33/
BLUUE W ^ 2 D
N 1 X yi D k0 y=N N i D1
.B 34/
BIQUUE: N
1^2 D
2 0 1 X xi ^ x k^ D x k^ 1 1 1 = .N 1/ N 1 i D1
2^2 D
2 0 1 X yi ^ y k^ D y k^ 1 1 2 = .N 1/ N 1 i D1
^ 12 D
.B 35/
N
N 1 X ^ ^ 0 ^ .xi ^ 1 /.yi 2 / D .x k1 / .y k2 /=.N 1/ N 1 i D1
“Decomposition”
.B 36/
.B 37/
748
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions
N N X X 2 ^2 ^ 2 .xi 1 /2 D Œ.xi b 1 /.1 ^ 1 / D .N1/1 CN.1 1 / i D1
N X
2
.yi 2 / D
N X
i D1 N X
.B39/
i D1
2
2 ^ ^ 2 Œ.yi b 2 /.2 ^ 2 / D .N 1/2 C N.2 2 /
.B40/
i D1 ^ ^ ^ ^ .xi 2 /.yi 2 / D .N 1/ 12 1 2 C N.1 ^ 1 /.2 2 /
.B41/
i D1
!N
1 p
f .x1 ; : : : ; xN ; y1 ; : : : ; yN / D
2 1 2 1 2
.1 b 1 /2 2 12 .1 b 1 /.2 b 2 / 2 /2 N .2 b C 2.1 2 / 1 2 12 22 2 b 22 b 1 2 12b N 1 12b 1b 2 .B42/ exp C 2 1 2 2.1 12 / 12 22
exp
“Product of two normal distributions” First part, marginal distribution 0 C1 Z C1 Z B @ 1 1
1 1 q
2 2 1 2 1 12
C A exp
" N 1 /2 .1 b 2 2.1 12 / 12
# ! 1 /.2 b 2 / 2 /2 2 12 .1 b .2 b C 2 db 1 db 1 2 22
.B43/
“Fourier-Stieltjes analysis” f .!/ D
Z .exp i !x/f .x/dx ,
1 , f .x/ D 2
.B44/
C1 Z .exp i !x/f .x//f ^ .!/d! 1
First part of the marginal normal distribution
.B45/
B-8 Multidimensional Variance Analysis, Sampling from the Multivariate
1 p 2 1 2 1 12
C1 Z Z db 1 db 2 exp 1
749
1 i t1 N .1 b 1 /2 2.1 12 /2 12
exp C
N 1 i t2 2 12 .1 b 1 /.2 b 2 /2 2 1 2 2.1 12 /
exp
1 i t3 N .2 b 2 /2 2.1 12 /2 22
.B46/
“abbreviations”
1 WD
1 /2 .1 b N D 1 .b 1 / 2 2 2.1 12 / 1
exp Cit1 1 D exp it1 2 WD
.B47/
N 1 /2 .1 b 2 2.1 12 / 12
.1 b 1 /.2 b 2 / N 2 12 D 2 .b 1 ; b 2 / 2 1 2 2.1 12 /
exp Cit2 2 D exp Cit2
.B48/
N .1 b 1 /.2 b 2 / 2 12 2 1 2 2.1 12 /
“First part of the marginal normal distribution” 1 q 2 2 1 2 1 12
C1 Z Z exp 1
1 i !1 N .1 b 1 /2 2 2.1 12 / 12
2 12 .1 C i !2 / .1 b 1 /.2 b 2 / 1 2 1 i !3 2 db 1 d mu C . b / c2 2 2 22
D
2 1=2 .1 12 / 2 Œ.1 i !1 /.1 i !3 / 12 .1 C i !2 /2 1=2
“Check yourself the integral
’
.B49/
expŒ˛x 2 C ˇxy y 2 dxdy”
“Second part of the marginal normal distribution”
750
B Sampling Distributions and Their Use: Confidence Intervals and Confidence Regions 2 .1 12 /
N 1 2
2 Œ.1 it1 /.1 it3 / 12 .1 C it2 /
N 1 2
2 b 22 1 2 12b 12b 1b 2 N 1 b exp C 2 1 2 2.1 12 / 12 22 .B50/
f .x1 ; : : : ; nN ; y1 ; : : : ; yN /dx1 : : : dxN dy1 : : : dyN ) f .x1 ; y1 / : : : f .xN ; yN /dx1 dy1 : : : dxN dyN f .b 1 ; b 2 /f .b 1; b 2; b 12 /db 1 db 2 db 1 db 2 db 12
.B51/
f .x1 ; : : : ; xN ; y1 ; : : : ; yN / D p1 .b 1 ; b 2 /p2 .b 1; b 2; b 12 / “first” f1N .t1 ; t2 ; t3 /
f1 .b 1 ; b 2 /
p1N .t1 ; t2 ; t3 / D
2 .1 12 / 2 2 Œ.1 i t1 /.1 i t3 / 12 .1 C i t2 /2
N 1
.B52/
“second” 2 / .1 12 p.g1 ; g2 ; g3 / D .2n/3
N 1 2
C1 ZZ Z
1
exp.i !1 1 i !/2 2 i !3 3 / 2 Œ.1 i t1 /.1 i t3 / 12 .1 C i t2 /2
d!1 d!2 d!3 p.b 12 / D
N 1 2
0 N 2 N 2
2 / .1 12 N 4 d .1 b 212 / 2
.N 3/Š d.b 12 /
1.B53/
12 / C B arccos. 12b @ q A 2 2 1 12b 12
(B.1)
End of Lemma Example As a specific example, we consider the confidence interval of the confidence coefficient 0.95 on the correlation of 0.782 observed by a sample of ten. Using Graph 4 of David (1954) we find the two limits are 0.34 and 0.98 . This result leads us to the interval 0:34 < 12 < 0:98 (Anderson 1958, p 72). Lemma B (probability interval for the correlation coefficient, functional determinant: David 1954, pp. xxxv–xxxviii): ˇ ˇ ˇ ˇ @.x1 ; : : : ; xN ; y1 ; : : : ; yN / ˇDN ˇ ˇ @.b 1 ; b 2 I 1 ; : : : ; N 1 ; 1 ; : : : ; N 1 / ˇ subject to the Helmert transformation
B-8 Multidimensional Variance Analysis, Sampling from the Multivariate
1 1 1 1 C p 2 C C p x1 D b 1 C p N 1 21 32 N.N 1/ 1 1 1 1 C p 2 C C p x2 D b 1 p N 1 21 32 N.N 1/ x3 D b 1
xN D b 1
1 N 1 2 C C p 32 N.N 1/ ::: 1 N 1 p N.N 1/
p
2
1 1 1 1 C p 2 C C p y1 D b 2 C p N 1 21 32 N.N 1/ 1 1 1 1 C p 2 C C p N 1 y2 D b 2 p 21 32 N.N 1/ 1 N 1 2 C C p 32 N.N 1/ ::: 1 yN D b 2 p N 1 N.N 1/ ˇ ˇ ˇ ˇ @.xi ; yi / ˇ ˇ p.b 1 ; b 2 ; j ; j / D p.xi ; y; i / ˇ @.b 1 ; b 2 ; j ; j / ˇ y3 D b 2
p
2
for all j D 1; 2; : : : ; N 1I
i D 1; 2; : : : ; N
751
Appendix C
Statistical Notions, Random Events and Stochastic Processes
Definitions and lemmas relating to statistics, random events as well as stochastic processes are given, neglecting their proofs. First, we review statistical moments of a probability distribution of random vectors and list the Gauss-Laplace normal distribution by introducing the notion of a quasi-normal distribution. As the end of Appendix C1 we take reference to two lemmas about the Gauss-Laplace 3 rule, namely the Gauss-Laplace inequality and the Vysochainskii-Potunin inequality. Appendix C2 reviews the spatial linear error propagation as well as the general nonlinear error propagation based on y D g.x/ introducing the moments of second, third and fourth order. The special role Jacobi matrix as well as the Hesse matrix is classified. Appendix C3 reviews useful identities like Efyy 0 ˝j g and Efyy 0 ˝yy 0 g as well as D Ef.y Efyg/.y Efyg/0 ˝ .y Efyg/g relating to the matrix of obliquity and WD Ef.y Efyg/.y Efyg/0 ˝ .y Efyg/.y Efyg/0 g C : : : relating to the matrix of courtosis. Appendix C.4 newly introduces scalar-valued stochastic processes of one parameter enriched by Appendix C.5 reviewing characteristics of a one parameter stochastic processes like non-stationary, stationary and cryodic stochastic processes. Extensive simple examples are presented in Appendix C.6 including ARIMA processes namely. Appendix C.7 introduce the important WIENER process of type ORNSTEIN- UHLENBECK, WIENER process with drift integrated WIENER process. The spectral analysis of one parameter is enriched by 14 examples like the DIRAC representation, spectral densities, band limited white noise. An extensive introduction into scalar-, vector-, and tensor-valued stochastic processes in Appendix C.9 refers to the characteristic functional the moment representations of stochastic processes for scalar-valued and vector-valued qualities, for statistical homogeneous and isotropic fields of multi-point systems. The Appendix is specified by Example C.9.1 on the topic of two-dimensional Euclidean networks under the postulates of homogeneity and isotropy in the statistical sense, by Example C.9.2 on criterion matrices for absolute coordinates in a space time astronomic frame of reference, for instance represented in scalar-valued spherical harmonic, by Example C.9.3 on space gravity spectroscopy illustrating the benefits of E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2, © Springer-Verlag Berlin Heidelberg 2012
753
754
C Statistical Notions, Random Events and Stochastic Processes
TAYLOR-KARMAN structured criterion matrices, by Example C.9.4 on Nonlocal prediction and Example C.9.5 on Nonlocal Time Series Analysis.
C-1 Moments of a Probability Distribution, the Gauss–Laplace Normal Distribution and the Quasi-Normal Distribution First, we define the moments of a probability distribution of a vector valued random function and explain the notion of a Gauss-Laplace normal distribution and its moments. Especially we define the terminology of a quasi - Gauss-Laplace normal distribution. Definition C.1. (statistical moments of a probability distribution): (1) The expectation or first moment of a continuous random vector ŒXi for all i D 1; : : : ; n of a probability density f .x1 ; : : : xn / is defined as the mean vector WD Œ1 ; : : : ; n of EfXi g D i Z i WD EfXi g D
Z
C1 1
:::
C1
1
xi f .x1 ; : : : ; xn /dx1 : : : dxn :
(C1)
The first moment related to the random vector Œei WD ŒXi EfXi g is called first central moment
i WD EfXi i g D EfXi g i D 0: (C2) (2) The second moment of a continuous random vector ŒXi for all i D 1; : : : ; n of a probability density f .x1 ; : : : ; xn / is the mean matrix Z ij WD EfXi Xj g D
Z
C1 1
:::
C1 1
xi xj f .x1 ; : : : ; xn /dx1 dxn :
(C3)
The second moment related to the random vector Œei WD ŒXi EfXi g is called variance - covariance matrix or dispersion matrix Œij , especially variance or dispersion 2 for i D j or covariance for i ¤ j :
i i D i2 D V fXi g D DfXi g WD Ef.Xi i /2 g
(C4)
ij D ij D C fXi ; Xj g WD Ef.Xi i /.Xj j /g
(C5)
Dfxg D Œij D ŒC fXi ; Xj g D Ef.x Efxg/.x Efxg/0 g:
(C6)
x WD ŒX1 ; : : : ; Xn 0 is a collection of the n 1 random vector. The random variables Xi ; Xj for i ¤ j are called with respect to the central moment of second order uncorrelated if ij D 0fori ¤ j . (3) The third central moment with respect to the random vector Œei WD ŒXi EfXi g defined by
C-1 Moments of a Probability Distribution, the Gauss–Laplace Normal
ij k WD Efei ej ek g D Ef.Xi i /.Xj j /.Xk k /g
755
(C7)
contains for i D j D k the vector of obliquity with the components i WD Efe3i g
(C8)
Uncorrelation with respect to the central moment up to third order is defined by Efeni11 eni22 eni33 g D
3 Y j D1
2 n Efeijj g 4
81 i1 ¤ i2 ¤ i3 n and 0 n1 C n2 C n3 3:
(C9)
(4) The fourth central moment relative to the random vector Œei WD ŒXi EfXi g
ij k` WD Efei ej ek e` g D Ef.Xi i /.Xj j /.Xk k /.X` ` /g
(C10)
leads for i1 D i2 D i3 D i4 to the vector of curtosis with the components i WD Efe4i g 3fi2 g2
(C11)
Uncorrelation with respect the central moments up the fourth order is defined by Efeni11 eni22 eni33 eni44 g D
4 Y j D1
2 n Efeijj g
4
81 i1 ¤ i2 ¤ i3 ¤ i4 n und 0 n1 C n2 C n3 C n4 4 :
(C12)
(5) The central moments of the nth order relative to the random vector Œei WD ŒXi EfXi g are defined by
i1 :::in WD Efei1 : : : ein g D Ef.Xi1 i1 / : : : .Xin in /g
(C13)
A special distribution is the Gauss-Laplace normal distribution of random vectors in Rn : Note that alternative distributions on manifolds exist in large numbers, for instance the von Mises distribution on S2 or the Fisher distribution on S3 of Chap. 7. Definition C.2. (Gauss-Laplace normal distribution): An n 1 random vector x WD ŒX1 ; : : : ; Xn 0 is a Gauss-Laplace normal distribution if its probability density f .x1 ; : : : xi / has the representation 1 f .x/ D .2 /n=2 j˙ j1=2 expŒ .x Efxg/0˙ 1 .x Efxg/ 2
(C14)
756
C Statistical Notions, Random Events and Stochastic Processes
Symbolically we can write x N .; ˙ /
(C15)
with the moment of first order or mean vector WD Efxg and the central moment of second order or variance – covariance matrix ˙ WD Ef.x Efxg/.x Efxg/0 g: The moments of the Gauss-Laplace normal distribution are given next. Lemma C.3. (moments of the Gauss-Laplace normal distribution): Let the n 1 random vector x WD ŒX1 ; : : : ; Xn 0 follow a Gauss-Laplace normal distribution. Then all central moments of odd order disappear and the central moments of even order are product sums of the central moments of second order exclusively. Efei1 : : : ein g D 08n D 2m C 1; m D 1; : : : ; 1 Efei1 : : : ein g D
fct.i21 ; i1 i2 ; : : : ; i22 ; : : : ; i2n /8n
(C16)
D 2m; m D 1; : : : ; 1 (C17)
ij D ij ; i i D i2 ;
(C18)
ij k D 0;
(C19)
ij k` D ij k` C i k j ` C i ` j k ;
(C20)
i ijj D i i jj C 2 ij ij D i2 j2 C 2ij2 ; i i i i D 3.i2 /2
(C21)
ij k`m D 0;
(C22)
i1 i2 :::i2m2 i2m1 i2m D i1 i2 :::i2m2 i2m1 i2m C i1 i2 :::i2m3 i2m1 i2m2 i2m C : : : C i2 i3 :::i2m1 i1 i2m :
(C23)
The vector of obliquity WD Œ 1 ; : : : ; m 0 and the vector of curtosis WD 0 Œ1 ; : : : ; m vanish. A weaker assumption compared to the Gauss-Laplace normal distribution is the assumption that the central moments up to the order four are of the form (C18)– (C21). Thus we allow for a larger class of distributions which have a similar structure compared to (C18)–(C21). Definition C.4. (quasi-Gauss-Laplace normal distribution): A random vector x is quasi-Gauss-Laplace normally distributed if it has a continuous symmetric probability distribution f .x/ which allows a representation of its central moments up to the order four of type (C18)–(C21). Of special importance is the computation of error bounds for the Gauss-Laplace normal distribution, for instance. As an example, we have the case called the 3
C-2 Error Propagation
757
rule which states that the probability for the random variable for X falling away, from its mean by more than three standard deviations (SDs) is at most 5%, P fjX j 3g
4 < 0:05: 81
(C24)
Another example is the Gauss-Laplace inequality, that bounds the probability for the deviation from the mode . Lemma C.5. (Gauss inequality): The expected squared deviation from the mode is p 4 2 forallr 4=3 9r 2 p p P fjX j rg 1 .r= 3/forallr 4=3
P fjX j rg
(C25) (C26)
subject to 2 WD Ef.X /2 g. Alternatively, we take advantage of the Vysochanskii–Potunin inequality. Lemma C.6. (Vysochanskii–Potunin inequality): The expected squared deviation from an arbitrary point ˛ 2 is p 4 2 forallr 8=3 9r 2 p 4 2 1 P fjX ˛j rg 2 forallr 8=3 3r 3
P fjX ˛j rg
(C27) (C28)
subject to 2 WD Ef.X ˛/2 g: References about the two inequalities are Gauss (1823), Pukelsheim (1994).
C-2 Error Propagation At the beginning we note some properties of operators “expectation E” and “dispersion D”. Those derivatives can be taken from Pukelsheim (1993) and Teuniseen (1989b, 1990). Afterwards we review the special and general, in particular nonlinear error propagation. Lemma C.7. (expectation operators EfXg ): E is defined as a linear operator in the space of random variables in Rn , also called expectation operator. For arbitrary constants ˛; ˇ; ı 2 R there holds the identity
758
C Statistical Notions, Random Events and Stochastic Processes
Ef˛Xi C ˇXi C ıg:
(C29)
Let A and B be two m n and m ` matrices and ı an m 1 vector of constants x WD ŒXi ; : : : ; Xn 0 and y WD ŒYi ; : : : ; Yn 0 two n 1 and ` 1 random vectors such that EfAX C BX C ıg D AEfxg C BEfyg C ı (C30) holds. The expectation operator E is not multiplicative that is EfXi Xj g D EfXi gEfXj g C C fXi ; Xj g ¤ EfXi gEfXj g;
(C31)
if Xi and Xj are correlated. Lemma C.8. (special error propagation): Let y be an n 1 dimensional random vector which depends linear of the m 1 dimensional random vector x by means of a constant n m dimensional matrix A and of a constant dimensional vector ı of the form y WD Ax C ı. Then hold the “error propagation law” Dfyg D DfAx C ıg D ADfxgA0 :
(C32)
The dispersion function D is not addition, in consequence a nonlinear operator that is DfXi C Xj g D DfXi g C DfXj g C 2C fXi ; Xj g ¤ DfXi g C DfXj g;
(C33)
if Xi and Xj are correlated. The “special error propagation law” holds for a linear transformation x ! y D Ax C ı. The “general nonlinear error propagation” will be presented by Lemmas C.7 and C.8. The detailed proofs are taken from Chap. 8, Examples. Corollary C.9. (“nonlinear error propagation”): Let y D g.x/ be a scalar valued function between one random variable x and one random variable y. g(x) is assumed to allow a Taylor expansion around the fixed approximation point 0 W g.x/ D g.0 / C g 0 .0 /.x 0 / C 12 g 00 .0 /.x 0 /2 C O.3/ D 0 C 1 .x 0 / C 2 g.0 /2 C O.3/ :
(C34)
C-2 Error Propagation
759
Then the expectation and dispersion identities hold 1 Efyg D g.x / C g 00 .0 /x2 C O .3/ D g.x / C 2 x2 C O .3/; 2
(C35)
Dfyg D 12 x2 C 4x2 Œ1 2 .x 0 / C 22 .x 0 /2 22 x4 CEf.x x /3 gŒ21 2 C 422 .x 0 / C Ef.x x /4 g22 C O .3/: (C36) For the special case of a fixed approximation point 0 chosen to coincide with mean value x D 0 and x being quasi - Gauss-Laplace normal distributed we arrive at the identities 1 Efyg D g.x / C g 00 .x /x2 C O .4/; 2 1 Dfyg D Œg 0 ./2 x2 C Œg 00 .x /2 x4 C O .3/: 2
(C37) (C38)
The representation (C37) and (C38) characterize the nonlinear error propagation which is in general dependent of the central moments of the order two and higher, especially of the obliquity and the curtosis. Lemma C.10. (“nonlinear error propagation”): Let y D f.x/ be a vector-valued function between the m 1 random vector x and the n 1 random vector y. g.x/ is assumed to allow a Taylor expansion around the m 1 fixed approximation vector 0 W g.x/ D g. 0 / C g0 . 0 /.x 0 / C 12 g00 . 0 /Œ.x 0 / ˝ .x 0 / C O .3/ D 0 C J.x 0 / C 12 HŒ.x 0 / ˝ .x 0 / C O.3/: (C39) With the n m Jacobi matrix J WD ŒJj gi . 0 / and the n m2 Hesse matrix H WD ŒvecH1 ; : : : ; vecHn 0 ; Hi WD Œ@j @k gi . 0 /.i D 1; : : : ; nI j; k D 1; : : : ; m/
(C40)
there hold the following expectation and dispersion identities (“nonlinear error propagation”) 1 (C41) Efyg D y D g.x / C Hvec˙ C O .3/ 2
760
C Statistical Notions, Random Events and Stochastic Processes
1 Dfyg D † y D J †J 0 C J Œ† ˝ .x 0 /0 C .x 0 /0 ˝ †H0 2 1 C HŒ† ˝ .x 0 / C .x 0 / ˝ †J0 2 1 C H Œ† ˝ .x 0 /.x 0 /0 C .x 0 /.x 0 /0 ˝ † 4 C.x 0 / ˝ † ˝ .x 0 /0 C .x 0 /0 ˝ † ˝ .x 0 /H0 1 C J Ef.x x /.x x /0 ˝ .x x /0 gH0 2 1 C H Ef.x x / ˝ .x x /.x x /0 gJ0 2 1 C H ŒEf.x x /.x x /0 ˝ .x x /g .x 0 /0 4 C.x 0 / Ef.x x /0 ˝ .x x /.x x /0 g C.I ˝ .x 0 // Ef.x x /.x x /0 ˝ .x x /0 g CEf.x x / ˝ .x x /.x x /0 g ..x 0 /0 / ˝ I/H0 1 C H f.x x /.x x /0 ˝ .x x /.x x /0 g 4 vec†.vec†/0 H0 C O .3/: (C42) In the special case that the fixed approximations vector 0 coincides to the mean vector x D 0 and the random vector x is quasi-Gauss-Laplace normally distributed the following identities hold: Efyg D y D g.x / C 12 Hvec† C O .4/ .C 43/ Dfyg D † y D J †J 0 14 Hvec†.vec†/0 H0 1 C H Ef.x x /.x x /0 ˝ .x x /.x x /0 g H0 C O .3/ 4 1 D J†J0 C HŒ† ˝ † C Ef.x x /0 ˝ † ˝ .x x /g H0 C O .3/: 4 (C44)
C-3 Useful Identities Notable identities about higher order moments are the following.
C-3 Useful Identities
761
Lemma C.11. (identities: higher order moments): (a) Kronecker-Zehfuss products #1 Efyy0 g D Ef.y Efyg/.y Efyg/0g C EfygEfyg0
(C45)
Efyy0 ˝ yg D Ef.y Efyg/.y Efyg/0 ˝ .y Efyg/g C Efyy0g ˝ Efyg C Efy ˝ Efyg0 ˝ yg CEfyg ˝ Efyy0 g 2Efyg ˝ Efyg0 ˝ Efyg 0
0
0
(C46)
0
Efyy ˝ yy g D G ‰ ˝ Efyg Efyg ˝ ‰ ‰ 0 ˝ Efyg Efyg ˝ ‰ 0 C Efyy0 g ˝ Efyy0g CEfy0 ˝ Efyy0 g ˝ yg C Efy ˝ ygEfy ˝ yg0 2EfygEfyg0 ˝ EfygEfyg0:
(C47)
(b) Kronecker products #2 ‰ W D Ef.y Efyg/.y Efyg/0 ˝ .y Efyg/g
(C48)
G W D Ef.y Efyg/.y Efyg/0 ˝ .y Efyg/.y Efyg/0g Ef.y Efyg/.y Efyg/0 ˝ Ef.y Efyg/.y Efyg/0 g Ef.y Efyg/0 ˝ Ef.y Efyg/.y Efyg/0 g ˝ .y Efyg/g Ef.y Efyg/ ˝ .y Efyg/gEf.y Efyg/0 ˝ .y Efyg/0g: (C49) The n2 n matrix ‰ contains the components of obliquity, the n2 n2 matrix G the components of curtosis relative to the n 1 central random vector y Efyg. (c) Covariances between linear and quadratic forms C fF1 y C d1 ; F2 y C d2 g WD Ef.F1 y C d1 EfF1 y C d1 g/.F2 y C d2 (C50) EfF2 y C d2 g/0 g D F1 †F 2 (linear error propagation) CfFy C d; y0 Hyg WD Ef.Fy C d EfFy C d g/.y0 Hy Efy0 Hyg/g D FŒEfyy0 ˝ y0 g EfygEfy0 ˝ y0 gvecH D
1 F‰ 0 vec.H C H0 / C F†.H C H0 /Efyg 2
(C51)
762
C Statistical Notions, Random Events and Stochastic Processes
C fy0 Gy; y0 Hyg WD Ef.y0 Gy Efy0 Gyg/.y0 Hy Efy0 Hyg/g D .vecG/0 ŒEfyy0 ˝ yy0 g Efy ˝ ygEfy0 ˝ y0 gvecH D
1 Œvec.G C G0 /0 Gvec.H C H0 / 4 1 Œvec.G C G0 /0 .‰ 0 ˝ Efyg C ‰ ˝ Efyg0 /vec.H C H0 / 2 1 C trŒ.G C G0 /†.H C H0 /† C Efyg0.G C G0 /†.H C H0 /Efyg: 2 (C52) (d) quasi-Gauss-Laplace-normally distributed data C fF1 y C ı; F2 y C ı2 g D F1 †F 2
(C53)
(independent from any distribution) C fFy C ı; y0 Hyg D F †.H C H0 /Efyg C fy0 Gy; y0 Hyg D
1 trŒ.G C G0 /˙ .H C H0 /˙ 2 CEfyg0 .G C G0 /˙ .H C H0 /Efyg:
(C54)
(C55)
C-4 Scalar – Valued Stochastic Processes of One Parameter We have previously reviewed the moments of a probability distribution up to order four. The deviation was based on integration from minus infinity to plus infinity, A different concept is needed if we work with distributions on manifolds. For instance, we are confronted with the problem to find a distributions for data on a circle or a sphere or on a hypersphere. As mentioned before, directional measurements, also called angular observations or longitudinal data are not Gauss-Laplace distributed, but enjoy a von Mises-Fisher distribution in S R C1 Example C.4.1: p=1 (von Mises, 1918) f . j; / D Œ2 I0 ./1 expŒ cos. /
Example C.4.2: p=2, (Fisher, 1953)
(C56)
C-4 Scalar – Valued Stochastic Processes of One Parameter
763
expŒcos ˚ cos ˚ cos. / C sin ˚ sin ˚ 4 sinh exp < jX > (C57) D 4 sinh
f . ; ˚ k ; ˚ ; / D
Here, we have introduced the circular normal distribution CN.; / parameterized by the mean direction .0 2 / and by the concentration parameter . > 0/, the reciprocal of a dispersion measure. The spherical normal distribution CN. ; ˚ ; / is parameterize by the “longitudinal direction, lateral mean direction . ; ˚ /” and the concentration parameter , the reciprocal dispersion measure. We note that the p-spherical normal distribution is an element of the exponential class. Most important examples of the circular normal (R. von Mises) distributions are phase observation within the Global Positioning System (“Global Problem Solver”) according to Cai et al.(2007) and the spherical normal (Fisher) distribution for spherical harmonic analysis-synthesis (Fourier-Legendre functions) describing the gravitational field within gravities, electrostatics and magnetostaties according to Fisher et al. (1981), Kent (1983), Mardia (1972), Man (2005), Mc Fadden and Jones (1981) and Roberts and Ursell (1960). There have been made studies to consider an ellipsoidal reference to analyze the proper reference field of celestial bodies like the Earth, the Moon, or any other planetary body. The ellipsoid-of-revolution is the equilibrium figure of type MacLaurin which represents to first order of the Earth, the Moon, Mars or other planets even the Jacobi triaxial reference figure is a better approximation following, for instance Chandra Sekhar (1969). We leave the question: what is the characteristic distribution of a ellipsoid-of-revolution or a triaxial ellipsoid open! A random event X is the result of a random experiment under a given condition. If these conditions change it might have an influence on the result of random experiment, namely to the probability distribution. If we consider the varying conditions on the random quantities X D X.t/, we arrive at the random effects which depend on the deterministic parameter t. We illustrate the new detailed definition of one parameter stochastic process. Example C.4.1 We measure the temperature of a point continuously by a sensor for 3 years. Naturally we model the sensor recording by a graph shown in Fig. C.1. Actually we find a random variable which is function of the continuous parameter “time t”, namely X.t/. Every year we receive a temperature profile which is a deterministic function of time t W x D x.t/; 0 t 1. In this form x.t/ is understood as realization of the random variable X.t/. Neglecting the influence of climate changes the function x D x.t/ is cyclic. We introduce the generalized random experiment “continuous measurement of temperature over one year” and refer it to fX.t/j0 t tg. Actually one could represent climate changes by a trend function. The character of a generalized random experiment is very obvious since for closely measuring points ti and ti C1 range in the second up to the minute.
764
C Statistical Notions, Random Events and Stochastic Processes
Fig. C.1 Graph of the function f(t) over a period of 1 year or 12 months. Two years recording: trajectories of the stochastic process
Example C.4.2 Alternatively we consider the length L=60 mm of Nylon wires and the design condition of a variance of 0.5 mm produced by one machine under equal conditions. If we analyzed the dependence of the diameter of a function of the distance t from the initial point, we receive for any measurement of the wire a function x D x.t/; 0 t L. These functions differ from wire to wire, but these is no significant difference between the functional different wires. Indeed we assume changes to the constant condition of the wire. The random variations of the diameter are in the range of ˙0.02 mm and caused by the experimental conditions. Let us refer to the random wire diameter as a function of the one parameter t, we are able to define by a “continuous measurement of the various diameter of a wire as a dependence of the initial point” the generalized random experiment and formalize it by fX.t/; 0 t Lg. In contrast to random experiments which resulted in real numbers, such as generalized random experiment generates random functions, also called stochastic process or random process. We have called our parameter “t” derived from the term “time”, but it can be any parameter for instance “position”, “placement”,“pressure” etc. Finally we are able to define a stochastic process of one parameter. Definition C.4.2: Stochastic process
C-5 Characteristic of One Parameter Stochastic Processes
765
A stochastic process of one parameter t 2 T , the parameter space, and the state z 2 Z, the state space, is defined as the set of random quantities fX.t/; t 2 T g . If the set of parameter is finite or measurable infinite, we refer to a stochastic process with discrete time. Such processes are summarized as a sequence of random quantities fX1 ; X2 ; : : : g. If the parameter t 2 T is interval, we refer to stochastic process with continuous time. A stochastic process fX.t/; t 2 Tg is called discrete if the state space z is a finite or measurable infinite. In contrast we refer to a continuous stochastic process if he state space is an interval.
Stochastic process
HH @ @HH @ HH HH @ HH @ R j H
discrete stochastic process with discrete time
discrete stochastic process with continuous time
continuous stochastic process with discrete time
continuous stochastic process with continuous time
If we consider various realization of X.t/ for all t 2 T , we call them trajectory of the stochastic process. For a discrete stochastic process with continuous time the trajectories are “step functions”
C-5 Characteristic of One Parameter Stochastic Processes The most important function in connection with stochastic process is the distribution function of Xt ; t 2 T, namely F .x/ D P .Xt x/. But with the set of the distributions fFt .x/; t 2 Tg a stochastic ic process is not uniquely determined. A complete characterization of a stochastic process it is necessary to have information on all n-type ft1 ; t2 ; ; tn1 ; tn g for the discrete set N WD f1; 2; ; n 1; ng with ti 2 T and the information of the distribution function of random vectors fX.t1 /; X.t2 /; ; X.tn1 /; X.tn /g, namely Ft1 ;t2 ;tn1 ;tn .x1 ; x2 ; ; xn1 ; xn / WD P .X.t1 / x1 ; ; X.tn / xn / The set of distribution functions will be introduced by Table C.5.1 as well as the trend function, the covariance function, the variance function, the correlation function and the symmetry condition.
766
C Statistical Notions, Random Events and Stochastic Processes
Table C.5.1 Characteristics of one parameter stochastic process “Distribution functions of an n-dimensional random vector” Ft1 ;t2 ;tn1 ;tn .x1 ; x2 ; ; xn1 ; xn / WD P fX.t1 / x1 ; X.t2 / x2 ; ; X.tn1 / xn1 ; X.tn / xn g “Trend function” EfXt g D .t/ for t 2 T C1 R xft .x/dx .t/ D
(C58)
(C59) (C60)
1
“Covariance function and stochastic process” C ov.s; t/ WD C ovfXs ; Xt g D EfŒXs EfXs gŒXt EfXt gg for all s; t 2 T
(C61)
C ov.s; t/ D EfXs ; Xt g .s/.t/
(C62)
“Variance function” VarfXs g DW C ov.t; t/
(C63)
“Correlation function” p p .s; t/ DW C ovfXs ; Xt g . VarfXs g VarfXt g/
(C64)
“Symmetry” C ov.s; t/ DW C ov.t; s/ lim C ov.s; t/ D lim .s; t/ D 0
jt sj!1
jt sj
(C65) (C66)
End of Table C.5.2 Of special importance are those stochastic processes which do not depend on the absolute values of ti , but are functions of the distances between the points. They are considered as functions of the relative positions. Definition 5.1 (Stationary in the narrow sense) A stochastic process fXt ; t 2 Tg is stationary in the narrow sense if for all n 2 f1; 2; g for arbitrary subject to ti 2 T and ti C h 2 T holds
C-5 Characteristic of One Parameter Stochastic Processes
767
Ft1 ;t2 ;tn1 ;tn .x1 ; x2 ; ; xn1 ; xn / D Ft1 Ch;t2 Ch;tn1 Ch;tn Ch .x1 ; x2 ; ; xn1 ; xn / (C67) A special property is for .t/ D EfX.t/g D .const/ VarfX.t/g D .const/
(C68) (C69)
End of Definition 5.1 Special case n D 2, for instance, t1 D 0; t2 D t s; h D s will lead us to F0;t s .X1 x2 / D Fs;t .x1 x2 / D F ./ for WD t s
(C70)
Cov.s; t/ D Cov.s; s C / D Cov./ for WD t s
(C71)
“Symmetry”:Cov./ D Cov./ D Cov.jj/
(C72)
lim Cov./ D 0
(C73)
j j!1
Another generalization is achieving of the treat stochastic process ”of higher order” Definition 5.2: (stationary in the wider sense): A stochastic process of second order is stationary in the wider sense if the conditions 1. .t/ D const and 2. Cov.jj/ D CovfXs ; Xt g End of Definition 5.2 Besides the condition of stationary another support property of a stochastic process is the set of conditions which hold for increments of stochastic process fX.t/; t 2 Tg in the interval Œt1 ; t2 for t1 < t2 , namely the random difference Xt 2 Xt 1 In general such increments may be negative. The concept of incremental stochastic processes will lead us later to the concept of Kolmogov’s structure functions which is very popular in the applied sciences. Definition 5.3: (Stationary increments): A stochastic process fX.t/; t 2 Tg is called homogenous or stationary incremental, if the increments fX.t2 C /g fX.t1 C /g for all have identical probability distributions for any fixed t1 and t2 End of Definition 5.3 The great advantage of the incremental stochastic process in applications has the origin in the fact that a stationary stochastic process is not necessary.
768
C Statistical Notions, Random Events and Stochastic Processes
Definition 5.4: (Stochastic processes with incremental properties): A stochastic process fX.t/; t 2 Tg has independent increment for all n D 3; 4; and values ft1 ; t2 ; ; tn1 ; tn g subject to the order ft1 < t2 < < tn1 < tn g and ti 2 T with increments Xt2 Xt1 ; Xt3 Xt2 ; ; Xtn1 Xtn2 ; Xtn Xtn1
(C74)
being independent. End of Definition 5.4 Try to consider that the point tn1 is placed in the future the point tn at present time, is general for n > 1 all points ft1 ; t2 ; ; tn1 ; tn g in the past. The event at tnC1 for a Markov process depends only on the events at tn : Definition 5.5: (Simple Markov process): A stochastic process fX.t/jt 2 Tg has the property of a Markov process is for all n 2 f1; 2; ; g if for all n 2 f1; 2; g.n C 1/ tuple ft1 ; t2 ; ; tn1 ; tn g ordered according to ft1 < t2 < < tn1 < tn g the probability relation P fXtnC1 2 ZnC1 jX.tn / 2 Zn ; X.tn1 / 2 Zn1 ; ; X.t1 / 2 Z1 g D P fX.tn1 / 2 Zn1 jX.tn / 2 Zn g
(C75)
End of Definition 5.5 Stochastic processes with the property of a simple Markov process is characterized by independent increments.
Lemma 5.6: Markov process A Markov process is in narrow sense stationary if and only if the one-dimensional probability is independent of the time such that F .x/ D P fXt tg D x for all t 2 T
(C76)
End of Lemma 5.6 Markov process with a finite state space are called Markov chains, otherwise they are called continuous Markov chains. Definition 5.7: (Continuity in the squared mean): A stochastic process of second order fX.t/jt 2 Tg located in the point t0 is in the squared mean continuous if
C-6 Simple Examples of One Parameter Stochastic Processes
lim E.ŒX.t0 C h/ X.t0 /2 / D 0
h!0
769
(C77)
The stochastic process fX.t/jt 2 Tg is in the domain T0 D T in the quadratic mean continuous if the stochastic process has this property for all t 2 T0 End of Definition 5.7 Lemma 5.8: Squared mean and its covariance function A stochastic process of second order fX.t/jt 2 Tg is continuous in the point t0 if its covariance function C ov.s; t/ is characterized by .s; t/ D .t0 t0 / and its trend function .t/ is continuous in the point t D t0 . End of Lemma 5.8
C-6 Simple Examples of One Parameter Stochastic Processes Simple examples of one parameter stochastic process are introduced divided in two types of stochastic processes, first with continuous time and second with discrete time: (i) (ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii)
Process with linear realizations Cosine oscillations with random amplitude Cosine oscillations with random amplitude and random phase Superposition of two uncorrelated random functions Impulse modulation Randomly retarded pulse modulation Random sequences with discrete signals Random sequences of moving averages of order n: MA.n/ Random sequences of moving averages of order n: M A.n/ Random sequences of unlimited order Autoregressive sequences of first order AR.1/ Autoregressive sequences of first order r: AR.r/ ARMA (r,s) models.
We present here simple examples of stochastic processes. Statements with respect to stationarity relate always to stationarity in the wide sense.
Ex 1: Processes with Linear Scalisations Examples of stochastic processes with continuous time
770
C Statistical Notions, Random Events and Stochastic Processes
Example 6.1 Process with linear realizations Many linear models can be characterized by the simple linear model X.t/ D V t C W , V and W are random effects with the mean EfX.t/jt 0g called trend functions and the variance-covariance function EfŒX.s/ EfX.s/gŒX.t/ EfX.t/g D EfŒVs C W ŒV t C wg .s/.t/ DW ˙.s; t/. A first result is the variance-covariance function oriented to our model. ˙.s; t/ D st Var.V / C .s C t/ Cov.V; W / C Var.W /
(C78)
A second result is the special case W = constant. In such a case we find the variancecovariance function ˙.s; t/ D st Var.V / End of Example 6.1
Ex 2: Cosine Oscillations with Random Amplitudes Example 6.2 Cosine oscillation with random amplitude Here we assume the cosine model X.t/ D A cos !t, where A is a non-negative random effect, the amplitude subject to EfAg < 1. Here we realize the trend function of oscillations of type EfX.t/g D EfAg cos !t. The variance-covariance function for such a simple model is given by EfŒX.s/EfX.s/gŒX.t/EfX.t/g D 2 EfŒA cos !sŒA cos !tg .s/.t/ D ŒEfAg2 EfAg .cos !s/.cos !t/ DW ˙.s; t/ Let us denote the variance 2 D Var.A/ such that we receive the variancecovariance function ˙.s; t/, namely ˙.s; t/ D 2 .cos !s/.cos !t/
(C79)
End of Example 6.2
Ex 3: Cosine Oscillations with Random Amplitudes and Phase Example 6.3 Cosine oscillation with random amplitude and random phase This time we assume the cosine model X.t/ D A cos.!t C ˚/, where A is a nonnegative random effect and finite variance. The phase is a random effect, equally distributed ˚ 2 Œ0; 2 , The two random effects A and ˚ are independently from each other in the statistical sense. Such a stochastic process fX.t/ 2 .1; C1/g can be considered as the outputs of the larger number of similar oscillators, arbitrarily chosen when applied at different time instances. Because of
C-6 Simple Examples of One Parameter Stochastic Processes
Efcos.!t C ˚/g D D
1 2
771
Z2
cos.!t C ˚/d˚ 0
1 Œsin.!t C ˚/2
0 D0 2
(C80)
We experience a zero trend function. According the variance-covariance function we may represent by ˙.s; t/ D EfŒA cos.!s C ˚/ŒA cos.!t C ˚/gd˚ 1 D EfA g 2
2
1 D EfA g 2
2
Z2
cos.!s C ˚/ cos.!t C ˚/d˚ 0
Z2
0
1 fcos.t s/ cosŒ!.s C t/ C 2˚gd˚ 2
(C81)
The first term characterizes a constant, while the second term approaches zero such the resulting variance-covariance function is proportional to the difference DW t s ˙./ D ˙.t s/ D
1 EfA2 g cos !t 2
(C82)
Such a process is stationary. End of Example 6.3
Ex 4: Superposition of Two Uncorrelated Random Function Example 6.4 Superposition of two uncorrelated random functions Let A and B be two uncorrelated random functions characterized by EfAg D EfBg D 0 and var.A/ D var.B/ D 2 < 1, The derived stochastic process is defined by fX.t/; t 2 .1; C1/g, namely by X.t/ D A cos !t C B sin !t
(C83)
Because of var.X.t// D 2 < 1 we have an example of random process of second order. Due to EfX.t/g D EfAg cos !t C EfBg sin !t D 0 (C84)
772
C Statistical Notions, Random Events and Stochastic Processes
Fig. C.2 Pulse demodulation of a binary system
We are able to compute the variance-covariance function ˙.s; t/ D EfX.t/X.s/g assuming EfA; Bg D EfAgEfBg D 0 due to uncorrelation ˙.s; t/ D EfA2 cos !s cos !t C B 2 sin !s sin !tg C EfAB cos !s sin !t C sin !s cos !tg D 2 .cos !s cos !t C sin !s sin !t/ C EfABg.cos !s sin !t C sin !s cos !t/ ˙.s; t/ D 2 cos !t for DW t s
(C85) (C86)
Again we gain a stochastic process which depends only on the difference t s DW which is stationary. End of Example 6.4
Ex 5: Pulse Modulation Example 6.5 Pulse modulation A source generates at each time interval T time units of independent from each other a sign “1” or “0” with the probability 1-p or . The transfer of a “1” or “0” appears in this way that T time units produce a pulse with the amplitude A. In this way they generate a random signal, namely a stochastic process fX.t/; t 2 .1; C1/g, with this property: X.t/ D
0 with probability A with probability .1 /:
We start from the partial sequence, for instance 1 0 1 1 0 1, illustrated by Fig. C.2. We have chosen the initial value t D 0 in such a way that we start the transmission of a message. Finally we review the trend function of the process as a constant and the covariance function as a non stationary process.
C-6 Simple Examples of One Parameter Stochastic Processes
773
Fig. C.3 Randomly retarded pulse demodulation of a binary system
EfX.t/g D AP fX.t/ D Ag C OP fX.t/ D 0g D A.1 p/
(C87)
nT s; t < .n C 1/T; n D 0; ˙1; ˙2; EfX.s/; X.t/g D EfX.s/; X.t/jX.s/ D AgP fX.s/jX.s/ D Ag C EfX.s/; X.t/jX.s/ D 0gP fX.s/jX.s/ D 0g D A2 .1 p/
(C88)
For m ¤ n, the process X.s/ and X.t/ are independent due to mT s .m C 1/T and nT t < .n C 1/T . Accordingly the variance-covariance function is not stationary: ˙.s; t/ D Cov.s; t/ 2 A p.1 p/ for all nT t < .n C 1/T; for all n D 0; ˙1; ˙2; D 0 otherwise: (C89) End of Example 6.5
Ex 6: Random Retarded Pulse Modulation Example 6.6: Randomly retarded pulse modulation With regard to our previous example of a stochastic process fX.t/; t 2 .1; C1/g, we consider another stochastic process fY.t/ D fX.t 1/g subject to t 2 .1; C1/ assuming that D 2 Œ0; T is a random variable of a trajectory fX.t/; t 2 .1; C1/g shifted to the right producing the other stochastic process fY.t/; t 2 .1; C1/g. We illustrate the transfer by a random number 101101 under the constraint D D 0. The trend function of such a process is characterized by EfX.t D/g D A.1 p/ as we have seen before. X.s/ and X.t/ are independent if and only if when the inequality jt sj > T holds or if s and t at the points nT C D for n D 0; ˙1; ˙2; are disjunct. Let us call the random event by B. The complementary event is denoted by B. The related
774
C Statistical Notions, Random Events and Stochastic Processes
Fig. C.4 Covariance function of a randomly retarded pulse modulation
probability values are called P .B/ D .t s/=T and P .B/ D 1 .t s/=T . For all jt sj T to variance-covariance function is represented by ˙.s; t/ D EfX.s/; X.t/jBgP fBg C EfX.s/; X.t/jBg EfX.s/gEfX.t/g D EfX.s/; X.t/jBgP fBg C EfŒX.s/2 gP fBg EfX.s/gEfX.t/g D ŒA.1 p/2 jt sj=T C A2 .1 p/.1 jt sj=T / ŒA.1 p/2 ˙. WD t s/ D
(C90) 2
A p.1 p/.1 jj=T / forjj T 0 otherwise:
(C91)
The random process of second order fY.t/; t 2 .1; C1/g is therefore stationary illustrated by its variance covariance-function of Fig. C.4 Example of stochastic processes with discrete time Stochastic processes with discrete time scale can be understood as a random sequence. Here we restrict us to stationary random sequences as a first order model. They play a dominant role in signal transform and signal transfer. End of Example 6.6
Ex 7: Random Sequences with Discrete Signals
Example C6.7 Random sequences with discrete signals Let f ; X2 ; X1 ; X0 ; X1 ; X2 ; g be a random sequence of uncorrelated identical distributed random events characterized by EfXi g D 0; VarfXi g D 2 < 1; i 2 f0; ˙1; ˙2; Due to the restricted range of variances we model a random process of second order, its trend function is zero.
C-6 Simple Examples of One Parameter Stochastic Processes
775
EfXi g D 0; for all i 2 f0; ˙1; ˙2; cov./ D
2 for all D 0I 0 for ¤ 0:
(C92)
Its covariance function with odd numbers s and t is characterized by the form C ov.s; t/ D EfXs Xt g such that C ov.s; t/ D EfXs gEfXt g D 0 for s ¤ t and C ov.s; t/ D EfX2 g D 2 for s D t. Obviously beside the property of stationarity of random sequences it is important the property of independent increments. End of Example 6.7
Ex 8: Random Sequences of Moving Averages of Order n W MA.n/ Example C.6.8 Random sequences of moving averages of order n: MA(n) A random signal may be given by Yt D
n X
ci Xt 1 for allt 2 f0; ˙1; ˙2; g
i D0
where fXi g is a random sequence of independent effects with parameter. The natural number n and the sequence of real restricted real numbers c0 ; c1 ; ; cn1 ; cn are given. The construction of the Yt are influenced by the momentary date Xt as well as n previous values f ; X2 ; X1 ; X0 ; X1 ; X2 ; g. Such a relation is called Principle of moving averages characterized by the process of second order Xt ; t 2 f0; ˙1; ˙2; g. VarfYt g D 2
n X
ci ci < 1 for all t 2 f0; ˙1; ˙2; g
(C93)
i D0
“trend function” EfXt g D 0 f or al l t 2 f0; ˙1; ˙2; g
.C 93/
“covariance function” cov.s; t/ D EfYs Yt g D E
8" n < X :
i D0
# ci Xsi
9 n = X Œ cj Xt j ; j D0
82 39 n < X = X DE 4 ci cj Xsi Xt j 5 i D 0n : ; j D0
(C94)
776
C Statistical Notions, Random Events and Stochastic Processes
“if jt sj > n , then EfXsi Xt j g D 0 subject to s i ¤ t j ”, “if s i D t j , then” n X
cov.s; t/ D Ef
ci cjt sjCi X2si g
0i n;0jt sji n
D 2
njt sj X
ci cjt sjCi
(C95)
i D0
As a summary, the covariance function cov.s; t/ D cov.t/ subject to WD t s is as a sequence of moving average stationary cov./ D
2 .c0 cj j C c1 cj jC1 C cnj j cn / f or 0 jj nI 0 f or jj > n:
(C96)
End of Example C 6.8
Ex 9: Random Sequences with Constant Coefficients of Moving Averages of Order n W MA.n/ Example C.6.9 Random sequences of moving averages of order n: MA(n) A special case if the coefficient ci in our previous example is the setup c D 1=.nC1/ for i 2 f0; 1; ; n 1; ng. The sequence of moving average of order n can be defined by n
Yt D cov./ D
1 X Xt i for all t 2 f0; ˙1; ˙2; g n C 1 i D0 " 2
j j 1 nC1 f or 0 jj nI nC1 0
otherwise:
“MA” summarizes “moving average” of order n. End of Example C 6.9
(C97)
(C98)
C-6 Simple Examples of One Parameter Stochastic Processes
777
Ex 10: Random Sequences of Unlimited Order
Example C.6.10 Random sequences of unlimited order Let be given the random sequence of unlimited order of type Yt D
1 X
ci Xt i for all t 2 f0; ˙1; ˙2; g
(C99)
i D0
for real numbers ci . In order to guarantee the convergence of the series we assume 1 X
ci < 1
(C100)
i D0
its related covariance function can be represented C ov.; t/ D ˙./ D
2
1 X
ci cj jCi f or al l D f0; ˙1; ˙2; g (C101)
i D0
Var.Yt / D ˙.0/ D lim 2 J !1
Yt D ˙.0/ D
lim
i !1;C1
J X
ci f or al l t D f0; ˙1; ˙2; g (C102)
i D0
ci Xt i f or al l t D f0; ˙1; ˙2; g
C ov.s; t/ D ˙./ D 2
lim
J 2Œ1;C1
ci cj jCi
The result is a two-sided sequence of moving averages End of Example C 6.10
Ex 11: Autoregression Sequence of First Order: AR.1/ Example C.6.11 Autoregressive sequences of first order: AR(1) The starting point is the autoregressive sequence of first order AR(1) of type Yt D aYt 1 C bXt for all t D f0; ˙1; ˙2; g
(C103)
778
C Statistical Notions, Random Events and Stochastic Processes
limited by jaj < 1 and b as well as the random sequence fXg. The instant state Yt depends only of the previous state Yt 1 as well as a random disturbance Xt subject to fXg D 0 and Ef.Xt EfXt g/2 g D b 2 2 . Applying n times of the series we gain
Yt D an Yt n C b
n1 X
ai Xt i
(C104)
i D0
lim an D 0
.C 105/ ) Yt D b
n!1
1 X
ai Xt
(C105)
i D0
“Special Case” ci DW bai W lim
J !1
J X
bai
2
i D0
D b 2 lim
J !1
J X i D0
a2i D
b2 f orJ < 1 1 a2 (C106)
For an autoregressive sequence of first order of a stationary random process we arrive a the characteristic covariance function C ov./ D .b/2 lim
J X
J !1
C ov./ D
ai aj jCi D aj j .b/2 lim
i D0
J !1
J X
a2i
(C107)
i D0
.b/2 aj j for D 0; ˙1; ˙2; 1 a2
(C108)
End of Example C 6.11
Ex 12: Autoregression Sequence of First Order r: AR.r/ Example C 6.12 autoregressive sequences of order r: AR(r) An autoregressive sequence of order r called “AR(r)” is the random sequence Yt ; t D 0; ˙1; ˙2; over the set of real numbers a1 ; a2 ; ; ar1 ; ar namely Yt C a1 Yt 1 C a2 Yt 2 C C ar1 Yt .r1/ C ar Yt r D bXt The set Xi is a random sequence of parameters
(C109)
C-6 Simple Examples of One Parameter Stochastic Processes
Is the setup Yt D lim
J P
J !1 i D1
ct Xt 1 for
1 P i D0
779
ci < 1 for a stationary sequence
transformable in an autoregressive sequence of order r ? For solving this problem we have to solve for the relation of constants c0 D b c1 C a1 c0 D 0 c2 C a1 c1 C a2 c0 D 0 cr C a1 cr1 C C ar c0 D 0 ci C a1 ci 1 C C ar ci r D 0
(C110)
A non-trivial solution of the equation leads us to the algebraic equation y r D a1 y r1 C C ar1 y C ar D 0
(C111)
Example: y 2 C a1 y C a2 D 0 For an autoregressive sequence of second order, namely r=1, we study the equation of quadratic order, namely by the eigen values 1 and 2 . For the case 1 ¤ 2 we compute the covariance function. C ov./ D C ov.0/
.1 21 /2j jC1 .1 22 /1j jC1 .2 1 /.1 1 2 /
(C112)
Alternatively, for the case 1 D 2 D we calculate the covariance function: 1 2 C ov./ D C ov.0/ 1 C jj j j f or D 0; ˙1; ˙2; 1 C 2
(C113)
subject to C ov.0/ D VarfYt g C ov.0/ D
1 C a2 .1 a2 /Œ.1 C a2 /2 a12
(C114)
If 1 and 2 are complex, then there exist and ! subject to 1 D exp.i !/; 2 D exp.i !/
(C115)
780
C Statistical Notions, Random Events and Stochastic Processes
C ov./ D C ov.0/˛j j sin.!jj C ˇ/ f or D 0; ˙1; ˙2; ˛ WD
1 C 2 1 ; ˇ D arctan tan ! sin ˇ 1 2
(C116) (C117)
Example: Numerical example Let us introduce the numerical example for the autoregressive sequence Yt 0:6 Yt 1 C 0:05 Yt 2 D 2Xt f or t D 0; ˙1; ˙2;
(C118)
subject to VarfXt g D 1
(C119)
Obviously the impact of the term at the time instant t-2 on Y is small, at least compared to term on the time instant t-1. The related algebraic equation of second order can be represented by Y2 0:6 y C 0:05 D 0
(C120)
Their solution is 1 D 0:1 and 2 D 0:5. Indeed compared to one, they are small than one. The solution leads to a stationary autoregressive sequence characterized by the covariance functions C ov./ D 7:017 .0:5/j j 1:063 .0:1/j j subject to D 0; ˙1; ˙2; C ov.0/ D VarfYt g D 5:954 More details can be taken from Andel (1984) End of Example C.6.12
Ex 13: ARMA (r,s) Models Example C.6.13 ARMA(r,s) models
(C121)
C-7 Wiener Processes
781
As a combination of random sequences of moving averages and of auto regressive sequences we introduce the ARMA(r,s) of the order Yt Ca1 Yt1 C Car1 Yt.r1/ Car Ytr D b0 Xt Cb1 Xt1 C Cbs1 Xt.s1/ Cbs Xts (C122)
In practice, ARMA model we used to model time series as a sequence of real numbers determined by observing the state variables in discrete time instants. Very often, time series are not realizations of stationary random sequences of type fYt ; t D 0; ˙1; ˙2; g with constant trend functions EfYt g D .t/, but the transformation Z.t/ D EffY.t/g EfY.t/gg (C123) reduced mean EfZi g D EŒfYt g EfYt g D 0
(C124)
reduced covariance matrix DfZg D EŒfYt1 g EfYt1 g D ŒYt2 EfYt2 g D cov.t1 ; t2 /
(C125)
leads approximately to proper fit of ARMA models. They are called trend reduced moving average interregressive sequences being “nearly stationary”. Very efficient software exist nowadays. End of Example C.6.13: ARMA(r,s)
C-7 Wiener Processes First, we define a Wiener Process by three axioms, namely by stationary increments of a random function, together with path continuity. The relation to a random walk is outlined. Especially the probability density of an ordered n-dimensional Wiener Process is reviewed. Secondly, special Wiener A Process of type (a) OrnsteinUhlenbeck (b) WIENER Process with drift and (c) integrated WIENER Process are introduced
C-71 Definition of the Wiener Processes The English botanician R. Brown published in the year 1828 the random movement of microscopic small organic and inorganic particles suspended in liquids illustrated in Fig. C.3. We observe a totally irregular motion of particles, nowadays called Brownian motion. Its first analytical treatment, we owe Bachelier (1900) and
782
C Statistical Notions, Random Events and Stochastic Processes
Fig. C.5 Trajectory of a two dimensional Brownian motion of 30 s distance (Perrin 1916)
Einstein (1905). Both found that a two-dimensional Gauss-Laplace distribution can model the typical Brownian motion. They spoke a chaotic movement of microscopic small particles. Wiener (1923) built up his theory of stochastic process with independent increments to model this Brownian motion we will define first. Definition C.7.1: Wiener Process A stochastic process fX.t/; t 0g with continuous time and the state space Z D .1; C1/ is called Wiener Process when it can be described as a path-continuous process by the following axioms. (i) X.0/ D 0 (ii) fX.t/; t 0g has stationary increments (iii) for all t > 0, X.t/ is Gauss-Laplace normally distributed
(C 126) (C 127) (C 128)
End of Definition C.7.1: Wiener Because of the stationarity of the increments, the difference X.t/ X.s/ for all s; t 0 are Gauss-Laplace normally distributed with expectation EfX.t/ X.s/g D 0 and variance 2 jt sj. Since the increments are independent the increments are a Markov-Wiener process. If D 1, then we call the process “standard Wiener”. Continuity and differentiability EfjX.t/ X.s/j2 g D VarfX.t/ X.t/g D 2 jt sj 2
2
lim EfjX.t C h/ X.t/j g D lim jhj D 0
h!0
h!
“In the mean square sense the Wiener Process is continuous” Wiener Process and the random walk
(C129) (C130)
C-7 Wiener Processes
783
There is a intimate contact of Wiener Process and “a random walk” of a particle. The random walk of a particle can be described by X.t/ D .X1 C X2 C C XŒt =t /x
(C131)
“In the time interval starting by x D 0, a particle moves in the time difference t by x as a length unit to the left and the right with probability 1/2”. X.t/ D
C1 if the jump is to the right 1 if the jump is to the left
(C132)
“t=t” is the largest even number smaller or equal “t=t” P .Xt D 1/ D P .Xt D 1/ D 1=2
(C133)
EfXi g D 0 and VarfXi g D 1 ) ) EfX.t/g D 0 and VarfX.t/g D .x/2 Œt=t t ! 0 ) EfX.t/g D 0, VarfX.t/g D 2 t Multidimensional distribution and conditional probability Let fX.t/g for t 0 be a Wiener Process. We ask the question what is distribution of the sum of independent Gauss-Laplace normally distributed incremental qualities of type X.ti / D X.t1 / C .X.t2 /X.t1 // C C .X.ti /X.ti 1 // subject to i D 2; 3; ; n 1; n The result will lead us to the Special Wiener Process of type Gauss Process Lemma C.7.2: Probability density of an n-dimensional Wiener Process Let us assume 0 < t1 < t2 < < tn1 < tn for an ordered n-dimensional Wiener Process of the random vectors fX1 ; X2 ; ; X.tn1 /; X.tn /g characterized by independent Gauss-Laplace normally distributed increments of a Wiener Process. Then the probability density has the property
784
C Statistical Notions, Random Events and Stochastic Processes
ft1 ;t2 ; ;tn1 ;tn .X1 ; X2 ; ; X.tn1 /; X.tn // D ft1 .X1 /ft2 t1 .X2 X1 / ftn1 tn2 .Xn1 Xn2 /ftn tn1 .Xn Xn1 / (C134) and has the structure ft1 ;t2 ; ;tn1 ;tn .X1 ; X2 ; ; X.tn1 /; X.tn // .x2 x1 /2 .xn1 xn2 /2 .xn xn1 /2 1 x2 CC C g D expf Œ 1 C 2 t1 t2 t1 tn1 tn2 tn tn1 p (C135) .2 /n=2 n t1 .t2 t1 / .tn1 tn2 /.tn tn1 /
End of Lemma C.7.2: n-dimensional Wiener Process Proof of Lemma 7.2 For the proof we depart from the one dimensional probability density of fXt g of type Wiener Process. First assumption ft .x/ D expŒx 2 =.2 2 t/
p 2 t
(C136)
Second assumption Let fs;t .x1 x2 / the probability density of the random vector fX.s/X.t/g subject to the order 0 < s < t characterized by fs;t .x1 x2 /dx1 dx2 D P .X.s/ D x1 ; X.t/ D x2 /dx1 dx2
(C137)
P .X.s/ D x1 ; X.t/ D x2 / D P .X.s/ D x1 ; X.t/ X.s/ D x2 x1 /dx1 dx2 (C138) fs;t .x1 x2 /dx1 dx2 D P .X.s/ D x1 /P .X.t/ X.s/ D x2 x1 /dx1 dx2 (C139) “independent increments” fs;t .x1 x2 / D fs .x1 /ft .x2 x1 /
(C140)
Third assumption p fs;t .x1 x2 / D expfŒtx12 2sx1 x2 C sx12 g 2 2 s.t s/
(C141)
C-7 Wiener Processes
785
“correlation coefficient, covariance function” p s; t D C s=t
(C142)
C ovfX.s/; X.t/g D 2 s for 0 < s t C ovfX.s/; X.t/ X.s/g D 0
(C143)
C ovfX.s/; X.t/g D C ovfX.s/; X.s/ C X.t/ X.s/g D C ovfX.s/; X.s/g C C ovfX.s/; X.t/ X.s/g D C ovfX.s/; X.s/g D VarfX.s/g D 2
(C144)
“conditional probability density” Condition: X.t/ D b fX.s/ .xjX.t/ D b/ D fs;t .x; t/ ft .b/ r s 2 s fX.s/ .xjX.t/ D b/ D expf.x b/ 2 .t s/g t t
(C145) (C146)
s s EfX.s/jX.t/ D bg D b; VarfX.s/jX.t/ D bg D 2 .t s/ (C147) t t max VarfX.s/jX.t/ D bg , s D t=2 (C148) Finally, a generalization from ft .t/; fs;t .s; t/ to ft1 t2 tn1 tn leads us to Lemma C 7.2 End of proof for Lemma C.7.2 At this end we will define the Gauss process. Definition C.7.3: Gauss process A stochastic process X.t/; t 2 T is a Gauss Process of type (C135), if for arbitrary vectors .t1 ; t2 ; ; tn1 ; tn / ordered as t1 < t2 < < tn1 < tn and n D 1; 2; are random vectors fX.t1 /; X.t2 /; ; X.tn1 /; X.tn /g of an n-dimensional GaussLaplace normal distribution. End of Definition C.7.3: Gauss process
C-72 Special Wiener Processes: Ornstein–Uhlenbeck, Wiener Processes with Drift, Integral Wiener Processes Here we only introduce special Wiener Process of type
786
C Statistical Notions, Random Events and Stochastic Processes
(i) Ornstein-Uhlenbeck (ii) Wiener Process with drift and (iii) integrated Wiener Process It has to be remembered that the trajectories of a Wiener Process are nowhere differentiable. Particles in motion have an infinite velocity when modeled a Wiener Process. In order to overcome this difficulty, Ornstein and Uhlenbeck developed a stochastic model for modeling the velocity of particles, an important issue in Continuum mechanics. Definition 7.3: Ornstein-Uhlenbeck Process Let fX.t/; t 0g be a Wiener Process with the parameter . Then we call V .t/ D Œexp.˛t/X.exp.˛t//
(C149)
subject to ˛ > 0, a stochastic process fV .t/; 1 < t < C1g an OrnsteinUhlenbeck Process. End of Definition: Ornstein-Uhlenbeck Process What are the characteristic properties of the Ornstein-Uhlenbeck Process? Property 1: Probability density p fV .t / .x/ D Œexp.x 2 =2 2 / . 2 /
(C150)
Property 1: Expectations EfV .t/g D 0; VarfV .t/g D 2
(C151)
C ov.s; t/ D 2 Œexp.2˛.t s// for s t
(C152)
Proof for the covariance function C ov.s; t/ D C ov.V .s/; V .t// D EfV .s/V .t/g D expŒ˛.s C t/EfX.exp2˛s/X.exp2˛t/g D expŒ˛.s C t/C ov.X.exp2˛s/X.exp2˛t// D expŒ˛.s C t/ 2 Œexp.2˛.t s//
(C153)
It has to be mentioned that the Ornstein-Uhlenbeck Process is stationary in the wider sense. As a Gauss Process it is also stationary in the narrow sense. In summary, the Ornstein-Uhlenbeck Process is an instationary Wiener Process which is stationary process due to the action of time transformation and normalization. In comparison to the Wiener Process, the Ornstein-Uhlenbeck has the following differences
C-7 Wiener Processes
787
1. The Ornstein-Uhlenbeck Process has no independent increments 2. The trajectories of an Ornstein-Uhlenbeck Process are in the quadratic mean differentiable
Wiener Process with drift Another example for the transformation of an instationary process to a stationary process is the Wiener Process with drift. Definition 7.4: Wiener Process with drift A stochastic process fW.t/; t 0g with independent increments is called a Wiener Process with drift, if the following properties hold: (i) W .0/ D 0
(C154)
(ii) Each increment has the property W.t/ W .s/ is Gauss-Laplace normally distributed with second order statistics (iii) EfW.t/g D .t s/ (C155) (iv)
VarfW.t/g D 2 jt sj
(C156)
End of Definition 7.4: drifted Wiener Process An equivalent formulation is next: “fW.t/; t 0g is a Wiener Process with drift, if it has the structure W.t/ D t C X.t/
(C157) 2
where fX.t/; t 0g is a Wiener Process subject to D VarfX.1/g and an arbitrary constant, called as drift parameter”. Obviously, a Wiener Process with drift is generated by the super position of type “additive” of a Wiener Process with a deterministically growing decreasing part. The deterministic part is a trend function of type .t/ D t. The one dimension probability density distribution of a “Wiener Process with drift” can be characterized by p fW.t / .x/ D expŒ.x t/2 =.2/ Œ 2 t; 1; x; C1
(C158)
788
C Statistical Notions, Random Events and Stochastic Processes
In practice, drift parameters are daily experience. We restrict ourselves with only one example. Example: Geometric Wiener Process with drift Let us introduce a stochastic process of type fZ.t/; t 0g characterized by Z.t/ D expW.t/
(C159)
called “geometric Wiener Process with drift”. Since the model W.t/ has by definition a Gauss-Laplace normal distribution the expectation of .Z.t// equals the moment generating function of W.t/. For instance, EfexpW.t/g D expfst. C 1=2 2 s/g 2
s D 1 W) EfZ.t/g D expŒt. C =2/
(C160) (C161)
sD2W ) EfZ2 .t/g D expŒ2t. C 2 /
(C162)
) VarfZ.t/g D expŒt.2 C 2 /Œexp.t 2 / 1
(C163)
Integrated Wiener Process By transforming of a Wiener Process we receive a stochastic process which is very important in the analysis. We will summarize these transformations in three cases in our Lemma 7.5: Elementary transformations os a Wiener Process Let fX.t/; t 0g be a standard Wiener Process such that the following stochastic processes are as well standard Wiener Processes: (i)
fU.t/; t 0g subject to U.t/ D cX.t=c 2 /; c > 0
(C164)
fV.t/; t 0g subject to V.t/ D X.t C h/ X.h/; h > 0
(C165)
(ii) (iii) fV.t/; t 0g subject to W.t/ D
X.t/ for all t > 0 0 for all t > 0
(C166)
End of Lemma 7.5 We skip the proof. Instead let fX.t/; t 0g be a Wiener Process: Accordingly “near all” trajectories x D x.t/ are continuous in the squared mean. In consequence, there
C-7 Wiener Processes
789
Rt
exist the integrals u.t/ D
0
x.y/dy for “nearly all” trajectories x D x.t/. These
integrals are realizations of the random integral. Zt u.t/ D
x.y/dy
(C167)
0
u.t/ is the symbol for the integral acting on “nearly all” trajectories of a Wiener Process subject to the probability distribution fU.t/; t 0g defines a stochastic process called integrated Wiener Process. Stochastic integration Assume the Riemenn integration for 0 D t0 < t1 < < tn1 < tn D 1 subject to ti D ti ti 1 for forward computations and i D 1; 2; ; n 1; n u.t/ D
f
lim
n!1;ti !0
n X ŒX.ti / X.ti 1 /ti
(C168)
i D1
u.t/ is the lines of a sum of independent Gauss-Laplace normally distributed random variables as well Gauss-Laplace normally distributed: An integrated Wiener Process is being analyzed by .a/ Trend function: Zt
Zt
Ef
X.y/dyg D 0
EfX.y/gdy D E.t/ D 0
(C169)
0
.b/ Covariance function:
Zs Covfu.s/; u.t/g D Ef
Zt
D
2
Zs
0
Zs 0
0
1 2 y dy C 2 z
EfX.z/X.y/gdyd z 0
Zt min.y; z/dyd z D
0
D 2
X.y/dyg D
X.z/d z 0
Zs Z t
2
0
Zs
Zy zd zdy C
0
0
2
Zs Z t y dz dy 0
y
Zs y.t y/dy 0
(C170)
790
C Statistical Notions, Random Events and Stochastic Processes
C ovfu.s/; u.t/g D
2 .3t s/s 2 for s t 6
(C171)
2 3 t for s D t 3
(C172)
Varfu.t/g D
An integrated wiener Process is not stationary, in general. But it can be proven that the stochastic process V.t/ WD u.t C / u.t/ for V.t/ 2 fV.t/; t 0g is stationary for all > 0. White noise A stochastic process, Z.t/ 2 fZ.t/; t 0g subject to
Z.t/ WD d X.t/ dt D X.t/
(C173)
or d X.t/ WD Z.t/dt
(C174)
cannot be computed by standard differentiation: We have to introduce stochastic integration, for instance Zb f .t/d X.t/ D a
lim
f
n!1;ti !0
n X
f .ti 1 /ŒX.ti / X.ti 1 /g
(C175)
i D1
Zb
Zb f .t/d X.t/ D f .b/X.b/ f .a/X.a/
a
X.t/df .t/
(C176)
a
The representation reminds us to “partial integration”. As the limit of the sum of independent Gauss-Laplace normally distributed random variables we have to compute the expectation of type Zb Ef
f .t/d X.t/g D 0 a
(C177)
C-7 Wiener Processes
791
Varf
n X
f .ti 1 /ŒX.ti / X.ti 1 /
i D1
D
n X
f 2 .ti 1 /VarfX.ti / X.ti 1 /g
i D1
D 2
n X
f 2 .ti 1 /.ti ti 1 / D 2
i D1
n X
f 2 .ti 1 /ti
(C178)
i D1
The limit approaches the variance of the stochastic integral of type Zb Varf
f .t/d X.t/g D
2
a
Zb
f 2 .t/dt
(C179)
a
This result leads to the definition of White Noise Definition C.7.6: White Noise A stochastic process fZ.t/; t 0g is called “White Noise” if for any arbitrary interval Œa; b of continuously differentiable functions f .t/, namely Zb
Zb f .t/d Z.t/ D f .b/X.b/ f .a/X.a/
a
X.t/df .t/
(C180)
a
where fX.t/; t 0g is a Wiener Process. End of Definition C.7.6: White Noise ?Why we call stochastic process “White Noise”? The origin of the question is the definition of the “generalized derivative”. To this end we define ”the Dirac Delta function” as a “generalized function” ı.t/ WD lim
h!0
1= h for h=2 t Ch=2 0 otherwise
(C181)
or ı.t/ WD
1 for t D 0 0 otherwise
(C182)
792
C Statistical Notions, Random Events and Stochastic Processes
Properties of Dirac’s Delta function (i) C1 Z f .t/ı.t t0 /dt D f .t0 /
(C183)
1
(ii) Heaviside functions 1 for t 0 H.t/ WD 0 for t < 0
(C184)
“The Dirac’s Delta function is the formal derivative of the Heavide function” ı.t/ D @H.t/=@t
(C185)
(iii) Covariance function C ovfZ.s/; Z.t/g D C ovŒ@X.s/=@s; @X.t/=@t @ @ @ @ C ovfX.s/; X.t/g D min.s; t/ @s @t @s @t @ (C186) D H.s t/ @s D
C ovfZ.s/; Z.t/g D ı.s t/
(C187)
the covariance function is a measure of the statistically independence between Z.s/ and Z.t/, have a function of the small distances between the point s and the point t, namely js tj. “How can we illustrate the Dirac function?” Let fN.t/; t 0g be an arbitrary numbering process following our Definition C.7.6. The instants of the process at the points T1 ; T2 ; ; admit the representation in terms of the Heavide function N 0 .t/ in terms of the Dirac function: N.t/ D
1 X i D1
H.t Ti /;
@N.t/=@t D
1 X
ı.t Ti /
(C188)
i D1
Figure C.6 an illustration of the numbering process. The theory of stochastic integration has led us to the concept of generalized function, namely the Dirac function and Heaviside function, and to the notion of White Noise. More general notions like the concept of Colored Noise and finally to the notion of its stochastic processes.
C-8 Special Analysis of One Parameter Stationary Stochastic Process
793
Fig. C.6 Dirac numbering system
C-8 Special Analysis of One Parameter Stationary Stochastic Process C.8: Spectral analysis of one parameter stationary stochastic processes Statistic or stochastic processes are conventionally divided into three classes we have already touched: Class one general instationary process
Class two stationary process
Class three ergodic stationary process
In real life, of course, we have live the property that all natural process are instationary. It is the task of modeling real world process, for instance by a first order trend model or by taking increments of higher order to come more closely to the case of stationary processes for higher order increments. We will come back to this problem. For formal reasons we introduce first real-valued and complex-valued stochastic process. Second we study the structure of process with a discrete spectrum. At the end, we introduce stochastic process with a continuous spectrum enriched by examples.
C-81 Foundations: Ergodic and Stationary Processes C.8.1: Foundations: Ergodic and stationary processes We move on towards complex-valued stochastic processes defined by the following axioms:
794
C Statistical Notions, Random Events and Stochastic Processes
.a/
X.t/ D Y .t/ C iZ.t/ subject to i D
p 1
.b/
“Real” W fY .t/; t 2 Œ1; C1g
.c/
“Imaginary” W fZ.t/; t 2 Œ1; C1g p p ZN D a b; jzj D zNz D a2 C b 2 “absolute value”
.d /
“Trend function” W EfX.t/gEfY .t/g C iEfZ.t/g
.e/
“Covariance function” W
(C189)
(C190)
CovfX.s/; X.t/g D EfŒX.s/ EfX.s/gŒX.t/ EfX.t/gg(C191) .f /
.g/
Stationary in the wider sense W .vi1/ EfX.t/g D .t/ D (constant)
(C192)
.vi 2/ CovfX.s/; X.t/g D Cov.oI s t/; WD t s
(C193)
Complex-valued stochastic process of second order: EfjX.t/j2 g < 1 f oral l t 2 Œ1; C1
(C194) (C195)
Ergodicity A stochastic process EfX.t/; t 2 Œ1; C1 is stationary in the narrow sense if for all trajectories x.t/ D y.t/ C i z.t/ there hold 1 EfX.t/g D EfX.t0/g DW lim T !1 2
ZCT x.t/dt for t0 2 R
(C196)
T
In this integral relation, in order to compute the expectation value Efx.t0 /g we use only all the information which is located in one realization of the stochastic process. In contrast to the general case, we have used up to zero the information of all realizations of the stochastic process. In stationary ergodic process we used only one instant. An excellent approximation is the presentation. N 1 X xk .t0 / N !1 N RD1
EfX.t0/g D D lim
(C197)
Such a definition has a simple physical interpretation: The mean of a fixed instant equals the mean of a long time interval: The mean location is identical to the time
C-8 Special Analysis of One Parameter Stationary Stochastic Process
795
average. In essence, This is the property of ergodic stationary processes. Analogue we can define the covariance function for a such processes namely 1 Cov./ D lim T !1 2
ZCT p Œx.t/ x.t C / dt
(C198)
T
The treatment of ergodic stationary processes is reasonable for cases in which we have only one time like realization of the stochastic process. Of course increasing the length of the interval ŒT; CT improves the estimation of EfX.t0 /g D . Here we also take reference the Euler’s representations: exp ˙ix D cos x ˙ i sin x 1 Œexp.ix/ exp.ix/ 2i 1 cos.x/ D Œexp.ix/ C exp.ix/ 2 sin x D
(C199) (C200) (C201)
C-82 Processes with Discrete Spectrum C.8.2: Processes with discrete spectrum We start with general structure of stationary processes with discrete spectrum: we assume that all stationary process are ergodic.
Ex 1: Special Stationary Ergodic Processes Example 8.1: Special stationary ergodic processes Let fX.t/; t 0g be a stochastic process structure by lrX.t/ D X˝.t/; X a complex random variable
(C202)
EfX g D 0; E.jX j2 / < 1; EfX.t/X.t C /g D ˝.t/˝.t C /EfX 2 g (C203) D 0 W ˝.t/˝.t/ D j˝.t/j2 D constant
(C204)
“ansatz” W ˝.t/ D j˝.t/ expŒi !.t/ for !.t/ real function
(C205)
“ansatz” W !.t C / !.t/ ¤ f .t/ W d Œ!.t C / !.t/=dt D 0
(C206)
796
C Statistical Notions, Random Events and Stochastic Processes
or d!.t/=dt D constant
(C207)
“ansatz” W !.t/ is a function of two constants; I ! and W
(C208)
˝.t/j˝.t/j expŒi.! C /
(C209)
End of Example 8.1: Special stationary ergodic processes Lemma 8.1: decomposition of a complex stationary ergodic process A stochastic process fX.t/; t 0g of the structure X.t/ D X˝.t/ is stationary in the wider sense if and only if X.t/ D X exp.i !t/
(C210)
subject to EfX g D 0; Efjxj2 g < 1; s DW EfjX j2g; Cov./ D s exp.i!/ End Lemma 8.1 The real part of a complex stochastic process describes a cosine oscillation with a random amplitude and phase y.t/ D a cos.!t C /
(C211)
The parameter ! is called frequency. We generalize to two stationary processes as a linear combination: we refer to X1 and X2 as two complex random variables with zero expectation and two constant parameter f!1 ; !1ı D 0; !2 ; !2ı D 0g. We assume that the two complex random functions X1 and X2 are uncorrelated such that EfX1 X2 g D 0 or EfX1 X2 g D 0. Our intention is to get the form of the covariance function: X.t/ D X1 exp.i !1 t/ C X2 exp.i !2 t/
(C212)
covfX.t/X.t C /g D EfŒX1 exp.i !1 t/ C X2 exp.i !2 t/ŒXN1 exp.i !1 t/ C XN2 exp.i !2 t/g D EfŒX1 XN2 exp.i !1 / C X1 XN2 exp.i.!1 !2 /t i !2 /g C EfŒX2 XN1 exp.i !2 / C X2 XN2 exp.i.!2 !1 /t i !1 /g
(C213)
s1 DW EfjX1 j2 g; s2 DW EfjX2 j2 g
(C214)
C ov./ D s1 exp.i !1 / C s2 exp.i !2 /
(C215)
C-8 Special Analysis of One Parameter Stationary Stochastic Process
797
Ex 2: Special Correlation Function of a Stationary Stochastic Processes Example 8.2: Special correlation function of a stationary stochastic process A special complex stationary stochastic process may be real-valued. For instance we define 1 1 X1 D .A C iB/ and X2 DW XN1 D .A ıB/ (C216) 2 2 A and B are two real-valued functions. In consequence of our representation X.t/ D X1 exp.i !1 t/ C X2 exp.i !2 t/ we specialize here to X.t/ D A cos !t B sin !t
(C217)
A, B are uncorrelated. We received the covariance function C ov./ D 2s cos !
(C218)
subject to s WD EfjX1 j2 g D EfjX2 j2 g End of Example 8.2: Special correlation function For pairwise P uncorrelated stationary stochastic processes the general ansatz of type X.t/ D nkD1 exp.i !k t/ subject to !j ¤ !k for j ¤ k and i; j D 1; 2; : : : ; n1; n and EfXk g D 0 will lead us to the covariance function C ov./ D
n X
sk .i !k / and sk D EfjXk j2 g for k 2 f1; 2; : : : ; n 1; ng
kD1
(C219) in particular C ov. D 0/ D EfjX.t/j2 g D
n X
sk
(C220)
kD1
In summary, the oscillation X.t/ is an additive superposition of a harmonic oscillation. This average energy per time unit is the same of circle average energies per time unit, an appropriate interpretation of our last representation. In order to achieve a real-valued function, we have specialize to (a) an even number n, (b) and a pair of values Xk being conjugate complex. Let the set f!1 ; !2 ; : : : g define the spectrum. We have to divide the various spectrum into two classes: small band processes and broad band processes, an entry into filter theory with many application in all sciences.
798
C Statistical Notions, Random Events and Stochastic Processes
Ex 3: Dirac Representation of a Covariance Function Example 8.3: Dirac representation of covariance function We use the infinitely generalized covariance function for application of a Dirac function representation.
Cov./ D lim
K!1
K X kD1
C1 Z sk exp.i !/s.! !k /d!
(C221)
1
C1 Z
Cov./ D
exp.i !/s.!/d!
(C222)
1
subject to s.!/ D lim
K!1
K X
sk ı.! !k /
(C223)
kD1
The generalized function s.! is denoted by “spectral density” of a stationary stochastic process, formally equivalent to the Fourier transform. End of Example 8.3: spectral density
C-83 Processes with Continuous Spectrum C.8.3 Processes with continuous spectrum Here we present first the spectral decomposition of the covariance function of a stochastic process with continuous spectrum. Second we present Kolmogorov’s spectral decomposition theorem. C8-31 Spectral decomposition of a covariance function Let fX.t/; t 2 Rg be a complex-valued stationary stochastic process described by its covariance function. Then there exists a real-valued function S.!/ such that cov./ enjoys the representation.
Cov./ D
C1 Z exp.i !/dS.!/
(C224)
1
called the spectral function of the process. Note the domain t 2 Œ1; C1 Cov.0/ D S.1/ S.1/ D EfjX j2 g < 1
(C225)
C-8 Special Analysis of One Parameter Stationary Stochastic Process
799
The spectral function is determined only up to an additive constant c. Most commonly c is chosen to S.1/ D 0. If there exists the first derivative S.!/ D dS.!/=d!, we find the spectral density of the stochastic process: The covariance function of this stationary process is the Fourier transform of the spectral density. Since S.!/ has to be a decreasing function we find also its property.
C1 Z Cov./ D exp.i !/s.!/d!
(C226)
1
s.!/ 0;
(C227)
C1 Z s.!/d! < 1
(C228)
1
The set f!; s.!/ > 0g defines the continuous spectrum of the stationary stochastic process. The term C1 Z Cov.0/ D S.1/ S.1/ D s.!/d!
(C229)
1
is called average power of the spectrum. If we describe the stationary stochastic process Xt by harmonic oscillations with random amplitude and random phase, we are able to express exp.i !t/ D expŒi t.! C 2 k/ for all k 2 f1; 2; : : : g or k 2 Œ ; C (C230) ZC
Cov./ D exp.i !/s.!/d! for D 0; ˙1; ˙2; : : :
(C231)
If
C1 R 1
Cov./d < 1 for stationary stochastic processes, holds, we are able to
introduce the inverse Fourier transform by
1 s.!/ D 2
C1 Z exp.i!/Cov./d 1
(C232)
800
C Statistical Notions, Random Events and Stochastic Processes
i s.!2 / s.!1 / D 2
C1 Z
1
exp.!2 / exp.i !1 / Cov./d (C233)
Ex 4: Fourier Spectral Density Example 8.4: Fourier spectral density Assume real-valued constants a and cjaj < 1 . Here we will analyze the covariance function for an autoregressive stationary sequence of first order and compute its spectral density. C1 1 X Cov./ exp.i!/ 2 D1 " 1 # 1 X X e D a exp.i!/ C a exp.i!/ 2 D1 D0 "1 # 1 X c X a exp.i !/ C a exp.i!/ D 2 D1 D0
s.!/ D
c s.!/ D 2
a exp.i !/ 1 C 1 a exp.i !/ 1 a exp.i !/
(C234)
(C235)
End of Example: Fourier spectral density Assume a real-valued stochastic process whose covariance function Cov./ D Cov./, namely Cov./ D ŒCov./ C Cov./=2, can be represented by C1 Z Cov./ D cos.!/s.!/d!
(C236)
1 C1 Z Cov./ D 2 cos.!/s.!/d! .cos.!/ D cos.!// 1
(C237)
C-8 Special Analysis of One Parameter Stationary Stochastic Process
s.!/ D
s.!/ D
1 2
1
C1 Z cos.!/Cov./d
801
(C238)
1 C1 Z
cos.!/Cov./d
(C239)
0
Correlation time In practice the notion of “correlation time” or “correlation length” is very appropriate: 1 0 WD Cov.0/
C1 Z Cov./d
(C240)
0
or
s.0/
0 WD 2
C1 R 0
(C241)
s.!/d!
For jj 0 there is a unstable correlation or statistical dependence between X.t/ and X.t/ C . This correlation decreases for growing jj; jj > 0 .
Ex 5: Stationary Infinite Random Sequence Example 8.5: Stationary infinite random sequence The infinite random sequence f ; X1 ; X0 ; CX1 ; g may be structured according to Xt D at X subject to t D 0; ˙1; ˙2; : The complex random variable is structured according to EfX g D 0 as well as VarfX g D 2 . The random sequence f ; X1 ; X0 ; CX1 ; g is the wider sense stationary if and only if the constants ! within the set-up. at D exp.i t!/
for t D 0; ˙1; ˙2;
(C242)
are represented. In the stationary case, the harmonic oscillations Xt will be given by the random amplitude and the random phase as well as a constant frequency !. For such a case a set-up of the covariance-function will be calculated. exp.i t!/ D exp i t.! C 2k /
for t 2; ˙1; ˙2;
(C243)
802
C Statistical Notions, Random Events and Stochastic Processes
Cov./ D
ZC
exp.i t!/s.!/d!
for 2; ˙1; ˙2;
,
(C244)
, s.!/ D
C1 1 X exp.i t!/Cov./ 2 D1
(C245)
If we specialize to X real-valued and uncorrelated we achieve a pure random sequence subject to Cov./ D 2 , s.!/ D 2 =.2 /
(C246)
End of Example 8.5: random sequence
Ex 6: Spectral Density of a Random Telegraph Signal Example 8.6: spectral density of a random telegraph signal Given the covariance function, we intend to find a representation of the spectral density for a random telegraph signal. Cov./ D a exp.bjj/ s.!/ D
1 2
for a > 0; b > 0
(C247)
C1 Z
exp.i!/a exp.bjj/d! 1
8 0 9 C1 Z Z < = a D exp.i!/d C expŒ.b i !/d ; 2 : D
a 2
1
1 1 C b i! b C i!
s.!/ D
0
ab
.! 2 C b 2 /
(C248)
(C249)
The correlation time is calculated to 0 D 1=b! this result is illustrated by Figs. C.7 and C.8, namely by the spectral structure of the covariance function and its spectral density End of Example: spectral density telegraph signal
C-8 Special Analysis of One Parameter Stationary Stochastic Process
803
Fig. C.7 Covariance of Example 8.6
Fig. C.8 Spectral density of Example 8.6
Ex 7: Spectral Representation of a Pulse Modulation Example 8.7: Spectral density of a pulse modulation We start with the covariance function of a pulse modulation and study its spectral density. Cov./ D
a.T jj/ for jj T 0 for jj > T
a s.!/ D 2
(C250)
ZCT Œexp.i!/.T jj/d T
9 8 T Z ZCT ZCT = a < D exp.i!/d exp.i !/d exp.i!/d T ; 2 : T
0
9 8 ZCT = < a 2T sin.!/ 2 cos.!/d D ; 2 : !
0
(C251)
0
s.!/ D
a 1 cos !T
!2
(C252)
Figures C.7 and C.8 illustrate the covariance function and the spectral density of a pulse modulation. A slight modification of our example proves that it is not automatic for stationarity.
804
C Statistical Notions, Random Events and Stochastic Processes
Fig. C.9 Covariance of Example 8.7
Fig. C.10 Spectral density of Example 8.7
2 Cov./ D 4
a.T 2 / for jj 0 for a > 0; T > 0 0
(C253)
for jj < 0
s.!/ cannot be the spectral density of a stationary stochastic process. Alternatively, we like to resent the covariance function Cov./ be a Dirac deltafunction ı./. In this case we find 1 ı./ D 2
1 s.!/ D 2
C1 Z exp.i!/d
(C254)
1
C1 Z 1 exp.i!/ı./d D 2
1
by formal inversion. End of Example 8.7: pulse modulation
(C255)
C-8 Special Analysis of One Parameter Stationary Stochastic Process
805
Fig. C.11 Spectral density of a harmonic function
Ex 8: Spectral Density and Spectral Function of a Harmonic Oscillation Example 8.8: Spectral density and spectral function of a harmonic oscillation Given a harmonic oscillation with a given covariance function, we will derive the structure of the spectral density function as well as the spectral function. Cov./ D a cos !0 a s.!/ D 2
a D 4
(C256)
C1 Z exp.i!/ cos.!0 /d 1
Z1 exp.i!/Œexp.i !0 / exp.i !0 /d 1
9 8 1 Z Z1 = < a D expŒi.!0 C !/ C expŒi.!0 !/ d (C257) ; 4 : 1
s.!/ D
1
a fı.!0 !/ C ı.!0 C !/g 2
(C258)
2
0 for ! < !0 S.!/ D 4 a=2 for !0 < ! !0 a for ! > !0
step function
(C259)
We illustrate the spectral density as well as the spectral function of a harmonic oscillation by Fig. C.11 as well as Fig. C.12. It has to be emphasized that all three functions Cov./; s.!/ and S.!/ are stationary functions. End of Example 8.8: Spectral function of harmonic oscillation
806
C Statistical Notions, Random Events and Stochastic Processes
Fig. C.12 Spectral function of type step function for a harmonic oscillation
Ex 9: Spectral Density Function of a Damped Harmonic Oscillation Example 8.9: Spectral density functions of a damped harmonic oscillation Given the covariance function of a damped harmonic oscillation, we want to find the spectral density of this stationary stochastic process, Cov./ D a exp.bjj/ cos !0 a s.!/ D
Z1 exp.b/ cos.!/ cos.!0 /d 0
a D 2
ab s.!/ D 2
(C260)
Z1 exp.b/Œcos.! !0 / C cos.!0 C !/d
(C261)
0
1 b 2 C .! !0 /2
C
1
b 2 C .! C !0 /2
(C262)
!0 is called the eigen function of the stochastic process of type stationary. A typical example is the fading of a radio signal. End of Example 8.9: Damped harmonic oscillation
Ex 10: Spectral Density of White Noise Example 8.10: Spectral density of white noise Perviously we introduced the concept of “white noise” as a real-valued stationary process with continuous time of type Wiener process. Given its covariance function, we are interested in the spectral density function. Cov./ D 2 s0 ı./
(C263)
C-8 Special Analysis of One Parameter Stationary Stochastic Process
1 s.!/ D
807
Z1 exp.i!/2 s0 ı./d D S0
(C264)
1
A comparative formulation of a white noise process is the following: White noise is a real-valued process whose spectral density is a constant Comment the spectral density of a white noise process has only the property s.!/ 0, but not R1 the property s.!/d! < 1. Accordingly, a white noise process with an average 1
power spectrum fulfils s.!/d! D 1
(C265)
does exist only in theory not in practice. A comparison is “the mass point” or “the point mass”, which also exist only “in theory”. Nevertheless it is reasonable approximation of any reality. A stationary process can always be considered as a excellent approximation of white noise if the covariance between Z.t/ and Z.t C / for growing values jj extremely fast goes to zero. Example: Let us denote the variations of the absolute value of the force by Z.t/ acting on a particle of a liquid medium at the time t and leading to a Brownian motion. Per second, the resulting force leads to approximately 1021 collisions with other particles. Therefore the stochastic variables Z.t/ and Z.t C / are stochastically independent (“uncorrelated”) if jj is of the order of 1018 seconds. If we assume a covariance function of type Cov./ D exp.bjj/ for b > 0, the number b has to be larger than 1017 seconds. End of Example 8.10: White noise
Ex 11: Band Limit of White Noise Example 8.11: Band limited white noise A stationary stochastic process with a constant spectral density function S.!/ D s0 for ! 2 f1; C1g does not in practice. But a stationary stochastic process with the spectral density
808
C Statistical Notions, Random Events and Stochastic Processes
Fig. C.13 Spectral density of band limited white noise process
Fig. C.14 Covariance function of a band limited white noise process
s.!/ D exists
s0 for ˝=1 ! C˝=2 0 otherwise
C˝=2 Z
exp.i !/so D 2s0
Cov./ D ˝=2
sin.˝/=2
(C266)
(C267)
The average power of the process is proportional to Cov .0/ D s0 ˝. The parameter ˝ is called the band width of the process. A white noise process is generated by the limit of a band limited white noise illustrated by Figs. C.13 and C.14. End of Example 8.11: band limited white noise
C-84 Spectral Decomposition of the Mean and Variance-Covariance Function C.8.4: Spectral decomposition of the expectation and the variance-covariance function The decomposition of the spectrum of a stationary process fY .x/; x 2 Rg subject to R 2 f1; C1g applied to orthogonal increments of various order has many representations, if they do not overlap in the intervals Œx1 ; x2 ; Œx3 ; x4 or Œx1 ; x2 ; x3 ; Œx4 ; x5 ; x6 etc
C-8 Special Analysis of One Parameter Stationary Stochastic Process
809
in the formulations of double, triple etc. differences. An example is EfŒY .x2 / Y .x1 /ŒY .x4 / Y .x4 /g
(C268)
EfŒY .x3 / 2Y .x2 / C Y .x1 /ŒY .x6 / 2Y .x5 / C Y .x4 /g
(C269)
Accordingly, a real-valued stochastic process with independent increments whose trend function EfY .x/g D 0 is zero, has orthogonal increments. Example 8.12: Spatial best linear uniformly unbiased prediction: spatial BLUUP Spatial BLUUP depends on the exact knowledge of proper covariance function. In general, the variance-covariance function depends on three parameters fı1 ; ı2 ; ı3 g called f rugged, sill, rangeg Cov./ D .kx1 x2 k/
(C270)
We assume stationarity and isotropy for the mean function as well as for the variance-covariance function. (In the next section we define isotropy) Model W Z.x/ f .x/0 ˇ C e.x/
(C271)
Subject to Efe.x/g D 0 Postulate one: “ Mean square continuity ” lim EfZ.y/ Z.x/g D 0
y!x
(C272)
Postulate two: Mean square differentiability lim
h!0
Z.k k C hn / Z.k k/ ! 0 hn n!1
(C273)
If we assume “mean square differentiability”, then Z 0 has a variance-covariance function. If “mean square differentiability” holds, it is also differentiated in any dimension. Here we introduce various classes of differential calculus applied to variance-covariance functions. (i) DIGGLE chaos: (ii) MATERN class:
C ov.h/ D c exp.ajhj /
(C274)
C ov.h/ D c.˛jhj H.ajhj//
(C275)
“ H” denotes the “modified Bessel function of order ” (iii) Whittle class: spectral density of type c.˛ 2 C ! 2 /
˛=2
(C276)
810
C Statistical Notions, Random Events and Stochastic Processes
(iv) Stochastic process on curved spaces like the sphere or the ellipsoid: no Matern classes, the characic functions have a discrete spectrum, for instance in terms of harmonic functions. End of Example 8.12: spatial BLUUP Let us give some references: Brown et al. (1994), Brus (2007), Bueso et al. (1999), Christenson (1991), Diggle and Lophaven (2006), Fedorov and Muller (2007), Groeningen et al. (1999), Kleijnen (2004), Muller et al. (2004), Muller and Pazman (2003), M¨uller (2005), Omre (1987), Omre and Halvorsen (1989), Pazman and Muller (2001), Pilz (1991), Pilz et al. (1997), Pilz and Spock (2006), PilzSpock(2008), Sp¨ock (1997), Spock(2008), Stein (1999), Yaglom (1987), Zhu and Stein (2006). Up to now we assumed a stochastic process of type fX.t/; t 2 Rg, R D f1; C1g as a complex stationary process of second order. Then there exist a stochastic process of second order fU.!/; ! 2 Rg with orthogonal increments which can be represented by C1 Z X.t/ D exp.i !t/d U.!/ (C277) 1
The process fU.!/; ! 2 Rg of the process fX.t/; t 2 Rg is called the related spectral process. Its unknown additive constant we fix by the postulate P fU.1/ D 0g D 1
(C278)
subject to
EfU.!/g DW 0 2
EfŒjU.!/j g S.!/ 2
EfŒjd U.!/j g dS.!/
(C279) (C280) (C281)
Our generalization was introduced by A. N. Kolmogorov: Compare this representation with our previous model X.t/ D lim
K X
k!1
kD1
and
exp.i !t/Xk
(C282)
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
Xt D
C1 Z exp.i !t/d U.!/
811
(C283)
1
The orthogonality of the Kolmogorov stochastic process fU.t/; t 0g corresponds in case of a discrete spectrum to the concept of uncorrelation. Convergence in the squared mean of type
C1 R 1
exp.i !t/d U.!/ D lim
lim
n P
a!1 n!C1 b!C1 !k !0 kD1
exp.i !k1 t/ŒU.!k / U.!k1 / (C284)
emphasizes the various weighting functions of frequencies: The frequency !k1 in the range Œ!k1 ; !k receives the random weight U.!k / U.!k1 /. A analogue version of the inversion process leads to
i U.!2 / U.!1 / D 2
C1 Z
1
exp.i !2 t/ exp.i !1 t/ X.t/dt t
(C285)
In practice, Kolmogorov stochastic processes are applied in linear filters for prognosis or prediction.
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes of Multi-Parameter Systems Appendix C.9: Scalar-, vector- and tensor-valued stochastic processes of multiparameter systems We have experienced the generalization of a scalar-valued stochastic process due to A.N. Kolmogorov. Another generalization due to him is the concept of the structure functions for more general vector-valued and tensor-valued for increments of various order as well as their power spectrum or multi-parameter systems. First we introduce the notion of the characteristic functional replacing the notion of the characteristic function for random function. Second, the moment representation for stochastic processes is outlined. Third we generalize the notion from scalar-valued and vector-valued stochastic processes to tensor-valued stochastic processes of multi-point parameter systems. In particular, we introduce the notion
812
C Statistical Notions, Random Events and Stochastic Processes
of homogeneous and isotropic stochastic fields and extent the Taylor-Karman structured criterion matrices to distributions of the sphere.
C-91 Characteristic Functional Let us begin with the definition of the characteristic functional. Zb
Yx Œ D Efexp.iY .//g D Efi g
.x/Y .x/d x
(C286)
a
Y./ as the characteristic functional elates uniquely to the random function Y .x/ of a fixed complex number. If the characteristic functional of a specific random function is known we are able to derive the probability density distribution fX1 ;:::;XN .YX ; : : : ; YXN /. Example If a scalar-valued function in a three dimensional field x WD .x1 ; x2 ; x3 / exists define its characteristic functional by
C1 ZZ Z
YY Œ WD Efexp.i
Y .x1 ; x2 ; x3 /.x1 ; x2 ; x3 //d x1 d x2
(C287)
1
The domain of integration is very often chosen to be spherical coordinates of ellipsoidal coordinates .u; v; w/ for u 2 f0; 2 g; v 2 f =2 < v < C =2g and w 2 f0 < w < 1g as a first chart. A second chart is needed to cover totally the intersection surfaces, for instance by meta-longitude, meta-latitude. End of Example Probability distribution for random functions and the characteristic functional As a generalization of the characteristic function for random events, the concept of the characteristic functionals for random functions or stochastic processes has been developed. Here we aim at reviewing the theory. We apply the concept of random functions for multi-functions of type scalar-valued and vector-valued functions which have been first applied in the statistical theory turbulence. what is the characteristic functional? We summarize in Table C.9.1 the detailed notion of a scalar-valued as well as a vector-valued characteristic functional illustrated by an example by introducing the “Fourier-Stieljes functional”.
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
813
Table C.9.1 Probability distribution of a random function and the characteristic functional “Scalar-valued characteristic functionals” “Random function over an interval [a,b] with N-variables” fY .x/; x 2 RN g is called a random function if the density function fx1 ;x2 ;:::;xN 1 ;xN ŒY .x1 /; Y .x2 /; : : : ; Y .xN 1 /; Y .xN /
(C288)
exist at .x1 ; x2 ; : : : ; xN 1 ; xN / points. “Fourier representation of the distribution function” P Yx1 ;:::;xN .1 ; : : : ; N / WD EfexpŒi. N kD1 k Yk /g R C1 R PN : : : expfi kD1 k Yk gfx1 ;:::;xN .Y1 ; : : : ; YN /d Y1 : : : d YN D 1
(C289)
subject Y1 WD Y .x1 /; : : : ; Yk WD Y .xk /
(C290)
“Existence condition” Y Œ1 ; : : : ; N D
Rb a
.x/dx
(C291)
“Vector-valued characteristic functionals” “at any point P of the domain D place a set of base vectors e˛ .Pi / at fixed points Pi for ˛ D 1; 2; : : : ; n 1; n; i D 1; 2; : : : ; N 1; N . Yi˛ D; Y .P i /je˛ .Pi / >
(C292)
“Vector-valued characteristic functionals”
YY Œ W .x; y; z/ D
Pn
˛D1
˛ .Pi /Y.Pi /
End of Table C.9.1: characteristic functional
(C293)
814
C Statistical Notions, Random Events and Stochastic Processes
C-92 The Moment Representation of Stochastic Processes for Scalar Valued and Vector Valued Quantities The moment representation of stochastic processes is the standard tool of statistical analysis. If a random function is given by its probability function, then we find a unique representation of its moments. But the inverse process of a given moment representation converted in its probability function is not unique. Therefore we present only the direct representation of moments for a given probability function. At first we derive for a given scalar-valued probability function its moment representation in Table C.9.2. In Table C.9.3 we generalize this result for stochastic processes which are vector-valued. An example is presented for second order statistics, namely for the Gauss-Laplace normal distribution and its first and second order moments. Table C.9.2 Moment representation of stochastic processes for scalar quantities “scalar-valued stochastic processes” “moment representation” Y .x/; x 2 RN is a random function, for instance applied in turbulence theory. Define the set of random function fx1 ; x2 ; : : : ; xN 1 ; xN g 3 R with the probability distribution f .x1 ; x2 ; : : : ; xN 1 ; xN /. Then the moment representation is the polynomial M p1 ;p2 ;:::;N 1 ;N WD Efx1 ; x2 ; : : : ; xN 1 ; xN g D
RRC1RR 1 2 1 N xN f fx1 ; x2 ; : : : ; xN 1 ; xN g : : : x1 x2 : : : xNN1
(C294)
1
dx1 dx2 : : : dxN 1 dxN (nonnegative even numbers) p1 ; : : : ; pN W 1 ; 2 ; : : : ; pN 1 ; pN 2 NN Œp WD 1 C p2 C C pN 1 C N is called the order of the moment representation
(C295)
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
815
Table C.9.2 Continued “central moment representation” fx1 Efx1 g; x2 Efx2 g; : : : ; xN 1 EfxN 1 g; xN EfxN gg B 1 2 :::pN 1 ;pN WD Ef.x1 Efx1 g/1 ; .x2 Efx2 g/2 ; : : : ; .xN 1 EfxN 1 g/N 1 ; .xN EfxN g/N g
(C296)
(C297) “Characteristic function” ˚.1 ; : : : ; N / WD ::: RR
expfi
RR
C1
(C298)
1
P k
k xk gf fx1 ; x2 ; : : : ; xN 1 ; xN gdx1 dx2 : : : dxN 1 dxN “differentiate at point 1 D 0; : : : ; N .0/”
.i /Œ M 1 p2 :::pN 1 ;pN D
h
@Œ p p @1 1 :::@NN
˚.1 ; : : : ; N /
i 1 D0;:::;N D0
(C299)
“semi-invariants” ‰ WD ln ˚
.i /Œ .1/Œ
h
S 1 ;:::;N WD
@Œ 1 p @1 1 @2 2 :::@NN1 @NN
‰.1 ; : : : ; N /
(C300) i 1 D0;:::;N D0
(C301)
“transformation of semi invariants to moments of order” Si D M1 ; S2 D M2 M12 D B2 ; S3 D M3 3M2 M1 C 2M13 D B3 S4 D B1 3B22 ; S5 D B5 10B3 B2 End of Table C.9.2: Moment representation of stochastic processes
(C302)
816
C Statistical Notions, Random Events and Stochastic Processes
Table C.9.3 Moment representation of stochastic processes for vectorvalued quantities “Vector-valued stochastic processes” “moment representation” “multi-linear functions of a vector field Gauss-Laplace distributed”
YY Œ W .x/ D
Pn
˛D1
˛ .Px /Y˛ .Pi /
(C303)
for a set of base vectors e ˛ .Pi / at fixed points P˛ ; ˛ D 1; 2; : : : ; n 1; nI i D 1; 2; : : : ; N 1; N “one-point n-dimensional Gauss-Laplace distribution” Y.P˛ / f.P˛ / f.PN / D C exp
N 1 P 2
i;j
gij .P˛ /.Yi˛
EfY
! i
j g/.Y˛
j EfY˛ g/
(C304)
“gauge” RR
C1RR
:::
1
f .P˛ /d Y1 d Y2 : : : d YN 1 d YN WD 1
(C305)
“two-point n-dimensional Gauss-Laplace distribution: covariance function” B ˛;ˇ .P˛ ; Pˇ / D EfŒY ˛ .P˛ / EfY ˛ gŒY ˇ .Pˇ / EfY ˇ gg
(C306)
Y .P˛ /; Y .Pˇ / f .P˛ ; Pˇ /
(C307)
P i i i i f .P˛ ; Pˇ / D D exp 12 N g .P ; P /.Y EfY g/.Y EfY g/ ˛ ˇ ˛ ˛ i;j D1 ij ˇ ˇ (C308) Example : Gauss-Laplace normal distribution, character’s functional “ansatz”
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
817
Table C.9.3 Continued Rb Rb A./ D EfX./g WD Ef .Y /Y .x/gdx D EfY .x/gdx a
B./ D
Rb Rb a a
(C309)
a
B.x1 ; x2 /.x1 /.x2 /dx1 dx2
(C310)
“multi-point functions”
Y .Y /Œ.x/ WD expfi
Rb a
.x/EfY .x/gdx
1 2
Rb Rb a a
B.x1 ; x2 /.x1 /.x2 /dx1 dx2 (C311)
in general: multi-point functional series Fourier-Stieljes functional series Gauss-Laplace normal distribution functional: first and second order moments An example is the central moment of fourth order of type two-points subject to a Gauss-Laplace normal distribution: ı
ı
B ˛ˇ; .P1 ; P2 / D B ˛;ˇ .P1 /B .P2 /C ˛; CB .P1 ; P2 /B ˇı .P1 ; P2 / C B ˛ı .P1 ; P2 /B ˇı .P1 ; P2 /
(C312)
general rule by JSSERLIS: on a formula for the product moment coefficient in any number of variables Biometrika 12(1918) No. 138 First and Second order moments of a probability density and the uncertainty relation Rb a
Rb Rb a a
ŒY.x/ EfY.x/0 .x/d x D 0
(C313)
B.x1 ; x2 /.x1 /.x2 /dx1 dx2 D 0
(C314)
818
C Statistical Notions, Random Events and Stochastic Processes
C-93 Tensor-Valued Statistical Homogeneous and Isotropic Field of Multi-Point Systems First, we introduce the ideal structure of a variance-covariance matrix of multi-point systems in an Euclidean space called homogenous and isotropic in the statistical sense. Such a notion generalized the concept of stationary functions so far introduced. Throughout we use the terminology “signal” and “signal analysis”. Let us begin with the notion of an ideal structure of the variance-covariance matrix of scalar value called signal, a multi-point function. An example is the gravity force as a one-point function, namely the distance between two-points as a two-point function or the angle between to a reference point and two targets as a three-point function. Here, we are interested in a criterion matrix of vector-vector signals which are multi-point functions. An example is the placement vector between network points determined by an adjustment process, namely a vector-valued one-point function proportional to the difference vector between network points, a vectorvalued two-point function. Let S.X1 ; ; X˛ / be a scalar-valued function between placement vectors X1 ; ; X˛ with in a three dimensional Euclidean space. The variance-covariance of the scalar-valued signal at placement vectors X1 ; ; X˛ and X˛C1 ; ; Xˇ is defined by P
.X1 ; ; X˛ ; X˛C1 ; ; Xˇ / WD EfŒS.X1 ; ; X˛ / EfS.X1 ; ; X˛ /g ŒS.X˛C1 ; ; Xˇ / EfS.X˛C1 ; ; Xˇ /gg
(C315)
For P the special case X1 ; ; X˛ ; X˛C1 ; ; Xˇ the multi-point function agrees to .X1 ; ; X˛ ; X˛C1 ; ; Xˇ / to the variance in the other case to the covariance. Definition C.9.1: Homogeneity and isotropy of a scalar-valued multi-point function P The scalar-valued multi-point function .X1 ; ; X˛ ; X˛C1 ; ; Xˇ / is called homogenous in the statistical sense. if it is translational invariant, namely X X .X1 C t; ; X˛ C t; X˛C1 C t; ; Xˇ C t/ D .X1 ; ; X˛ ; X˛C1 ; ; Xˇ / (C316) P t is called the translational vector. The function is called the isotropic, if it is rotational invariant in the sense of X X .X1 ; ; X˛ ; X˛C1 ; ; Xˇ / .RX1 ; ; RX˛ ; RX˛C1 ; ; RXˇ / D (C317) R is called the rotational matrix.
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
819
End of Definition C.9.1: homogeneity and isotropy The following representation theorems apply: Lemma C.9.2: homogeneity and isotropy of a scalar-valued multi-point function The scalar-valued multi-point function is homogenous in the statistical sense if and only if it is a function of the difference vectors X2 X1 ; ; Xˇ Xˇ1 : X X .X1 ; ; X˛ ; X˛C1 ; ; Xˇ / D .X2 X1 ; ; X˛C1 X˛ ; ; Xˇ Xˇ1 / (C318) This function is isotropic in the statistical sense, P
.X1 ; ; X˛ ; X˛C1 ; ; Xˇ / X D .jX1 j; < X1 ; X2 >; < X1 ; X3 >; ; < Xˇ1 ; Xˇ >; jXˇ j/ (C319)
is a function of all scalar products including the length of the vectors X1 ; ; X˛ ; X˛C1 ; ; Xˇ . P The function .X1 ; ; X˛ ; X˛C1 ; ; Xˇ / is homogenous and isotropic in the statistical sense if P
.X1 ; ; X˛ ; X˛C1 ; ; Xˇ / X D .jX2 X1 j; < X2 X1 jX3 X2 >; ;
; ; < Xˇ Xˇ1 jXˇ1 Xˇ2 >; jXˇ Xˇ1 j/
(C320)
is a function of all scalar products including the length of the difference vectors X2 X1 ; ; X˛C1 X˛ ; Xˇ Xˇ1 . End of Lemma C.9.2: homogeneity and isotropy As a basic result we take advantage of the property that a scalar-valued function of type multi-points is homogenous and isotropic in the statistical sense if
P
P
(C321) .X1 ; ; X˛ ; X˛C1 ; ; Xˇ / X D .jX2 X1 j; jX3 X1 j; ; jXˇ Xˇ2 j; jXˇ Xˇ1 j/
is a function of length of all difference vectors X Xı for all 1 ı < 1. For a proof we need the decomposition
820
C Statistical Notions, Random Events and Stochastic Processes
< X; Y > D D
1 .jXj2 C jYj2 jX Yj2 / 2
1 .jX C Yj2 jX Yj2 / 2
(C322)
Example: four-point function, var-cov function Choose a scalar-values two-point function as the distance S.X1 ; X2 / between two points X1 and X2 . The criterion matrix of a variance-covariance P function under the postulate of homogeneity and isotropy in the statistical sense .X1 ; X2 ; X3 ; X4 / as a four-point function described by ˇ.ˇ 1/=2 D 6 quantities, for instance P
.X1 ; X2 ; X3 ; X4 / X D .jX2 X1 j; jX3 X1 j; jX4 X1 j; jX3 X2 j; jX4 X2 j; jX4 X3 j/ (C323) End of example: var-cov function, 4 points
Another example is a scalar-valued three-point function of the type angular observation A.X1 ; X2 ; X3 / between a fixed point X1 and the two moving points X2 and X3 . The situation of vector-valued functions between placement vectors X1 ; ; X˛ in a three-dimensional Euclidian space based on the representation S.X1 ; ; X˛ / D
3 X
Si .X1 ; ; X˛ /ei .X1 ; ; X˛ /
(C324)
i D1
is slightly more complicated. the Euclidean space is spanned by the base vectors fe 1 ; e 2 ; e 3 g fixed to a reference point. The variance-covariance of a vector signals at placements X1 ; ; X˛ and X˛C1 ; ; Xˇ is defined as the second order tensor X
WD
X
.X1 ; ; X˛ ; X˛C1 ; ; Xˇ /ei .X1 ; ; X˛ / ˝ ej .X˛C1 ; ; Xˇ /
i;j
(C325) when we have applied the sum notion convention over repeated indices. P i;j
.X1 ; ; X˛ ; X˛C1 ; ; Xˇ / WD EfŒSi .X1 ; ; X˛ /EfSi .X1 ; ; X˛ /g ŒSj .X˛C1 ; ; Xˇ / EfSj .X˛C1 ; ; Xˇ /gg
(C326)
We remark that all Latin indices run 1; 2; 3 but all Greek indices label the number of multi-points. For the case X1 D X˛C1 ; ; X˛ D Xˇ we denote the multi-point P function i;j .X1 ; ; X˛ ; X˛C1 ; ; Xˇ / as the local variance covariance matrix for the general case nonlocal covariance
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
821
Definition C.9.3: Homogeneity and isotropy in the statistical sense of a tensor valued multi-point function P The multi point tensor valued function .X1 ; ; X˛ ; X˛C1 ; ; Xˇ / is called homogenous in the statistical sense, if it is translational invariant in the sense of X X .X1 ; ; X˛ ; X˛C1 ; ; Xˇ / .X1 C t; ; X˛ C t; X˛C1 C t; ; Xˇ C t/ D (C327) Where t denotes the translational vector P In contrast the multi-point tensor-valued function .X1 ; ; X˛ ; X˛C1 ; ; Xˇ / is called isotropic in the statistical sense if it is rotational invariant in the sense of X X .RX1 ; ; RX˛ ; RX˛C1 ; ; RXˇ / D R .X1 ; ; X˛ ; X˛C1 ; ; Xˇ /R (C328) where R denotes the rotation matrix element of the SO.3/ group End of tensor-valued multi-point functions of key importance are the following representations for tensor-valued multi-point homogenous and isotropic functions Lemma C.9.4: homogeneity and isotropy in the statistical sense of tensor-valued symmetric multi-point functions A second order symmetric tensor-valued multi-point function is homogenous if, X X .X1 ; ; X˛ ; X˛C1 ; ; Xˇ / D .X2 X1 ; ; X˛C1 X˛ ; ; Xˇ Xˇ1 / (C329) is a function of difference vector .X2 X1 ; ; X˛C1 X˛ ; ; Xˇ Xˇ1 /. This function is isotropic if P
.X1 ; ; X˛ ; X˛C1 ; ; Xˇ /
D 0 .jX1 j; < X1 ; X2 >; < X1 ; X3 >; ; < Xˇ1 ; Xˇ >; jXˇ j/ıij ei ˝ ej Xˇ Xˇ C .jX1 j; < X1 ; X2 >; < X1 ; X3 >; ; < Xˇ1 ; Xˇ >; jXˇ j/ D1
Œ.X ˝ X / C .X ˝ X /
(C330)
for a certain isotropic scalar-valued function 0 and for all 1 ˇ Finally such a function is homogenous and isotropic in the statistical sense, if and only if
822
C Statistical Notions, Random Events and Stochastic Processes
P
.X1 ; ; X˛ ; X˛C1 ; ; Xˇ /
D 0 .jX2 X1 j; jX3 X1 j; ; jXˇ Xˇ2 j; jXˇ Xˇ1 j/ıij .ei ˝ ej / Xˇ Xˇ .jX2 X1 j; jX3 X1 j; ; jXˇ Xˇ1 j/ C D2
Œ.X X 1 / ˝ .X X1 / C .X X1 / ˝ .X X 1 /
(C331)
holds for a certain homogenous and isotropic scalar-valued functions 0 and for all 2 ˇ End of Lemma C.9.4: tensor-valued function var-cov functions For the applications the Taylor-Karman structure for the case ˇ D 2 is most important: Corollary C.9.5: ˇ D 2, Taylor-Karman structure for two-point tensor function homogeneity and isotropy A symmetric second order two-point tensor-function is homogenous and isotropic in the statistical sense if and only if
˙ .X1 ; X2 / D 0 .jX2 X1 j/ıij .ei ˝ ej / C 22 .jX2 X1 j/Œ.jX1 X2 j/ ˝ .jX2 X1 j/ D 0 .jX2 X1 j/ıij C 22 .jX2 X1 j/ < X2 X1 ; ei >< X2 X1 ; ej > D ˙ij .X1 ; X2 /.ei ˝ ej /
(C332)
(summation convention) and
˙ij D 0 .jX2 X1 j/ıij C Œ22 .jX2 X1 j/jX2 X1 j2 C 0 .jX2 X1 j/ 0 .jX2 X1 j/
Xi Xi jX2 X1 j2
(C333)
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
823
˙ij .X1 X2 / D ˙m .jX2 X1 j/ıij C fŒ˙l .jX2 X1 j/ ˙m .jX2 X1 j/
(C334) Xi Xi jX2 X1 j2
˙m and ˙l are the homogenous and isotropic scalar-values functions End of Corollary C.9.5: ˇ D 2, Taylor-Karman structure The representation of the symmetric second-order two-point tensor-valued function under the postulate of homogeneity and isotropy has been analyzed by G. J. Taylor (1935) and T. Karman (1937) dealing with problems of Hydrodynamics. It was introduced into Geodetic sciences by E. Grafarend(1970, 1972). Most often the homogenous and isotropic scalar-valued function ˙m and ˙l are called longitudinal and lateral correlation function. It has been C.C. Wang who wrote first the symmetric second order two-point tensor-valued function being homogenous and isotropic if ˙.X1 X2 / D a1 .X2 X1 /ıij Œe i X1 ˝ e j X2 C a2 .X2 X1 /Œ.X2 X1 / ˝ .X2 X1 / (C335) holds. This representations refers to a1 .X2 X1 / D b1 .jX2 X1 j/ and a2 .X2 X1 / D b2 .jX2 X1 j/ according to C. C. Wang (1970 p, 214/215) as scalar-valued isotropic functions. In case we decompose the difference vector X2 X1 D < X2 X1 je i .X1 / >< X2 X1 ; e j .X2 / > Œe i X1 ˝ e j X2 D D Xi Xj Œe i X1 ˝ e j X2
(C336)
˙ij .X1 X2 / D b1 .jX2 X1 j/ıij C b2 .jX2 X1 j/Xi Xj for any choice of the basis e i1 ; e j2 . A special choice is to put parallel to the basic vector X2 X1 such that ˙11 D b1 .jX2 X1 j/ C b2 .jX2 X1 j/jX2 X1 j2
(C337)
˙22 D ˙22 D b1 .jX2 X1 j/
(C338)
The base vector e 2 and e 3 are orthogonal to the “longitudinal direction” X2 X1 . For such a reason b1 .jX2 X1 j/ WD ˙m .jX2 X1 j/ (C339) and
824
C Statistical Notions, Random Events and Stochastic Processes
Fig. C15 Longitudinal and lateral correlation: Two-point function
b2 .jX2 X1 j/ WD
1 jX2 X1 j2
f˙l .jX2 X1 j/ ˙m .jX2 X1 j/g
(C340)
of the scalar-valued isotropic functions ˙l .jX2 X1 j/ and ˙m .jX2 X1 j/. The result is illustrated in Fig. C.9.1 A two-dimensional Euclidean example is the vector-valued one-point function of a network point. Let 2
x1 x1 6 y1 x1 ˙ D6 4 x x 2 1 y2 x1
x1 y1 y1 y1 x2 y1 y2 y1
x1 x2 y1 x2 x2 x2 y2 x2
3 x1 y2 y1 y2 7 7 x2 y2 5 y2 y2
(C341)
be the variance-covariance matrix of the two-dimensional network of two points with Cartesian coordinates. The variance-covariance matrix will be characterized by the variance x1 x1 D x21 of the X-coordinate of the first point, the covariance x1 y1 of the x; y coordinates of the first point and the second point and so on. The discrete variance-covariance matrix ˙ D EfŒX EfXgŒX EfXgg
(C342)
or ˙ij .X1 ; X2 / D EfŒxi .X1 / Efxi X1 gŒxj Efxj X2 gg or xi .X1 / D fx1 X1 ; x2 X1 g DW fx1 ; y1 g xj .X2 / D fx1 X2 ; x2 X2 g DW fx2 ; y2 g
(C343)
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
825
Fig. C16 Homogenous and isotropic dispersion situation: (a) general, (b)homogenous and (c) homogenous-isotropic
are the estimated coordinates of the Cartesian vectors of the first network point X1 or the second network point X2 . The two-point functions ˙ij .X1 ; X2 / are the coordinates labeling the tensor of second order. The local variances and covariances are usually illustrated by local error ellipses: ˙ij .X1 ; X2 D X1 /
(C344)
and ˙ij .X1 D X2 ; X2 /;
x1 x1 x1 y1 y1 x1 y1 y1
and
x2 x2 x2 y2 y2 x2 y2 y2
(C345)
The nonlocal covariances are contained in the function ˙ij .X1 ; X2 / for the case X1 ¤ X2 , in particular
x1 x2 x1 y2 y1 x1 y1 y2
(C346)
Figure C.9.2 illustrates the postulates of homogeneity and isotropy for variancecovariance matrices of cartesian coordinates left, we illustrate the general dispersion situation in two-dimensional networks, of course those local dispersion ellipses. In the center, the homogenous dispersion situation: At each network points, due the translation invariance all dispersion ellipses are equal. In the right side, the homogenous and isotropic dispersion situation of type Taylor-Karman is finally illustrated. “All dispersion ellipses are equal and circular”
826
C Statistical Notions, Random Events and Stochastic Processes
Fig. C17 Correlation figures of type (a) ellipse and (b) hyperboloid
Finally we illustrate“longitudinal” and “lateral” within homogenous and isotropic networks by Fig. C16, namely the characteristic figures of type (a) ellipse and (b) hyperboloid. Those characteristic functions ˙l and ˙m are described the eigen values of the variance-covariance matrix. If the eigenvalues are positive, then the correlation figure is an ellipse. But if the two eigen values change sign (either “C” or “” or “” or “C”) the characteristic function has to be illustrated by a hyperboloid.
Ex 1: Two-Dimensional Euclidean Networks Under the Postulates of Homogeneous and Isotropy in the Statistical Sense Example C.9.1: Two-dimensional Euclidean networks under the postulates of homogeneity and isotropy in the statistical sense Two-dimensional network in a Euclidean space are presented in terms of the postulates homogeneity and isotropy in the statistical sense. At the beginning, we assume a variance-covariance matrix of absolute cartesian net coordinates of type homogenous and isotropic, in particular with Taylor Karman structure. Given such a dispersion matrix, we derive the associated variance-covariance matrix of coordinate differences, a special case of the Kolmogorov structure function, and directions, angles and distances. In order to write more simple formulae, we assume quantities of type EfYg D 0 Definition C.9.6: Variance covariance matrix of Cartesian coordinate, coordinate differences, azimuths, angular observations and distances in two-dimension Euclidean networks
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
827
Let ıxi .X1 / be absolute Cartesian coordinates ıxi .X1 / ıxi .X2 / relative cartesian coordinates or coordinate differences, ı˛.X1 ; X2 / the azimuths of a line observation point X1 to a target point X2 , the difference ı˛.X1 ; X2 / to ı˛X1 ; X3 / as angular observations or azimuth differences and ıs.X1 X2 / the distances from the point X1 to the point X2 . Then ˙ij .X1 ; X2 / D Efıxi .X1 /ıxj .X2 /g
(C347)
˚ij .X1 ; X2 ; X3 ; X4 / D EfŒıxi .X1 / ıxi .X2 /Œıxj .X3 / ıxj .X4 /g (C348) .X1 ; X2 ; X3 / D EfŒı˛.X1 ; X2 /ı˛.X3 ; X4 /g
(C349)
.X1 ; X2 ; X3 ; X4 ; X5 ; X6 / D EfŒı˛.X1 ; X2 / ı˛.X1 ; X3 / Œı˛.X4 ; X5 / ı˛.X4 ; X6 /g ˘.X1 ; X2 ; X3 ; X4 / D Efıs.X1 ; X2 / ıs.X3 ; X4 /g
(C350)
are the derived variance-covariance matrices for the increments Y.x1 ; ; xi / y.x1 ; ; xi / D ıy.x1 ; ; xi /fıxi :ıxi ıxj ; ı˛; ı˛i ı˛j ; ısg with respect to a model y. End of Definition C: derived var-cov matrices Let the variance-covariance matrix ˙ij .X1 ; X2 / of absolute Cartesian coordinates of the homogenous and isotropic type the characteristic isotropic functions ˙l and ˙m given. Then we derive the variance-covariance of Lemma C.9.7: Representation of derived variance-covariance matrices of TaylorKarman structured variance-covariance of absolute Cartesian coordinates. Let the variance-covariance matrix of absolute Cartesian ij .X1 ; X2 / be isotropic and homogenous in terms of the Taylor-Karman structure be given. Then the derived variance-covariance matrices are structured according to ˚ij .X1 ; X2 ; X3 ; X4 / D ˙ij .X1 ; X3 / ˙ij .X1 ; X4 / ˙ij .X2 ; X3 / ˙ij .X2 ; X4 /
(C351)
.X1 ; X2 ; X3 ; X4 / D ai .X1 ; X2 /aj .X3 ; X4 /˚ij .X1 ; X2 ; X3 ; X4 / (C352) .X1 ; X2 ; X3 ; X4 ; X5 ; X6 / D .X1 ; X2 ; X4 ; X5 / .X1 ; X2 ; X4 ; X6 / .X1 ; X3 ; X4 ; X5 / C .X1 ; X3 ; X4 ; X6 / (C353) ˘.X1 ; X2 ; X3 ; X4 / D bi .X1 ; X2 /bj .X3 ; X4 /˚ij .X1 ; X2 ; X3 ; X4 / (C354) subject to
828
C Statistical Notions, Random Events and Stochastic Processes
˙ij .Xˇ ; X˛ / D ˙m .jXˇ X˛ j/ıij C Œ˙l .jXˇ X˛ j/ ˙m .jXˇ X˛ j/bi .Xˇ ; X˛ /bj .Xˇ ; X˛ / P ˛ / y.X P ˇ / ŒPs .Xˇ ; X˛ / a1 .X˛ ; Xˇ / WD Œy.X 2
(C355) (C356)
P ˛ / x.X P ˇ / ŒPs .x˛ ; xˇ / a2 .X˛ ; Xˇ / WD CŒx.X
(C357)
b1 .X˛ ; Xˇ / WD CŒx.X P ˛ / x.X P ˇ / ŒPs 2 .x˛ ; xˇ / D cos ˛P ˇ˛
(C358)
2
P ˛ / y.X P ˇ / ŒPs .x˛ ; xˇ / D sin ˛P ˇ˛ b2 .X˛ ; Xˇ / WD CŒy.X
(C359)
End of Lemma C.9.7: Taylor-Karman derived structures For the proof, we apply summation convention with respect to two identical indices. In addition, the derivations are based on ı˛.X1 ; X2 / D a1 .X1 ; X2 /Œıx.X1 / ıx.X2 / C a2 .X1 ; X2 /Œıy.X1 / ıy.X2 / (C360) ıs.X1 ; X2 / D b1 .X1 ; X2 /Œıx.X1 / ıx.X2 / C b2 .X1 ; X2 /Œıy.X1 / ıy.X2 / (C361) In order to be informed about local variance-covariance we need special results about the situation X1 D X3 ; X2 D X4 : Lemma C.9.8: derived local variance-covariance matrices of a Taylor-Karman structure variance-covariance matrix of absolute Cartesian coordinates ˚ij .X1 ; X2 / D 2Œ˙m .0/ıij .jX2 X1 j/ D ˚m .jX2 X1 j/ıij C Œ˚l .jX2 X1 j/ ˚m .jX2 X1 j/bi .X1 ; X2 /bj .X1 ; X2 /
(C362)
˚l .jX2 X1 j/ D 2Œ˙l .0/ ˙l .jX2 X1 j/
(C363)
˚m .jX2 X1 j/ D 2Œ˙m .0/ ˙m .jX2 X1 j/
(C364)
.X1 ; X2 / D 2ai .X1 ; X2 /aj .X1 ; X2 /Œ˙m .0/ıij ˙.X1 ; X2 / D Œ˚m .jX2 X1 j/ ŒjX2 X1 j2
(C365)
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
829
.X1 ; X2 ; X3 / D .X1 ; X2 / 2.X1 ; X2 ; X1 ; X3 / C .X1 ; X3 / D
˚m .jX2 X1 j/ 2
jX2 X1 j
C
˚.jX3 X1 j/ jX3 X1 j2
ai .X1 ; X2 /aj .X1 ; X3 /
Œ˚ij .X1 ; X2 / ˚ij .X2 ; X3 / ˚ij .X1 ; X3 / D ˚m .jX2 X1 j/
< X3 X2 ; X3 X1 jX2 X1 j2 jX3 X1 j2
˚m .jX3 X2 j/ C˚m .jX3 X2 j/
< X2 X1 ; X3 X2 > jX2 X1 j2 jX3 X2 j2 < X2 X1 ; X3 X1 > jX2 X1 j2 jX3 X1 j2
CŒ˚m .jX2 X1 j/ ˚.jX3 X2 j/
ŒjX2 X1 j2 jX3 X1 j2 < X2 X1 ; X3 X1 > ŒjX2 X1 j2 jX3 X1 j2 jX3 X2 j2
˘.X1 ; X2 / 2bi .X1 ; X2 /j˙m .0/ıij ˙ij .X1 ; X2 / D ˚m .jX2 X1 j/
(C366)
(C367)
End of Lemma C 9.8: Derived local var-cov matrices The results can be interpreted as following. First, the scalar-valued functions .X1 ; X2 /; .X1 ; X2 ; X3 / and ˘.X1 ; X2 / are homogenous and isotropic. Second, of the local variance-covariance matrix of Cartesian coordinate differences has the structure of nonlocal covariance matrix of absolute Cartesian coordinates as long as we depart from a Taylor-Karman matrix: Third, not every homogenous and isotropic local variance-covariance matrix of Cartesian coordinate differences produces derived homogenous and isotropic functions as we illustrated in our next example. Of course, not every homogenous and isotropic local variance-covariance matrix of type coordinate difference has the structure of being derived by the transformation of absolute coordinates into coordinate difference. For instance, for the special case X1 D X2 the variance-covariance matrix ˙ij .X1 ; X2 / due to ˙m .0/ D ˙l .0/ can be characterized by circles. The limits ˙ij .X1 ; X1 / D ˙ij .X1 ; lim X2 / D lim ˙ij .X1 ; X2 / X2 !X1
X2 !X1
(C368)
are fixed for any direction of approximation. However, due to ˚m .jX2 X1 j/ ¤ ˚l .jX2 X1 j/ lead us to symmetry breaking in a two-dimensional EUCLIDEAN
830
C Statistical Notions, Random Events and Stochastic Processes
space: Though we have started from a homogenous and isotropic variancecovariance matrix of absolute Cartesian coordinates into relative coordinates or coordinate differences the variance-covariance matrix ˚ij .X1 ; X2 ; X3 ; X4 / is not homogenous and isotropic, a result mentioned first by W. Baarda (1977): The error eclipses are not circular. In a three dimensional Euclidean space. In contrast, we receive rotational symmetric ellipsoids based on the difference vector X2 X1 with respect to the axis of rotation and circles as intersection surfaces orthogonal to the vector X2 X1 . In order to summarize our results we present Lemma C.9.9: Circular two-point functions The image of the two-point function ˚ij .X1 ; X2 / D ˚.X1 ; X2 ; X1 ; X2 / is a circle if and only if ˚l .jX2 X1 j/ D ˚m .jX2 X1 j/ or ˙l .jX2 X1 j/ D m .jX2 X1 j/ End of Lemma C.9.9: Circular two-point functions The result is illustrated by an example: Assume network points with a distance function jX2 X1 j D 1. In this distance the longitudinal and the lateral correlation functions are assumed to the values 1/2 and 1/4. Then there holds
˙l .0/ D ˙m .0/ D 1 1 1 ; ˙m D 2 4 3 3 ˚l D 1; ˚m D ; .1/ D ; ˘.1/ D 1 2 2
˙l D
(C369) (C370) (C371)
Lemma C.9.10: Derived local variance-covariance matrix for a variancecovariance matrix of absolute Cartesian coordinate of Special Taylor-Karman structure ˙l D ˙m Let the variance-covariance matrix of absolute Cartesian coordinates with Special Taylor-Karman structure ˙l D ˙m for homogenous and isotropic signals given. Then the derived variance-covariance matrices are structured accordingly to ˚ij .X1 ; X2 / D 2Œ˙m .0/ ˙m .jX2 X1 j/ıij D ˚.jX2 X1 j/ıij
(C372)
˚l .jX2 X1 j/ D ˚m .jX2 X1 j/ D 2Œ˙m .0/ ˙m .jX2 X1 j/ (C373)
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
.X1 ; X2 / D
2 jX2 X1 j2 D
.X1 ; X2 ; X3 / D
831
Œ˙m .0/ ˙m .jX2 X1 j/
1 ˚m .jX2 X1 j/ jX2 X1 j
(C374)
1 2
jX2 X1 j jX3 X1 j2 Œ˚m .jX2 X1 j/ < X3 X2 jX3 X1 >
˚m .jX3 X1 j/ < X2 X1 ; X3 X2 > C˚m .jX3 X2 j/ < X3 X1 ; X2 X1 >
(C375)
˘.X1 ; X2 / D 2Œ˙m .0/ ˙m .jX2 X1 j/ D ˚m .jX2 X1 j/ (C376) End of Lemma C 9.10: Special Taylor-Karman matrix Our example is characterized now by a homogenous and isotropic variancecovariance matrix of Cartesian absolute coordinates which are the gradient of a homogenous and isotropic scalar-valued function. In addition, we assume that the scalar valued function is an element of a two-dimensional Markov process of first order. Its variance-covariance matrix is characterized by normalized longitudinal and lateral correlation function which possess the distance function s WD jX2 X1 j between network points d the characteristic distance, K0 and K1 the modified Bessel function of the second kind of zero and first order. The characteristic distance d is the only variable or degree of freedom. Within the parameter d we depend on our experience with geodetic networks. Due to the property ˙l .0/ D ˙l .0/ we call the longitudinal and lateral normalized. In order to find the scale for the variancecovariance matrix we have to multiply by the variance 2 . The modified Bessel function K0 and K1 are tabulated in Table C18 in the range 0.1(0.1)8 where we followed M. Abramowtz and I. Stegun (1970 p. 378-379). In Fig. C18 we present the characteristic functions of type absolute coordinates ˙l .X / D 4x 2 C 2K0 .X / C 4x 1 K1 .X / C 2xK1 .X / and ˙m .X / D C4x 2 2K0 .X /2K0 .X /4x 1 K1 .X /. Next is the representation of the derived functions ˚l .X/ D ˘.X / D 2˙m .0/ 2˙l .X /; ˚m .X / D 2˙m .0/ 2˙m .X / and .X / D 2x 2 j˙m .0/ ˙m .X /j D x 2 ˚m .X /. Figures C19 and C20 illustrate the various variance-covariance functions. An explicit example is a two-dimensional network in Fig. C21. Let point one be the center of a fixed (x,y) Cartesian coordinate system. In consequence, we measure nine points by distance observations assuming the distance between point one and point two is unity, namely p p p 1; 2; 2; 5; 8 The related characteristic functions for a characteristic distance d=1 are approximately given in Table C18. The variance-covariance matrix of absolute Cartesian
832
C Statistical Notions, Random Events and Stochastic Processes
Fig. C18 Modified Bessel function K0 .X/ and K1 .X/ longitudinal and lateral function of absolute Cartesian coordinates f1 .X/ DW ˙l .X/ and f2 .X/ DW ˙m .X/, longitudinal and lateral correlation of relative Cartesian coordinates f3 WD ˚m .X/ and f4 WD ˚l .X/, variances of directions f5 WD .X/ and distances ˘.X/ DW .X/ of a two-dimensional conservation Markov process of first order
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
833
Fig. C19 Characteristic functions l and ˙m for Taylor-Karman structured varaince-covariance matrix of Cartesian coordinates
coordinates is relative Cartesian coordinates or coordinate differences is computed with respect to 2 D 1:09; y1 y2 D 0:50; x1 x2 ;y1 y2 D 0 We added by purpose according to Fig. C19 the characteristic functions for the special homogenous and isotropic variance-covariance matrix given by ˙l D ˙m . The associated variance-covariance matrix for absolute and relative Cartesian coordinates is given in Tables C25 and C26, structured by
˙l D
4d s 4d 2 s s s K1 . / C 2. /K1 . / C 2K0 . / C s2 d s d d d
(C377)
4d 4d 2 s s K1 . / C 2K0 . / C 2 s d s d
(C378)
˙m D
Finally we present the derived variance-covariance matrices for relative Cartesian coordinates, direction s, direction differences namely angles and distances, for the special case ˙l D ˙m .
834
C Statistical Notions, Random Events and Stochastic Processes
Fig. C20 Derived variance-covariance functions for Taylor-Karman structured variance-covariance matrix of Cartesian coordinates
˚l .s/ D ˘.s/ D 2˙m .0/ 2˙m .s/; 2
s 8d
s
s
s 8d D 2C K 4K 4 K 0 1 1 s2 d s d d d (C379) ˚m .s/ D ˘.s/ D 2˙m .0/ 2˙m .s/; 2
s 8d
s 8d D 2 C K1 ; 4K 0 2 s d s d 2 Œ˙m .0/ ˙m .s/ ; s2 2
s
s
s 4d 2 C 4 K 2K D 2 1 0 1 s s2 d d d
(C380)
.s/ D ˘.s/ D
(C381)
Example 9.1 is based on our contribution E. Grafarend and B. Schaffrin (1979) on the subject of criterion matrices of type homogenous and isotropic. Other statistical properties like anisotropic or non-isotropic correlation functions were treated by G. K. Batchelor (1946), S. Chandrasekhar (1950) and E. Grafarend (1971, 1972) on the subject of axisymmetric turbulence.
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes Fig. C21 Two dimensional network
835
836
C Statistical Notions, Random Events and Stochastic Processes
Fig. C22 Characteristic functions of a network example
Fig. C23 Cartesian coordinate data for a network example
Of course there is a rich list of references for “criterion matrices”. We list only few of them. J. E. Alberta (1974), A. B. Amiri (2004), W. Baarda (1973, 1977), K. Bill (1985), R. Bill et. al. (1986), Bischoff et. al. (2006), G. Boedecker (1977), F. Bossler et. al. (1973), F. Comte et. al. (2002), P. Cross et. al. (1978), G. Gaspari et. al. (1999), E. Grafarend (1971, 1972, 1975, 1976), E. Grafarend, F. Krumm and B. Schaffrin (1985), E. Grafarend and B. Schaffrin (1979), T. Karman (1937), T. Karman et. al. (1939), R. Kelm (1976), R. M. Kerr (1985), S. Kida (1989), F. Krumm et. al (1982, 1986), S Meier (1976, 1977, 1978, 1981), S. Meier and W. Keller (1990), J. van Mierlo (1982), M. Molenaar (1982), A. B. Obuchow (1958), T. Reubelt et. al. (2003), H. P. Robertson (1940), B. Schaffrin and E. Grafarend (1977, 1981), B. Schaffrin and E. Grafarend (1982), B. Schaffrin et. al. (1977), G. Schmitt (1977, 1978 i, ii), J. F. Schoeberg (1938), J. P. Shkarofky (1968), V. I. Tatarski (1961), A. I. Taylor (1935, 1938), C. C. Wang (1970), H. Wimmer (1972, 1978), M. J. Yadrenko (1983) and A. A. Yaglom (1987).
Ex 2: Criterion Matrices for Absolute Coordinates in a Space Time Holonomic Frame of Reference, for Instance Represented in Scalar-Valued Spherical Harmonics Example C.9.2: Criterion matrices for absolute coordinates in a space time holonomic frame of reference, for instance represented in scalar-valued spherical harmonics
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
837
Fig. C24 Derived variance-covariance functions for a special Taylor-Karman structure variance-covariance matrix of type ˙l D ˙m for Cartesian coordinates
Conventional geodetic networks have been adjusted by methods of Total Least Squares and optimized in different reference frames. The advantage of a rigourously three dimensional adjustment in a uniform reference frame is obvious, especially under the influence of subdtic geodetic networks within various Global Positioning Systems (GPS, Global Problem Solver). Numerical results in a three dimensional, Total Least Squares approach are presented for instance by K. Bergbauer et. al. (1979) K. Euler et. al. (1982), E. Grafarend (1991), G. Hein and H. Landau (1983), and by W. Jorge and H. G. Wenzel (1973). The diagnosis of a geodetic network is rather complicated since the measure of quality, namely the variance-covariance matrix of three dimensional coordinates of Cartesian type contains 3g.3g C 1/=2 independent elements. The integer number g accounts for the number of ground stations. Example: g = 100
838 Fig. C25 Homogenous isotropic normalized variance-covariance matrix of absolute coordinates from a network example
C Statistical Notions, Random Events and Stochastic Processes
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
839
Fig. C26 Special homogenous isotropic normalized variance-covariance matrix of absolute coordinates for a network example case ˙l D ˙m
Fig. C27 Special homogenous isotropic normalized variance-covariance matrix of relative coordinates for a network example case ˙l D ˙m
The number of independent elements with in the coordinate variance-covariance matrix amounts to 45,150. Example: g = 1,000 The number of independent elements within the coordinate variance-covariance matrix amounts to 4,501,500. Example: g = 5,000 (GPS)
840
C Statistical Notions, Random Events and Stochastic Processes
The number of independent elements within the coordinate variance-covariance matrix amounts to 125,507,500 ! In a definition, the same number of unknown variance-covariance elements of vertical deflections and gravity disturbances. The open big problem In light of the great number of elements of the variance-covariance matrix it is an open question how to achieve any measure of quality of geodetic network based on GPS. We think the only way to get away from this number of millions of unknowns is to design an idealized variance-covariance matrix again called criterion matrix. Comparing the real situation of the variance-covariance with an idealized one, we receive an objective measure of quality of the geodetic network. Here we are aiming for designing a three-dimensional criterion matrices for absolute coordinates in a holonomic frame of reference and for functionals of the “disturbing potential”. In a first section we extensively discuss the notion of criterion matrix on a surface of a moving frame of type homogenous and isotropic on the sphere which is a generalization of the classical Taylor-Karman structure for a vectorized stochastic process. The vector values stochastic process on the sphere is found. In detail we review the representation of the criterion matrix in scalar- and vector-valued spherical harmonics. In section two we present the criterion matrices for functionals of the disturbing potential. The property of harmonicity is taken into account for the case that a representation in terms of scalar-valued spherical harmonics exists. At the end, we present a numerical example. Example C.9.21: Criterion matrices for absolute coordinates in a space time holonomic frame if reference The variance-covariance matrix of absolute Cartesian coordinates in the fixed orthonormal frame of equatorial type, holonomic in spacetime, is transformed into a moving frame, e.g. of spherical type. In the moving frame longitudinal and lateral characteristic functions are derived. The postulates of spherical homogenity and spherical isotropy are reviewed extending to the classical Taylor-Karman structure with respect to three dimensional Cartesian coordinates into three dimensions. Finally the spherical harmonic representation of the coordinate criterion matrix is introduced. Transformation of the coordinate criterion matrix from a fixed into a moving frame of reference F ! e W We depart from the fixed orthonormal frame fF 1 ; F 2 ; F 3 g which we transform e e e into a moving one fe 1 ; e 2 ; e 3 ; g by e e e
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
F ! e D RE .; ; 0/F
841
(C382)
RE .; ; 0/ WD R3 .0/R2 . 2 /R3 ./ is the three dimensional euler rotation matrix including geometric longitude and geometric latitude . As an example assume a representation of a placement vector x by spherical coordinates .; ; r/ such that e x D F1 r cos cos C F2 r sin cos C F3 r sin
(C383)
e1 D x; kx; k D e e2 D x; kx; k D e e3 D x;r kx;r k D er
(C384)
x; denotes ıx=ı etc. e ! f Now let us rotate the orthonormal frame fe1 ; e2 ; e3 ; g into the orthonormal frame ff1 ; f2 ; f3 ; g by means of the South azimuth ˛ 2
3 f1 e ! f D R3 .˛/e 4 f2 5 f3 2 32 3 cos ˛ sin ˛ 0 e1 D 4 sin ˛ cos ˛ 0 5 4 e2 5 0 0 1 e3
(C385)
The transformation e ! f on the unit sphere orientates f1 into the direction of the great circle connecting the points .; ; r/ and .0 ; 0 ; r 0 / For this reason the unit vector f1 will be called “longitudinal direction”. In contrast, f2 points in direction orthogonal to f1 , but still in the tangential plane of the unit sphere. Thus we call the unit vector f2 “lateral direction”. Obviously f3 D e3 holds F ! g ! f There is an alternative way to connect the triads F and f . Consider the spherical position of a point in the “orbit” of the great circle passing through the points x; x0 , respectively. For a given set of “Keplerian elements” f˝; i; !g - ˝ the right ascension of the ascending node, i the inclination and ! the angle measured in the “orbit” - we transform the fixed orthonormal frame fF1 ; F2 ; F3 g into the orthonormal “great circle moving frame” fg1 ; g2 ; g3 g D ff3 ; f1 ; f2 ; g by F ! g D R3 .!/R1 .i /R3 .˝/F
(C386)
g ! F D R3 .˝/R1 .i /R3 .!/F
(C387)
842
C Statistical Notions, Random Events and Stochastic Processes
Fig. C28 Moving and fixed frame of reference on the sphere : two point correlation function
or explicitly 2
3 2 3 g1 cos ! cos ˝ sin ! cos i sin ˝ cos ! sin ˝ C sin ! cos i cos ˝ C sin ! sin i 4 g2 5 D 4 sin ! cos ˝ cos ! cos i sin ˝ sin ! sin ˝ C cos ! cos i cos ˝ cos ! sin i 5 g3 sin i sin ˝ sin i cos ˝ cos i (C388) 2
3
F1 : 4 F2 5 F3
(C389)
g3 D sin i sin ˝F1 sin i cos ˝F2 C cos i F3
(C390)
2
represents the “great circle normal” parameterized by the longitude ˝ and the latitude 2 i . g1 and g2 span the “great circle plane”, namely g1 is directed towards the spherical point x. Figure C28 illustrates the various fames of type “fixed” and “moving”. fg1 ; g2 ; g3 g D ff3 ; f1 ; f2 ; g, g2 is the vector product of g3 and g1 Transformation of variance-covariance matrices Next we compute the Cartesian coordinates variance-covariance matrix.
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
843
†.x; x0 / D † i j .x; x0 /fFi ˝ Fj g †.x; x0 / D † i j .x; x0 /ffi .x/ ˝ fj .x0 /g †.x; x0 / D † ij .x; x0 /fgi .x/ ˝ gj .x0 /g
(C391)
wherei; j"f1; 2; 3g holds 3 ˙x;x0 ˙x;y0 ˙x;z0 0 .˙ ij / D 4 ˙y;x0 ˙y;y0 ˙y;z0 5 ˙z;x0 ˙z;y0 ˙z;z0 2
0
0
0
(C392)
0
.˙ ij / D EfŒx i Efx i gŒX j EfX j gg D Ef"i "j g ˙.x; x0 / WD Ef".x/"T .x0 /g ij 0
(C393) (C394)
0
Note that the variance-covariance matrix ˙ .x; x / is a two-point tensor field paying attention to the fact that the coordinate errors ".x/ and ".x/ are functions of the point x; x0 , respectively. Now let us transform ˙ $ ˙ $ ˙. D RE .; ; ˛/˙33 RTE .0 ; 0 ; ˛ 0 / ˙30
D R3 .˝/R1 .i /R3 .!/˙33 R3 .! 0 /RT1 .i 0 /RT3 .˝ 0 / D RE .; ; ˛/˙ RTE .0 ; 0 ; ˛ 0 / ˙30
(C395)
D RE .; ; ˛/˙ R.!; i; ˝/˙33 RT .! 0 ; i 0 ; ˝ 0 / RTE .0 ; 0 ; ˛ 0 / ˙30 D RT .! 0 ; i 0 ; ˝ 0 /˙33 R.!; i; ˝/ D RT .! 0 ; i 0 ; ˝ 0 /RTE .; ; ˛/˙33 RE .0 ; 0 ; ˛ 0 / R.! 0 ; i 0 ; ˝ 0 /
A detailed write-up of the variance-covariance matrix ˙ is 3 ˙l l 0 ˙lm0 ˙lr 0 .˙ / D 4 ˙ml 0 ˙mm0 ˙mr 0 5 ˙rl 0 ˙rm0 ˙rr 0 2
ij 0
(C396)
indicating the “longitudinal” base vector f1 D fl , “lateral” base vector f2 D fm and the “radical” base vector f3 D fr D er . In terms of spherical coordinates ˙ ; ˙, respectively, can be written ˙30 .x; x0 / D ˙ .; ; r; 0 ; 0 ; r 0 /
(C397)
˙30 .x; x0 / D ˙.!; i; ˝; r; ! 0 ; i 0 ; ˝ 0 ; r 0 /
(C398)
0
˙l l 0 D Ef"l .x/"l 0 .x /g; ; ˙l l 0 .x; x0 /ffl .x/ ˝ fl 0 .x0 /g
844
C Statistical Notions, Random Events and Stochastic Processes
˙lm0 D Ef"l .x/"m0 .x0 /g; ; ˙lm0 .x; x0 /ffl .x/ ˝ fm0 .x0 /g ˙lr 0 D Ef"l .x/"r 0 .x0 /g; ; ˙lr 0 .x; x0 /ffl .x/ ˝ fr 0 .x0 /g
(C399)
The characteristic function ˙l l 0 .x; x0 / is called longitudinal correlation function, ˙mm0 .x; x0 / lateral correlation function, ˙lm0 .x; x0 / cross correlation function between the longitudinal error "l .x/ at the point x and the lateral error "m0 .x0 / at the point x0 . Spherically homogenous and spherically isotropic coordinate criterion matrices in a surface moving frame terrestrial geodetic networks constitute points on the earths surface. Once we want to develop an idealized variance-covariance matrix of the coordinates of the network points, we have to reflect the restriction given by the closed surface. A three dimensional criterion matrix of Taylor-Karman type is therefore not suited since it assumes extension of a network in three dimension up to infinity. a more suitable concept of a coordinate criterion matrix starts from the representation where the radial part is treated separately from the longitudinal-lateral part. Again the nature of the coordinate criterion matrix as a second order two-point tensor field will help us to construct homogenous and isotropic representations. The spherically homogeneity-isotropy-postulate reflects the intuitive notion os an idealized variance-covariance matrix. Definition (spherical homogeneity and spherical isotropy of a tensor-valued twopoint function): The two-point second order tensor function ˙.x; x0 / is called spherically homogenous, if it is translational invariant in the sense of † 33 .! C ; i; ˝; r; ! 0 C ; i 0 ; ˝ 0 ; r 0 / D † 33 .!; i; ˝; r; ! 0 ; i 0 ; ˝ 0 ; r 0 / (C400) for all 2 Œ0; 2 ; i D i 0 ; ˝ D ˝ 0 ; r D r 0 , where ! 0 D ! C spherically isotropic, if it is rotational invariant in the sense of 0
0
holds. It is called
0
˙33 .! !s ; i ; ˝ ; r; ! !s ; i ; ˝ ; r 0 / D R3 .ıs /˙33 .! !s ; i; ˝; r; ! 0 !s ; i 0 ; ˝ 0 ; r 0 /R3 .ıs / 0
(C401)
for all ı"Œ0; 2 ; r D r 0 ; where ! 0 D ! C , ! D ! C holds. R3 .ıs / indicates a local rotation around the intersection points xs of the great circle through x; x0 0 and through x ; x , respectively. xs has the common coordinates .˝; i; !s and .˝ ; i ; !s with respect to the locally rotated frames. End of Definition
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
845
Corollary (spherical homogeneity and spherical isotropy of a tensor-valued twopoint function): The two-point second order tensor function ˙.x; x0 / is spherically homogenous, if it is a function of . ; i; ˝; r/ only: ˙.!; i; ˝; r; ! 0 ; i 0 ; ˝ 0 ; r/ D ˙.!; i; ˝; r; ! !s ; ; r/
(C402)
It is spherically homogenous and isotropic, if it is a function of . ; r/ only ˙.!; i; ˝; r; ! 0 ; i 0 ; ˝ 0 ; r/ D ˙. ; i; ˝; r/
(C403)
such that ˙.x; x0 / D f˙0 . ; r/ı ij C ˙22 . ; r/ < x0 xjfi > < x0 xjfi >g.fi ˝ fj /
(C404)
holds for some spherically isotropic and homogenous scalar-valued functions ˙0 and ˙22 End of Corollary Let us interpret the definition of the corollary, especially with respect to Fig. C28. Two points x; x0 are connected to the sphere of radius r D r 0 D r0 by a great circle such that D 2 arcsin.kx; x0 k=2r0/ is the spherical distance if these points. Obviously a “translation” of the configuration x; x0 is generated by ! ! ! C; ! ! ! 0 C where the location of x; x0 in the great circle / “orbit” plane is measured by !; ! 0 . The “rotation” of the configuration x; x0 is more complicated. Assume 0 therefore an alternative configuration x ; x of two point connected by another 0 great circle. Note the spherical distances of x; x0 and x ; x respectively, coincide. The great circles intersect at xs , : The location of Xs within the plane x; x0 is 0 described by !s with respect to the ascending node, while within the plane x ; x by !s . the angle between the two great circles is denoted by ıs . 0
The configuration x; x0 and x ; x is described with respect to xs by 0 0 0 ! !s ; i; ˝; ! 0 !s ; i 0 ; ˝ 0 and ! !s ; i ; ˝ ; ! !s ; i ; ˝ respectively. Now it should be obvious that the two configurations under the spherical isotropy postulate coincide once we rotate around xs by the angle ıs . We expressed ˙ . given in the F -frame in terms of the g-frame, namely by ˙ D R3T .!/R1T .i /R3T .˝/˙ R3 .˝/R1 .i /R3 .!/
(C405)
Introducing ! D .! !s / C !s we compute the variance-covariance matrix ˙
846
C Statistical Notions, Random Events and Stochastic Processes
˙ D RT3 .! !s /RT3 .!s /RT1 .i /RT3 .˝/˙ R3 .˝ 0 /R1 .i 0 /R3 .!s /R3 .! 0 !s / (C406) ˙ D RT3 .! !s /˙s R3 .! 0 !s /
(C407)
in terms of the variance-covariance matrix ˙s in the g.xs /-frame. Now we are prepared to introduce Corollary (spherical Taylor-Karman structure): 0
Denote the variance-covariance matrix ˙ i j in the f - and e - frame by f˙ e˙
i j 0
i j 0
22 0 ˙l l 0 . ; r/ f˙ 0 ˙mm0 . ; r/ 22 ˙110 ˙120 e˙ D ˙210 ˙220
D
(C408) (C409)
and transform f ˙ ! e ˙ by e˙ D
cos ˛ sin ˛ cos ˛ 0 sin ˛ 0 f˙ D sin ˛ cos ˛ sin ˛ 0 cos ˛ 0
(C410)
Then ˙110 D ˙l l 0 . ; r/ cos ˛ cos ˛ 0 C ˙mm0 . ; r/ sin ˛ sin ˛ 0 ˙120 D ˙l l 0 . ; r/ cos ˛ sin ˛ 0 ˙mm0 . ; r/ sin ˛ cos ˛ 0 ˙210 D ˙l l 0 . ; r/ sin ˛ cos ˛ 0 ˙mm0 . ; r/ cos ˛ sin ˛ 0 ˙220 D ˙l l 0 . ; r/ sin ˛ sin ˛ 0 C ˙mm0 . ; r/ cos ˛ cos ˛ 0
(C411)
˙lm0 D ˙ml 0 D 0
(C412)
holds for spherically homogenous and isotropic scalar-valued functions ˙l l 0 . ; r/, ˙mm0 . ; r/ called longitudinal and lateral correlation functions on the sphere. End of Corollary The corollary extends results of G.J. Taylor (1935), T. Karman (1937) and C. Wang (1970 p. 214-215) and has been given for a special variance-covariance two-point tensor field of potential type by H. Moritz (1972 p. 111). Here it has been generalized for any vector-valued field. Note that incase of homogenity and isotropy the cross-correlation functions ˙lm0 and ˙ml 0 may vanish. The classical planar Taylor-Karman structure is constrained within.
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
847
Let us put ˛ D ˛ 0 such that cos ˛ cos ˛ 0 D cos2 ˛ D 1 sin2 ˛; sin ˛ sin ˛ 0 D sin2 ˛ etc. can be rewritten ˙110 D ˙mm0 . / C Œ˙l l 0 . / ˙mm0 . / cos2 ˛ ˙120 D Œ˙l l 0 . / ˙mm0 . / sin ˛ cos ˛ D ˙210 ˙220 D ˙mm0 . / C Œ˙l l 0 . / ˙mm0 . / sin2 ˛
(C413)
which is the standard Taylor-Karman form Representation of the coordinate criterion matrix in scalar spherical harmonics Let us assume a coordinate variance-covariance matrix which is homogenous and isotropic in the sphere. With reference to (C412) we postulate a coordinate vector of potential type, in addition, namely " D gradR "i D @i R .i D 1; 2; 3/
(C414)
@ @ @ where .@i / D . @x ; @y ; @z / nd R the scalar potential. Now the variance-covariance matrix of absolute coordinates can be represented by 0
0
˙ ij D Ef"i .x/"j .x0 /g D @i @j 0 EfR.x/R.x0/g
(C415)
where we used the commutator relation @i E D E@i of differentiation and integration operators. Denote the scalar-valued variance-covariance function EfR.x/R.x0/g DW K.x; x0 /. Once we postulate homogeneity and isotropy of two-point function K.x; x0 / on the sphere we arrive at K.x; x0 / D K.r; r 0 ; /
(C416)
For a harmonic coordinate error potential, @i @i R D 0, we find for the covariance function K.r; r 0 ; / @i @i K.r; r 0 ; / D Ef@i @i R.x/R.x0 /g D 0 0
0
@j 0 @j 0 K.r; r ; / D EfR.x/@j 0 @j 0 R.x /g D 0
(C417) (C418)
Case 1 .kxk D kx0 k/ On a sphere .kxk D kx0 k/ D r0 the scalar-valued variance-covariance function K. / which fulfils (C414 - C 417) may be represented by K. / D
1 X nD0
kn Pn .cos /
(C419)
848
C Statistical Notions, Random Events and Stochastic Processes
where Pn are the Legendre polynomials. The constants kn depend, of course, on the choice of r0 Case 2 .kxk ¤ kx0 k/ Denote .kxk D r; kx0 k D r 0 /. the scalar-valued variance-covariance function K.r; r 0 ; / which fulfils (C414 - C 417) may be represented by K D K.r; r 0 ; / D
1 X nD0
1 kn .
r02 / rr 0
nC1
Pn .cos / C
1 X
2 kn .
nD0
rr 0 n / Pn .cos / r0
(C420)
where the coefficients 1kn ; 2kn represent the influence of the decreasing and increasing solution, by Legendre polynomials with respect to r; r 0 , respectively. We can only hope that the covariances K.r; r 0 ; / decrease with respect to r; r 0 , respectively in order to be allowed to discriminate 2kn D 0. In the following we mae this assumption, thus representing K.x; x0 / D K.r; r 0 ; / D
1 X nD0
kn .
r02 / rr 0
nC1
Pn .cos /
(C421)
Next let us assume the coordinate variance-covariance matrix (C420) is of potential type. Then longitudinal and lateral correlation functions on the sphere if radius r can be represented by ˙l l 0 . / D ˙mm0 . / D ˙l l 0 . / D
d2 K. / d 2 1 sin
1 K. / d
d .˙mm0 sin d
(C422) (C423) (C424)
if .x; x0 / is homogenous and isotropic on the sphere. End of Corollary Note that the result ˙l l 0 D d .s˙mm0 / ds q
d d
.sin ˙mm0 / for the sphere corresponds to ˙l l 0 D
for a harmonic field in the plane.
.x 0 x/2 C .y 0 y/2 is the Euclidean distance in the plane which s D corresponds to at the distance on the sphere.
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
849
Corollary (longitudinal and lateral correlation functions on the sphere of harmonic type): Let us assume that the coordinate variance-covariance matrix id homogenous, isotropic and harmonic on the sphere in the sense of the definition. Longitudinal and lateral correlation functions on the sphere can be represented by ˙l l 0 . / D
1 X
kn
nD0
.n2 C 3n C 2/cos2 sin2
C .n C 1/ :
2 cos Pn .cos / Œn C 3n C 2 2 sin 2
C Œn2 C 3n C 2 ˙mm0 . / D
X nD0
1kn
1 sin2
nC1 sin2
PnC1 .cos /
PnC2 .cos /g
(C425)
Œcos Pn .cos /
PnC1 .cos /
(C426)
End of Corollary For the proof of the above corollary we have made successive use of the recurrence relation .x 2 1/dPn .x/=dx D .n C 1/ŒxPn .x/ PnC1 .x/; x D cos ; d=d D .dx=d /.d=dx/ D sin d=dx Representation of the coordinate criterion matrix in vector spherical harmonics Previously we restricted the coordinate criterion matrix being homogenous and isotropic on the sphere to be of potential type. Here we generalize the coordinate criterion matrix to be of potential and “turbulent” type. Thus according to the Lame lemma – for more details see A. Ben Menahem - S. J. Singh (1981) - we represent the coordinate vector " D gradR C rotA (C427) where we gauge the vector potential A by d i vA D 0. In vector analysis gradR is called the longitudinal vector field, rotA the transversal vector field. In order to base the analysis only on scalar potentials, the source-free vector field rotA is decomposed S C T where S is the poloidal part while T is the toroidal part such that " D gradR C rot rot.er S/ C rot.er T/
(C428)
.R; S; T / are the three scalar potentials we were looking for. The variancecovariance matrix of absolute coordinates can now be structured according to
850
C Statistical Notions, Random Events and Stochastic Processes
˙.x; x0 / D Ef".x/g D grad ˝ grad 0 EfR.x/R.x0/g Crot rot ˝ rot 0 rot 0 er er 0 EfS.x/S.x0/g Crot ˝ rot 0 er er 0 EfT .x/T .x0/g
(C429)
where we have assumed EfR.X/S.X0 /g D 0; EfR.X/T .X0 /g D 0; EfS.X/T .X0/g D 0 holds for a coordinate variance-covariance matrix ˙.xx0 / homogenous and isotropic on the sphere. Thus with respect to the more general representation is
˙.x; x0 / D grad ˝ grad 0 K C rot rot ˝ rot 0 rot 0 er er 0 L C rot ˝ rot 0 er er 0 M
(C430)
where K WD EfR.X/R.X0 /g; L WD EfS.X/S.X0 /g; M WD EfT .X/T .X0 /g are supposed to be the scalar-valued two point covariance functions depending on .r; r 0 ; / for .˙.x; x0 // homogenous and isotropic on the sphere. The representation of the criterion matrix .˙.x; x0 // in terms of vector spherical harmonics is achieved once we introduce the spherical surface harmonics. 2q
.nm/Š 2.2n C 1/ .nCm/Š Pnm .sin / cos m 8m > 0 6 6 6p enm ./ D 6 8m D 0 6 2.2n C 1/Pn .sin / 6 4q .njmj/Š 2.2n C 1/ .nCjmj/Š Pnjmj .sin / sin jmj 8m < 0:
(C431)
where Pnm .sin / are the associated Legendre functions nCm 1 n 2 m=2 d .1 D x / .x 2 1/ (C432) n nCm 2 nŠ dx Note that the quantum numbers run n D 0; 1; ; 1I m D n; n C 1; ; 0; ; n 1; n. The base functions enm .; / are orthonormal in the sense
Pnm .x/ D
1 4
Z2
C =2 Z
d 0
d cos ekl enm D ık n ılm =2
(C433)
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
851
Finally a vector field " can be represented in terms of vector spherical harmonics by " D Unm anm C Vnm bnm C Wnm cnm 2 anm WD er enm .; / 6 6 6 6 bnm WD r grad enm .; / 6 4 cnm WD x grad enm .; /:
(C434)
(C435)
For more details on vector spherical harmonics we refer to standard textbooks, here e.g. to E. Grafarend (1982). ˙.x; x0 / D Ef".x/ ˝ ".x0 /g D Unmn0 m0 anm ˝ an0 m0 C Vnmn0 m0 bnm ˝ bn0 m0 C Wnmn0 m0 cnm ˝ cn0 m0
(C436)
which is due to < ajb >D< ajc >D< bjc >D 0. Unmn0 m0 etc. are spherical covariance functions.
Example C 9.22: Towards criterion matrices for functionals of the distributing potential in a space-time holonomic frame if reference The geodetic observational equations in their linearized form contain beside unknown Cartesian coordinate corrections in an orthonormal reference frame, being holonomic in space-time, the disturbance of the potential and its derivatives as unknowns . for a complete set-up of a criterion matrix of geodetic unknowns we have therefore present thoughts about a homogenous and isotropic variancecovariance matrix of the potential disturbance and its derivatives, The postulates of homogeneity and isotropy will lead again to the extension of an classical TalyorKarman structure. finally the representation of the criterion matrix of potential related quantities in terms of scalar spherical harmonics is introduced. Since the potential disturbance is harmonic function, vector spherical harmonics are not needed. We begin the transformation of the gravity criterion matrix from spherical into Cartesian coordinates. b ; ı c ; With reference to the linearized observational equations the unknowns .ı b / have to be equipped with a criterion matrix, reasonable to be homogenous and ı
852
C Statistical Notions, Random Events and Stochastic Processes
isotropic on the sphere. In order to take advantage of lemma and corollaries of the second chapter, we better transform the spherical coordinates of the disturbance gravity vector ı into Cartesian ones and then apply the variance-covariance “propagation” on the transformation formula. 2
3 2 3 cos cos ˚ cos cos ıi WD i i D 4 sin cos ˚ 5 4 sin cos 5 sin ˚ sin 2 3 . C ı / cos. C ı / cos. C ı / D 4 . C ı / sin. C ı / cos. C ı / 5 . C ı / sin. C ı / 2 3 cos cos 4 sin cos 5
(C437)
sin
2 3 . C ı /.cos ı sin /.cos ı sin / : 4 ıi D . C ı /.sin C ı cos /.cos ı sin / 5 . C ı /.sin C ı cos / 2 3 cos cos 4 sin cos 5 (C438) sin 3 2 ı cos cos C ı sin cos C ı cos sin : 4 ıi D ı sin cos ı cos cos ı sin sin 5 ı sin ı cos (C439) 3 ı1 C sin cos C cos sin cos cos ı : 4 ı2 5 D 4 cos cos sin sin sin cos 5 4 ı 5 ı3 0 cos sin ı 2
3
2
2
3
2
ı1 : 4 ı2 5 D ı3
32
1=2 2 C1 2 .12 C 22 / 6 1=2 4 C1 2 3 .12 C 22 / 1=2 0 .12 C 22 /
32
(C440) 3
1 1 ı 7 2 1 5 4 ı 5 ı 3 1
b
(C441)
ci .x/; ı j 0 .x0 /g as a function of the varianceThe variance-covariance matrix Dfı b ; ı c ; ı b /.x/; .ı b ; ı c ; ı b /.x0 /g based on the transcovariance matrix Df.ı formation ( C 438) is given. finally the Cartesian variance-covariance matrix ci .x/; ı j 0 .x0 /g in the fixed frame F should be transformed into the moving Dfı frame f according to (C381).
b
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
Fig. C29 Normalized covariance function K. / D
Thus we refer to
5 P nD0
kn Pn .cos / for kn D
˙ ˙ ˙ ˙ ˙ l˙l 0 ; lm 0 ; lr 0 ; ml 0 ; mm0 ; ; rr 0
853
1 nC2
(C442)
˙ as the longitudinal correlation function l˙l 0 , the lateral correlation function mm 0 of the gravity disturbance vector ıi .
etc.
In order to construct a homogenous and isotropic criterion matrix for functionals of the disturbing potential in a surface moving frame we refer to the introduction which applies here, too. Again we make use of the definition of homogeneity and isotropy 0 of the tensor-valued two-point function ˙.X;X / on the sphere, the basic lemma and the corollary for the spherical Taylor-Karman structure. Just by adding the left hand index on ˙ within formula (C410) (C411), we arrive at the corresponding formulae for longitudinal, lateral and cross-correlation of the gravity disturbance vector ı . For the representation of the criterion matrix for functionals of the disturbing potential in scalar spherical harmonics we once more refer to (C417). All results can be transferred, especially (C 420 - C 425) once we write K.r; r 0 ; / as the c E.ıw/xŒ c ıw c E.ıw/x c 0 g homogenous scalar-valued two-point function EfŒıw and isotropic on the sphere. Besides the first-order functionals of the potential ıw and their variance-covariance matrices, higher order functionals, e.g. for gravity gradients, and their variance-covariance matrices can be easily obtained, a project for the future.
854
C Statistical Notions, Random Events and Stochastic Processes
Fig. C30 Normalized covariance function K. / D
5 P nD0
kn Pn .cos / for kn D
1 .nC2/.n2 2nC2/
A numerical example of the homogenous and isotropic two-point function K.x; x0 ; / on the space For demonstration purpose plots C29 - C31 show the normalized covariance functions K.X; X0 / D K.r; r 0 ; / D K. / D
5 P nD0
kn Pn .cos / for case 1
.kXk D kmathbf X0 k D r0 / and three different choices of the coefficients kn . The coefficients are chosen in such a manner that their sum equals the unity (so as to normalize the covariance function) and the covariance function approaches zero as tends to . As an example for the construction of a criterion matrix, the coefficients kn could be estimated for an ideal network and its variance-covariance matrix; that is derived covariance function K.x; x0 /. We have taken a part of Appendix C 9.2 from our contribution E. Grafarend, I. Krumm and B. Schaffrin (Manuscripta geodetica 10 (1988) 3-22). Spherical homogeneity and isotropy in tensor-valued two-point functions have been introduced by E. Grafarend (1976) and H. Moritz (1972, 1978). Ex 3: Space Gravity Spectroscopy: The Benefits of Taylor–Karman Structured Criterion Matrices Example C.9.3. Space gravity spectroscopy: the benefits of Taylor-Karman structured criterion matrices (eg, P. Marinkovice et. al. (2003))
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
Fig. C31 Normalized covariance function K. / D
5 P nD0
kn Pn .cos / for kn D
855
1 n2 C3
As soon as the space spectroscopy was successfully performed, for instance by means of semi-continuous ephemeris of LEO-GPS tracked satellites, the problem of data validation appeared. It is for this purpose that a stochastic model for the homogenous isotropic analysis of measurements, obtained as “directly” measured values in LEO satellite missions (CHAMP, GRACE, GOCE), is studied. An isotropic analysis is represented by the homogenous distribution os measured values and the statistical properties of the model are calculated. In particular a correlation structure function is defined by the third order tensor (Taylor-Karman tensor), for the ensemble average of a set of incremental differences in measured components. Specifically, Taylor-Karman correlation tensor is calculated with the assumption that the analyzed random function is of a “potential type”. The special class of homogenous and isotropic correlation functions is introduced. Finally, a successful application of the concept is presented in the case study CHAMP and a comparison between modeled and estimated correlations is performed. Example C.9.31. Introduction A significant problem of LEO satellites, both in geometry and gravity space, is the association of quality standards to Cartesian ephemeris in terms of variance-covariance matrix valued-functions. as a pessimistic measure of the quality standards os LEO satellite ephemeris, a three dimensional Taylor-Karman structured criterion matrix has been proposed, named in honor of Taylor (1938) and Karman (1938), the founders of the statistical theory of turbulence.
856
C Statistical Notions, Random Events and Stochastic Processes
The concept of the Taylor-Karman criterion matrices was first introduced by E. Grafarend (1979) and subsequently further developed by B. Schaffrin and E. Grafarend (1982), H. Wimmer (1982), and E. Grafarend et. al. (1985, 1986). with this contribution we extend the application of the Taylor-Karman structured matrices to the third-dimension, namely to the long-arc orbit analysis. If we assume the vector-valued stochastic process to be the gradient of a random scalar-valued potential, in particular its longitudinal and lateral correlation function or the “correlation function along-track and across-track”, what would be the structure of a three-dimensional Taylor-Karman variance-covariance matrix. In order to answer this question, a three-dimensional correlation tensor and its decomposition in the case of homogeneity and isotropy is studied with the emphasis on R3 . Additionally, we deal with a special class of homogenous and isotropic tensorvalued correlation functions. They are derived, analyzed and applied to the data validation process. The theoretical concept for the application of the previously discussed criterion matrix in the geometric and gravitational analysis of LEO satellite ephemeris is presented. Finally the case study CHAMP is used for the application of our theory concept followed by the results and conclusion. Example C.9.32: Homogenous and isotropic variance-covariance tensor and correlation functions in a three-dimensional Euclidean space Example C.9.321: Notions of homogeneity and isotropy The notions of homogeneity and isotropy for functions on R are briefly explained as the following. The general context for these two definitions involves the action of a transitive group of motions on a homogenous and space belongs to the extensive theory of Lie groups (F. W. Warner (1983), A. M. Yaglom (1987)). However, it is important to clarify the different notions of homogeneity and isotropy exist due to the variety of homogenous spaces and transitive group actions on these spaces. The terminology introduced associates the notion of homogeneity and isotropy with
Fig. C32 Graphical representation of the longitudinal and lateral correlation functions
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
857
the functions on R that are invariant under the translation group acting on R . The notion of isotropy is defined for functions on R that are invariant under the orthogonal group acting on R . Table C.9.9: Tensor-valued correlation function of second-order “two-point correlation function” ˙.x ; x / D
3 P i;j D1
e i e j ˙ij .x ; x / D
3 P i;j D1
˙ij .x ; x /e i e j
(C443)
r WD kx ; x k and r WD x ; x ˙ij .r/ D ˙m .r/ıij C Œ˙l .r/ ˙m .r/
xi xj r2
; i; j 2 f1; 2; 3g
(C444)
“Cartesian versus spherical coordinates” x1 D x WD .x1 x1 / D r cos ˇ cos ˛ x2 D y WD .x2 x2 / D r cos ˇ sin ˛
(C445)
x3 D z WD .x3 x3 / D r sin ˇ “continuity of a potential type” ˙l .r/ D End of Table C.9.9
d Œr˙m .r/ dr
m .r/ D ˙m .r/ C r d˙dr
(C446)
Example C.9.322: Homogenous and isotropic variance-covariance tensor (correlation tensor) Taylor (1938) proved that he homogenous random field X.t/ correlation function ˙.r/ D< X.t C r/X.t/ > depends only on the length r D krk of the vector r and not on its direction, where < ::: > denoted the ensemble average. If the correlation function ˙.r/ of the homogenous random field X.t/ in R has the property, then the field X.t/ is said to be an isotropic random field in R . The corresponding correlation function ˙.r/ is then called an isotropic correlation function in R ( or an n-dimensional isotropic correlation function). Process, which satisfy the introduced postulates of homogeneity and isotropy, are said to be (widely) stationary (M. J. Yadrenko (1983)). Note that for an isotropic random field in R , all directions in space are obviously equivalent. The decomposition of a homogenous and isotropic variance-covariance tensorvalued function, shown in box 1, was introduced by von Karman and Howarth
858
C Statistical Notions, Random Events and Stochastic Processes
Fig. C33 Graphical representation of the correlation tensor transformation
(1938) by means of a more general and direct method than the one used by G. J. Taylor(1938). Additionally, Robertson (1940) refined and reviewed the T. Karman and L. Howard equation in the light of a classical invariant tensor theory. The decomposition of ˙ij .kx x k/, with a special emphasis on n D 3, is performed in terms of Cartesian coordinates and with respect to the orthonormal frame of reference fe1 ; e2 ; e3 j0g attached to the origin 0 of a three-dimensional Euclidean space. kx x k denotes the Euclidean distance r between the two points x and x of E WD fR ; ıij . Longitudinal and lateral correlation functions ˙l and ˙m , are the structural elements of such a homogenous and isotropic tensorvalued variance-covariance function which appear in the spherical tensor ˙m .r/ıij as well as in the oriented tensor Œ˙l .r/ ˙m .r/xi xj =r 2 for all i; j 2 f1; 2; 3g, see (C443) and Fig. C.32. ıij denotes the Kronecker or unit matrix, xi the Cartesian coordinate difference between the points x and x . These differences are also represented in terms of relative spherical coordinates .˛; ˇ; r/, (C444). Finally the continuity condition of a potential type is formulated by (C445) which provides the unique relation between ˙l and ˙m (G. J. Taylor (1938); H. M. Obuchow (1958); E. Grafarend (1979)). Due to its complexity, it is necessary to further elaborate on the previous equation set. The correlation tensor ˙ij .r/ was transformed to a special coordinate system O 0 x10 x20 x20 in R instead of the initial set Ox1 x2 x3 . The new set O 0 x10 x20 x30 is selected in such a way so that its origin O 0 is shifted by the vector x with respect to the origin O, as illustrated in Fig. C15. This means that O 0 coincides with the terminal point of the vector x that refers to the initial coordinates, while the axis O 0 x10 lies along the vector x x .
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
859
Fig. C34 The behavior of Tatarski’s correlation function for different values of shape parameter .d D 900/ and scale parameter d. D 3=2/
˙ij , introduced here are explanatory functions, are the components of the correlation tensor ˙ij .r/ in the new set of coordinates. The functions ˙ij0 .r/ clearly depend only on the length r D krk of the vector r, since the direction of r is fixed in the new set of coordinates. In the space R exist a reflection which leaves the points x and x D .O 0 / unmoved and it replaces the axis O 0 xj0 by O 0 xj0 , where j ¤ 1 is a fixed number. However, it does not change the directions of all other coordinates axes O 0 xl0 ; l 2 f1; 2; 3g and l ¤ j . It follows that ˙ij0 .r/ D ˙ij0 .r/ D 0text rmf ori ¤ j
(C447)
˙ij0 .r/
Hence, only the diagonal elements of ˙ij .r/ can differ from zero. Further more, if i ¤ j and j ¤ 1, then the axis O 0 xi0 by its rotation around the axis O 0 xj0 , can be transformed to the axis O 0 x10 . Hence 0 0 ˙22 .r/ D ˙33 .r/
(C448)
The tensor ˙ij0 and, consequently, ˙ij are symmetric and their components ˙ij0 .r/ can take at most only tow non-equal non-zero values at the already introduces longitudinal correlation function ˙i0i .r/ D ˙l .r/ and the lateral correlation function 0 0 ˙22 .r/ D ˙33 .r/ D ˙m .r/, which specify in a unique way the correlation tensor. In order to obtain the explicit form of ˙ij .r/, as the function of ˙l .r/ and ˙m .r/, the unit vectors of the old coordinate axes Ox1 ; Ox2 ; Ox3 along the axes of the new system O 0 x10 ; O 0 x20 ; O 0 x30 must be resolved and then ˙ij .r/ can be represented 0 as linear combination of the functions ˙kl .r/; .k ¤ i; l ¤ j and k; l 2 f1; 2; 3g/, which leads to (C442) and (C443).
860
C Statistical Notions, Random Events and Stochastic Processes
Example C.1. homogenous and isotropic correlation functions
It was shown in the pervious section that for a homogenous and isotropic random fields defined on the Euclidean space R , the correlation between x and x depends only on the Euclidean distance kx x k. Therefore as shown in Table C32, we can distinguish a homogenous and isotropic correlation function on R with the real valued function ˙.r/ defined on Œ0; 1/ and we denote by ˚n the class of all continuous permissible functions ˙.r/ W Œ0; 1/ ! such that ˙.0/ D 1 (we are working in terms od correlation not covariance) and the symmetric function ˙.k:k/ is a positive definite on R . The characterization of classes ˚n , also shown in Table C.9.10, is a well known result of J. F. Schoenberg (1938). The function ˙.r/ W Œ0; 1/ ! R is an element of ˚n if and only if it abmits a representation in the form of (C450), where W is a probability measure on Œ0; 1/; J.n2/=2 is the Bessel function of the first kind of order .n 2/=2 and stands for the Gamma function. Hence in for geodesy relevant spaces R and R , (C450) reduces to the form presented in (C452). Table C.9.10: Isotropic correlation and Schoenberg’s characterization “homogenous and isotropic correlation function” ˙.x ; x / D ˙.kx ; x k/; x ; x 2 R r WD kx ; x k R “the characterization of the ˚n class” ˝n .r/d W ./ ˙n .r/ D
Œ0;1/
.n2/=2
J.n2/=2 .r/ ˝n .r/ D .n=2/. r2 / “reduction of the ˝n for n D 2 and n D 3” ˝2 .r/ D J0 .r/ and ˝3 .r/ D r 1 sin r
(C449)
(C450) (C451) (C452)
End of Table C.9.10 Table C.9.11: Special classes of correlation functions “Tatarski’s class” 21 r . / K . dr / ./ d
˙Œ .r/ D
(C 453)
“Shkarofsky’s class” 2
˙Œ;ı .r/ D
=2
. r 2 Cı 2 / d
2
K .. r 2 Cı 2 / d ı K .ı/
End of Table C.9.11
1=2
/
(C454)
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
861
Example C.2. Tatarski’s class of homogenous and isotropic correlation functions Many analytical candidate models for ˙ have been suggested in the literature (for example, Buell, (1972); Haslett, (1989)), but we refer to Tatarski (1962) as being the first who elaborated on such correlation functions which fulfill all the conditions presented in the previous section. The Tatarski’s correlation function class is shown in Table C.9.11 and illustrated by Fig. C.34. In addition to V. J. Tatarski’s class, a very general family of correlation function models due to J. P. Shkarofsky (1968) is introduced, that came as the generalization of Tatarski’s correlation function family. These two classes, which have been proved to be the members of ˚3 (Shkarofsky, (1968) and of ˚1 classes; Gneiting et. al. (1999)), can be applied to many geodetic problems, (e.g. Grafarend (1979); Meier (1982); Wimmer (1982); Grafarend (1985)). In equation of Table C.9.11, K stands for a modified Bessel function of order ; d > 0 is a scale parameter, and ı > 0 and are shape parameters. In the case of ı D 0 J. P. Shkarofsky’s class reduces to V. J. Tatarski’s class. The special case of Tatarski’s class appears if the shape parameter is sum of a non-negative integer k and 1=2. Then the right-hand side of the equation can be written as a product of exp.r=d / and a polynomial of degree k in r=d (eg. T. Gneiting, (1999)). In particular, in the case of n D 3 dimensional Markov process of the p D 1 order, shown in Fig. C34, the shape parameter is expressed as D .2p C 1/=.n 1/ D 3=2 and results in the flowing simplification of (C454): ˙Œ3=2 .r/ D .1 C
r r / exp. / d d
(C455)
Example C.3. Three dimensional Taylor-Karman criterion matrix in the geometric and gravitational analysis of LEO satellite ephemeris
WE have so far analyzed the theoretical background and solution for design of the homogenous and isotropic Taylor-Karman correlation tensor. The question is, how this theoretical concept applies to the geometric and gravitational analysis of LEO satellite ephemeris. The basic idea is, that the errors in the position vectors of LEO satellites constitute a vector-valued stochastic process. Following this concept, a satellite orbit of LEOGPS tracked satellites is an homogenous and anisotropic field of error vectors and the error situation is described by the covariance function. As it is well known, the error situation os a newly determined position in a three-dimensional Euclidean space is the best, when the error ellipsoid is sphere (isotropy) with the
862
C Statistical Notions, Random Events and Stochastic Processes
minimal radius and if the error situation is uniform over the complete satellite orbit (homogeneity). This “ideal” situation can be explained by the three-dimensional Taylor-Karman structured criterion matrix of W. Baarda - E. Grafarend (potential) type. Then the correlations between the vectors of pseudo-observations of satellite ephemeris described by the longitudinal and lateral correlation functions. The characteristic correlation functions can be estimated by matching the correlation tensor with a three-dimensional Markov process of 1st order and with the introduction of some additional information about the underlying process. The correlation analysis is performed with the assumption that the vector valued threedimensional random function is of “potential type” (E. Grafarend (1979)), i.e. the gradient of a random scalar function “signal s.x/”. The structure of the random function s.x/ is outlined as an n-dimensional Markov process of the p t h order. Figure C35 illustrates the case n D 3 and p D 1: One of the simples differential equation os such a process has the form given by p
.2 ˛ 2 / s.x/ D e.x/
(C456)
where e.x/ is a white noise. If the Laplace operator can be applied to the homogenous random scalar function, then it transforms this function into a new homogenous random function, having the spectral density that corresponds to the spectral density of the correlation function of (C453) and (C454), see P. Whittle (1954), V. Heine (1955) and P. Whittle (1963). Hence the homogenous solution of C455 (if it exist) must have the spectral density that corresponds to the spectral density of the correlation function. Table C.9.9 summarizes the representation of the homogenous and isotropic correlation functions of type (i) Signal correlation function ˙, (ii) Longitudinal correlation function ˙l and (iii) Lateral correlation function ˙m Example C.9.34: Results and conclusions Case study: Champ As the numerical test in this study, we processed two data sets: two threedimensional fx.tk /; y.tk /; z.tk /g and fx.td /; y.td /; z.td /g Cartesian ephemeris time-series of CHAMP satellite orbit for the test period from day 140 to 150 of 2001 (20 May to 30 May, both inclusive). We analyzed in total 27,360 triplets of satellite positions. Both time series are indexed with a 30 s sampling rate and referenced to a kinematic (index k) and a dynamic CHAMP orbit (index d ). The dynamic orbit used as a reference, provides us with ephemeris differences between the two orbits. The estimation of (“real”) auto and cross correlations between the
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
863
Fig. C35 The six point interaction in the grid; the three-dimensional Markov process of the 1st order (auto regressive process); the gray rectangle represents the same process in two-dimensions
vectors of pseudo-observations as functions of time, can be performed as Priestley (1981). According to Tables C.9.10 and C.9.11 the Taylor-Karman structured (“ideal”) correlations are computed from the three-dimensional fx.tk /; y.tk /; z.tk /g time series. The adopted scale parameter is d D .2=3/Rchar , where Rchar is the characteristic length of the process. The characteristic length is defined by the arc length of 2400 seconds (80 points for the 30 second sampling rate). The both parameters are experimentally estimated. For further details on the scale parameter and characteristic length, please see Wimmer (1982). The numerical results of the study are graphically presented in Fig. C37. The gray area represents the estimated (“real”) correlation situation along the satellite arc as presupposed by Austen et al. (2001). The high auto and low cross-correlations between CHAMP satellite positions for approximately 20 min of an orbit arc are very evident. The Taylor-Karman structured correlation (black line), as theoretically assumed, gives an upper bound of the “real” correlation situation along the satellite orbit.
864
C Statistical Notions, Random Events and Stochastic Processes
Fig. C36 The behavior of the longitudinal and lateral correlation functions, (C456) and (C 458), for different values of the scale parameter d. D 3=2/
Table C.9.12 Longitudinal and lateral correlation function for a homogenous and isotropic vector-valued random field of potential type, 1st order Markov process “condition for a process of potential type” ˙.x/ !
2p r2
Rr 0
! ˙l .r/ D
x˙.x/ dx ! d dr Œr˙m .r/
“input” ˙.r/ D
21=2 r 3=2 K3=2 . dr / .3=2/ . d /
D .1 C dr / exp. dr /
(C457)
“output” ˙l .r/ D 6. dr /2 C exp . dr /Œ4 C 2. dr / C 6. dr /1 C 6. dr /2
(C458)
˙m .r/ D 6. dr /2 C exp . dr /Œ2 C 6. dr /1 C 6. dr /2
(C459)
End of Table C 9.12
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
865
Fig. C37 Graphical representation of the numerical study results (20 minute correlation length, 40 points for the 30 second sampling rate)
Ex 4: Nonlinear Prediction
Example C.9.4: Nonlinear Prediction For direct interpolation, extrapolation or filtering without trend analysis, linear prediction gives poor results, asking for a more general concept, namely nonlinear prediction. A set of functional series of Volterra type for nonlinear prediction of scalar-, vector-, in general, tensor-valued signals is set-up. Optimizing the Helmert or Work Meister point error, a set of linear integral equations, generalizing the Weiner-Hopf integral equations. Multi-point-correlation functions (MPCF) are characteristic for those equations. A criterion for linear prediction, namely the vanishing of all MPCFs beside the two-point-correlation, is formulated solutions of the discrete integral equations and the prediction error tensor of rank two are reviewed. A structure analysis of PCFs of homogenous and isotropic vector fields proves the number of characteristic functions to be 2.n D 2/; 4.n D 3/; 10.n D 4/ and 26.n D 5/. The methods of statistical prediction were developed relating to stochastic processes by two famous scientists, by the Russian A. N. Kolmorogov (1941) and by the American N. Wiener (1941), nearly at the same time during the Second World War.
866
C Statistical Notions, Random Events and Stochastic Processes
The world-wide resonance to these milestones began rather late: the work of A. N. Kolmorogov was due to the bad documentation of Eastern publications in Western countries nearly unknown, the great works of N. Wiener appeared as a classified report of section D2 of the American National Defence Research Committee. At that time the difficult formula apparatus made the acceptance rather impossible. Nowadays, the difficulties are only history, the number of publications about linear filtering, interpolation and extrapolation, reaches ten thousands. For example, the American Journal JRE, Transactions on Information Theory, New York appears regularly every year with a literature report about the State-of-the-Art. Notable are the works of R. Kalman (1960) who succeeded to solve elegant by the celebrated perdition equation, the Wiener-Hopf integral equation in transforming to a differential equation which is much easier to solve. The methods became very well known as a method of data analysis applied to Navigation of the Apollo Spacecraft. In practice we begin with a two-step procedure. First, we analyze the observed values with respect to trend: we fit a data set to a harmonic function, to a exponential function or a polynomial with respect to “position”. Second, we analyze the data after the trend fit to a Wiener-Kolmogorov prediction assuming a linear relation between the predicted signal and the reduced unknown signal. The coefficients in this linear relation depend on the distance between the points and its direction. It is well known how to operate with the discrete approximation of the classical Wiener-Hopf integral equation. The two step procedure accounts for the typical results that (a) the direct application of a linear Kolmogorov-Wiener prediction gives too inconsistent results, namely too large mean-square-errors of the predicted signal. The remarkable disadvantage of the two step procedure, in contrast is the immense computer work. Thus the central question is: is the two-step-procedure necessary when a concept for nonlinear prediction exists. For all statistical prediction concepts the correlation function between various points is the key quantity. In practice, the computational problems are greatly reduced if we are able to find out whether the predicted signal is subject to a statistical homogenous and isotropic process. It is a key information if a correlation function depends only on the distance between the involved points or is a function of the difference vector, namely isotropy and homogeneity, our special topic in a special section. Example C.9.41: The hierarchy for the equations in nonlinear prediction of scalarvalued signals We review shortly the characteristical equations which are given in nonlinear prediction. Before hand we agree to the following notation: All Greek indices run f1; 2; ; N 1; N g counting the number of points P˛ with a signal s.P˛ /. The
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
867
signal may be a scalar signal, a vector signal or a tensor signal. A natural assumption is the “simple prediction model” that the predicted signal s.P / at the point P is a linear function of the observed signal s.P˛ / for ˛ 2 f1; 2; ; N 1; N g. The coefficients which describe the linear relation are dented by a.PP˛ /. The inconsistency between the predicted signal and the observed signal is denoted by the letter “"00 leading the prediction variance Ef".P /".P /g, namely assuming that N P a.P; P˛ /s.P˛ / is random. the difference s.P / ˛D1
Nonlinear prediction is defined by a nonlinear relation between the predicted signal and the observed signal, for instance the sum of a linear and a quadratic relation described by our previous formula. The postulate os a minimum mean square error leads to the problem to estimate the coefficients a and b. Of course, we could continue with signal functions which depends of arbitrary power series. The open question is which is signal relation of arbitrary powers. In order to give an answer to our key question, we have to change our notion from discrete to an integral concept. We arrive in this concept if we switch over to a continuous distribution of points P , namely from fP; P˛ g to fr; r0 g. In a one-dimensional problem we integrate over a measured line, in a two-dimensional problem we integrate over a surface. The integral set-up, in contrast, is reviewed by formula .C 461/ .C 464/ O.s 3 / expresses the order of neglected terms in the functional series S Œs.r0 . We refer to a type of Taylor series in a generalized Hilbert space referred to V. Volterra (1959), called Volterra functional series. The coefficient functions given by .C 462/ are called as variational derivatives in a generalized Hilbert space. A special criticism originates from the continuous version of the Volterra series. Help comes from the application of functional analysis, namely the Dirac function as a generalized function illustrated by formulae (C464) and (C465). It is easy now the switch from the continuum to the discrete, namely for a regular grid. Nowadays the integral version is “simple” to end up with a discrete formulation, introduced already by N. Wiener. Up to now we have introduced a set-up of our prediction formulae for scalar signals. In relating our work to practical problems we have to generalize to vector-valued functions. Table C.9.13 Discrete and continuous Volterra series for scalar signals
s.P / D
N P ˛D1
“discrete 2nd order Volterra series” N P N P a.P; P˛ /s.P˛ / C b.P; P˛ ; Pˇ /s.P˛ /s.Pˇ / C ".P / ˛D1 ˇD1
“continuous 2nd order Volterra series”
(C460)
868
C Statistical Notions, Random Events and Stochastic Processes
R
d n r0 a.r; r0 /s.r0 / R R C d n r0 d n r00 a.r; r0 ; r00 /s.r0 /s.r00 / CO.s 3 / C ".r/ .C461/ “continuous 3rd order Volterra series” R s.r/ D d n r0 a.r; r0 /s.r0 / R R 0 00 0 00 C R d n r0 R d n r00 a.r; / R nr ;000r /s.r0 /s.r n 0 n 00 00 000 C d r d r d r a.r; r ; r ; r /s.r0 /s.r00 /s.r000 / .C462/ C O.s 4 / C ".r/ “variational derivatives” s.r/ D
ıs.r/ .C 463/ ıs.r0 / 2 s.r/ 1 a.r; r0 ; r00 / D ıs.rı0 /ıs.r 00 / 2Š 1 ı 3 s.r/ a.r; r0 ; r00 ; r000 / D 3Š ıs.r0 /ıs.r00 /ıs.r000 / “from continuous to discrete via Dirac functions: example” C1 R n 0 d r f .r0 /ı.r; r0 / D f .r/ 1 2 C1 R n 0 0 ! 1 for r D r ; d r ı.r; r0 / D 1 0 ı.r; r / WD 4 1 0R otherwi se: s.P / D d n r0 a.P; r0 /s.r0 / N N P P D K˛d n r0 ı.P; r0 / D a.P; P˛ /s.P˛ / a.r; r0 / D
˛D1
(C464)
(C465)
˛D1
End of Table C.9.12 Example C.9.42: The hierarchy of the equations in nonlinear prediction of vectorvalued signals In order to take care of practice, we will present finally the concept of prediction of vector-valued signals set-up by vector-valued Volterra series of type (C641). The partial variational derivatives of the vector-valued functions, also called signal Si .r/, are given by (C459) upto order O.s 3 . We let the indices of Latin type run 1; 2; ; n1; n where n is the maximum dimension number. We apply the Einstein summation convention, namely the addition with respect to identical numbered indices. In order to fill the deflections depend on the position r of the measurements. In order to determine the unknown coefficients of the nonlinear prediction “ansatz”, we choose risk function to be optimized. For instance, we may choose the trace of the variance function T1 D t rij r D mi n, called Helmert point error, or the determinant of the variance function T2 D detij r D mi n, called Werkmeister
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
869
point error, as two basic invariants called Hilbert invariants. Such an optimal design leads to a linear integral equation, a system of first kind, for the unknown tensors aij .r; r0 /, aij k .r; r0 ; r00 / etc. If we only refer to terms up to second order, namely to quadratic signal terms, we have eliminated cubic, biquadratic etc. term, the standard concept of linear prediction. The first nonlinear term, the quadratic terms, lead to the concept of two-point correlation functions: ˚ij .r; r0 / WD Efsi .r/sj .r0 /g three-point correlation functions: ˚ij k .r; r0 ; r00 / WD Efsi .r/sj .r0 /sk .r00 /g and four-point correlation functions: ˚ij kl .r; r0 ; r00 ; r000 / WD Efsi .r/sj .r0 /sk .r00 /sl .r000 /g The term “multipoint functions” is to be understood that tensor valued functions between multipoints appear as correlation functions. With respect to linear prediction newly three-point and four-point correlation functions appear. Let us formulate the discrete version of the integral equations: here run all the bar indices .1; 2; ; N n/ up to the number N n. Indeed, we approximated the integral (C470), (C471) etc. by (C472), (C473) etc. Indeed we switched from continuum to discrete by our summation convention by neglecting the summation signs. For instance, the P P ı two-point and three-point functions ai m .r; r1 /; ai mn .r; r1 ; r2 / are written ai m ; bi mn in the discrete versions. In summary, our matrices have N nŒ.N n/2 C .N n/ elements. They correspond to the correlation functions ˚i ;j ; ˚i ;j ;k and ˚i ;j ;k;l . Finally we restrict us to prediction terms of to second order when we write (C473) to (C476). Here we advantage of two conventions: “Extensor” calculus developed by Craig (1930–1939) abbreviates (a) as the multiindex the series i1 i2 ; i1 . Again we sum up indices in brackets. In the new version of H. V. Craig’s multiindex we gain our characteristic equations (C476) - (C 479). The convention coefficients ai .i / have their origin in the generalized Taylor series with the condition that higher order coefficients are getting smaller with increasing multiindex degree. If we apply the fundamental prediction equation (C475) we have to sum up the coefficients ai ;i ; ai ; i1 ; i2 ; ai ; i1 ; i2 ; i3 etc. With the generalized variance-covariance matrix, function of the coordinates of the position vector r. The Helmert point error reduces proportional to the number of prediction coefficients as long as the inequality (C481) holds. If we fix the coefficient functions aij .r; r0 / for the distance zero to one, the following coefficients aij k .r; r0 ; r00 /; aij kl .r; r0 ; r00 ; r000 / are smaller than one.
870
C Statistical Notions, Random Events and Stochastic Processes
Table C.9.44 Discrete and continuous Volterra series for vectors continuous 2nd order Volterra series si .r/ D C
n P n R P j D1 kD1 aij .r; r0 /
n 0
d r D
R
n R P j D1
d n r0 aij .r; r0 /sj .r0 /
d n r00 aij k .r; r0 ; r00 /sj r0 sk r00 C o.s 3 /
ısi .r/ ; ısj .r0 /
aij k .r; r0 ; r00 / D
(C466)
ı 2 si .r/ ısj .r0 /ısk .r00 /
(C467)
Variance function as a tensor of 2nd order ˚ R ij .r/ D E Œsi .r/ d n .r1 /ai i1 .r; r1 /si1 .r1 / R d n .r1 / d n .r2R/ai i1 i2 .r; r1 ; r2 /si1 .r1 /si2 .r2 / Œsj .r/ d n .r1 /ajj1 .r; r1 /sj1 .r1 / R R n d .r1 / d n .r2 /ajj1 j2 .r; r1 ; r2 /sj1 .r1 /sj2 .r2 / R
(C468) J1 D t rij .r/ D mi n; J2 D detij .r/ D mi n
(C469)
from continuous to discrete: multipoint correlation functions R ˚ij .r; r0 / D d n r1 ai m .r; r1 /˚j m .r0 ; r1 /C R n R n d r1 d r2 ai mn .r; r1 ; r2 /˚j mn .r; r1 ; r2 /
(C470)
R ˚ij k .r; r0 ; r00 / D d n r1 ai m .r; r1 /˚j km .r0 ; r00 ; r1 /C R n R n d r1 d r2 ai mn .r; r1 ; r2 /˚j kmn .r0 ; r00 ; r1 ; r2 /
(C471)
P
˛
P ı
˛ ı
˚jPk˛ D ai m ˚j m C bi mn ˚j mn ;
P ˛ˇ
P
˛ˇ
P ı
˛ˇ ı
˚ij k D ai m ˚j km C bi mn ˚j kmn (C472)
˚j k D ai m ˚j m C bimn ˚j mn ; ˚i j k D ai m ˚j km C bi mn ˚j kmn (C473) ai 1 D ci 1 ; ai 2 D ci 2 ; ; ai m D ci m ; ; ai N n D ci N n bi11 D ci N nC1 ; bi12 D ci N nC2 ; ; bi 1n D ci N n C n; ; bi 1N n D ci N n C N n bi21 D ci 2N nC1 ; bi22 D ci 2N nC2 ; ; bi 2n D ci 2N n C n; ; bi2N n D ci 2N n C N n bi31 D ci 3N nC1 ; bi32 D ci 3N nC2 ; ; bi 3n D ci 3N n C n; ; bi3N n D ci 3N n C N n :: : bi m1 D ci mN nC1 ; bi m2 D ci mN nC2 ; ; bimn D ci mN n C n; ;
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
871
bi mN n D ci mN n C N n bi N nN n D ci .N n/2 N n
(C474)
˚0 0 D C0 0 ˚ 0 0 ; C0 0 D ˚0 0 ˚1 0 0
(C475)
˚i.j / .ai.i / ; ˚.j /.i // D 0
(C476)
˚ij1 .ai i1 ; ˚j1 i1 / .ai i1 i2 ; ˚j1 i1 i2 / D 0
(C477)
˚ij1 j2 .ai i1 ; ˚j1 j2 i1 / .ai i1 i2 ; ˚j1 j2 i1 i2 / D 0
(C478)
.ai i1 ; ˚j1 i1 / D .ai i1 i2 ; ˚j1 j2 i1 i2 / D
R
d n r1
R
R
d n r1 ai i1 .r; r1 /˚j1 i1 /.r0 ; r1 /
(C479)
d n r2 ai i1 i2 .r; r1 ; r2 /˚j1 j2 i1 i2 /.r0 ; r00 ; r1 ; r2 / (C480)
ij .r/ D ..ıi.i / ai.i / /; ˚j.i / / ..ıi.i / ai.i / /; aj.j / ˚.i /.j //
(C481)
2t r.ai.i / ; ˚j.i / / > t r.ai.i / ; aj.j / ; ˚.i /.j //
(C482)
Table C.9.44: Volterra series Example C.4. Structure analysis
Here we analyst the structure of multipoint functions of higher order with respect to statistical assumptions of homogeneity and isotropy. These assumptions made our prediction formula much simpler. If we were not assuming this statistical property, we had to compute for every point and every direction an independent higher order correlation function. For instance, if we had to calculate two-pointcorrelation functions in two dimensions every 1 Gon, we had to live with 1600 different correlation functions. If we would be bale to assume isotropy, the number of characteristic functions would reduce to two functions only! In the case of threepoint-functions the reductions range from 3200 non-isotropic functions to four in case of isotropy. Of course, we will prove these results. The structure of these multi-point-functions of statistically homogenous and isotropic vector-valued signals follow from the form invariance of the functions with respect to rotations and mirror functions of the coordinate system. Alternatively, we have to postulate invariance against transformations of the complete orthogonality group O.n/. Form invariance of a multi-point-correlation functions is to be
872
C Statistical Notions, Random Events and Stochastic Processes
Fig. C38 Projections: longitudinal and lateral components for two-, three- and four point correlation functions, isotropy-homogeneity assumption
understood with respect to base functions which leads to scalar quantities. Our next section offers a better understanding of these theoretical results. As a short introduction to two-point-correlation os vector-valued signals we illustrate the theory of characteristic functions for two-point-functions, for three-point-functions and four-point-functions by Fig. C38. In case of a vector valued functions the structure of the correlation function used for linear prediction is generated the coordinates si .r/ at the point r or P1 and sj .r0 / at the point r0 or P2 and the projected coordinate components along the connection line P1 P2 (along track) and orthogonal to the line P1 P2 (cross track) called “longitudinal” and “lateral” sp and sn (“parallel” p, “normal” n). These characteristic functions ˚nn resp. ˚pp given by the two-point correlation functions illustrated in Fig. C38.
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
873
Table C.15 Characteristic functions for two-point- and three-point correlation functions of vector-valued signals “homogenous and isotropic two-point-function” ˚nn .r/ D Efsn .r/sn .r0 /g; ˚pp .r/ D Efsp .r/sp .r0 /g, ˚pn .r/ D ˚np .r/ D 0 ˚j .r; r0 / D Efsi .r/sj .r0 /g D ˚nn ıij Œ˚pp .r/ ˚nn .r/
xi xj r2
(C483) (C484)
“homogenous and isotropic three-point-function” 9 ˚nnp .r1 ; r2 / D Efsn .r/; sn .r0 /; sp .r00 /g > > = ˚npn .r1 ; r2 / D Efsn .r/; sp .r0 /; sn .r00 /g 0 00 ˚pnn .r1 ; r2 / D Efsp .r/; sn .r /; sn .r /g > > ; ˚ppp .r1 ; r2 / D Efsp .r/; sp .r0 /; sp .r00 /g ˚ij k .r; r0 ; r00 / D D
xi 0 xi jrr0 j ıj k ˚pnn .r1 ; r2 /
xk00 xk ı ˚ .r ; r / x100 x1 ij nnp 1 2
C
C
(C485)
xj 0 xj jrr0 j ıi k ˚npn .r1 ; r2 /C
.xi 0 xi /.xj 0 xj /.xk00 xk / jrr0 j2 .x100 x1 /
f˚ppp .r1 ; r2 / ˚pnn .r1 ; r2 / ˚npn .r1 ; r2 / ˚nnp .r1 ; r2 /g
(C486)
End of Table: characteristic functions In Table C 16, we present only results. How is the detailed proof of our results? This is here our central interest. We begin with the definition of the popular scalar product and the norm of a Hilbert space by equations (c 484) and (c 488), extend by the variance-covariance function (C489) by Craig calculus generalizations of a Hilbert space are given by (C490) and (C491). In terms of Craig calculus we compute Helmert’s optimal design of unknown coefficients by (C492), (C493), (C494) and (C495) and in contrast Werkmeister’s optimal design by (C496), (C497) and (C498), again of the unknown coefficients. Table C.16 Optimal variance function Hilbert space: scalar product, norm .f; g/ D
R ˝
f !g!dP !
(C487)
874
C Statistical Notions, Random Events and Stochastic Processes
jf2 j D
rR p .f; f / D jf !j2 dP !
(C488)
˝
Variance function of vector-valued signals, Craig calculus ij .r/ D ."i .r/; "j .r// D ..ıi.i / ai.i / /; ˚j.i / / ..ıi.i / ai.i / /; aj.j / ˚i.j / / (C489) Hilbert optimal design, Craig calculus J1 D tr ij .r/ D ..ıi.i / ai.i / /; ˚j.i / / ..ıi.i / ai.i / /; aj.j / ˚i.j / / D min ıJ1 ŒaI.i / D 0; ı 2 J1 ŒaI.i / > 0 (C490) .
dJ1 Œai.i / C"bi.i / /"D0 d"
D 0; .
d 2 J1 Œai.i / C"bi.i / /"D0 d "2
9 > > > > > > > =
>0
J1 Œai.i / C "bi.i / D tr ..ıi.i / .ai.i / C "bi.i ///; ˚j.i / / tr ..ıi.i / .ai.i / C "bi.i ///; .aj.j / C "bj.j //˚i.j / / .
dJ1 Œai.i / C"bi.i / /"D0 d"
.
D 0 W tr .bi.i / ; .˚j.i / aj.j / ˚i.j // D 0
d 2 J1 Œai.i / C"bi.i / /"D0 d "2
> > > > > > > ;
> 0 W tr .bi.k/; bj.l/ ˚.k/.l/ / > 0
(C491)
(C492)
positive definiteness of the correlation function tr
R
d n r0
R
d n r00 bi k .r; r0 /bj i .r; r00 /˚kl .r; r00 / > 0
(C493)
Werkmeister optimal design, Craig calculus 9 J2 D det ij .r/ D f..ı1.i / a1.i / /; ˚1.i / / ..ı1.i / a1.i / /; a1.j / ˚.i /.j //g = f..ı2.i / a2.i / /; ˚2.i / / ..ı2.i / a2.i / /; a2.j / ˚.i /.j //g ; f..ı1.i / a1.i / /; ˚2.i / / ..ı1.i / a1.i / /; a2.j / ˚.i /.j //2 g (C494) ıJ2 Œai.i / D 0; ı 2 J2 Œai.i / > 0 .
dJ1 Œai.i / C"bi.i / /"D0 d"
D 0; .
d 2 J1 Œai.i / C"bi.i / /"D0 d "2
>0
J2 Œai.i / C "bi.i / D A0 C A1 " C A2 "2 C A3 "3 C A4 "4 .
d 2 J1 Œai.i / C"bi.i / /"D0 d "2
D0W
A1 D 0
9 > > > > > > = > > > > > > ;
(C495)
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
.
d 2 J1 Œai.i / C"bi.i / /"D0 d "2
>0W
875
A2 > 0
(C496)
End of Table: Craig optimal design Finally, we study in more detail the concept of structure functions. We illustrate the form invariance of its base invariants which are invariant in terms of rotational and mirror symmetry. Let us call the base vector ei .r/ relating to signal si .r/ at the placement vector r; ej .r0 / relating to signal sj .r0 / at placement vector r0 etc. The set-up of base vectors by (C449) and corresponding structure functions by (C580) are given. They lead us to the combination of base invariants ˚1 ; ˚2 following Taylor (1935), and Karman (1937), ˚3 ; ˚4 ; ˚5 etc in Table (C501)–(C505). finally, we summarize up to order three the various isotropic and homogenous correlation functions in Table C.17, (C501)–(C505). They are based on the work of Karman and Howarth (1938), and Obuchow (1958). Let us present finally the number of characteristic functions for homogenous and isotropic correlation functions of higher order for vector-valued signals, collected in Table C.18. Table C.17 Scalar products of base invariants base invariants 9 xi xi ; xj 0 xj 0 ; xk 00 xk 00 ; xl 000 xl 000 ; etc. > > > > > ei ei 0 ; ei ei 00 ; ei ei 000 ; etc. > > > > ei 0 ei 00 ; ei 0 ei 000 ; etc. > > = etc. > xi ei ; xi ei 0 ; xi ei 00 ; xi ei 000 ; etc. > > > > xi 0 ei 0 ; xi 0 ei 00 ; xi 0 ei 000 ; etc. > > > > xi 00 ei 00 ; xi 00 ei 000 ; etc. > > ; etc.
(C497)
base functions ˚1 .ei / D ˚i ei ˚2 .ei ; ej 0 / D ei ej 0 ˚ij 0 .r; r0 / ˚3 .ei ; ej 0 ; ek 00 / D ei ej 0 ek 00 ˚ij 0 k 00 .r; r0 ; r00 / ˚4 .ei ; ej 0 ; ek 00 ; el 000 / D ei ej 0 ek 00 el 000 ˚ij 0 k 00 l 000 .r; r0 ; r00 ; r000 /
9 > > > > > = > > > > > ;
(C498)
combinations of base invariants ˚1 D ˚1 .ei / D ˚1 ei
(C499)
876
C Statistical Notions, Random Events and Stochastic Processes
˚2 .ei ; ej 0 / D ei ej 0 ˚ij 0 .r; r0 / D ei ej 0 fxi xj 0 a1 .r/ C ıij 0 a2 .r/g (C500) (Taylor (1935), Karman (1937)) ˚3 .ei ; ej 0 ; ek 00 / D ei ej 0 ek 00 ˚ij 0 k 00 .r; r0 ; r00 / D D ei ej 0 ek 00 fxi xj 0 xk 00 a1 .r1 ; r2 / C xi ıj 0 k 00 a2 .r1 ; r2 /C Cxj 0 ıi k 00 a3 .r1 ; r2 / C xk 00 ıij 0 a4 .r1 ; r2 /g ˚4 .ei ; ej 0 ; ek 00 ; el 000 / D ei ej 0 ek 00 el 000 ˚ij 0 k 00 l 000 .r; r0 ; r00 ; r000 / D D ei ej 0 ek 00 el 000 fxi xj 0 xk 00 xl 000 a1 .r1 ; r2 ; r3 / Cxi xj 0 ık 00 l 000 a2 .r1 ; r2 ; r3 /C Cxj 0 xl 000 ıi k 00 a3 .r1 ; r2 ; r3 / C xi xk 00 ıj 0 l 000 a4 .r1 ; r2 ; r3 /C Cxi xl 000 ıj 0 k 00 a5 .r1 ; r2 ; r3 / C xj 0 xk 00 ıi l 000 a6 .r1 ; r2 ; r3 /C Cxk 00 xl 000 ıij 0 a7 .r1 ; r2 ; r3 / C ıij 0 ık 00 l 000 a8 .r1 ; r2 ; r3 /C Cıi k 00 ıj 0 l 000 a9 .r1 ; r2 ; r3 / C ıj 0 k 00 ıi l 000 a10 .r1 ; r2 ; r3 /g
(C 501) 9 > > > > > > > > > = > > > > > > > > > ; (C502)
9 ˚4 .ei ; ej 0 ; ek 00 ; el 000 ; em0000 / D ei ej 0 ek 00 el 000 em0000 ˚ij 0 k 00 l 000 m0000 .r; r0 ; r00 ; r000 ; r0000 / D > > > > > D ei ej 0 ek 00 el 000 em0000 fxi xj 0 xk 00 xl 000 xm0000 a1 .r1 ; r2 ; r3 ; r4 / > > > > Cxi xj 0 xk 00 ıl 000 m0000 a2 C > > > > Cxi xj 0 xm0000 ık 00 l 000 a3 C xi xj 0 xl 0000 ık 00 m0000 a4 > > > > > Cxi xl 000 xm0000 ıj 0 k 00 a5 C > > > > Cxi xk 00 xm0000 ıj 0 l 000 a6 C xi xk 00 xl 000 ıj 0 m0000 a7 > > = Cxk 00 xl 000 xm0000 ıij 0 a8 C > Cxj 0 xl 000 xm0000 ıi k 00 a9 C xj 0 xk 00 xm0000 ıi l 000 a10 > > > > Cxj 0 xk 00 xl 000 ıi m0000 a11 C > > > > Cxi ıj 0 k 00 ıl 000 m0000 a12 C xj 0 ıi k 00 ıl 000 m0000 a13 C xk 00 ıij 0 ıl 000 m0000 a14 C > > > > > Cxl 000 ıij 0 ık 00 m0000 a15 C xm0000 ıij 0 ık 00 l 000 a16 C xi ıj 0 l 000 ık 00 m0000 a17 C > > > > Cxj 0 ıi l 000 ık 00 m0000 a18 C xk 00 ıi l 000 ıj 0 m0000 a19 C xl 000 ıi k 00 ıj 0 m0000 a20 C > > > > Cxm0000 ıi k 00 ıj 0 l 000 a21 C xi ık 00 l 000 ıj 0 m0000 a22 C xj 0 ıi m0000 ık 00 l 000 a23 C > > ; Cxk 00 ıi m0000 ıj 0 l 000 a24 C xl 000 ıi m0000 ıj 0 k 00 a25 C xm0000 ıi l 000 ıj 0 k 00 a26 g (C503) End of Table: Scalar products of basic invariants
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
877
Table C.18: Isotropic and homogenous correlation functions, two- and three- point functions, normal and longitudinal components
˚111 ˚112 ˚113 ˚121 ˚131 ˚211 ˚231 ˚311 ˚321 ˚331 ˚333
9 ˚11 .r; r0 / D a1 .r/r 2 C a2 .r/ D ˚pp = ˚12 .r; r0 / D ˚12 .r; r0 / D ˚pn D ˚np D 0 ; ˚22 .r; r0 / D ˚33 .r; r0 / D ˚nn 9 00 D r12 .x1 x1 /a1 .r1 ; r2 / C r1 Œa2 .r1 ; r2 / C a3 .r1 ; r2 / > > > 00 > > C.x1 x1 /a4 .r1 ; r2 / D ˚ppp > > 00 00 > 2 > D r1 .x3 x2 /a1 .r1 ; r2 / C .x2 x2 /a4 .r1 ; r2 / > > > 00 00 2 > > D r1 .x3 x3 /a1 .r1 ; r2 / C .x3 x3 /a4 .r1 ; r2 / > > > > D 0; ˚122 D r1 a2 D ˚pnn ; ˚123 D 0 > > = D 0; ˚132 D 0; ˚133 D r1 a2 D ˚pnn > D 0; ˚212 D r1 a3 D ˚npn ; ˚213 D 0 > > > > D ˚232 D ˚233 D 0 > > > > D ˚312 D ˚313 D r1 a3 D ˚npn > > > > > D ˚322 D ˚323 D 0 > > 00 00 > > D .x1 x1 /a4 D ˚nnp ; ˚332 D .x2 x2 /a4 ; > > 00 ; D .x3 x3 /a4 ˚ ˚ a1 .r/ D ppr 2 nn ; a2 .r/ D ˚nn a1 .r2 ; r2 / D a2 .r2 ; r2 / D a3 .r2 ; r2 / D a4 .r2 ; r2 / D
˚ppp ˚pnn ˚npn ˚nnp r12 .x100 x1 / ˚pnn r1 ˚npn r1 ˚nnp .x100 x1 /
(C 504)
(C505)
9 > > > > = (C506)
> > > > ;
˚ij .r; r0 / D ˚nn .r/ıij C Œ˚pp .r/ ˚nn .r/
xi xj r2
(C507)
.x 00 x / x xi ı ˚ C r1j ıi k ˚npn C .xk00 xk / ıij ˚nnp r1 j k nnn 1 1 x x .x 00 x / C ir 2 .xj00 xk / k .˚ppp ˚pnn ˚npn ˚nnp / 1 1 1
˚ij k .r; r0 ; r00 / D
(C508) 0
00
˚ij 0 k 0 .r; r D r / D xi xj 0 xk 0 a1 .r1 / C xi ıj 0 k 0 a2 .r1 / C.xj 0 ıi k 0 C xk 0 ıij 0 /a3 .r1 / ˚ppp .r1 /˚pnn .r1 /2˚nnp .r1 / xi xj 0 xk 0 r13 .x j 0 ıi k 0 Cxk 0 ıij 0 / xi C˚pnn .r1 / r1 ıj 0 k 0 C ˚nnp .r1 / r1
D
(C509) End of Table: Isotropy and homogenity
878
C Statistical Notions, Random Events and Stochastic Processes
Table C.19:Number of characteristic functions for multi-point functions based on vector-valued signals multi-point correlation functions
number of characteristic functions
2 3 4 5
2 4 10 26
End of Table: characteristic functions
Ex 5: Nonlocal Time Series Analysis Example: Nonlocal time series analysis Classical time series analysis, for instance Earth tide measurements, is based on local harmonic and disharmonic effects. Here we present a nonlocal concept to analyze on a global scale measurements at different locations. The tool of the generalized harmonic analysis base on correlation functions of higher order between vector-valued signals at different points and at different times. Its polyspectrum is divided into poly-coherence and poly-phase. A 3d-filder of Kolmogorov-Wiener type is designed for an array analysis. The optional distance between stations and the optimal registration time- Nyquist frequency- is given to avoid aliasing. Standard procedures of signal analysis treat a registered time series of one station only. It is our target here to combine various observation stations: we are interested in the global picture handled by array analysis. We like to refer to applications in seismology and tidal analysis, the great success of tomography. A 3d- or multichannel filter makes possible to arrange an optimal signal-to-noise ratio in terms of an optimal design of the station distance and an optimal design of the registration time, namely statistical accuracy of numbers which decide upon a linear or nonlinear filter. In the first section we introduce the classical nonlocal analysis for N D 2 stations for a vector-valued signal. We assume no correlation between the signal components. The mono-spectrum, namely the Fourier-transform of scalar-valued correlation functions is separated by coherence- and phase-graphs. The second section is a straight forward approach to registered vector-valued signals between N D 2 stations. Support has the origin in the analysis of coherence surface and of phase surface of Fourier-transformed correlation-tensors. Poly-spectra of signal spacetime functions form the basis of N stations analysis in the third section. We
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
879
separate real- and imaginary-parts in terms of a poly-coherence-graph and of a poly-phase-graph, both matrix forms. An extended array analysis is based on a Kolmogorov-Wiener filter in the fourth section. We give a simple analysis to predict an optimal registration and an optimal station distance. At the end we give references to (a) array analysis, (b) mono-spectrum and (c) bi- and poly-spectra. Scalar signals, “classical analysis” In terms of the “classical time series analysis” registrations at two different places are to be combined: Which are the frequency bands in which we can detect coherences, which is the spectral phase-dislocation between two registrations. We are to keep in mind the Fundamental Lemma of N. Wiener that a frequency assured by coherence has the property of maximal probability, characterized by signal timefunctions at two stations. We sum up the following operations: (i) Calculation of auto- and cross-correlation functions of scalar-valued signals, (ii) Calculation of Fourier-transforms of auto- and cross-correlation functions, separation of mono-spectra in terms of active and passive (cos versus sin terms) components, (iii) Plots of coherence graphs H 2 .k1 ; !1 / and phase graphs 2 .k1 ; !1 / Here we depart from a scalar space-time representation of a signal, namely by a signal s.r; t/ at a placement r and at the time t and a signal s.r0 ; t 0 / at the placement r0 and at the time t 0 by correlation, in detail by (C512 - C 519). f Œs.r; t/; s.r0 ; t 0 / denotes the probability density, to meet the signals s and s 0 at the placements r; r0 and the times t; t 0 . we intend to transform the probability integral into a spacetime integral assuming homogeneity and ergodicity by (c 513) and (C514). The e i ; !1 / is defined by (C515), space-time complex mono-spectrum ˚.ri ; t1 / ! ˚.k e direct and inverse, (C516), ˚.ki ; !1 / decomposed into a real-valued part and p an imaginary part. “e” denotes the Fourier transform, i W 1, n indicates the dimension. In analogy to wave propagation in electrodynamics, we define e DW e .ki ; !1 / active spectrum or co-spectrum, but J mf˚g e DW e quadratic Rcf˚g or “blind spectrum”. Backward transformation ˚.ri ; 1 / cos.!1 1 < k1 jr1 >/ e .ki ; !1 /; .k e i ; !1 /g Let us and ˚.ri ; 1 / sin.!1 1 < k1 jr1 >/ lead to f 2 2 2 e e finally H WD .ki ; !1 / C .ki ; !1 /g as the space-time coherence graph, and e i ; !1 /= e .ki ; !1 /; .ki ; !1 / the phase graph, both very tan .ki ; !1 /g WD .k important in defining “coherent light”. Figure C39 illustrates the phase graph, e2 C e2 the length of the complex vector in the polar coordinates. The auto correlation functions .r; r/ and .r0 ; r0 / clearly lead us to “angle” and “length” in the complex plane Fig. C40 illustrates the fundamental coherence- and phasefunctions. Finally the contributions of lackman (1965); Granger and Hatanaka (1964); Harris (1967); Kertz (1969, 1971); Lee (1960); Pugachev (1965); Robinson (1967), Robinson and Treitel (1964).
880
C Statistical Notions, Random Events and Stochastic Processes
Table C.20: Variance-covariance function of a scalar signal, direct and inverse Fourier transform, space-time representation EŒs.r; t/s.r0 ; t 0 / WD ˚.r; r0 ; t; t 0 / WD
C1 R
C1 R
(C510)
ds.r0 ; t 0 /s.r; t/s.r0 ; t 0 /f Œs.r; t/s.r0 ; t/
(C511)
Cy Cx CT R R R 1 dt dx dy 8T xy T;x;y!1 x y T
(C512)
s.x; y; t/sŒx C .x 0 C x/; y C .y 0 C y/; t C .t 0 C t/
(C513)
1
ds.r; t/
1
˚.r0 r; t 0 t/ D
lim
r0 r WD r1 ; e 1 ; !1 / D .2 / ˚.k
C1 R
nC1 2
1
d n r1
C1 R 1
t 0 t WD 1
(C514)
d 1 ˚.r1 ; 1 / exp Ci.!1 1 k1 r1 / (C515)
˚.r1 ; 1 / D .2 /
nC1 2
C1 R 1
d n k1
C1 R 1
e .k1 ; !1 / exp i.!1 1 k1 r1 / d!1 ˚ (C516)
e 1 ; !1 / D ReŒ˚ e C i I mŒ˚ e D e .k1 ; !1 / C i .k e 1 ; !1 / ˚.k e 1 ; !1 / WD .2 / .k
nC1 2
e .k1 ; !1 / WD .2 /
nC1 2
C1 R 1 C1 R 1
d n r1 d n r1
C1 R 1 C1 R 1
(C 517)
d 1 ˚.r1 ; 1 / cos.!1 1 k1 r1 /
(C518)
d 1 ˚.r1 ; 1 / sin.!1 1 k1 r1 /
(C519)
e 2 .k1 ; !1 / C e2 .k1 ; !1 / H 2 WD
(C520)
e2 .k1 ; !1 / e 2 .k1 ; !1 /
(C521)
tg .k1 ; !1 / WD
End of Table: variance-covariance functions
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
881
Fig. C39 Decomposition into real and imaginary component Gauss ; e g representation fe
Fig. C40 Scalar coherence and phase graph
C.9.52 First generalization: vector-valued classical analysis At first, we deal with separate single components of a general three-component tidal recording. We develop a joint analysis of all three components observed at two points on the surface of the Earth, for example. Its signal functions are correlated with each other, in the special case here the vertical component as well as the horizontal component. We receive a correlation tensor of second degree, namely in the form of the coherence matrix and the phase matrix. In detail, the analysis technique is based on the following operations: (i) Computation of auto- and cross-correlations of the vector-valued signal, (ii) Computation of the Fourier transform of the functions separation within the mono spectral matrix in “active” and “blind” spectra cosine- and sine-Fourier transform, (iii) Part of the coherenec matrix Hij .k1 ; !1 / as well as the phase matrix ij .k1 ; !1 /. Theoretical Background
882
C Statistical Notions, Random Events and Stochastic Processes
Let si .x; t/ D ıgi .x; t/ be a vector-valued space-time-signal consisting of vertical and horizontal components subject to i D 1 or x; i D 2 or y and i D 3 or z. The signal function si .x; t/ at the placement x and at the time instant t and the signal function sj .x0 ; t 0 / at the placement x0 and the time instant t 0 are correlated by means of Efsi .x; t/sj .x0 ; t 0 /g DW ˚ij .x; x0 ; t; t 0 / for i; j 2 f1; 2; 3g
(C522)
As an analogue of our previous formulae we define the coherence matrix, namely the coherence tensor Hij , and the phase matrix ij e 2ij .k1 ; !1 / C e2ij .k1 ; !1 / Hij .k1 ; !1 / WD
(C523)
e2ij .k1 ; !1 / e 2 .k1 ; !1 /
(C524)
tg ij .k1 ; !1 / D
ij
The coherence matrix as well as the phase matrix are represented by special surface for instance by coherence ellipses, coherence hyperbolae etc. C.9.53 Second generalization: poly spectra Up to now we analyzed only signal registrations at two different points by correlation, namely by coherence and phase. Here, we introduce the concept of poly spectra for the purpose of registration of Earth tides at all points we have registrations. In detail, the analysis technique is based on the following operations: (i) Computation of multi-point correlation functions of a vector-valued tidal signal (ii) Computation of the Fourier transforms called polyspectra. The concept is able to separate mono-, bi-, tri-, in short polyspectra in “active” and “blind” spectra, cosine- and sine-Fourier transform. (iii) Part of generalized coherence graphs Hi1 i2 iN 1 iN .k1 ; k2 ; ; kN 1 ; !1 ; !2 ; ; !N 1 /
(C525)
and the generalized phase graphs i1 i2 iN 1 iN .k1 ; k2 ; ; kN 1 ; !1 ; !2 ; ; !N 1 /
(C526)
at N tidal stations being functions of wave number vectors k1 ; k2 ; ; kN 1 and the frequencies !1 ; !2 ; ; !N 1 . Theoretical Background Let si1 .x; t/ ıgi.x; t/ be a tensor-valued space-time tidal signal at the first tidal station, si2 .x0 ; t/ be a tensor-valued tidal signal at the second tidal station. If we
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
883
correlate all signal functions with each other, e receive a multi-point correlation function equal to the number of tidal stations. Efsi1 .x; t/si2 .x0 ; t 0 / sin .xN ; tn /g WD ˚i1 i2 iN 1 iN
(C527)
In case of space like homogeneity and the time like stationarity we are able to represent the various correlation functions by ˚i1 i2 iN 1 iN .x1 ; x2 ; ; xN 1 ; 1 ; 2 ; ; N 1 /
(C528)
subject to x1 WD x0 x; x2 WD x00 x; etc :1 D t 0 t; 2 D t 00 t;
etc
(C529)
and their polyspectra e i1 i2 iN 1 iN .k1 ; k2 ; ; kN 1 ; !1 ; !2 ; ; !N 1 / D ˚ .2 /
n.N 1/C.N 1/ 2
C1 C1 Z Z n.N 1/ d r d n.N 1/ 1
1
˚i1 i2 iN 1 iN .r1 ; r2 ; ; rN 1 ; 1 ; 2 ; ; N 1 / exp Ci.!1 1 C !2 2 C C !N 1 N 1 k1 r1 k2 r2 kN 1 rN 1 / (C530) n is the dimension number, N the number of tidal stations. For the case of N D 3 we use the term “bispectrum”, for the case N D 4 we use the term “trispectrum” in general “polyspectrum”. The common separation of ˚i1 i2 iN 1 iN in real- and imaginary part leads us to the Polycoherence graph as well as the poly phase graph
e i1 i2 iN 1 iN C J m2 Œ˚ e i1 i2 iN 1 iN Hi1 i2 iN 1 iN D Rc 2 Œ˚
(C531)
e i1 i2 iN 1 iN J mŒ˚ e i1 i2 iN 1 iN RcŒ˚
(C532)
tgi1 i2 iN 1 iN D
Numerical results are presented in Brillinger (1964), Grafarend (1972), Hasselmann et al. (1963), Rosenblatt (1965, 1966), Sinaj (1963 i, ii), Tukey (1959), and Van Nees (1965).
884
C Statistical Notions, Random Events and Stochastic Processes
Table C.9.54: Generalized array analysis on the basis of a KolmogorovWiener filter (3D-filter) We do not attempt to review “filter theory”, instead we present array analysis for the purpose to optimally reduce noise in tidal measurements in optimal station density. The method of analysis is based in the following operations: (i) Computation of signal correlation functions from tidal registration in multistations and afterwards in noise correlation functions from an analysis of noise characteristic multi-dimensional stochastic process, for instance of type Markov, homogenous and isotropic (ii) Computation of spectra from diverse correlation functions (iii) Solving the linear matrix equation of type Wiener-Hopf equations of optimal filter functions. The optimal station density is achieved from the postulate < kjx > 1, the optimal registration time from the postulate !t 1. x is the distance of the tidal stations in position space, x the distance in the Fourier space, for instance kx D 2 =x wavelength aliasing t the discretization parameter in the time domain and ! the corresponding frequency distance in the Fourier space, for instance ! D 2 =T , ! the frequency aliasing. Example r
1 D k cos ^k; r 2 cos ^k; r
! 1=t ^k; r D 0; D 103 Km ; r Š 140 Km ^k; r D 300 ; D 103 Km ; r Š 330 Km t D 1 sec ; ! D 1 Hz ; T Š 6 sec t D 1 msec ; ! D 103 Hz ; T Š 6 msec Theoretical Background Lets us assume that the vector-valued registration signal consists of two parts, the pure signal part and the superimposed noise: ıgi .x; t/ D si .x; t/ C ni .x; t/
(C533)
The filtered signal ıgi0 .x; t/ has the linear structure of type ıgi0 .x; t/ D si0 .x; t/ C n0i .x; t/
(C534)
C-9 Scalar-, Vector-, and Tensor Valued Stochastic Processes
885
The filter error "i .x; t/ D si0 .x; t/ si .x; t/ C n0i .x; t/ ni .x; t/ is designed by I1 t rij .x; t/ D min or I1 det ij .x; t/ D min subject to the error tensor ij .x; t/ WD Ef"i .x; t/"j .x; t/g D min. The filter fij .x; t/ is defined by the linear Volterra setup
n0i .x; t/
C1 Z Z WD d n x1 d 1 fij .x1 ; 1 /nj .x x1 ; t 1 /
(C535)
1
J Œij .fkl .x; t// D min lead the “necessary” generalized Wiener-Hopf equations. C1 C1 Z Z n 00 ˚si sj .r; r ; t; t / D d r dt 00 fi k .r; r00 /˚j k .r; r00 / 0
0
1
(C536)
1
eij D ˚ e si sj .k1 ; !1 /Œ˚ e si sk C ˚ e ni nk C f
(C537)
ŒC identifies the generalized inverse of type minimum norm least squares. e Œi i D ˚ e ssŒi i Œ˚ e ssŒi i C ˚ e nnŒi i F
(C538)
The result of 3D or linearized multi-channel leads to an easy solution. If we would care for a nonlinear filter subject to the multi-index J WD .j1 ; j2 ; ; j˛ / and K WD .k1 ; k2 ; ; kˇ / we will receive the solution ˚ij D .fi k ; ˚j k /
(C539)
with respect to the scalar product .:; :/ of the corresponding Hilbert space. Numericals examples you can find in C. B. Archambeau et. al. (1965), J. W. Britill and F. E. Whiteway (1965), H. W. Briscoe and P. L. Fleck (1965), J. P. Burg (1964), J. F. Claerbout (1964), P. Embree et. al. (1963), E. Grafarend (1972), A. N. Kolmogorov (1941), N. Wiener (1949, 1964, 1968). Comments In experimental sciences, namely geosciences like geophysics, geodesy, environmental sciences, namely, weather, air pollution, rainfall, oceanography, temperature affected sciences take action on a spatially and temporally large scale to very large scales. For this reason, they never can be considered stationary or homogenousisotropic, but non-stationary inhomogeneous or anisotropic in the statistical sense Spock and Pilz (2008) if not Guttorp and Sampson (1994) as one of the attempts to model non-stationary, other examples are geodetic networks
886
C Statistical Notions, Random Events and Stochastic Processes
in two- and three-dimensional Euclidean space and on two-, three- and fourdimensional manifolds including the sphere, the ellipsoid or General Relativity as an excellent example for inhomogeneity and anisotropy presented here. C. Calder and N. Cressie (2007) review different directions in spectral and convolution based on Gaussian non-stationary random modeling. Speck and Pilz (2008) present a spectral decomposition without any restriction to the Gaussian approach. They advice the reader to apply a special Generalized Linear Mixed Model (GLMM), the topic of our book, namely derived from the spectral decomposition of non-stationary random fields. Topics (i) Stationary and isotropic random field versus (ii) Non-stationary, locally isotropic random field (iii) Interpretation in the context of linear mixed models (LMMs) Y D Fˇ C Ab C "0
a. Fˇ: design matrix of the first kind, result: regression function, trend behaviour F fixed design matrix b. Ab: non-deterministic random fluctuations around the trend functions, A fixed design matrix. c. "0 : independent, homoscedastic error term centered at zero (iv) (v) (vi) (vii) (viii)
Relation to non-stationary, locally isotropic random fields Generalized linear mixed models (GLMMs) Extending generalized linear mixed models (DHGLMMs) Limear and mixed models and Hermitic polynomials Anisotropic stochastic process axisymmetric, Chandrasekhar (1950), Grafarend (1972)
Appendix D
Basics of Groebner Basis Algebra
D-1 Definitions To enhance the understanding of the theory of Groebner bases presented in Chap. 15, the following definitions supplemented with examples are presented. Three commonly used monomial ordering systems are; lexicographic ordering, graded lexicographic ordering and graded reverse lexicographic ordering. First we define the monomial ordering before considering the three types.
Definition D.1. (Monomial ordering): A monomial ordering on k Œx1 ; : : : ; xn is any relation > on Zn 0 or equivalently any relation on the set x ˛ ; ˛ 2 Zn 0 satisfying the following conditions: (a) Is total (or linear) ordering on Zn 0 (b) If ˛ > ˇ and 2 Zn 0 , then ˛ C > ˇ C (c) < is a well ordering on Zn 0 . This condition is satisfied if and only if every strictly decreasing sequence in Zn 0 eventually terminates. Definition D.2. (Lexicographic ordering): This is akin to the ordering of words used in dictionaries. If we define a polynomial in three variables as P D kŒx; y; z and specify an ordering x > y > z, i.e., x comes before y and y comes before z, then any term with x will supersede that of y which in tern supersedes that of z. If the powers of the variables for respective monomials are given as ˛ D .˛1 ; : : :; ˛n / and ˇ D .ˇ1 ; : : :; ˇn /; ˛; ˇ 2 Zn 0 ; then ˛ >lex ˇ if in the vector difference ˛ ˇ 2 Zn ; the most left non-zero entry is positive. For the same variable (e.g., x) this subsequently means x ˛ >lex x ˇ :
E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2, © Springer-Verlag Berlin Heidelberg 2012
887
888
D Basics of Groebner Basis Algebra
Example D.1. x > y 5 z9 is an example of lexicographic ordering. As a second example, consider the polynomial f D 2x 2 y 8 3x 5 yz4 C xyz3 xy 4 , we have the lexicographic order; f D 3x 5 yz4 C 2x 2 y 8 xy 4 C xyz3 jx > y > z .
Definition D.3. (Graded lexicographic ordering): In this case, the total degree of the monomials is taken into account. First, one considers which monomial has the highest total degree before looking at the lexicographic ordering. This ordering looks at the left most (or largest) variable of a monomial and favours the largest power. Let ˛; ˇ 2 Zn 0 ; then ˛ >grlex ˇ if n n P P ˛i > jˇj D ˇi or j˛j D jˇj, and ˛ >lex ˇ; in ˛ ˇ 2 Zn ; the most j˛j D i D1
i D1
left non zero entry is positive.
Example D.2. x 8 y 3 z2 >grlex x 6 y 2 z3 j.8; 3; 2/ >grlex .6; 2; 3/, since j.8; 3; 2/j D 13 > j.6; 2; 3/j D 11 and ˛ ˇ D .2; 1; 1/. Since the left most term of the difference (2) is positive, the ordering is graded lexicographic. As a second example, consider the polynomial f D 2x 2 y 8 3x 5 yz4 C xyz3 xy 4 ; we have the graded lexicographic order; f D 3x 5 yz4 C 2x 2 y 8 xy 4 C xyz3 jx > y > z:
Definition D.4. (Graded reverse lexicographic ordering): In this case, the total degree of the monomials is taken into account as in the case of graded lexicographic ordering. First, one considers which monomial has the highest total degree before looking at the lexicographic ordering. In contrast to the graded lexicographic ordering, one looks at the right most (or largest) variable of a monomial and favours the smallest power. Let ˛; ˇ 2 Zn 0 ; then ˛ >grevlex ˇ if n n P P ˛i > jˇj D ˇi or j˛j D jˇj, and ˛ >grevlex ˇ, and in ˛ ˇ 2 Zn the j˛j D i D1
i D1
right most non zero entry is negative.
Example D.3. x 8 y 3 z2 >grevlex x 6 y 2 z3 j.8; 3; 2/ >grevlex .6; 2; 3/ since j.8; 3; 2/j D 13 > j.6; 2; 3/j D 11 and ˛ ˇ D .2; 1; 1/. Since the right most term of the difference (-1) is negative, the ordering is graded reverse lexicographic. As a second example,
D-2 Buchberger Algorithm
889
consider the polynomial f D 2x 2 y 8 3x 5 yz4 C xyz3 xy 4 , we have the graded reverse lexicographic order: f D 2x 2 y 8 3x 5 yz4 xy 4 C xyz3 jx > y > z . P If we consider a non-zero polynomial f D ˛ a˛ x ˛ in k Œx1 ; : : : :; xn and fix the monomial order, the following additional terms can be defined:
Definition D.5. Multidegree of f : Multideg (f / D max.˛ 2 Zn 0 ja˛ ¤ 0/ Leading Coefficient of f : LC (f / D amultideg.f / 2 k Leading Monomial of f : LM (f / D x multideg.f / (with coefficient 1) Leading Term of f : LT .f / D LC.f / LM .f /. Example D.4. Consider the polynomial f D 2x 2 y 8 3x 5 yz4 C xyz3 xy 4 with respect to lexicographic order fx > y > zg , we have Multideg (f )=(5,1,4) LC (f )= 3 LM (f )= x 5 yz4 LT (f )= 3x 5 yz4 The definitions of polynomial ordering above have been adopted from Cox et al. (1997), pp. 52–58.
D-2 Buchberger Algorithm computes the Groebner basis using the computer algebraic systems ((CAS) of Mathematica and Maple. With Groebner basis, most nonlinear equations that are encountered in geodesy and geoinformatics can be solved. All that is required of the user is to write algorithms that can easily run in Mathematica or Maple using steps that are discussed in Sects. D-21 and D-22 below.
D-21 Mathematica Computation of Groebner Basis Groebner basis can be computed using algebraic softwares of Mathematica (e.g., version 7 onwards). In Mathematica, Groebner basis command is executed by writing I nŒ1 WD GroebnerBasisŒfpolynomialsg; fvariablesg;
(D.1)
890
D Basics of Groebner Basis Algebra
where In[1]:= is the Mathematica prompt which computes the Groebner basis for the Ideal generated by the polynomials with respect to the monomial order specified by monomial order options. Example D.5. (Mathematica computation of Groebner basis): In Example 15.3 on p. 536, the systems of polynomial equations were given as
f1 D xy 2y f2 D 2y 2 x 2 :
Groebner basis of this system would be computed by I nŒ1 WD GroebnerBasisŒff1 ; f2 g; fx; yg;
(D.2)
leading to the same values as in (15.10) on p. 536. With this approach, however, one obtains too many elements of Groebner basis which may not be relevant to the task at hand. In a case where the solution of a specific variable is desired, one can avoid computing the undesired variables, and alleviate the need for back-substitution by simply computing the reduced Groebner basis. In this case (D.1) modifies to I nŒ1 WD GroebnerBasisŒfpolynomialsg; fvariablesg; felimsg;
(D.3)
where elims is for elimination order. Whereas the term reduced Groebner basis is widely used in many Groebner basis literature, we point out that within Mathematica software, the concept of a reduced basis has a different technical meaning. In this book, as in its predecessor, and to keep with the tradition, we maintain the use of the term reduced Groebner basis. Example D.6. (Mathematica computation of reduced Groebner basis): In Example D.2.1, one would compute the reduced Groebner basis using (D.3) as I nŒ1 WD GroebnerBasisŒff1 ; f2 g; fx; yg; fyg;
(D.4)
which will return only 2x 2 C x 3 . Note that this form is correct only when the retained and eliminated variables are disjoint. If they are not, there is absolutely no guarantee as to which category a variable will be put in! As an example, consider the solution of three polynomials x 2 C y 2 C z2 1, xy z C 2, and z2 2x C 3y. In the approach presented in (D.4), the solution would be I nŒ1 WD GroebnerBasisŒfx 2 Cy 2 Cz2 1; xyzC2; z2 2xC3yg; fx; y; zg; fx; yg; (D.5) leading to
D-2 Buchberger Algorithm
891
f1024 832z 215z2 C 156z3 25z4 C 24z5 C 13z6 C z8 g: Lichtblau (Priv. Comm.) however suggests that the retained variable, in this case .z/ and the eliminated variables .x; y/ be separated, i.e., I nŒ1 WD GroebnerBasisŒfx 2 C y 2 C z2 1; xy z C 2; z2 2x C 3yg; fzg; fx; yg: (D.6) The results of (D.6) are the same as those of (D.5), but with the advantage of a vastly better speed if one uses an ordering better suited for the task at hand, e.g., elimination of variables in this example. For the problems that are solved in our books, the condition of the retained and eliminated variables being disjoint is true, and hence the approach in (D.5) has been adopted. The univariate polynomial 2x 2 C x 3 is then solved for x using the roots command in Matlab (see e.g., Hanselman and Littlefield (1997), p. 146) by roots. Œ1 2 0 0 /:
(D.7)
The values of the row vector in (D.7) are the coefficients of the cubic polynomial x 3 2x 2 C0x C0 D 0 obtained from (D.4). The values of y from Example (15.3) on p. 536 can equally be computed from (D.4) by replacing y in the option part with x and thus removing the need for back substitution. We leave it for the reader to compute the values of y from Example (15.3) and also those of z in Example (15.3) using reduced Groebner basis (D.3) as an exercise. The reader should confirm that the solution of y leads to y 3 2y with the roots y D 0 or y D ˙1:4142. From experience, we recommend the use of reduced Groebner basis for applications in geodesy and geoinformatics. This will; fasten the computations, save on computer space, and alleviates the need for back-substitution.
D-22 Maple Computation of Groebner Basis In Maple Version 10 the command is accessed by typing > with (Grobner); Basis (W L; T ) is for Groebner, gbasis (W L; T ) is for reduced Groebner basis. W L list or set of polynomials, T monomial order description, e.g. “plex” (variables) – pure lexicographic order and > is the Maple prompt and the semicolon ends the Maple command). Once the Groebner basis package has been loaded, the execution command then becomes > gbasis (polynomials, variables) which computes the Groebner basis for the ideal generated by the polynomials with respect to the monomial ordering specified by variables in the executable command. We should point out that at the time of writing this book, May 2012, version 16 of Maple has been released.
892
D Basics of Groebner Basis Algebra
D.3 Gauss Combinatorial Formulation
D.3 Gauss Combinatorial Formulation
893
References
Aarts, E.H. and Korst, J. (1989): Simulated Annealing and Boltzman Machines, Wiley, New York ¨ Abbe, E. (1906): Uber die Gesetzm¨aßigkeit in der Verteilung der Fehler bei Beobachtungsreihen, Gesammelte Abhandlungen, Vol. II, Jena 1863 (1906) Abdous, B. and R. Theodorescu (1998): Mean, median, mode IV, Statistica Neerlandica 52 (1998), 356-359 Abel, J.S. and Chaffee, J.W. (1991): Existence and uniqueness of GPS solutions, IEEE Transactions on Aerospace and ElectronicSystems 27 (1991) 952-956. I. Abhandlung Monatsberichte der Akademie der Wissenschafte 1853, Werke 4 (1929) 1-11 Absil, P.A., Mahony, R., Sepulehre, R. and Van Dooren, P. (2002): A Grassmann-Rayleigh quotient iteration for computing invariant subspaces, SIAM Review 44 (2002), 57-73 Abramowitz, M. and J.A. Stegun (1965): Handbook of mathematical functions, Dover Publication, New York 1965 Adams, M. and V. Guillemin (1996): Measure theory and probability, 2nd edition, Birkh¨auserVerlag, Basel Boston Berlin 1996 Adelmalek, N.N. (1974): On the discrete linear L1 approximation and L1 solutions of overdetermined linear equations, J. Approximation Theory 11 (1974), 38-53 Adler, R. and Pelzer, H. (1994): Development of a Long Term Geodetic Monitoring System of Recent Crustal Activity Along the Dead Sea Rift. Final scientific report for the Gennan-Israel Foundation GIF Aduol, F.W.O. (1989): Intergrierte geod¨atische Netzanalyse mit stochastischer Vorinformation u¨ ber Schwerefeld und Referenzellipsoid, DGK, Reihe C, Heft Nr. 351 Afifi, A.A. and V. Clark (1996): Computer-aided multivariate analysis, Chapman and Hall, Boca Raton 1996 Agnew, D.C. (1992): The time-domain behaviour of power-law noises. Geophysical research letters, 19(4):333-336, 1992 Agostinelli, C. and M. Markatou (1998): A one-step robust estimator for regression based on the weighted likelihood reweighting scheme, Stat. & Prob. Letters 37 (1998), 341-350 Agr´eo, G. (1995): Maximum likelihood estimation for the exponential power function parameters, Comm. Statist. Simul. Comput. 24 (1995), 523-536 Aickin, M. and C. Ritenbaugh (1996): Analysis of multivariate reliability structures and the induced bias in linear model estimation, Statistics in Medicine 15 (1996), 1647-1661 Aitchinson, J. and I.R. Dunsmore (1975): Statistical prediction analysis, Cambridge University Press, Cambridge 1975 Aitken, A.C. (1935): On least squares and linear combinations of observations, Proc. Roy. Soc. Edinburgh 55 (1935), 42-48
E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2, © Springer-Verlag Berlin Heidelberg 2012
895
896
References
Airy, G.B. (1861): On the algebraical and numerical theory of errors of observations and the combination of observations, Macmillan Publ., London 1861 Aki, K. and Richards P.G. (1980): Quantitative seismology, W.H. Freeman, San Francisco Albert, A. (1969): Conditions for positive and nonnegative definiteness in terms of pseudo inverses, SIAM J. Appl. Math. 17 (1969), 434-440 Alberda, J.E. (1974): Planning and optimization of networks: some general considerations, Boll. di Geodesicl, e Scienze, Aff. 33 (1974) 209-240 Albertella, A. and F. Sacerdote (1995): Spectral analysis of block averaged data in geopotential global model determination, Journal of Geodesy 70 (1995), 166-175 Alcala, J.T., Cristobal, J.A. and W. Gonzalez-Manteiga (1999): Goodness-of-fit test for linear models based on local polynomials, Statistics & Probability Letters 42 (1999), 39-46 Aldrich, J. (1999): Determinacy in the linear model: Gauss to Bose and Koopmans, International Statistical Review 67 (1999), 211-219 Alefeld, G. and J. Herzberger (1974): Einfhrung in die Intervallrechung, Bibliographisches Inst. Mannheim 1974 Alefeld, G. and J. Herzberger (1983): Introduction to interval computation. Computer science and applied mathematics, Academic Press, New York London 1983 Ali, A.K.A., Lin, C.Y. and E.B. Burnside (1997): Detection of outliers in mixed model analysis, The Egyptian Statistical Journal 41 (1997), 182-194 Allende, S., Bouza, C. and I. Romero (1995): Fitting a linear regression model by combining least squares and least absolute value estimation, Questiio 19 (1995), 107-121 Allmer, F. Louis Krger (1857-1923), 25 pages, Technical University of Graz, Graz 2001 Alzaid, A.A. and L. Benkherouf (1995): First-order integer-valued autoregressive process with Euler marginal distributions, J. Statist. Res. 29 (1995), 89-92 Amiri-Simkooei, A.R. (2004): A new method for second order design of geodetic networks: aiming at high reliability, survey Review, 37 (2004), pp. 552-560 Amiri-Simkooei, A.R. (2007): Least-squares estimation of variance components, Theory and EPS applications, Ph.D. Thesis, Delft Univ. of Tech., delft, The Netherlands 2007 An, H.-Z., F.J. Hickernell and L.-X. Zhu (1997): A new class of consistent estimators for stochastic linear regressive models, J. Multivar. Anal. 63 (1997), 242-258 Andel, J. (1984): Statistische Analyse von Zeitreihen, Akademie Verlag, Stuttgart 1984 Anderson, P.L. and M.M. Meerscaert (1997): Periodic moving averages of random variables with regularly varying tails, Ann. Statist. 25 (1997), 771-785 Anderson, R.D., Henderson H.V., Pukelsheim F. and Searle S.R. (1984): Best estimation of variance components from balanced data, with arbitrary kurtosis, Mathematische Operationsforschung und Statistik, Series Statistics 15, pp. 163-176 Anderson, T.W. (1945): Non-central Wishart Distribution and Its Application to Statistics, doctoral dissertation, Princeton University Anderson, T.W. (1946): The non-central Wishart distribution and certain problems of multivariate statistics, Ann. Math. Stat., 17, 409-431 Anderson, T.W. (1948): The asymptotic distributions of the roots of certain determinantal equations, J. Roy. Stat. Soc., B, 10, 132-139 Anderson, T.W. (1951): Classification by multivariate analysis, psychometrika, 16, 31-50 Anderson, T.W. (1973): Asymptotically efficient estimation of covariance matrices with linear structure, The Annals of Statistics 1 (1973), 135-141 Anderson, T.W. (1984): An introduction to multivariate statistical analysis, 2nd ed., J. Wiley, New York 1982 Anderson, T.W. (2003): An introduction to multivariate statistical analysis, J. Wiley, Stanford CA, 2003 Anderson, T.W. and M.A. Stephens (1972): Tests for randomness of directions against equatorial and bimodal alternatives, Biometrika 59 (1972), 613-621 Anderson, W.N. and R.J. Duffin (1969): Series and parallel addition of matrices, J. Math. Anal. Appl. 26 (1969), 576-594 Ando, T. (1979): Generalized Schur complements, Linear Algebra Appl. 27 (1979), 173-186
References
897
Andrews, D.F. (1971): Transformations of multivariate data, Biometrics 27 (1971), 825-840 Andrews, D.F. (1974): Technometrics 16 (1974), 523-531 Andrews, D.F., Bickel, P.J. and F.R. Hampel (1972): Robust estimates of location, Princeton University Press, Princeton 1972 Angelier, J., Tarantola, A., Vallete, B. and S. Manoussis (1982): Inversion of field data in fault tectonics to obtain the regional stress - I, Single phase fault populations: a new method of computing the stress tensor, Geophys. J. R. astr. Soc., 69, 607-621 Angulo, J.M. and Bueso, M.C. (2001): Random perturbation methods applied to multivariate Spatial Sampling design, Environmetrics, 12, 631-646. Anh, V.V. and T. Chelliah (1999): Estimated generalized least squares for random coefficient regression models, Scandinavian J. Statist. 26 (1999), 31-46 Anido, C. and T. Vald`ees (2000): Censored regression models with double exponential error distributions: an iterative estimation procedure based on medians for correcting bias, Revista Matem`eatica Complutense 13 (2000), 137-159 Anscoube, F.J. (1960): Rejection of outliers, Technometrics, 2 (1960), pp. 123-147 Ansley, C.F. (1985): Quick proofs of some regression theorems via the QR Algorithm, The American Statistician 39 (1985), 55-59 Anton, H. (1994): Elementary linear algebra, J. Wiley, New York 1994 Arcadis (1997): Ingenieurgeologischer Situationsbericht f¨ur 1997, Bericht des Pumpspeicherwerks Hohenwarthe, VEAG, 1997 Archambeau, C.B. and Bradford, J.C. and Broome, P.W. and Dean, W.C. and Flinn, E.A. (1965): Data processing techniques for the detection and interpretation of teleseismic signals, Proc. IEEE 53 (1965), 1860-1865 Arellano-Valle, R. and Bolfarine, H. and Lachos, V (2005): Skew-normal linear mixed models. In Journal of Data Science vol 3, pp.415-438 Arnold, B.C. and R.M. Shavelle (1998): Joint confidence sets for the mean and variance of a normal distribution, J. Am. Statist. Ass. 52 (1998), 133-140 Arnold, B.F. and P. Stahlecker (1998): Prediction in linear regression combining crisp data and fuzzy prior information, Statistics & Decisions 16 (1998), 19-33 Arnold, B.F. and P. Stahlecker (1999): A note on the robustness of the generalized least squares estimator in linear regression, Allg. Statistisches Archiv 83 (1999), 224-229 Arnold, K.J. (1941): On spherical probability distributions, P.H. Thesis, M.I.T., Boston 1941 Arnold, S.F. (1981): The theory of linear models and multivariate analysis, J. Wiley, New York 1981 Arrowsmith, D.K. and C.M. Place (1995): Differential equations, maps and chaotic behavior, Champman and Hall, London 1995 Arun, K.S. (1992): A unitarily constrained total least squares problem in signal processing, SIAM J. Matrix Anal. Appl. 13 (1992), 729-745 Ashkenazi V. and Grist, S. (1982): Inter-comparison of 3-D geodetic network adjustment models, International sypmosium on geodetic networks and computations of the I.A.G. Munich, August 31 to September 5, 1981. Atiyah, M.F., Hitchin, N.J. and Singer, J.M.: Self-duality in four-dimensional Riemannian geometry. Proc. Royal Soc. London A362 (1978) 425-461 Austen, G. and Grafarend, E.W. and Reubelt, T. (2001): Anal: Earth’s gravitational field from semi continuous ephe low earth orbiting GPS-tracked satellite of type CRA or Goce, in: Vistas for Geodesy in the New Millenn Adam and Schwarz, International Association of Geo posia 125, 309-315, 2001 Aven, T. (1993): Reliability and risk analysis, Chapman and Hall, Boca Raton 1993 Awange, L.J. (1999): Partial procrustes solution of the three dimensional orientation problem from GPS/LPS observations, Quo vadis geodesia...? Festschrift to E.W. Grafarend on the occasion of his 60th birthday, Eds. F. Krumm and V.S. Schwarze, Report Nr. 1999.6-1 Awange, J.L. (2002): Gr¨obner bases, multi polynomial resultants and the Gauss-Jacobi combinatorial algorithms - adjustment of nonlinear GPS/LPS observations, Schriftenreihe der Institute des Studiengangs Geod¨asie und Geoinformatik, Report 2002.1
898
References
Awange, J. and E.W. Grafarend (2002): Linearized Least Squares and nonlinear Gauss-Jacobi combinatorial algorithm applied to the 7-parameter datum transformation C7(3) problem, Zeitschrift f¨ur Vermessungswesen 127 (2002), 109-116 Awange, J.L. (2012): Environmental Monitoring using GNSS. Global Navigation Satellite Systems, Springer, Berlin-New York. Azzalini, A. (1996): Statistical inference, Chapman and Hall, Boca Raton 1996 Azzam, A.M.H. (1996): Inference in linear models with non stochastic biased factors, Egyptian Statistical Journal, ISSR - Cairo University 40 (1996), 172-181 Azzam, A., Birkes, D. and J. Seely (1988): Admissibility in linear models with polyhedral covariance structure, Probability and Statistics, essays in honor of Franklin A. Graybill, J.N. Srivastava, Ed. Elsevier Science Publishers, B.V. (North-Holland), 1988 Baarda, W. (1967): Statistical Concepts in Geodesy. Netherlands Geodetic Commission, Publications on Geodesy, New Series, Vol. 2, No.4, Delft. Baarda, W. (1967): A generalization of the concept strength of the figure, Publications on Geodesy, New Series 2, Delft 1967 Baarda, W. (1968): A testing procedure for use in geodetic networks, Netherlands Geodetic Commission, New Series, Delft, Netherlands, 2 (5), 1968 Baarda, W. (1973): S-transformations and criterion matrices, Netherlands Geodetic Commission, Vol. 5, No. 1, Delft 1973 Baarda, W. (1977): Optimization of design and computations of control networks, F. Halmos and J. Somogyi (eds.), Akademiai Kiado, Budapest 1977, 419-436 Babai, L. (1986): On Lovasz’ lattice reduction and the nearest lattice point problem, Combinatorica 6 (1986), 1-13 Babu, G.J. and E.D. Feigelson (1996): Astrostatistics, Chapman and Hall, Boca Raton 1996 Bai, J. (1994): Least squares estimation of a shift in linear processes, Journal of Time Series Analysis 15 (1994), 453-472 Bognarova, M., Kubacek, L. and Volaufova J. (1996): Comparison of MINQUE and LMVQUIE by simulation. Acta Univ. Palacki. Olomuc., Fac. Rerum Nat., Math. 35 (1996), 25-38. Bachelier, L. (1900): Theorie de le speculation, Ann, Sci. EC. Norm. Super. 17 (1900) 21-26 B¨ahr, H.G. (1988): A quadratic approach to the non-linear treatment of non-redundant observations Manuscripta Geodaetica, 13 (1988) 191-197. B¨ahr, H.G. (1991): Einfach u¨ berbestimmtes ebenes Einschneiden, differentialgeometrisch analysiert, Zeitschrift f¨ur Vermessungswesen 116 (1991) 545-552. Bai, W.M., Vigny, C., Ricard, Y. and Froidevaus, C. (1992): On the origin of deviatoric stresses in the lithosphere, J. geophys. Res., B97, 11728-11737 Bai, Z.D., Chan, X.R., Krishnaiah, P.R. and L.C. Zhao (1987): Asymptotic properties of the EVLP estimation for superimposed exponential signals in noise, Technical Report 87-19, CMA, University of Pittsburgh, Pittsburgh 1987 Bai, Z.D. and Y. Wu (1997): General M-estimation, J. Multivar. Anal. 63 (1997), 119-135 Bajaj, C.L. (1989): Geometric Modelling with Algebraic Surfaces. In: The Mathematics of Surfaces III, Ed.: D.C. Handscomb, Clarendon Press, Oxford. Bajaj, C., Garity, T. and Waren, J. (1988): On the applications of multi-equational resultants, Department of Computer Science, Purdue University, Technical Report CSD-TR-826, pp. 1-22, 1988. Baksalary, J.K. and Kala, R. (1981): Linear transformations preserving best linear unbiased estimators in a general Gauss-Markoff model. Ann. Stat. 9 (1981), 913-916. Baksalary, J.K. and A. Markiewicz (1988): Admissible linear estimators in the general GaussMarkov model, J. Statist. Planning and Inference 19 (1988), 349-359 Baksalary, J.K. and Puntanen, S. (1989): Weighted-Least-squares estimation in the general GaussMarkov model. In: Statistical Data Analysis and Inference, Pap. Int. Conf., Neuchatel/Switzerl. (Y. Dodge, ed.), North-Holland, Amsterdam, pp. 355-368. Baksalary, J.K. and Liski, E.P. and G. Trenkler (1989): Mean square error matrix improvements and admissibility of linear estimators, J. Statist. Planning and Inference 23 (1989), 313-325
References
899
Baksalary, J.K. and F. Pukelsheim (1989): Some properties of matrix partial ordering, Linear Algebra and it Appl., 119 (1989), pp. 57-85 Baksalary, J.K. and F. Pukelsheim (1991): On the L¨owner, Minus and Star partial orderings of nonnegative definite matrices and their squares, Linear Algebra and its Applications 151 (1991), 135-141 Baksalary, J.K. and Rao, C.R., Markiewicz, A. (1992): A study of the influence of the “natural restrictions” on estimation problems in the singular Gauss-Markov model. J. Stat. Plann. Inference 31, 3 (1992), 335-351. Balakrishnan, N. and Basu, A.P. (eds.) (1995): The exponential distribution, Gordon and Breach Publishers, Amsterdam 1995 Balakrishnan, N. and R.A. Sandhu (1996): Best linear unbiased and maximum likelihood estimation for exponential distributions under general progressive type-II censored samples, Sankhya 58 (1996), 1-9 Bamler, R. and P. Hartl (1998): Synthetic aperture radar interferometry, Inverse Problems 14 (1998), R1-R54 Banachiewicz, T. (1937): Zur Berechnung der Determinanten, wie auch der Inversen und zur darauf basierten Au o¨ sung der Systeme linearen Gleichungen, Acta Astronom. Ser. C3 (1937), 41-67 Bancroft, S. (1985): An algebraic solution of the GPS equations, IEEE Transaction on Aerospace and Electronic Systems AES-21 (1985) 56-59. Bandemer, H. (2005): Mathematik und Ungewissheit: Drei Essais zu Problemen der Anwendung, Edition avn Gutenberg platz, Leipzig 2005 Bandemer, H. et al. (1976): Optimale Versuchsplanning, Z. Auflage Akademic Verlag, Berlin 1976 Bandemer, H. et al. (1977): Theorie und Anwendung der optimalen Versuchsplanung I. AkademieVerlag, Berlin Bandemer, H. and W. Nather (1980): Theorie and Anwendung der optimalen Versuchsplanung II, akademic Verlag, Berlin 1980 Bandemer, H. and S. Gottwald (1993): Einfhrung in die Fuzzy-Methoden, Akademic-Verlag, Berlin 1993 Bahndorf, J. (1991): Zur Systematisierung der Seilnetzberechnung und zur Optimierung von Seilnetzen. Deutsche Geod¨atische Kommission, Reihe C, Nr. 373, Miinchen. Bansal, N.K., Hamedani, G.G. and H. Zhang (1999): Non-linear regression with multidimensional indices, Statistics & Probability Letters 45 (1999), 175-186 Barankin, E.W. (1949): Locally best unbiased estimates, Ann. Math. Statist. 20 (1949), 477-501 Barber, B.C. (1993): The non-isotropic two-dimensional random walk, Waves in Random Media, 3 (1993) 243-256 Barham, R.H. and W. Drane (1972): An algorithm for least squares estimation of nonlinear parameters when some of the parameters are linear, Technometrics 14 (1972), 757-766 Barlett, M. (1933): On the theory of statistical regression, Proc. Roy. Soc .. Edinb., 53, 260-283. Barlett, M. (1937): Property od sufficiency and statistical tests, Proc. Roy. Soc., A 160, 268-282 Barlett, M. (1947): Multivariate analysis, J. Roy. Stat. Soc., Supple., 9, 176-197. Barlow, R.E., Clarotti, C.A. and F. Spizzichino (eds.) (1993): Reliability and decision making, Chapman and Hall, Boca Raton 1993 Barlow, R.E. and F. Proschan (1966): Tolerance and confidence limits for classes of distributions based on failure rate, Ann. Math. Statist 37 (1966), 1593-1601 Barnard, J., McCulloch, R. and X.-L. Meng (2000): Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage, Statistica Sinica 10 (2000), 1281-1311 Barnard, M.M. (1935): The scalar variations of skull parameters in four series of Egyptian skulls, Ann. Eugen. 6 (1935), 352-371 Barndorff-Nielsen, O. (1978): Information and exponential families in statistical theory, J. Wiley, New York, 238 pp. Barndorff-Nielsen, O.E., Cox, D.R. and C. Kl¨uppelberg (2001): Complex stochastic systems, Chapman and Hall, Boca Raton, Florida 2001 Barnett, V. and Lewis, T. (1994): Outliers in Statistical Data. John Wiley, New York, 1994.
900
References
Barnett, V. (1999): Comparative statistical inference, 3rd ed., J. Wiley, New York 1999 Barone, J. and A. Novikoff (1978): A history of the axiomatic formulation of probability from Borel to Kolmogorov, Part I, Archive for History of Exact Sciences 18 (1978), 123-190 Barrlund, A. (1998): Efficient solution of constrained least squares problems with Kronecker product structure, SIAM J. Matrix Anal. Appl. 19 (1998), 154-160 Barrodale, I. and D.D. Oleski (1981): Exponential approximation using Prony’s method, The Numerical Solution of Nonlinear Problems, eds. Baker, C.T.H. and C. Phillips, 258-269 Bartelme, N. and Meissl, P. (1975): Ein einfaches, rasches und numerisch stabiles Verfahren zur Bestimmung des k¨urzesten Abstandes eines Punktes von einem sph¨aroidischen Rotationsellipsoid, Allgemeine Vermessungs-Nachrichten 82 (1975) 436-439. Bartlett, M.S. (1980): An introduction to stochastic processes with special reference to methods and applications, paperback edition, University press, Cambridge 1980 Bartlett, M.S. and D.G. Kendall (1946): The statistical analysis of variance-heterogeneity and the logarithmic transformation, Queen’s College Cambridge, Magdalen College Oxford, Cambridge/Oxford 1946 Bartoszynski, R. and M. Niewiadomska-Bugaj (1996): Probability and statistical inference, J. Wiley, New York 1996 Barut, A.O. and R.B. Haugen (1972): Theory of the conformally invariant mass, Annals of Physics 71 (1972), 519-541 Basset JR., G. and R. Koenker (1978): Asymptotic theory of least absolute error 73 (1978), 618-622 Basawa, I.V. (2001): Inference in stochastic processes in: D.N. Shanbhag and C.R. Rao. Handbook of statistics, pp. 55-77, Vol 19, Elsevier Science 2001 Basch, A. (1913): Uber eine Anwendung der graphostatischen Methode auf den Ausgleich von Beobachtungsergebnissen. OZfV 11, 11-18, 42-46. Baselga, S. (2007): Global Optimization Solution of Robust Estimation. J Surv. Eng. 2007, 133, 123-128. Basu, S. and Mukhopadhyay, S. (2000): Binary response regression with normal scale mixture links. Dey, D.K., Ghosh S.K.-and Mallick, B.K. eds., Generalized linear models: A Bayesian perspective, Marcel Dekker, New York Batchelor, G.K. (1946): Theory of axisymmetric turbulenz, Proc. Royal Soc. A186 (1946) 480 Batchelor, G.K. and A.A. Townsend (1950): The nature of turbulent motion at large wave numbers, Fluid Mech. (1952) 238-255 Batchelor, G.K. (1953): The theory of homogeneous turbulence, Cambridge University Press, Cambridge, 1953. Batchelor, G.K. (1967): The theory of homogenous turbulence, Cambridge (1967) Baselga, S. (2007): Critical Limitation in Use of’t Test for Gross Error Detection. J Surv. Eng. 2007, 133,52-55. Bateman, H. (1910): The transformation of the electrodynamical equations, Proc. London Math. Soc. 8 (1910), 223-264, 469-488 Bateman, H. (1910): The transformation of coordinates which can be used to transform one physical problem into another, Proc. London Math. Soc. 8 (1910), 469-488 Bates, D.M. and D.G. Watts (1980): Relative curvature measures of nonlinearity (with discussion), J. Roy. Statist. Soc. B42 (1980), 1-25 Bates, D.M. and M.J. Lindstorm (1986): Nonlinear least squares with conditionally linear parameters, Proceedings of the Statistical Computation Section, American Statistical Association, Washington 1986 Bates, D.M. and D.G. Watts (1988a): Nonlinear regression analysis and its applications, J. Wiley, New York 1988 Bates, D.M. and D.G. Watts (1988b): Applied nonlinear regression, J. Wiley, New York 1988 Bates, R.A., Riccomagno, E., Schwabe, R. and H.P. Wynn (1998): Lattices and dual lattices in optimal experimental design for Fourier models, Computational Statistics & Data Analysis 28 (1998), 283-296
References
901
Batschelet, E. (1965): Statistical methods for the analysis of problems in animal orientation and certain biological rhythms, Amer. Inst. Biol. Sciences, Washington Batschelet, E. (1971): Recent statistical methods for orientation, (Animal Orientation Symposium 1970 on Wallops Island), Amer. Inst. Biol. Sciences, Washington, D.C., 1971 Batschelet, E. (1981): Circular statistics in biology, Academic Press, New York London 1981 Bauer, H. (1992): Maß- und Integrationstheorie, 2. Au age, Walter de Gruyter, Berlin New York 1992 Bauer, H. (1996): Probability theory, de Gruyter Verlag, Berlin New York 1996 Bayen, F. (1976): Conformal invariance in physics, in: Cahen, C. and M. Flato (eds.), Differential geometry and relativity, Reidel Publ., 171-195, Dordrecht 1976 Beale, E.M. (1960): Confidence regions in non-linear estimation, J. Roy. Statist. Soc. B22 (1960), 41-89 Beaton, A.E. and J.W. Tukey (1974): Technometrics 16 (1974), 147-185 Becker, T. and Weispfenning, V. (1993): Gr¨obner bases. A computational approach to commutative algebra. Graduate Text in Mathematics 141, Springer-Verlag, New York 1993. Becker, T., Weispfennig, V. and H. Kredel (1998): Gr¨obner bases: a computational approach to commutative algebra, Corr. 2. print., Springer-Verlag, Heidelberg Berlin New York 1998 Beckermann, B. and E.B. Saff (1999): The sensitivity of least squares polynomial approximation, Int. Series of Numerical Mathematics, Vol. 131: Applications and computation of orthogonal polynomials (eds. W. Gautschi, G.H. Golub, G. Opfer), 1-19, Birkh¨auser-Verlag, Basel Boston Berlin 1999 Beckers, J., Harnad, J., Perroud, M. and P. Winternitz (1978): Tensor fields invariant under subgroups of the conformal group of space-time, J. Math. Phys. 19 (1978), 2126-2153 Beder, C. and Forstner, W, (2006): Direct Solutions for Computing Cylinders from Minima/ Sets of 3D Points. European Conference on Computer Vision ECCV 2006. Graz, Austria 2006, S. 135-146. Behnken, D.W. and N.R. Draper (1972): Residuals and their variance, Technometrics 11 (1972), 101-111 Behrens, W.A. (1929): Ein Beitrag zur Fehlerberechnung bei wenigen Beobachtungen, Landwirtschaftliche Jahrb¨ucher 68 (1929), 807-837 Belikov, M.V. (1991): Spherical harmonic analysis and synthesis with the use of column-wise recurrence relations, manuscripta geodaetica 16 (1991), 384-410 Belikov, M.V. and K.A. Taybatorov (1992): An efficient algorithm for computing the Earth’s gravitational potential and its derivatives at satellite altitudes, manuscripta geodaetica 17 (1992), 104-116 Ben-Menahem, A. and Singh, S.J. (1981): Seismic waves and sources. Springer-Verlag New York 1981 Ben-Israel, A. and T. Greville (1974): Generalized inverses: Theory and applications, J. Wiley, New York 1974 Benbow, S.J. (1999): Solving generalized least-squares problems with LSQR, SIAM J. Matrix Anal. Appl. 21 (1999), 166-177 Benciolini. B. (1984): Continuous networks II. Erice Lecture Notes, this volume. Heidelberg. Benda, N. and R. Schwabe (1998): Designing experiments for adaptively fitted models, in: MODA 5 - Advances in model-oriented data analysis and experimental design, Proceedings of the 5th International Workshop in Marseilles, eds. Atkinson, A.C., Pronzato, L. and H.P. Wynn, Physica-Verlag, Heidelberg 1998 Bennett, B. M. (1951): Note on a solution of the generalized Behrens-Fisher problem, Ann.Insr:Srat:XIath., 2, 87-90. Bennett, R.J. (1979): Spatial time series, Pion Limited, London 1979 Benning, W. (1974): Der k¨urzeste Abstand eines in rechtwinkligen Koordinaten gegebenen Außenpunktes vom Ellipsoid, Allgemeine Vermessungs-Nachrichten 81 (1974) 429-433. Benning, W. (1987): Iterative ellipsoidische Lotfußpunktberechung, Allgemeine VermessungsNachrichten 94 (1987) 256-260.
902
References
Benning, W. and Ahrens, B. (1979): Konzept und Realisierung eines Systems zur automatischen Fehlerlokalisierung und automatischen Berechnung von Naherungskoordinaten. N achrichten aus dem 6ffentlichen Vermessungsdienst N ordrhein-Westfalen, Heft 2, 107-124. Beran, R.J. (1968): Testing for uniformity on a compact homogeneous space, J. App. Prob. 5 (1968), 177-195 Beran, R.J. (1979): Exponential models for directional data, Ann. Statist. 7 (1979), 1162-1178 Beran, J. (1994): Statistical methods for long memory processes, Chapman and Hall, Boca Raton 1994 Berberan, A. (1992): Outlier detection and heterogeneous observations - a simulation case study, Australian Journal of Geodesy, Photogrammetry and Surveying 56 (1992), 49-61 Berberan, A. (1995): Multiple Outlier Detection-A Real Case Study. Surv. Rev. 1995, 33, 41-49. Bergbauer, K. and Lechner, W. and Schuchert, M. and Walk, R. (1979): Anlage, Beobachtung und Ausgleichung eines dreidimensionalen Testnetzes. Diplomarbeit Hochschule der bundeswehr. Nunchen 1979. Berger, M.P.F. and F.E.S. Tan (1998): Optimal designs for repeated measures experiments, Kwantitatieve Methoden 59 (1998), 45-67 Berman, A. and R.J. Plemmons (1979): Nonnegative matrices in the mathematical sciences, Academic Press, New York London 1979 Bertsekas, D.P. (1996): Incremental least squares methods and the extended Kalman filter, Siam J. Opt. 6 (1996), 807-822 Berry, J.C. (1994): Improving the James-Stein estimator using the Stein variance estimator, Statist. Probab. Lett. 20 (1994), 241-245 Bertuzzi, A., Gandolfi, A. and C. Sinisgalli (1998): Preference regions of ridge regression and OLS according to Pitman’s criterion, Sankhya: The Indian Journal of Statistics 60 (1998), 437-447 Bessel, F.W. (1838): Untersuchungen u¨ ber die Wahrscheinlichkeit der Beobachtungsfehler, Astronomische Nachrichten 15 (1838), 368-404 Betensky, R.A. (1997): Local estimation of smooth curves for longitudinal data, Statistics in Medicine 16 (1997), 2429-2445 Betti, B. and Crespi, M. and Sanso, F. (1993): A geometric illustration of ambiguity resolution in GPS theory and a Bayesian approach. Manuscr Geod 18:317-330 Beylkin, G. and N. Saito (1993): Wavelets, their autocorrelation function and multidimensional representation of signals, in: Proceedings of SPIE - The international society of optical engineering, Vol. LB26, Int. Soc. for Optical Engineering, Bellingham 1993 Bezdek, J.C. (1973): Fuzzy Mathematics in Pattern Classification, Ph.D. Thesis, Cornell University, Ithaca, NY Bhatia, R. (1996): Matrix analysis, Springer-Verlag, Heidelberg Berlin New York 1996 Bhattacharyya, D.P. and R.D. Narayan (1939): Moments of the D 2 -statistic for population with unequal dispresions, Sankhya, 5,401-412. Bhattacharya, R.N. and R. Ranga Rao (1976): Normal approximation and asymptotic expansions, J. Wiley, New York 1976 Bhimasankaram, P. and Sengupta, D. (1996): The linear zero functions approach to linear models. Sankhya, Ser. B 58, 3 (1996), 338-351. Bhimasankaram, P. and SahaRay, R. (1997): On a partitioned linear model and some associated reduced models. Linear Algebra Appl. 264 (1997), 329-339. Bhimasankaram, P. and Sengupta, D., and Ramanathan, S. (1995): Recursive inference in a general linear model. Sankhya, Ser. A 57, 2 (1995), 227-255. Biacs, Z.F., Krakiwsky, E.J. and Lapucha, D. (1990): Reliability Analysis of Phase Observations in GPS Baseline Estimation. J Surv. Eng. 1990, 116, 204-224. Biancale, R., Balmino, G., Lemoine, J.-M., Marty, J.-M., Moynot, B., Barlier, F., Exertier, P., Laurain, O., Gegout, P., Schwintzer, P., Reigber Ch., Bode A., Konig, R., Massmann, F.-H., Raimondo, J.C., Schmidt, R. and Zhu, S.Y., (2000): A new global Earth’s gravity field model from satellite orbit perturbations: GRlMS-Sl. Geophys. Res. Lett., 27,3611-3614.
References
903
Bibby, J. (1974): Minimum mean square error estimation, ridge regression, and some unanswered questions, colloquia mathematica societatis Janos Bolyai, Progress in statistics, ed. J. Gani, K. Sarkadi, I. Vincze, Vol. I, Budapest 1972, North Holland Publication Comp., Amsterdam 1974 Bickel, P.J., Doksum, K. and J.L. Hodges (1982): A Festschrift for Erich L. Lehmann, Chapman and Hall, Boca Raton 1982 Bickel, P.J. and K.A. Doksum (1981): An analysis of transformations revisited, J. Am. Statist. Ass. 76 (1981), 296-311 Bierman, G.J. (1977): Factorization Methods for discrete sequential estimation, Academic Press, New York London 1997 Bill, R. (1985): Kriteriummatrizen ebener geod¨atischer Netze, Ph.D. Thesis, Akademic d. Wissenschafien, Deutsche Geod¨atische Kommission, M¨unchen, Bill, R. (1985b): Kriteriummatrizen ebener geod¨atischer Netze, Deutsche Geod¨atische Kommission, M¨unchen, Reihe A, No. 102 Bill, R. and Kaltenbach H. (1986): Kriteriummatrizen ebener geod¨atischer Netze - Zur Schutzung von Korrelations funktionen uvittels Polynomen and Splines, Acl. Vermessungsnaeir AVN 96 (1986) 148-156 Bilodeau, M. and D. Brenner (1999): Theory of multivariate statistics, Springer-Verlag, Heidelberg Berlin New York 1999 Bingham, C. (1964): Distributions on the sphere and projective plane, Ph. D. Thesis, Yale University 1964 Bingham, C. (1974): An antipodally symmetric distribution on the sphere, Ann. Statist. 2 (1974), 1201-1225 Bingham, C., Chang, T. and D. Richards (1992): Approximating the matrix Fisher and Bingham distributions: Applications to spherical regression and Procrustes analysis, Journal of Multivariate Analysis 41 (1992), 314-337 Bini, D. and V. Pan (1994): Polynomial and matrix computations, Vol. 1: Fundamental Algorithms, Birkh¨auser-Verlag, Basel Boston Berlin 1994 Bill, R. (1984): Eine Strategie zur Ausgleichung und Analyse von Verdichtungsnetzen, Deutsche Geod¨atische Kommission, Bayerische Akademie der Wissenschaften, Report C295, 91 pages, M¨unchen 1984 Bill, R. (1985): Kriterium matrizen chener geod¨atischer Netze, Deutsche Geod¨atische Kommission, Series A, report 102, Munich 1985 Bill, R. and H. Kaltenbach (1985): Kriterium matrizen ebener geodesticher Netze, zur Schutzung von Korrelations functionen mittels Polynomen and Spline, All. Vermessungswesen 96 (1986), pp. 148-156 Birtill, J.W. and Whiteway, F.E. (1965): The application of phased arrays to the analysis of seismic body waves, Phil.Trans.Roy.Soc. London, A 258 (1965), 421-493 Bischof, C.H. and G. Quintana-Orti (1998): Computing rank-revealing QR factorizations of dense matrices, ACM Transactions on Mathematical Software 24 (1998), 226-253 Bischoff, W. (1992): On exact D-optimal designs for regression models with correlated observations, Ann. Inst. Statist. Math. 44 (1992), 229-238 Bischoff, W. and Heck, B., Howind, J. and Teusch, A. (2005): A procedure for testing the assumption of homoscedasticity in least squares residuals: a case study of GPS carrier-phase observations, Journal of Geodesy, 78:397-404, 2005. Bischoff, W. and Heck, B. and Howind, J. and Teusch, A. (2006): A procedure for estimating the variance function of linear models and for checking the appropriateness of estimated variances: a case study of GPS carrier-phase observations. J Geod 79(12):694-704 Bradley, J.V. (1968): Distribution-free statistical tests, Englewood Cliffs, Prentice-Hall 1968 Briscoe, H.W. and Fleck, P.L (1965): Data recording and processing for the experimental large aperture seismic array, Proc. IEEE 53 (1965), 1852-1859 Bjerhammar, A. (1951): Rectangular reciprocal matrices with special reference to calculation, Bull. G`eeod`eesique 20 (1951), 188-210
904
References
Bjerhammar, A. (1951): Application of calculus of matrices to the method of least-squares with special reference to geodetic calculations, Trans. RIT, No. 49, Stockholm 1951 Bjerhammar, A. (1955): En ny matrisalgebra, SLT 211-288, Stockholm 1955 Bjerhammar, A. (1958): A generalized matrix algebra, Trans. RIT, No. 124, Stockholm 1958 Bjerhammar, A. (1969): Linear Prediction and Filtering, Royal Inst. Techn. Div. Geodesy, Stockholm 1969. Bjerhammar, A. (1973): Theory of errors and generalized matrix inverses, Elsevier, Amsterdam 1973 Bj¨orck, A. (1967): Solving linear least squares problems by Gram-Schmidt orthogonalization, Nordisk Tidskr. Informationsbehandling (BIT) 7 (1967), 1-21 Bj¨orck, A. (1990): Least Squares Methods. In: Handbook of Numerical Analysis, Eds.: P.G. Ciarlet, J.L. Lions, North Holland, Amsterdam Bj¨orck, A. (1996): Numerical methods for least squares problems, SIAM, Philadelphia 1996 Bj¨orck, A. and G.H. Golub (1973): Numerical methods for computing angles between linear subspaces, Mathematics of Computation 27 (1973), 579-594 Bj¨orkstr¨om, A. and R. Sundberg (1999): A generalized view on continuum regression, Scandinavian Journal of Statistics 26 (1999), 17-30 Blackman, R.B. (1965): Linear data - smoothing and prediction in theory and practice, Reading 1965 Blaha, G. (1987): Nonlinear Parametric Least-Squares Adjustment. AFGL Technical Report 870204, Air Force Geophysics Laboratory, Bedford, Mass. Blaha, G. (1991): Non-Iterative Geometric Approach to Nonlinear Parametric Least-Squares Adjustment with or without Constraints. Report No. PL-TR-91-2136, Phillips Laboratory, Air Force Systems Command, Hanscom Air Force Base Massachusetts. Blaha, G. and Besette, R.P. (1989): Nonlinear least squares method via an isomorphic geometrical setup, Bulletin Geodesique 63 (1989) 115-138. Blaker, H. (1999): Shrinkage and orthogonal decomposition, Scandinavian Journal of Statistics 26 (1999), 1-15 Blaschke, W.; Leichtweiss, K. (1973): Elementare Differentialgeometrie. Springer-Verlag, Berlin Heidelberg. Blewitt, G. (2000): Geodetic network optimization for geophysical parameters, Geophysical Research Letters 27 (2000), 2615-3618 Bloomfield, P. and W.L. Steiger (1983): Least absolute deviations - theory, applications and algorithms, Birkh¨auser-Verlag, Basel Boston Berlin 1983 Bobrow, J.E. (1989): A direct minimization approach for obtaining the distance between convex polyhedra, Int. J. Robotics Research 8 (1989), 65-76 Boedecker, G. (1977): The development of an observation scheme for the variance-covariance matrix of the unknowns, Proceedings International Symposium on Optimization of Design and Computation of Control Networks, Sopron 1977. Boggs, P.T., Byrd, R.H. and R.B. Schnabel (1987): A stable and efficient algorithm for nonlinear orthogonal distance regression, SIAM J. Sci. Stat. Comput. 8 (1987), 1052-1078 Boik, R.J. (1986): Testing the rank of a matrix with applications to the analysis of interaction in ANOVA. J. Am. Stat. Assoc. 81 (1986), 243-248. Bollerslev, T. (1986): Generalized autoregressive conditional heteroskedasticity, J. Econometrics 31 (1986), 307-327 Boogaart, K.G. and van den A. Brenning (2001): Why is universal kriging better than IRFkkriging: estimation of variograms in the presence of trend, Proceedings of 2001 Annual Conference of the International Association for Mathematical Geology, September 6-12, 2001, Cancun, Mexico Boogaart, K.G. and van den H. Schaeben (2002a): Kriging of regionalized directions, axes, and orientations 1. Directions and axes, Mathematical Geology, to appear Boogaart, K.G. and van den H. Schaeben (2002b): Krigging of normalised directions, axes, Month. Geology 34 (2002) 479
References
905
Boon, F. and Ambrosius, B. (1997): Results of real-time applications of the LAMBDA method in GPS based aircraft landings. In: Proceedings KIS97, pp 339-345 Boon, F., de Jonge, P.J. and Tiberius, C.C.J.M. (1997): Precise aircraft positioning by fast ambiguity resolution using improved troposphere modelling. In: Proceedings ION GPS-97. vol. 2, pp 1877-1884 Booth, J.G. and J.P. Hobert (1996): Standard errors of prediction in generalized linear mixed models, J. Am. Statist. Ass. 93 (1996), 262-272 Bopp, H.; Krauss, H. (1978): Strenge oder herk6mmliche bedingte Ausgleichung mit Unbekannten bei nichtlinearen Bedingungsgleichungen? Allgemeine Vermessungsnachrichten, 85. Jg., Heft 1, 27-3l. Bordes L., Nikulin, M. and V. Voinov (1997): Unbiased estimation for a multivariate exponential whose components have a common shift, J. Multivar. Anal. 63 (1997), 199-221 Borg, I. and P. Groenen (1997): Modern multidimensional scaling, Springer Verlag, Berlin New York 1997 Borkowski, K.M. (1987): Transformation of geocentric to geodetic coordinates without approximation, Astrophys. Space. Sci. 139 (1987) 1-4. Borkowski, K.M. (1989): Accurate algorithm to transform geocentric to geodetic coordinates, Bull. Geod. 63 (1989) 50-56. Borovkov, A.A. (1998): Mathematical statistics, Gordon and Breach Science Publishers, Amsterdam 1998 Borre, K. (1977): Geodetic elasticity theory. Bulletin Geodesique, 63-71. Borre, K. (1978): Error propagation in absolute geodetic networks - a continuous approach -. Studia geophysica et geodaetica, 213-223. Borre, K. (1979a): Covariance functions for random error propagation in large geodetic networks. Scientific Bulletin of the Stanislaw I Staszic University of Mining and Metallurgy, Krakow, 23-33. Borre, K. (1979b): On discrete networks versus continua. 2. Int. Symp. wei tgespannte Fl achentra.gwerke. Beri chtsheft 2. p. 79-85. Stuttgart. Borre, K. (1980): Elasticity and stress-strain relations in geodetic networks, in: Proceedings of the International School of Advanced Geodesy, ed. E. Grafarend and A. Marussi, p. 335-374, Florence. Borre, K. (2001): Plane networks and their applications, Birkh¨auser-Verlag, Basel Boston Berlin 2001 Borre, K. and Meissl, P. (1974): Strength analysis of leveling type networks, an application of random walk theory, Geodaetisk Institute Meddelelse No. 50, Copenhagen. Borre, K. and T. Krarup (1974): Foundation of a Theory of Elasticity, for Geodetic Networks. 7. nordiske geodaetm0de i K0benhavn. Borre, K. and S.L. Lauritzen (1983): Some geometric aspects of adjustment, In. Festschrift to Torben Krarup, Eds. E. Kejlso et al., Geod. Inst., No. 58, pp. 70-90. Bos, M.S., Fernandes, R.M., Williams, S.D.P. and L. Bastos (2008): Fast error analysis of continuous GPS observations, J. Geodesy 82 (2008) 157-166 Bose, R.C. (1944): The fundamental theorem of linear estimation, Proc. 31st Indian Scientific Congress (1944), 2-3 Bose, R.C. (1947): The design of experiments, Presidential Address to Statistics Section, 34th session of the Indian Science Congress Assoc. 1947 Bossler, J. (1973): A note on the meaning of generalized inverse solutions in geodesy, J. Geophys. Res. 78 (1973), 2616 Bossler, J., Grafarend, E. and R. Kelm (1973): Optimal design of geodetic nets II, J. Geophys. Res. 78 (1973), 5887-5897 Bossler, J., Grafarend, E. and R. Kelm (1973) : Optimal Design of Geodetic Nets II, J. Geophysical Research 78 (1973) 5837-5898 Boulware, D.G., Brown, L.S. and R.D. Peccei (1970): Deep inelastic electroproduction and conformal symmetry, Physical Review D2 (1970), 293-298
906
References
Bowring, B.R. (1976): Transformation from spatial to geographical coordinates, Survey Review 23 (1976) 323-327. Bowring, B.R. (1985): The accuracy of geodetic latitude and height equations, Survey Review 28 (1985) 202-206. Box, G.E.P. and D.R. Cox (1964): Ana analysis of transformations, J. Roy. Statist. Soc. B26 (1964), 211-252 Box, G.E.P. and G. Tiao (1973): Bayesian inference in statistical analysis, Addison-Wesley, Reading 1973 Box, G.E.P. and N.R. Draper (1987): Empirical model-building and response surfaces, J. Wiley, New York 1987 Box, M.J. (1971): Bias in nonlinear estimation, J. Roy. Statist. Soc. B33 (1971), 171-201 Branco, M.D. (2001): A general class of multivariate skew-elliptical distributions, J. Multivar. Anal. 79 (2001), 99-113 Brandt, S. (1992): Datenanalyse. Mit statistischen Methoden und Computerprogrammen, 3. Au., BI Wissenschaftsverlag, Mannheim 1992 Brandt, S. (1999): Data analysis: statistical and computational methods for scientists and engineers, 3rd ed., Springer-Verlag, Heidelberg Berlin New York 1999 Brandst¨atter, G. (1974): Notiz zur analytischen L¨osung des ebenen R¨uckw¨artsschnittes, ¨ Osterreichische Zeitschrift f¨ur Vermessungswesen 61 (1974) 134-136. Braess, D. (1986): Nonlinear approximation theory, Springer-Verlag, Heidelberg Berlin New York 1986 Branham, R.L. (1990): Scientific Data Analysis. An Introduction to Overdetermined Systems, Springer-Verlag, New York Brauner, H. (1981): Differentialgeometrie. Vieweg, Wiesbaden. Breckling, J. (1989): Analysis of directional time series: application to wind speed and direction, Springer-Verlag, Heidelberg Berlin New York 1989 Bremand, P. (1999): Markov chains, Gibbs fields, Monte Carlo simulation and queues, Spring Verlag, New York 1999 Breslow, N.E. and D.G. Clayton (1993): Approximate inference in generalized linear mixed models, J. Am. Statist. Ass. 88 (1993), 9-25 Brill, M. and E. Schock (1987): Iterative solution of ill-posed problems - a survey, in: Model optimization in exploration geophysics, ed. A. Vogel, Vieweg, Braunschweig 1987 Brillinger, E.D. (1964): An introduction to polyspectra, Economic Research Program Memorandum no 67, Princeton 1964 Bro, R. (1997): A fast non-negativity-constrained least squares algorithm, J. Chemometrics 11 (1997), 393-401 Bro, R. and S. de Jong (1997): A fast non-negativity-constrained least squares algorithm, Journal of Chemometrics 11 (1997), 393-401 Brock, J.E. (1968), Optimal matrices describing linear systems, AIAA J. 6 (1968), 1292-1296 Brockwell, P. and Davies, R. (2003). Time Series: Theorie and Methods. Springer, New York Brokken, F.B. (1983): Orthogonal Procrustes rotation maximizing congruence, Psychometrika 48 (1983) 343-352. Brovelli, M.A., Sanso, F. and G. Venuti (2003): A discussion on the Wiener-Kolmogorov prediction principle with easy-to-compute and robust variants, Journal of Geodesy 76 (2003), 673-683 Brown, B. and R. Mariano (1989): Measures of deterministic prediction bias in nonlinear models, Int. Econ. Rev. 30 (1989), 667-684 Brown, B.M., P. Hall, and G.A. Young (1997): On the effect of inliers on the spatial median, J. Multivar. Anal. 63 (1997), 88-104 Brown, H. and R. Prescott (1999): Applied mixed models in medicine, J. Wiley, New York 1999 Brown, K.G. (1976): Asymptotic behavior of Minque-type estimators of variance components, The Annals of Statistics 4 (1976), 746-754 Brown, P.J., Le, N.D. and J.V. Zidek (1994): Multivariate spatial interpolation and exposure to air pollutants. Can J Stat 2:489-509
References
907
Brown, W.S. (1971): On Euclid’s Algorithm and the Computation of Polynomial Greatest Common Divisors. Journal of the ACM, 18(4), 478-504. Brualdi, R.A. and H. Schneider (1983): Determinantal identities: Gauss, Schur, Cauchy, Sylvester, Kronecker, Jacobi, Binet, Laplace, Muir and Cayley, Linear Algebra Appl. 52/53 (1983), 765-791 Brunk, H.D. (1958): On the estimation of parameters restricted by inequalities, Ann. Math. Statist. 29 (1958), 437-454 Brunner, F.K. (1979): On the analysis of geodetic networks for the determination of the incremental strain tensor, Survey Review 25 (1979) 146-162. Brunner, F.K. (1999): Zur Pr¨azision von GPS Messungen, Geowiss. Mitteilungen, TU Wien, Heft 50:1-10, 1999. Brunner, F.K., Hartinger, H. and L. Troyer (1999): GPS signal diffraction modelling: the stochastic sigma-model, Journal of Geodesy 73 (1999), 259-267 Brunner, F.K. (2009): Ranm-veit analyse von GPs messengun mittels Turbulenz theorie, First holloquium E. Grafarend, Stuttgart 2009 Bruno, A.D. (2000): Power geometry in algebraic and differential equations, Elsevier, Amsterdam Lausanne New York Oxford Shannon Singapore Tokyo 2000 Brus, D.J. and de Gruijter, J.J. (1997): Random sampling or geostatistical modeling? Chosing between design-based and model-based sampling strategies for soil (with discussion). Geoderma 80:1-44 Brus, D.J. and Heuvelink, G.B.M. (2007): Optimization of sample patterns for universal kriging of environmental variables. Geoderma 138: 86-95 Brz`eezniak, Z. and T. Zastawniak (1959): Basic stochastic processes, Springer-Verlag, Heidelberg Berlin New York 1959 Buchberger, B. (1965a): Ein Algorithmus zum Auffinden der Basiselemente des Restklassenringes nach einem nulldimensionalen PolynomideaL Dissertation der Universitat Innsbruck. Buchberger, B. (1965b): An algorithm for finding a basis for the residue class ring of a zero dimensional polynomial ideal (German), Ph.D. thesis, University of Innsbruck, Institute of Mathematics. Buchberger, B. (1970): Ein algorithmisches Kriterium f¨ur die L¨osbarkeit eines algebraischen Gleichungsystems, Aequationes Mathematicae 4 (1970) 374-383. Buchberger, B. (1979): A criterion for detecting unnecessary reductions in the construction of Groebner bases. Proceedings of the 1979 European Symposium on Symbolic and Algebraic computation, Springer lecture notes in Computer Science 72, Springer-Verlag, pp. 3-21, BerlinHeidelberg-New York 1979. Buchberger, B. (1985): Groebner Bases: An Algorithmic Method in Polynomial Ideal Theory. In: Multidimensional Systems Theory, Ed.: N.K. Bose, D. Reidel Publishing Company, DordrechtBoston-Lancaster. Buchberger, B. and Kutzler, B. (1986): Computer-Algebra fiir den Ingenieur. In: Mathematische Methoden in der Technik, Hrsg.: J. Lehn, H. Neunzert, H. Wacker, Band 4, Rechnerorientierte Verfahren, B.G. Teubner, Stuttgart Buchberger, B. (1999): Personal communication. Buchberger, B. and Winkler, F. (1998): Groebner bases and applications. London mathematical society lecture note series 251, Cambridge university press, 1998. Buell, C.E. (1972): Correlation functions for wind and geopotential on isobaric surfaces, Journal of Applied Meteorology, 1972. Bueso, M.C., Angulo, J.M., Qian, G. and Alonso, F.J. (1999): Spatial sampling design based on stochastic complexity. J. Multivar. Anal. 71: 94-110 Buhmann, M.D. (2001): Approximatipn and interpolation with radial functions, In: Multivariate Approximation and Applications, Cambridge University Press, Cambridge 2001, 25-43 Bunday, B.D., S.M.H. Bokhari, and K.H. Khan (1997): A new algorithm for the normal distribution function, Sociedad de Estadistica e Investigacion Operativa 6 (1997), 369-377 Bunke, H. and O. Bunke (1974): Identifiability and estimability, Math. Operationsforschg. Statist. 5 (1974), 223-233
908
References
Bunke, H. and O. Bunke (1986): Statistical inference in linear models, J. Wiley, New York 1986 Bunke, H. and O. Bunke (1989): Nonlinear regression, functional relations and robust methods, Vol 2, Wiley 1989 Bunke, O. (1977): Mixed models, empirical Bayes and Stein estimators, Math. Operationsforschg. Ser. Statistics 8 (1977), 55-68 Buonaccorsi, J., Demidenko, E. and T. Tosteson (2000): Estimation in longitudinal random effects models with measurement error, Statistica Sinica 10 (2000), 885-903 Burg, J.P (1964): Three-dimensional filtering with an array of seismometers, Geophysics 29 (1964), 693-713 Burns, F., Carlson, D., Haynsworth, E., and T. Markham (1974): Generalized inverse formulas using the Schur complement, SIAM J. Appl. Math. 26 (1974), 254-259 Businger, P. and G.H. Golub (1965): Linear least squares solutions by Householder transformations, Numer. Math., 7 (1965), 269-276 Butler, N.A. (1999): The efficiency of ordinary least squares in designed experiments subject to spatial or temporal variation, Statistics & Probability Letters 41 (1999), 73-81 Caboara, M. and E. Riccomagno (1998): An algebraic computational approach to the identifiability of Fourier models, J. Symbolic Computation 26 (1998), 245-260 Cadet, A. (1996): Polar coordinates in Rnp, application to the computation of the Wishart and Beta laws, Sankhya: The Indian Journal of Statistics 58 (1996), 101-114 Cai, J. (2004): Statistical inference of the eigenspace components of a symmetric random deformation tensor, Dissertation, Deutsche Geod¨atische Kommission (DGK) Reihe C, Heft Nr. 577, 138 Seiten, M¨unchen, 2004 Cai, J. and Grafarend, E. and Schaffrin, B. (2004): The A-optimal regularization parameter in uniform Tykhonov-Phillips regularization - ˛-weighted BLE, IAG Symposia 127 “HotineMarussi Symposium on Mathematical Geodesy”, Matera, Italy, 17-21 June 2002, edited by F. Sanso, Springer, 309-324 Cai, J. and Grafarend, E. and Schaffrin B. (2005): Statistical inference of the eigenspace components of a two-dimensional, symmetric rank two random tensor, J. of Geodesy, Vo1. 78, 426-437 Cai, J. and Grafarend, E. (2007): Statistical analysis of the eigenspace components of the twodimensicnial, symmetric rank-two strain rate tensor derived from the space geodetic measurements (ITRF92-ITRF2000 data sets) in central Mediterranean and Western Europe, Geophysical Journal International, Vol. 168, 449-472 Cai, J. and Koivula, H. and Poutanen, M. (2008): The statistical analysis of the eigenspace components of the strain rate tensor derived from FinnRef GPS measurements (1997-2004), IAG Symposia 132 “VI Hotine-Marussi Symposium on Theoretical and Computational Geodesy”, Wuhan, China, 29 May-2 June, 2006, eds. P. Xu, J. Liu and A. Dermanis, pp. 79-87, Springer, Berlin, Heidelberg Cai, J. and Hu, C. and Grafarend, E. and Wang, J. (2008): The uniform Tykhonov-Phillips regularization (a weighted S-homBLE) and its application in GPS rapid static positioning, in Fennoscandia, IAG Symposia 132 “VI Hotine-Marussi Symposium on Theoretical and Computational Geodesy”, Wuhan, China, 29 May-2 June, 2006, eds. P. Xu, J. Liu and A. Dermanis, pp. 221-229, Springer, Berlin, Heidelberg Cai, J. and Wang, J. and Wu, J. and Hu, C. and Grafarend, E. and Chen, J. (2008): Horizontal deformation rate analysis based on multi-epoch GPS measurements in Shanghai, Journal of Surveying Engineenng, Vol. 134(4), 132-137 Cai, J. and Grafarend, E. and Hu, C. (2009): The total optimal criterion in solving the mixed integer linear model with GNSS carrier phase observations, Journal of GPS Solutions, Vol. 13, 221-230, doi: 10.1007 Is 10291-008-0 115-y Calder, C. and Cressie, N. (2007): Some topics in convolution-based modeling. In Proceedings of the 56th session of the International Statistics Institute, Lisbon. Cambanis, S. and I. Fakhre-Zakeri (1996): Forward and reversed time prediction of autoregressive sequences, J. Appl. Prob. 33 (1996), 1053-1060
References
909
Campbell, H.G. (1977): An introduction to matrices, vectors and linear programming, 2nd ed., Printice Hall, Englewood Cliffs 1977 Candy, J.V. (1988): Signal processing, McGrow Hill, New York 1988 Canny, J.F. (1988): The complexity of robot motion planning, ACM doctoral dissertation award, MIT Press, 1988. Canny, J.F., Kaltofen, E. and Yagati, L. (1989): Solving systems of nonlinear polynomial equations faster, Proceedings of the International Symposium on Symbolic and Algebraic Computations ISSAC, July 17-19, Portland, Oregon 1989. pp. 121-128. Carlin, B.P. and T.A. Louis (1996): Bayes and empirical Bayes methods, Chapman and Hall, Boca Raton 1996 Carlitz, L. (1963): The inverse of the error function, Pacific J. Math. 13 (1963), 459-470 Carlson, D., Haynsworth, E. and T. Markham (1974): A generalization of the Schur complement by means of the Moore-Penrose inverse, SIAM J. Appl. Math. 26 (1974), 169-179 Carlson, D. (1986): What are Schur complements, anyway? Linear Algebra and its Applications 74 (1986), 257-275 Carmo Do, M. (1983): Diiferentialgeometrie von Kurven und Flachen. Vieweg-Studium, Braunschweig. Carroll, J.D., Green, P.E. and A. Chaturvedi (1999): Mathematical tools for applied multivariate analysis, Academic Press, San Diego 1999 Carroll, J.D. and P.E. Green (1997): Mathematical tools for applied multivariate analysis, Academic Press, San Diego 1997 Carroll, R.J. and D. Ruppert (1982a): A comparison between maximum likelihood and generalized least squares in a heteroscedastic linear model, J. Am. Statist. Ass. 77 (1982), 878-882 Carroll, R.J. and D. Ruppert (1982b): Robust estimation in heteroscedastic linear models, Ann. Statist., 10 (1982), pp. 429-441 Carroll, R.J., Ruppert, D. and L. Stefanski (1995): Measurement error in nonlinear models, Chapman and Hall, Boca Raton 1995 Carruthers, P. (1971): Broken scale invariance in particle physics, Phys. Lett. Rep. 1 (1971), 1-30 Caspary, W. (1987): Concepts of Network and Deformation Analysis, Monograph 11, The University of New South Wales, School of Surveying, N.S.W., Australia, 1987 Caspary, W.; Borutta, H. (1986): Geometrische Deformationsanalysemit robusten Sch¨atzverfahren, Allgemeine Vermessungsnachrichten 8-9/1986, S.315-326 Caspary, W. and Lohse, P. and Van Mierlo, J. (1991): SSG 4.120: Non-linear Adjustment. In: National Report of the Federal Republic of Germany on the Geodetic Activities in the Years 1987-1991, Eds.: E.W. Grafarend, K. Schnadelbach Deutsche Geod¨atische Kommission, Reihe B, Nr.294. Caspary, W. and K. Wichmann (1994): Lineare Modelle. Algebraische Grundlagen und statistische Anwendungen, Oldenbourg Verlag, M¨unchen/Wien 1994 Castillo, J. (1994): The singly truncated normal distribution, a non-steep exponential family, Ann. Inst. Statist. Math 46 (1994), 57-66 Castillo, J. and P. Puig (1997): Testing departures from gamma, Rayleigh and truncated normal distributions, Ann. Inst. Statist. Math. 49 (1997), 255-269 Cattani, E., Dickenstein, A. and Sturmfels, B. (1998): Residues and resultants, J. Math. Sci. University of Tokyo 5 (1998) 119-148. Cayley, A. (1843): On the application os Quaternions to the theory of rotation, Phil. May. 33, (1848), pp. 196-200 Cayley, A. (1845): On Jacobi’s elliptic functions, in reply to the Rev. B. Bronwix and on quaternions, Philos, May. 261 (1845), pp. 208-211 Cayley, A. (1855): Sept differents memoires d’analyse, No. 3, Remarque sur la notation des fonctions algebriques, Journal f¨ur die reine und angewandte Mathematik 50 (1855), 282-285 Cayley, A. (1857): A memoir on the theory of matrices, Phil. Trans. Roy. Soc. London, 148 (1857), pp. 17-37 (collected works, vol. 2, 475-496) Cayley, A. (1858): A memoir on the theory of matrices, Phil. Transactions, Royal Society of London 148 (1858), 17-37
910
References
Cayley, A. (1863): On Jacobi’s elliptic functions, in reply to the Rev. B. Bronwin, and on quaternions (appendix only), in: The collected mathematical Papers, Johnson reprint Co., p. 127, New York, 1963 Cayley, A. (1885): On the Quaternion equation qQ Qq’ D 0, Messenger, 14 (1885), pp. 108-112 Cenkov, N.N. (1972): Statistical decision rule and optimal inference, Nauka 1972 Champagne, F.H. (1978): The fine scale structure of the turbulent velocity field, J. Fluid. Mech. 86 (1978) 67-108 Chaffee, J. and Abel, J. (1994): On the exact solutions of the pseudorange equations, IEEE Transactions on Aerospace and Electronic Systems 30 (1994) 1021-1030. Chan, K.-S. and H. Tong (2001): Chaos, a statistical perspective, Springer-Verlag, Heidelberg Berlin New York 2001 Chan, K.-S. and H. Tong (2002): A note on the equivalence of two approaches for specifying a Markov process, Bernoulli 8 (2002), 117-122 Chan, T.F. and P.C. Hansen (1991): Some applications of the rank revealing QR factorizations, Numer. Linear Algebra Appl., 1 (1991), 33-44 Chan, T.F. and P.C. Hansen (1992): Some applications of the rank revealing QR factorization, SIAM J. Sci. Statist. Comput., 13 (1992), 727-741 Chandrasekar, S. (1950): The theory of axisymmetric turbulence, Proc. Royal Soc. a2242 (1950) 557 Chandrasekar, S. and I.C.F. Ipsen (1995): Analysis of a QR algorithm for computing singular values, SIAM J. Matrix Anal. Appl. 16 (1995), 520-535 Chandrasekar, S., Gu, M. and A.H. Sayed (1998): A stable and efficient algorithm for the indefinite linear least-squares problem, SIAM J. Matrix Anal. Appl. 20 (1998), 354-362 Chandrasekar, S., Golub, G.H., Gu, M. and A.H. Sayed (1998): Parameter estimation in the presence of bounded data uncertainties, SIAM J. Matrix Anal. Appl. 19 (1998), 235-252 Chang, F.-C. and Y.-R. Yeh (1998): Exact A-optimal designs for quadratic regression, Statistica Sinica 8 (1998), 527-533 Chang, H. and Fu, A.Q. and Le, N.D. and Zidek J.V. (2005): Designing environmental monitoring networks for measuring extremes. http://www.samsi.info/TR/tr2005-04.pdf Chang, T. (1986): Spherical regression, Annals of Statistics 14 (1986), 907-924 Chang, T. (1988): Estimating the relative rotations of two tectonic plates from boundary crossings, J. Am. Statist. Ass. 83 (1988), 1178-1183 Chang, T. (1993): Spherical regression and the statistics of tectonic plate reconstructions, International Statis. Rev. 51 (1993), 299-316 Chang, X.W., Yang, X., Zhou, T. (2005): MLAMBDA: a modified LAMBDA method for integer ambiguity determination, Proceedings ION GNSS2005, Long Beach Chapman, D.G. and H. Robbins (1951): Minimum variance estimation without regularity assumptions, Ann. Math. Statist. 22 (1951), 581-586 Char, B.W. et al. (1990): MAPLE Reference Manual. Fifth Edition. Waterloo Maple Publishing Chartres, B.A. (1963): A geometrical proof of a theorem due to Slepian, SIAM Review 5 (1963), 335-341 Chatfield, C. and A.J. Collins (1981): Introduction to multivariate analysis, Chapman and Hall, Boca Raton 1981 Chatterjee, S. and A.S. Hadi (1986): Influential observations, high leverage points, and outliers in linear regression, Stat. Sci. 1 (1986), pp. 379-416 Chatterjee, S. and A.S. Hadi (1988): Sensitivity analysis in linear regression, J. Wiley, New York 1988 Chatterjee, S. and M. M¨achler (1997): Robust regression: a weighted least-squares approach, Commun. Statist. Theor. Meth. 26 (1997), 1381-1394 Chaturvedi, A. and A.T.K. Wan (1998): Stein-rule estimation in a dynamic linear model, J. Appl. Stat. Science 7 (1998), 17-25 Chaturvedi, A. and A.T.K. Wan (1999): Estimation of regression coefficients subject to interval constraints in models with non-spherical errors, Indian Journal of Statistics 61 (1999), 433-442 Chaturvedi, A. and A.T.K. Wan (2001): Stein-rule restricted regression estimator in a linear regression model with nonspherical disturbances, Commun. Statist.-Theory Meth. 30 (2001), 55-68
References
911
Chauby, Y.P. (1980): Minimum norm quadratic estimators of variance components, Metrika 27 (1980), 255-262 Chen, Hsin-Chu. (1998): Generalized reexive matrices: Special properties and applications, Society for Industrial and Applied Mathematics, 009 (1998), 141-153 Chen, L: Math. Phys. 167 (1995) 443-469 Chen, X. (2001): On maxima of dual function of the CDT subproblem, J. Comput. Mathematics 19 (2001), 113-124 Chen Y.-S. (2001): Outliers detection and confidence interval modification in fuzzy regression Fuzzy Sets Syst. 119, 2 (2001), 259-272. Chen, Z. and J. Mi (1996): Confidence interval for the mean of the exponential distribution, based on grouped data, IEEE Transactions on Reliability 45 (1996), 671-677 Chen, V.C.P., Tsui, K.L., Barton, R.R. and Meckesheimer, M. (2006): A review on design, modeling and applications of computer experiments. IIE Trans 38:273-291 Chen, Y.Q., Kavouuras, M. and Chrzanowski, A. (1987): A Strategy for Detection of Outlying Observations in Measurements of High Precision. The Canadian Surveyor. 41:529-540. Cheng, C.L. and van J.W. Ness (1999): Statistical regression with measurement error, Arnold Publ., London 1999 Cheng, C.L. (1998): Polynomial regression with errors in variables, J. Roy. Statist. Soc. B60 (1998), 189-199 Cherioito, P. (2001): Mixed fractional Brownian motion, Bernoulli 7 (2001) 913-934 Chiang, C.-Y. (1998): Invariant parameters of measurement scales, British Journal of Mathematical and Statistical Psychology 51 (1998), 89-99 Chiang, C.L. (2003): Statistical methods of analysis, University of California, Berkeley, USA 2003 Chikuse, Y. (1999): Procrustes analysis on some special manifolds, Commun. Statist. Theory Meth. 28 (1999), 885-903 Chil´ees, J.P. and P. Delfiner (1999): Geostatistics - modelling spatial uncertainty, J. Wiley, New York 1999 Chiodi, M. (1986): Procedures for generating pseudo-random numbers from a normal distribution of order, Riv. Stat. Applic. 19 (1986), 7-26 Chmielewski, M.A. (1981): Elliptically symmetric distributions: a review and bibliography, International Statistical Review 49 (1981), 67-74 Chow, Y.S. and H. Teicher (1978): Probability theory, Springer-Verlag, Heidelberg Berlin New York 1978 Chow, T.L. (2000): Mathematical methods for physicists, Cambridge University Press, Cambridge 2000 Christensen, R. (1991): Linear models for multivariate time series and spatial data. Springer, Berlin Christensen, R. (1996): Analysis of variance, design and regression, Chapman and Hall, Boca Raton 1996 Chrystal, G. (1964): Textbook of Algebra (Vol. 1), Chelsea, New York 1964. Chrzanowski, A. and Chen, Y.Q. (1998): Deformation monitoring, analysis, and prediction - status report, Proc. FIG XIX into Congress, Helsinki, Vo1. 6, pp. 84-97 Chrzanowski, A. Chen, Y.Q. and Secord, M. (1983): On the strain analysis of tectonic movements using fault crossing geodetic surveys, Tectonophysics, 97, 297-315: Crocetto, N. and Vettore, A. (2001): Best unbiased estimation of variance-covariance components: From condition adjustment to a generalized methodology. J. Inf. Optimization Sci. 22, 1 (2001), 113-122. Chiles, J.P. and P. Delfiner (1999): Geostatistics: modeling spatial uncertainty, Wiley series, New York 1999 Chui, C.K. and G. Chen (1989): Linear Systems and optimal control, Springer-Verlag, Heidelberg Berlin New York 1989 Chui, C.K. and G. Chen (1991): Kalman filtering with real time applications, Springer-Verlag, Heidelberg Berlin New York 1991
912
References
Ciarlet, P.G. (1988): Mathematical elasticity, Vol. 1: three-dimensional elasticity, North Holland Publ. Comp., Amsterdam Claerbout, J.F. (1964): Detection of p - waves from weak sources at great distances, Geophysics 29 (1964), 197-211 Clark, G.P.Y. (1980): Moments of the least squares estimators in a nonlinear regression model, JR. Statist. Soc. B42 (1980), 227-237 Clerc-B`eerod, A. and S. Morgenthaler (1997): A close look at the hat matrix, Student 2 (1997), 1-12 Cobb, L., Koppstein, P. and N.H. Chen (1983): Estimation and moment recursions relations for multimodal distributions of the exponential family, J. Am. Statist. Ass. 78 (1983), 124-130 Cobb, G.W. (1997): Introduction to design and analysis of experiments, Springer-Verlag, Heidelberg Berlin New York 1997 Cochran, W. (1972): Some effects of errors of measurement on linear regression, in: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, 527-539, UCP, Berkeley 1972 Cochran, W. (1972): Stichprobenverfahren, de Gruyter, Berlin 1972 Cohen, A. (1966): All admissible linear estimates of the mean vector, Ann. Math. Statist. 37 (1966), 458-463 Cohen, C. and A. Ben-Israel (1969): On the computation of canonical correlations, Cahiers Centre e` Etudes Recherche Op`eer 11 (1969), 121-132 Cohen J., Kesten M., and Newman C. (eds.) (1985): Random matrices and their applications, Amer. Math. Soc., Providence Collett, D. (1992): Modelling binary data, Chapman and Hall, Boca Raton 1992 Collet, D. and T. Lewis (1981): Discriminating between the von Mises and wrapped normal distributions, Austr. J. Statist. 23 (1981), 73-79 Colton, D., Coyle, J. and P. Monk (2000): Recent developments in inverse acoustic scattering theory, SIAM Review 42 (2000), 369-414 Comte, F. and Rozenholc, Y. (2002): Adaptive estimation of volatility functions in (auto-) regressive models, Stoel cesses and their Applications 97, 111-145, 2002. Cook, R.D., Tsai, C.L. and B.C. Wei (1986): Bias in nonlinear regression, Biometrika 73 (1986), 615-623 Cook, R.D. and S. Weisberg (1982): Residuals and influence in regression, Chapman and Hall, London 1982 Cook, R.D. and Goldberg, M.L. (1986): Curvatures for parameter subsets in nonlinear regression. Annals of Statistics, 14. Corbeil, R.R. and S.R. Searle (1976): restricted maximum likelihood (REML) estimation of variance components in the mixed model, Technometrics, 18 (1976), pp. 31-38 Cottle, R.W. (1974): Manifestations of the Schur complement, Linear Algebra Appl. 8 (1974), 189-211 Coulman, C.E. and J. Vernin (1991): Significance of anisotropy and the outer scale of turbulence for optical and radio seeing, Applied Optics 30 (1991) 118-126 Cox, L.B. and J.S.Y. Wang (1993): Fractal surfaces: measurement and applications in the earth Sciences. Fractals 1 1 (1993), pp. 87-115 Cox, A.J. and N.J. Higham (1999): Row-wise backward stable elimination methods for the equality constrained least squares problem, SIAM J. Matrix Anal. Appl. 21 (1999), 313-326 Cox, D., Little, J. and O’Shea, D. (1997): Ideals, Varieties, and Algorithms, An introduction to computational algebraic geometry and commutative algebra, Springer-Verlag, New York 1997. Cox, D., Little, J. and O’Shea, D. (1998): Using algebraic geometry. Graduate Text in Mathematics 185, Springer-Verlag, New York 1998. Cox, D.A. (1998): Introduction to Gr¨obner bases, Proceedings of Symposia in Applied Mathematics 53 (1998) 1-24. Cox, D.B. Brading, J.D.W. (1999): Integration of LAMBDA ambiguity resolution with Kalman filter for relative navigation of spacecraft. In: Proceedings ION NTM 99, pp 739-745 Cox, D.R. and D.V. Hinkley (1979): Theoretical statistics, Chapman and Hall, Boca Raton 1979 Cox, D.R. and V. Isham (1980): Point processes, Chapman and Hall, Boca Raton (1980)
References
913
Cox, D.R. and E.J. Snell (1989): Analysis of binary data, Chapman and Hall, Boca Raton 1989 Cox, D.R. and N. Wermuth (1996): Multivariate dependencies, Chapman and Hall, Boca Raton 1996 Cox, D.R. and N. Reid (2000): The theory of the design of experiments, Chapman & Hall, Boca Raton 2000 Cox, D.R. and P.J. Salomon (2003): Components of variance Chapman and Hall/CRC, Boca Raton London New York Washington DC Cox, R., Litte, J. and D. O’shea (1997) Ideals, variates and algorithms, an introduction Cox, T.F. and Cox, M.A. (1994): Multidimensional scaling, St. Edmundsbury Press, St. Edmunds, Suffolk 1994. Cox, T.F. and M.A.A. Cox (2001): Multidimensional scaling, Chapman and Hall, Boca Raton, Florida 2001; Chapman & Hall/CRC, Boca Raton London New York Washington D.C. 2003 Craig, H.V.: a) Bull. Amerie. Soc. 36, 558 (1930J, 37, 731 (1931), 39, 919 (1933); b) Americ. J. Math. 57, 457 (1935), 58, 833 (1936), 59, 764 (1937), 61, 791 (1939J. Craig, A.T. (1943): Note on the independence of certain quadratic forms, The Annals of Mathematical Statistics 14 (1943), 195-197 Cressie, N. (1991): Statistics for spatial data. Wiley, New York Croceto, N. (1993): Point projection of topographic surface onto the reference ellipsoid of revolution in geocentric Cartesian coordinates, Survey Review 32 (1993) 233-238. Cross, P.A. and K. Thapa (1979): The optimal design of levelling networks. Surv. Rev. 25, 68-79. Cross, P.A. (1985): The optimal design of levelling networks, Survey Review 25 (1979) 68-79 Cross, P.A. (1985): Numerical methods in network design, in: Grafarend, E.W. and F. Sanso (eds.), Optimization and design of geodetic networks, Springer-Verlag, Heidelberg Berlin New York 1985, 429-435 Crosilla, F. (1983a): A criterion matrix for a second order design of control networks, Bull. Geod. 57 (1983a) 226-239. Crosilla, F. (1983b): Procrustean transformation as a tool for the construction of a criterion Matrix for control networks, Manuscripta Geodetica 8 (1983b) 343-370. Crosilla, F. and A. Beinat, (2005): A forward scarth method for robust general procrustes analysis, Udine/Italy 2005 Crowder, M. (1987): On linear and quadratic estimating function, Biometrika 74 (1987), 591-597 Crowder, M.J. and D.J. Hand (1990): Analysis of repeated measures, Chapman and Hall, Boca Raton 1990 Crowder, M.J., Kimber, A., Sweeting, T., and R. Smith (1993): Statistical analysis of reliability data, Chapman and Hall, Boca Raton 1993 Crowder, J.M., Sweeting, T. and R. Smith (1994): Statistical analysis of reliability data, Chapman and Hall, Boca Raton 1994 Cs¨org¨o, M. and L. Horvath (1993): Weighted approximations in probability and statistics, J. Wiley, New York 1993 Cs¨org¨o, S. and L. Viharos (1997): Asymptotic normality of least-squares estimators of tail indices, Bernoulli 3 (1997), 351-370 Cs¨org¨o, S. and J. Mielniczuk (1999): Random-design regression under long-range dependent errors, Bernoulli 5 (1999), 209-224 Cummins, D. and A.C. Webster (1995): Iteratively reweighted partial least-squares: a performance analysis by Monte Carlo simulation, J. Chemometrics 9 (1995), 489-507 Cunningham, E. (1910): The principle of relativity in electrodynamics and an extension thereof, Proc. London Math. Soc. 8 (1910), 77-98 Cz¨uber, E. (1891): Theorie der Beobachtungsfehler, Leipzig 1891 D’Agostino, R. and M.A. Stephens (1986): Goddness-of-fit techniques, Marcel Dekker, New York 1986 Daalen, D. van (1986): The Astrometry Satellite Hipparcos. Proc. I. Hotine-Marussi Symp. Math. Geodesy, Rome 1985, Milano 1986, 259-284.
914
References
Dach, R. (2000): Einfluß von Auflasteffekten auf Pr¨azise GPS-Messungen, DGK, Reihe C, Heft Nr. 519. Dai, L., Nagarajan, N., Hu, G. and Ling, K. (2005): Real-time attitude determination for micro satellites by LAMBDA method combined with Kalman filtering. In: AIAA proceedings 22nd ICSSC, Monterey Daniel, J.W. (1967): The conjugate gradient method for linear and nonlinear operator equations, SIAM J. Numer. Anal. 4 (1967), 10-26 Dantzig, G.B. (1940): On the nonexistence of tests of Student’s hypothesis having power functions independent of 2, Ann. Math. Statist. 11 (1949), 186-192 Das, I. (1996): Normal-boundary intersection: A new method for generating the Pareto surface in nonlinear multicriteria optimization problems, SIAM J. Optim. 3 (1998), 631ff. Das, R. and B.K. Sinha (1987): Robust optimum invariant unbiased tests for variance components. In Proc. of the Second International Tampere Conference in Statistics. T. Pukkila and S. Puntanen (eds.), University of Tampere/Finland (1987), 317-342 Das Gupta, S., Mitra, S.K., Rao, P.S., Ghosh, J.K., Mukhopadhyay, A.C. and Y.R. Sarma (1994): Selected papers of C.R. Rao, Vol. 1, J. Wiley, New York 1994 Das Gupta, S., Mitra, S.K., Rao, P.S., Ghosh, J.K., Mukhopadhyay, A.C. and Y.R. Sarma (1994): Selected papers of C.R. Rao, Vol. 2, J. Wiley, New York 1994 Davenport, J.H., Siret, Y. and Tournier, E. (1988): Computer algebra. Systems and algorithms for algebraic computation, Academic Press Ltd., St. Edmundsbury, London 1988. David, F.N. (1937): A note on unbiased limits for the correlation coefficient, Biometrika, 29, 157-160 David, F.N. (1938): Tables of the Ordinates and Probability Integral o/the Distribution of the Correlation Coefficient in Small Samples, London, Biometrika. David, F.N. (1953): A note on the evaluation of the multivariate normal integral, Biometrika, 40, 458-459 David, F.N. (1954): Tables of the ordinates and probability integral of the distribution of the correlation coefficient in small samples, Cambridge University Press, London 1954 David, F.N. and N.L. Johnson (1948): The probability integral transformation when parameters are estimated from the sample, Biometrika 35 (1948), 182-190 David, H.A. (1957): Some notes on the statistical papers of Friedrich Robert Helmert (1943-1917), Bull. Stat. Soc. NSW 19 (1957), 25-28 David, H.A. (1970): Order Statistics, J. Wiley, New York 1970 David, H.A., Hartley, H.O. and E.S. Pearson (1954): The distribution of the ratio in a single normal sample, of range to standard deviation, Biometrika 41 (1954), 482-293 Davidian, M. and A.R. Gallant (1993): The nonlinear mixed effects model with a smooth random effects density, Biometrika 80 (1993), 475-488 Davidian, M., and D.M. Giltinan (1995): Nonlinear models for repeated measurement data, Chapman and Hall, Boca Raton 1995 Davis, R.A. (1997): M-estimation for linear regression with infinite variance, Probability and Mathematical Statistics 17 (1997), 1-20 Davis, R.A. and W.T.M. Dunsmuir (1997): Least absolute deviation estimation for regression with ARMA errors, J. Theor. Prob. 10 (1997), 481-497 Davis, J.H. (2002): Foundations of deterministic and stochastic control, Birkh¨auser-Verlag, Basel Boston Berlin 2002 Davison, A.C. and D.V. Hinkley (1997): Bootstrap methods and their application, Cambridge University Press, Cambridge 1997 Dawe, D.J. (1984): Matrix and finite element displacement analysis of structures, Clarendon Press, Oxford ¨ unel (1999): Stochastic analysis of the fractional Brownian motion, Decreusefond, L. and A.S. Ust¨ Potential Anal. 10 (1999), 177-214 Dedekind, R. (1901): Gauß in seiner Vorlesung u¨ ber die Methode der kleinsten Quadrate, Berlin 1901
References
915
Defant, A. and K. Floret (1993): Tensor norms and operator ideals, North Holland, Amsterdam 1993 Deitmar, A. (2002): A first course in harmonic analysis, Springer-Verlag, Heidelberg Berlin New York 2002 Demidenko, E. (2000): Is this the least squares estimate?, Biometrika 87 (2000), 437-452 Denham, M.C. (1997): Prediction intervals in partial least-squares, J. Chemometrics 11 (1997), 39-52 Denham, W. and S. Pines (1966): Sequential estimation when measurement function nonlinearity is comparable to measurement error, AIAA J4 (1966), 1071-1076 Denis, J.-B. and A. Pazman (1999): Bias of LS estimators in nonlinear regression models with constraints. Part II: Bi additive models, Applications of Mathematics 44 (1999), 375-403 Dermanis, A. (1977): Geodetic linear estimation techniques and the norm choice problem, Manuscripta geodetica 2 (1977), 15-97 Dermanis, A. (1978): Adjustment of geodetic observations in the presence of signals, International School of Anvanced Geodesy, Erice, Sicily, May-June 1978, Bollettino di Geodesia e Scienze Affini 38 (1979), 513-539 Dermanis, A. (1980): Adjustment of geodetic observations in the presence of signals. In: Proceedings of the international school of advanced geodesy. Bollettino di Geodesia e Scienze Affini. Vol. 38, pp. 419-445 Dermanis, A. (1998): Generalized inverses of nonlinear mappings and the nonlinear geodetic datum problem, Journal of Geodesy 72 (1998), 71-100 Dermanis, A. and E. Grafarend (1981): Estimability analysis of geodetic, astrometric and geodynamical quantities in Very Long Baseline Interferometry, Geophys. J. R. Astronom. Soc. 64 (1981), 31-56 Dermanis, A. and F. Sanso (1995): Nonlinear estimation problems for nonlinear models, manuscripta geodaetica 20 (1995), 110-122 Dermanis, A. and R. Rummel (2000): Data analysis methods in geodesy, Lecture Notes in Earth Sciences 95, Springer-Verlag, Heidelberg Berlin New York 2000 Dette, H. (1993): A note on E-optimal designs for weighted polynomial regression, Ann. Stat. 21 (1993), 767-771 Dette, H. (1997): Designing experiments with respect to ’standardized’ optimality criteria, J. R. Statist. Soc. B59 (1997), 97-110 Dette, H. (1997): E-optimal designs for regression models with quantitative factors - a reasonable choice?, The Canadian Journal of Statistics 25 (1997), 531-543 Dette, H. and W.J. Studden (1993): Geometry of E-optimality, Ann. Stat. 21 (1993), 416-433 Dette, H. and W.J. Studden (1997): The theory of canonical moments with applications in statistics, probability, and analysis, J. Wiley, New York 1997 Dette, H. and T.E. O’Brien (1999): Optimality criteria for regression models based on predicted variance, Biometrika 86 (1999), 93-106 Deutsch, F. (2001): Best approximation in inner product spaces, Springer-Verlag, Heidelberg Berlin New York 2001 Dewess, G. (1973): Zur Anwendung der Sch¨atzmethode MINQUE auf Probleme der ProzeßBilanzierung, Math. Operationsforschg. Statistik 4 (1973), 299-313 Dey, D. and Ghosh, S. and Mallick, B. (2000): Generalized linear models: A Bayesian perspective. Marcel Dekker, New York DiCiccio, T.J. and B. Efron (1996): Bootstrap confidence intervals, Statistical Science 11 (1996), 189-228 Diebolt, J. and J. Z¨uber (1999): Goodness-of-fit tests for nonlinear heteroscedastic regression models, Statistics & Probability Letters 42 (1999), 53-60 Dieck, T. (1987): Transformation groups, W de Gruyter, Berlin/New York 1987 Diggle, P.J., Liang, K.Y. and S.L. Zeger (1994): Analysis of longitudinal data, Clarendon Press, Oxford 1994 Diggle, P., Lophaven S. (2006): Bayesian geostatistical design. Scand. J. Stat. 33:53-64
916
References
Ding, C.G. (1999): An efficient algorithm for computing quantiles of the noncentral chi-squared distribution, Computational Statistics & Data Analysis 29 (1999), 253-259 Dixon, A.L. (1908): The elimination of three quantics in two independent variables, Proc. London Mathematical Society series 2 6 (1908) 468-478. Dixon, G.M. (1994): Division Algebras: Octonions, Quaternions, Complex Numbers and the Algebraic Design of Physics, Kluwer Aca. Publ., Dordrecht-Boston-London 1994 Dixon, W.J. (1951): Ratio involving extreme values, Ann. Math. Statistics 22 (1951), 68-78 Dobbie, M.J. and Henderson, B.L. and Stevens, D.L. Jr. (2008): Sparse sampling: spatial design for monitoring stream networks. Stat. Surv. 2: 113-153 Dobson, A.J. (1990): An introduction to generalized linear models, Chapman and Hall, Boca Raton 1990 Dobson, A.J. (2002): An introduction to generalized linear models, 2nd ed., Chapman/Hall/CRC, Boca Raton 2002 Dodge, Y. (1987): Statistical data analysis based on the L1-norm and related methods, Elsevier, Amsterdam 1987 Dodge, Y. (1997): LAD Regression for Detecting Outliers in Response and Explanatory Variables, J. Multivar. Anal. 61 (1997), 144-158 Dodge, Y. and D. Majumdar (1979): An algorithm for finding least square generalized inverses for classification models with arbitrary patterns, J. Statist. Comput. Simul. 9 (1979), 1-17 Dodge, Y. and J. Jureckov`ea (1997): Adaptive choice of trimming proportion in trimmed least-squares estimation, Statistics & Probability Letters 33 (1997), 167-176 Donoho, D.L. and P.J. H¨uber (1983): The notion of breakdown point, Festschrift f¨ur Erich L. Lehmann, eds. P.J. Bickel, K.A. Doksum and J.L. Hodges, Wadsworth, Belmont, Calif. 157-184, 1983 Dorea, C.C.Y. (1997): L1-convergence of a class of algorithms for global optimization, Student 2 (1997) Dorrie, H. (1948): Kubische und biquadratische Gleichungen. Leibniz-Verlag, Miinchen Downs, T.D. and A.L. Gould (1967): Some relationships between the normal and von Mises distributions, Biometrika 54 (1967), 684-687 Draper, N.R. and R. Craig van Nostrand (1979): Ridge regression and James-Stein estimation: review and comments, Technometrics 21 (1979), 451-466 Draper, N.R. and J.A. John (1981): Influential observations and outliers in regression, Technometrics 23 (1981), 21-26 Draper, N.R. and F. Pukelsheim (1996): An overview of design of experiments, Statistical Papers 37 (1996), 1-32 Draper, N.R. and F. Pukelsheim (1998): Mixture models based on homogenous polynomials, J. Stat., Planning and Inference, 71 (1998), pp. 303-311 Draper, N.R. and F. Pukelsheim (2002): Generalized Ridge Analysis Under Linear Restrictions, With Particular Applications to Mixture Experiments Problems, Technometrics, 44, 250-259. Draper, N.R. and F. Pukelsheim (2003): Canonical reduction of second order fitted models subject to linear restrictions, Statistics and Probability Letters, 63 (2003), pp. 401-410 Drauschke, M., Forstner, W. and Brunn, A. (2009): Multidodging: Ein effizienter Algorithmus zur automatischen Verbesserung von digitalisierten Luftbildern. Seyfert, Eckhardt (Hg.): Publikationen der DGPF, Band 18: Zukunft mit Tradition. Jena 2009, S. 61-68. Drauschke, M. and Roscher, R. and Labe, T. and Forstner, W. (2009): Improving Image Segmentation using Multiple View Analysis. Object Extraction for 3D City Models, Road Databases and Traffic Monitoring - Concepts, Algorithms and Evaluatin (CMRT09). 2009, S. 211-216. Driscoll, M.F. (1999): An improved result relating quadratic forms and Chi-Square Distributions, The American Statistician 53 (1999), 273-275 Driscoll, M.F. and B. Krasnicka (1995): An accessible proof of Craig’s theorem in the general case, The American Statistician 49 (1995), 59-62 Droge, B. (1998): Minimax regret analysis of orthogonal series regression estimation: selection versus shrinkage, Biometrika 85 (1998), 631-643
References
917
Dryden, I.L. (1998): General registration and shape analysis. STAT 96/03. Stochastic Geometry: Likelihood and Computation, by W.S. Kendell, O. Barndoff-Nielsen and M.N.N. van Lieshout, Chapman and Hall/CRC Press, 333-364, Boca Raton 1998. Drygas, H. (1970): The coordinate-free approach to Gauss-Markov estimation. SpringerVerlag, Berlin-Heidelberg-New York, 1970. Drygas, H. (1972): The estimation of residual variance in regression analysis, Math. Operations Forsch. Atatist., 3 (1972), pp. 373-388 Drygas, H. (1975a): Weak and strong consistency of the least squares estimators in regression models. Z. Wahrscheinlichkeitstheor. Verw. Geb. 34 (1975), 119-127. Drygas, H. (1975b): Estimation and prediction for linear models in general spaces, Math. Operationsforsch. Statistik 6 (1975), 301-324 Drygas, H. (1977): Best quadratic unbiased estimation in variance-covariance component models, Math. Operationsforsch. Statistik, 8 (1977), pp. 211-231 Drygas, H. (1983): Sufficiency and completeness in the general Gauss-Markov model, Sankhya Ser. A45 (1983), 88-98 Duan, J.C. (1997): Augmented GARCH (p,q) process and its diffusion limit, J. Econometrics 79 (1997), 97-127 Duda, R.O., Hart, P.E. and Stork, D.G. (2000): Pattern classification, 2ed., Wiley, New York, 2000 Duncan, W.J. (1944): Some devices for the solution of large sets of simultaneous linear equations, London, Edinburgh and Dublin Philosophical Magazine and Journal of Science (7th series) 35 (1944), 660-670 Dunfour, J.M. (1986): Bias of s2 in linear regression with dependent errors, The American Statistician 40 (1986), 284-285 Dunnett, C.W. and M. Sobel (1954): A bivariate generalization of Student’s t-distribution, with tables for certain special cases, Biometrika 41 (1954), 153-69 Dupuis, D.J., Hamilton, D.C. (2000): Regression residuals and test statistics: Assessing naive outlier deletion. Can. J. Stat. 28, 2 (2000), 259-275 Durand, D. and J.A. Greenwood (1957): Random unit vectors II : usefulness of Gram-Charlier and related series in approximating distributions, Ann. Math. Statist. 28 (1957), 978-986 Durbin, J. and G.S. Watson (1950): Testing for serial correlation in least squares regression, Biometrika 37 (1950), 409-428 Durbin, J. and G.S. Watson (1951): Testing for serial correlation in least squares regression II, Biometrika 38 (1951), 159-177 Dvorak, G. and Benveniste, Y. (1992): On transformation strains and uniform fields in multiphase elastic media, Proc. R. Soc. Lond., A437, 291-310 ¨ Ecker, E. (1977): Ausgleichung nach der Methode der kleinsten Quadrate, Ost. Z. Vermessungswesen 64 (1977), 41-53 Eckert, M (1935): Eine neue a¨ chentrene (azimutale) Erdkarte, Petermann’s Mitteilungen 81 (1935), 190-192 Eckhart, C. and G. Young (1939): A principal axis transformation for non-Hermitean matrices, Bull. Amer. Math. Soc. 45 (1939), 188-121 Eckl, M.C., Snay, R.A., Solder, T., Cline, M.W. and G.L. Mader (2001): Accuracy of GPS-derived positions as a function of interstation distance and observing-session duration, Journal of Geodesy 75 (2001), 633-640 Edelman, A. (1989): Eigenvalues and condition numbers of random matrices, PhD dissertation, Massachussetts Institute of Technology 1989 Edelman, A. (1998): The geometry of algorithms with orthogonality constraints, SIAM J. Matrix Anal. Appl. 20 (1998), 303-353 Edelman, A., Elmroth, E. and B. Kagstr¨om (1997): A geometric approach to perturbation theory of matrices and matrix pencils. Part I: Versal deformations, SIAM J. Matrix Anal. Appl. 18 (1997), 653-692 Edelman, A., Arias, T.A. and Smith, S.T. (1998): The geometry of algorithms with orthogonality constraints, SIAM J. Matrix Anal. Appl. 20 (1998), 303-353
918
References
Edgar, G.A. (1998): Integral, probability, and fractal measures, Springer-Verlag, Heidelberg Berlin New York 1998 Edgeworth, F.Y. (1883): The law of error, Philosophical Magazine 16 (1883), 300-309 Eeg, J. and T. Krarup (1973): Integrated geodesy, Danish Geodetic Institute, Report No. 7, Copenhagen 1973 Effros, E.G. (1997): Dimensions and C* algebras, Regional Conference Series in Mathematics 46, Rhode Island 1997 Efron, B. and R.J. Tibshirani (1994): An introduction to the bootstrap, Chapman and Hall, Boca Raton 1994 Eibassiouni, M.Y. and J. Seely (1980): Optimal tests for certain functions of the parameters in a covariance matrix with linear structure, Sankya 42 (1980), 64-77 Eichhorn, A. (2005): Ein Beitrag zur parametrischen Identifikation von dynamischen Strukturmodellen mit Methoden der adaptiven Kalman Filterung, Deutsche Geod¨atische Kommission, Bayerische Akademic der Wissenschaften, Reihe C 585, M¨unchen 2005 Einstein, A. (1905): Ueber die von der molekularkinetische Theorie der Waerme geforderte Berechnung von in Fluessigkeiten suspendierten Teilchen, Ann. Phys. 17 (1905) 549-560 Ekblom, S. and S. Henriksson (1969): Lp-criteria for the estimation of location parameters, SIAM J. Appl. Math. 17 (1969), 1130-1141 Elden, L. (1977): Algorithms for the regularization of ill-conditioned least squares problems, 17 (1977), 134-145 Elhay, S., Golub, G.H. and J. Kautsky (1991): Updating and downdating of orthogonal polynomials with data fitting applications, SIAM J. Matrix Anal. Appl. 12 (1991), 327-353 Elian, S.N. (2000): Simple forms of the best linear unbiased predictor in the general linear regression model, American Statistician 54 (2000), 25-28 Ellis, R.L. and I. Gohberg (2003): Orthogonal systems and convolution operators, Birkh¨auserVerlag, Basel Boston Berlin 2003 Ellis, S.P. (1998): Instability of least squares, least absolute correlation and least median of squared linear regression, Stst. Sci., 13 (1998), pp. 337-350 Ellis, S.P. (2000): Singularity and outliers in linear regression with application to least squares, least absolute deviation, and least median of squares linear regression. Metron 58, 1-2 (2000), 121-129. Ellenberg, J.H. (1973): The joint distribution of the standardized least squares residuals from a general linear regression, J. Am. Statist. Ass. 68 (1973), 941-943 Elpelt, B. (1989): On linear statistical models of commutative quadratic type, Commun. Statist.-Theory Method 18 (1989), 3407-3450 El-Basssiouni, M.Y. and Seely, J. (1980): Optimal tests for certain functions of the parameters in a covariance matrix with linear structure, Sankya A42 (1980), 64-77 Elfving, G. (1952): Optimum allocation in linear regression theory, Ann. Math. Stat. 23 (1952), 255-263 El-Sayed, S.M. (1996): The sampling distribution of ridge parameter estimator, Egyptian Statistical Journal, ISSR - Cairo University 40 (1996), 211-219 Embree, P.J., Burg, P. and Backus, M.M. (1963): Wide band velocity filtering - the pie slid process, Geophysics 28 (1963), 948-974 Engeln-Mullges, G. and Reutter, F. (1990): Formelsammlung zur numerischen Mathematik mit C-Programmen. Bibliographisches Institut Mannheim & F. A. Brockhaus AG Engl, H.W., Hanke, M. and A. Neubauer (1996): Regularization of inverse problems, Kluwer Academic Publishers, Dordrecht 1996 Engl, H.W., A.K. Louis, and W. Rundell (1997): Inverse problems in geophysical applications, SIAM, Philadelphia 1997 Engler, K., Grafarend, E.W., Teunissen, P. and J. Zaiser (1982): Test computations of threedimensional geodetic networks with observables in geometry and gravity space, Proceedings of the International Symposium on Geodetic Networks and Computations. Vol. VII, 119-141 Report B258/VII. Deutsche Geod¨atische Kommission, Bayerische Akademie der Wissenschaften, M¨unchen 1982.
References
919
Eringen, A.C. (1962): Non linear theory of continuous media, McGraw Hill Book Comp., New York Ernst, M.D. (1998): A multivariate generalized Laplace distribution, Computational Statistics 13 (1998), 227-232 Eshagh, M. and Sj¨oberg (2008): The modified best quadratic unbiased non-negative estimator (MBQUNE) of variance components. Stud. Geophysics Geod. 52 (2008) 305-320 Ethro, U. (1991): Statistical Test of Significance for Testing Outlying Observations. Survey Review. 31:62-70. Euler, N. and W.H. Steeb (1992): Continuous symmetry, Lie algebras and differential equations, B.I. Wissenschaftsverlag, Mannheim 1992 Even Tzur, G. (2004): Variance Factor Estimation for Two-Step Analysis of Deformation Networks. Journal of surveying engineering. Journal of Surveying Engineering, 130(3): 113-118. Even Tzur, G. (2006): Datuln Definition and its Influence on the Reliability of Geodetic Networks. ZN, 131: 87-95. Even Tzur, G. (2009): Two-Steps Analysis of Movement of the Kfar-Hanassi Network. FIG Working week 2009, Eilat, Israel. Everitt, B.S. (1987): Introduction to optimization methods and their application in statistics, Chapman and Hall, London 1987 Fahrmeir, L. and G. Tutz (2001): Multivariate statistical modelling based on generalized linear models, Springer-Verlag, Heidelberg Berlin New York 2001 Fakeev, A.G. (1981): A class of iterative processes for solving degenerate systems of linear algebraic equations, USSR. Comp. Maths. Math. Phys. 21 (1981), 15-22 Falk, M., H¨usler, J. and R.D. Reiss (1994): Law of small numbers, extremes and rare events, Birkh¨auser-Verlag, Basel Boston Berlin 1994 Fan, J. and I. Gijbels (1996): Local polynomial modelling and its applications, Chapman and Hall, Boca Raton 1996 Fang, K.-T. and Y. Wang (1993): Number-theoretic methods in statistics, Chapman and Hall, Boca Raton 1993 Fang, K.-T. and Y.-T. Zhang (1990): Generalized multivariate analysis, Science Press Beijing, Springer Verlag, Bejing, Berlin New York 1990 Fang, K.-T., Kotz, S. and K.W. Ng (1990): Symmetric multivariate and related distributions, Chapman and Hall, London 1990 Farahmand, K. (1996): Random polynomials with complex coefficients, Statistics & Probability Letters 27 (1996), 347-355 Farahmand, K. (1999): On random algebraic polynomials, Proceedings of the American Math. Soc. 127 (1999), 3339-3344 Faraway, J. (2006). Extending the linear model with R. Chapman and Hall, CRE. Farebrother, R.W. (1988): Linear least squares computations, Dekker, New York 1988 Farebrother, R.W. (1999): Fitting linear relationships, Springer-Verlag, Heidelberg Berlin New York 1999 Farrel, R.H. (1964): Estimators of a location parameter in the absolutely continuous case, Ann. Math. Statist. 35 (1964), 949-998 Fass´eo, A. (1997): On a rank test for autoregressive conditional heteroscedasticity, Student 2 (1997), 85-94 Faulkenberry, G.D. (1973): A method of obtaining prediction intervals, J. Am. Statist. Ass. 68 (1973), 433-435 Fausett, D.W. and C.T. Fulton (1994): Large least squares problems involving Kronecker products, SIAM J. Matrix Anal. Appl. 15 (1994), 219-227 Fazekas, I. and Kukush, A. S. Zwanzig (2004): Correction of nonlinear orthogonal regression estimator. Ukr. Math. J. 56, 1101-1118 Fedorov, V.V. (1972): Theory of Optimal Experiments. Academic Press Fedorov, V.V. and Hackl, P. (1994): Optimal experimental design: spatial sampling. Calcutta Stat Assoc Bull 44:173-194
920
References
Federov, V.V. and M¨uller, W.G. (1988): Two approaches in optimization of observing networks, in Optimal Design and Analysis of Experiments, Y. Dodge, V.V. Federov & H.P. Wynn, eds, North-Holland, Amsterdam, pp. 239-256 Fedorov, V.V. and Flagnagan, D. (1997): Optimal monitoring on Merer‘s expansion of covariance kernel, J. Comb. Inf. Syst. Sci. 23(1997) 237-250 Fedorov, V.V. and P. Hackl (1997): Model-oriented design of experiments, Springer-Verlag, Heidelberg Berlin New York 1997 Fedorov, V.V., Montepiedra, G. and C.J. Nachtsheim (1999): Design of experiments for locally weighted regression, J. Statist. Planning and Inference 81 (1999), 363-382 Fedorov, V.V. and M¨uller, W.C. (2007): Optimum desighn of correlated fields via covariance kernel expansion, in: Lopez Fidalgo, Rodriguez, J. and Forney, B. (eds) Physica Verlag Heidelberg Feinstein, A.R. (1996): Multivariate analysis, Yale University Press, New Haven 1996 Feldmann, R.V. (1962): Matrix theory I: Arthur Cayley - founder of matrix theory, Mathematics teacher, 57, (1962), pp. 482-484 Felus, Y.A. and R.C. Burtch (2005): A new procrustes algorithm for three dimensional datum conversion, Ferris State University 2005 Fengler, M., Freeden, W. and V. Michel (2003): The Kaiserslautern multiscale geopotential model SWITCH-03 from orbit pertubations of the satellite CHAMP and its comparison to the models EGM96, UCPH2002-02-0.5, EIGEN-1s, and EIGEN-2, Geophysical Journal International (submitted) 2003 Fenyo, S. and H.W. Stolle (1984): Theorie und Praxis der linearen Integralgleichungen 4. Berlin Feuerverger, A. and P. Hall (1998): On statistical inference based on record values, Extremes 1:2 (1998), 169-190 Fiebig, D.G., Bartels, R. and W.Kr¨amer (1996): The Frisch-Waugh theorem and generalized least squares, Econometric Reviews 15 (1996), 431-443 Fierro, R.D. (1996): Pertubation analysis for twp-sided (or complete) orthogonal decompositions, SIAM J. Matrix Anal. Appl. 17 (1996), 383-400 Fierro, R.D. and J.R. Bunch (1995): Bounding the subspaces from rank revealing two-sided orthogonal decompositions, SIAM J. Matrix Anal. Appl. 16 (1995), 743-759 Fierro, R.D. and P.C. Hansen (1995): Accuracy of TSVD solutions computed from rank-revealing decompositions, Numer. Math. 70 (1995), 453-471 Fierro, R.D. and P.C. Hansen (1997): Low-rank revealing UTV decompositions, Numer. Algorithms 15 (1997), 37-55 Fill, J.A. and D.E. Fishkind (1999): The Moore-Penrose generalized inverse for sums of matrices, SIAM J. Matrix Anal. Appl. 21 (1999), 629-635 Finsterwalder, S. and Scheufele, W. (1937): Das R¨uckwartseinschneiden im Raum, Sebastian Finsterwalder zum 75 Geburtstage, pp. 86-100, Verlag Hebert Wichmann, Berlin 1937. ¨ Fischer, F.A. (1969): Einfuhrung in die Statische Ubertragungstheorie, Bib. Institute, Maurheim, Zurigh 1969 Fischler, M.A. and Bolles, R.C. (1981): Random Sample Consensus: A Paradigm for Medell Fitting with Application to Image Analysis and Automated Cartography, Communications of the ACM 24 (1981) 381-395 Fiserova, E. (2004): Estimation in universal models with restrictions. Discuss. Math., Probab. Stat. 24, 2 (2004), 233-253. Fiserova, E. (2006): Testing hypotheses in universal models. Discuss. Math., Probab. Stat. 26 (2006), 127-149 Fiserova, E. and Kubacek, L. (2003) Sensitivity analysis in singular mixed linear models with constraints. Kybernetika 39, 3 (2003), 317-332 Fiserova, E. and Kubacek, L. (2004): Statistical problems of measurement in triangle. Folia Fac. Sci. Nat. Univ. Masarykianae Brunensis, Math. 15 (2004), 77-94 Fiserova, E. and Kubacek, L. (2003): Insensitivity regions and outliers in mixed models with constraints. Austrian Journal of Statistic 35, 2-3 (2003), 245-252 Fisher, N.I. (1993): Statistical Analysis of Circular Data, Cambridge University Press, Cambridge 1993
References
921
Fisher, N.I. and Hall, P. (1989): Bootstrap confidence regions for directional data, J. Am. Statist. Ass. 84 (1989), 996-1002 Fisher, N.J. (1985): Spherical medians, J. Roy. Statist. Soc. B47 (1985), 342-348 Fisher, N.J. and A.J. Lee (1983): A correlation coefficient for circular data, Biometrika 70 (1983), 327-332 Fisher, N.J. and A.J. Lee (1986): Correlation coefficients for random variables on a sphere or hypersphere, Biometrika 73 (1986), 159-164 Fisher, R.A. (1915): Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population, Biometrika 10 (1915), 507-521 Fisher, R.A. (1935): The fiducial argument in statistical inference, Annals of Eugenics 6 (1935), 391-398 Fisher, R.A. (1939): The sampling distribution of some statistics obtained from nonlinear equations, Ann. Eugen. 9 (1939), 238-249 Fisher, R.A. (1953): Dispersion on a sphere, Pro. Roy. Soc. Lond. A217 (1953), 295-305 Fisher, R.A. (1956): Statistical Methods and Scientific Inference, Edinburg: Oliver and Boyd, 1956 Fisher, R.A. and F. Yates (1942): Statistical tables for biological, agricultural and medical research, 2nd edition, Oliver and Boyd, Edinburgh 1942 Fisz, M. (1971): Wahrscheinlichkeitsrechnung und mathematische Statistik, Berlin (Ost) 1971 Fitzgerald, W.J., Smith, R.L., Walden, A.T. and P.C. Young (2001): Non-linear and nonstationary signal processing, Cambridge University Press, Cambridge 2001 Fitzgibbon, A., Pilu, M. and Fisher, R.B. (1999): Direct least squares fitting of ellipses, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (1999) 476-480. Fletcher, R. and C.M. Reeves (1964): Function minimization by conjugate gradients, Comput. J. 7 (1964) 149-154 Fletcher, R. (1987): Practical Methods of Optimization. Second Edition, John Wiley & Sons, Chicester. ¨ Fletling, R. (2007): Fuzzy clusterung zur Analyse von Uberwachungsmessungen, in: F.K. Brunner (ed.) Ingenienr vermessung, pp. 311-316, Wichmann Verlag, Heidelberg 2007 Flury, B. (1997): A first course in multivariate statistics, Springer-Verlag, Heidelberg Berlin New York 1997 ¨ Focke, J. and G. Dewess (1972): Uber die Sch¨atzmethode MINQUE von C.R. Rao und ihre Verallgemeinerung, Math. Operationsforschg. Statistik 3 (1972), 129-143 F¨orstner, W. (1976): Statistical Test Methods for Blunder Detection in Planimetric Block Triangulation. Presented paper to Comm. III, ISP Helsinki 1976 F¨orstner, W. (1978a): PrOfung auf grobe Bildkoordinatenfehler bei der relativen Orientierung? BuL, 1978, 205-212 F¨orstner, W. (1978b): Die Suche nach groben Fehlern in photogrammetrischen Lageblocken, Dissertation der Universitat Stuttgart . F¨orstner, W. (1979a): Ein Verfahren zur Schatzung von Varianz- und Kovarianzkomponenten, Ally. Vermess. Nachrichten 86 (1979), pp. 446-453 F¨orstner, W. (1979b): On Internal and External Reliability of Photogrammetric Coordinates. Proceedings of the Annual Convention of the ASP-ACSM, Washington 1979 F¨orstner, W. (1979c): Konvergenzbeschleunigung bei der a posteriori Varianzsch¨atzung, Zeitschrift f¨ur Vermessungswesen 104 (1979), 149-156 F¨orstner, W. (1979d): Ein Verfahren zur Sch¨atzung von Varianz- und Kovarianz-Komponenten, Allgemeine Vermessungsnachrichten 86 (1979), 446-453 F¨orstner, W. (1980a): The Theoretical Reliability of Photogrammetric Coordinates. Archives of the ISPRS. XIV Congress of the Int. Society for Photogrammetry Comm. 111, Hamburg, 1980 F¨orstner, W. (1980b): Zur PrOfung zusatzlicher Parameter in Ausgleichungen. ZN, 1980, 105, 510-519 F¨orstner, W. (1983): Reliability and Discernability of Extended Gauss-Markov Models. Ackermann, F. (ed.) Deutsche Geod¨atische Kommission, A98, 1983, 79-103
922
References
F¨orstner, W. (1985): Determination of the Additive Noise Variance in Observed Autoregressive Processes using Variance Component Estimation Technique. Statistics and Decision, Supplement Issue No. 2. M¨unchen 1985, S. 263-274. F¨orstner, W. (1987): Reliability Analysis of Parameter Estimation in Linear Models with Applications to Mensuration Problems in Computer Vision. CVGIP - Computer Vision, Graphics, and Image Processing. 1987, S. 273-310. F¨orstner, W. (1990): Modelle intelligenter Bildsensoren und ihre Qualit¨at. Grol1 Kopf, R.E. (ed.) Springer, 1990, 254, 1-21 F¨orstner, W. (1993a): Image Matching. In: Haralick, R.M. I Shapiro, L.G. (Hg.): Computer and -RObot Vision. Addison-Wesley, S. 289-379 F¨orstner, W. (1993c): Uncertain Spatial Relationships and their Use for Object location in Digital/mages. Forstner, W.; Haralick, R.M. Radig, B. (ed.)lnstitut f¨ur Photogrammetrie, Universitat Bonn, 1992 (Buchbeitrag) F¨orstner, W. (1994a): Diagnostics and Performance Evaluation in Computer Vision. Performance versus Methodology in Computer Vision, NSF/ARPA Workshop. Seattle 1994, S. 11-25. F¨orstner, W. (1994b): Generic Estimation Procedures for Orientation with Minimum and Redundant Information Technical Report. Institute for Photogrammetry, University Bonn. F¨orstner, W. (1995a): A 2D-data Model for Representing Buildings. Technical Report, Institut f¨ur Photogrammetrie, Universitat Bonn F¨orstner, W. (1995b): The Role of Robustness in Computer Vision. Proc. Workshop “Milestones in CCfmi5uier Vision”. Vorau 1995. F¨orstner, W. (1997a):Evaluation of Restoration Filters. Technical Report, Institut f¨ur Photogrammetrie, Universitat Bonn F¨orstner, W. (1997b): Parameter free information Preserving filter and Noise Variance Equalization, Technical Report, Institut f¨ur Photogrammetrie, Universitat Bonn F¨orstner, W. (1998): On the Theoretical Accuracy of Multi Image Matching, Restoration and Triangulation. Festschrift zum 65. Geburtstag von Prof. Dr.-Ing. mult. G. Konecny. Institut f¨ur Photogrammetrie, Universitat Hannover 1998 F¨orstner, W. (1999a): On Estimating Rotations. Heipke, C. / Mayer, H. (Hg.): Festschrift f¨ur Prof. Dr.-Ing. Heinrich Ebner zum 60. Geburtstag. Lehrstuhl f¨ur Photogrammetrie und Fernerkundung, TU M¨unchen 1999. F¨orstner, W. (1999b): Uncertain Neighborhood Relations of Point Sets and Fuzzy Delaunay Triangulation. Proceedings of DAGM Symposium Mustererkennung. Bonn, Germany 1999. F¨orstner, W. (1999c): Choosing Constraints for Points and Lines in Image Triplets 1999. Technical Report, Institut f¨ur Photogrammetrie, University Bonn. F¨orstner, W. (1999d): Interior Orientation from Three Vanishing Points, Technical Report, Institut f¨ur Photogrammetrie, Universitat Bonn F¨orstner, W. (1999e): Tensor-based Distance Transform. Technical Report, Institut f¨ur Photogrammetrie, Universitat Bonn. F¨orstner, W. (2000): Optimally Reconstructing the Geometry of Image Triplets. Vernon, David (Hg.): Appeared in: Computer Vision - ECCV 2000. 2000, S. 669-684. F¨orstner, W. (2001a): Algebraic Projective Geometry and Direct Optimal Estimation of Geometric Entities. Appeared at the Annual OeAGM 2001 Workshop. F¨orstner, W. (2001b): Direct optimal estimation of geometric entities using algebraic projective geometry. Festschrift anlasslich des 60. Geburtstages von Prof. Dr.-Ing. Bernhard Wrobel. Technische Universitat Darmstadt 2001, S. 69-87. F¨orstner, W. (2002): Computer Vision and Photogrammetry – Mutual Questions: Geometry, Statistics and Cognition. Bildteknik/lmage Science, Swedish Society for Photogrammetry and Remote Sensing. 2002, S. 151-164. F¨orstner, W. (2003): Notions of Scale in Geosciences. Neugebauer, Horst J. I Simmer, Clemens (Hg.): Dynamics of Multi-Scale Earth Systems. 2003, S. 17-39. F¨orstner, W. (2005): Uncertainty and Projective Geometry. Bayro Corrochano, Eduardo (Hg.): Handbook of Geometric Computing. 2005, S. 493-535.
References
923
F¨orstner, W. (2009): Computer Vision and Remote Sensing - Lessons Learned. Fritsch, Dieter (Hg.): Photogrammetric Week 2009. Heidelberg 2009, S. 241-249. F¨orstner, W. and Schroth, R. (1981): On the estimation of covariance matrices for photogrammetric image coordinates. VI: Adjustment procedures of geodetic networks, 43-70. F¨orstner, W. and Wrobel, B. (1986): Digitale Photogrammetrie, Springer (Buch) F¨orstner, W. and Gulch, E. (1987): A Fast Operator for Detection and Precise Location of Distinct Points, Corners and Centres of Circular Features. Proceedings of the ISPRS Intercommission Workshop, 1987. Jg. 1987. F¨orstner, W. and Vosselmann, G. (1988): The Precision of a Digital Camera. ISPRS 16th Congress. Kyoto 1988, S. 148-157. F¨orstner, W. and Braun, C. (1993): Direkte Schatzung von Richtungen und Rotationen. Institut f¨ur Photogrammetrie, Bonn F¨orstner, W. and Shao, J. (1994): Gabor Wavelets for Texture Edge Extraction. ISPRS Commission III Symposium on Spatial Information from Digital Photogrammetry and Computer Vision. Munich, Germany F¨orstner, W. and Brugelmann, R. (1994): ENOVA - Estimation of Signal Dependent Noise Variance. 3rd ECCV, Stockholm F¨orstner, W. and Moonen, B. (1999): A Metric for Covariance Matrices. Krumm, F. Schwarze, V.S. (Hg.): Quo vadis geodesia . . . ?, Festschrift for Erik W. Grafarend on the occasion of his 60th birthday. TR Dept. of Geodesy and Geoinformatics, Stuttgart University 1999. F¨orstner, W. and B. Moonen (2003): A metric for covariance matrices, in: E. Grafarend, F. Krumm and V. Schwarze: Geodesy - the Challenge of the 3rd Millenium, 299-309, Springer-Verlag, Heidelberg Berlin New York 2003 F¨orstner, W. and Dickscheid, T. and Schindler, F. (2009): Detecting Interpretable and Accurate Scale-Invariant Keypoint. 12th IEEE International Conference on Computer Vision (ICCV’09). Kyoto, Japan 2009, S. 2256-2263. Forsgren, A. and W. Murray (1997): Newton methods for large-scale linear inequality-constrained minimization, Siam J. Optim. to appear Forsythe, A.R. (1960): Calculus of Variations. New York Forsythe, A.B. (1972): Robust estimation of straight line regression coefficients by minimizing p-th power deviations, Technometrics 14 (1972), 159-166 Foster, L.V. (2003): Solving rank-deficient and ill-posed problems using UTV and QR factorizations, SIAM J. Matrix Anal. Appl. 25 (2003), 582-600 Fotiou, A. and D. Rossikopoulos (1993): Adjustment, variance component estimation and testing with the affine and similarity transformations, Zeitschrift f¨ur Vermessungswesen 118 (1993), 494-503 Fotiou, A.(1998): A pair of closed expressions to transform geocentric to geodetic coordinates, Zeitschrift f¨ur Vermessungswesen 123 (1998) 133-135. Fotopoulos, G. (2003): An Analysis on the Optimal Combination of Geoid, Orthometric and Ellipsoidal Height Data. Report No. 20185, Department of Geomatics Engineering, University of Calgary, Calgary, Canada. Fotopoulos, G., (2005): Calibration of geoid error models via a combined adjustment of ellipsoidal, orthometric and gravimetrical geoid height data. 1. Geodesy, 79, 111-123. Forstner, W., (1979): Ein Verfahren zur Achatzung von varianz und kovarianzkomponenten. Allgemeine Vermessungs-Nachrichten, 86, 446-453. Foucart, T. (1999): Stability of the inverse correlation matrix. Partial ridge regression, J. Statist. Planning and Inference 77 (1999), 141-154 Fox, A.J. (1972): Outliers in time series. J. Royal Stat. Soc., Series B, 43, (1972), 350-363 Fox, M. and H. Rubin (1964): Admissibility of quantile estimates of a single location parameter, Ann. Math. Statist. 35 (1964), 1019-1031 Franses, P.H. (1998): Time series models for business and economic forecasting, Cambridge University Press, Cambridge 1998 Fraser, D.A.S. (1963): On sufficiency and the exponential family, J. Roy. Statist. Soc. 25 (1963), 115-123
924
References
Fraser, D.A.S. (1968): The structure of inference, J. Wiley, New York 1968 Fraser, D.A.S. and I. Guttman (1963): Tolerance regions, Ann. Math. Statist. 27 (1957), 162-179 Freeman, R.A. and P.V. Kokotovic (1996): Robust nonlinear control design, Birkh¨auser-Verlag, Basel Boston Berlin 1996 Freund, P.G.O. (1974): Local scale invariance and gravitation, Annals of Physics 84 (1974), 440-454 Frey, M. and J.C. Kern (1997): The Pitman Closeness of a Class of Scaled Estimators, The American Statistician, May 1997, Vol. 51 (1997), 151-154 Frisch, U. (1995): Turbulence, the legacy of A.N. Kolmogorov, Cambridge University Press, Cambridge 1995 Fristedt, B. and L. Gray (1995): A modern approach to probability theory, Birkh¨auser-Verlag, Basel Boston Berlin 1997 Frobenius, F.G. (1893): Ged¨achtnisrede auf Leopold Kronecker (1893), Ferdinand Georg Frobenius, Gesammelte Abhandlungen, ed. J.S. Serre, Band III, 705-724, Springer-Verlag, Heidelberg Berlin New York 1968 ¨ Frobenius, G. (1908): Uber Matrizen aus positiven Elementen, Sitzungsberichte der K¨oniglich Preussischen Akademie der Wissenschaften von Berlin, 471-476, Berlin 1908 Fr¨ohlich, H. and Hansen, H.H. (1976): Zur Lotfußpunktrechnung bei rotationsellipsoidischer Bezugsfl¨ache, Allgemeine Vermessungs-Nachrichten 83 (1976) 175-179. Fuentes, M. (2002): lnterpolation of non stationary air pollution processes: a spatial spectral approach. In statistical Modeling, vol. 2, pp. 281-298. Fuentes, M. and Smith, R. (2001): A new class of non stationary models. In Tech. report at North Carolina State University, Institute of Statistics Fuentes, M., Chaudhuri, A. and Holland, D.M. (2007): Bayesian entropy for spatial sampling design of environmental data. J. Environ. Ecol. Stat. 14:323-340 Fujikoshi, Y. (1980): Asymptotic expansions for the distributions of sample roots under non-normality, Biometrika 67 (1980), 45-51 Fukushima, T. (1999): Fast transform from geocentric to geodetic coordinates, Journal of Geodesy 73 (1999) 603-610. Fuller, W.A. (1987): Measurement error models, Wiley, New York 1987 Fulton, T., Rohrlich, F. and L. Witten (1962): Conformal invariance in physics, Reviews of Modern Physics 34 (1962), 442-457 Furno, M. (1997): A robust heteroskedasticity consistent covariance matrix estimator, Statistics 30 (1997), 201-219 Galil, Z. (1985): Computing d-optimum weighing designs: Where statistics, combinatorics, and computation meet, in: Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, Vol. II, eds. L.M. LeCam and R.A. Olshen, Wadsworth 1985 Gallant, A.R. (1987): Nonlinear statistical models, J. Wiley, New York 1987 Gallavotti, G. (1999): Statistical mechanics: A short treatise, Springer-Verlag, Heidelberg Berlin New York 1999 Gander, W. (1981): Least squares with a quadratic constraint, Numer. Math. 36 (1981), 291-307 Gander, W., Golub, G.H. and Strebel, R. (1994): Least-Squares fitting of circles and ellipses, BIT No. 43 (1994) 558-578. Gandin, L.S. (1963): Objective analysis of meteorological fields. Gidrometeorologicheskoe Izdatel’stvo (GIMIZ), Leningrad Gao, X.S. and Chou, S.C. (1990): Implicitization of rational parametric equations. Technical report, Department of Computer Science TR-90-34, University of Texas, Austin 1990. Gao, S. and T.M.F. Smith (1995): On the nonexistence of a global nonengative minimum bias invariant quadratic estimator of variance components, Statistics and Probability letters 25 (1995), 117-120 Gao, S. and T.M.F. Smith (1998): A constrained MINQU estimator of correlated response variance from unbalanced dara in complex surveys, Statistica Sinica 8 (1998), 1175-1188
References
925
Gao, Y., Lahaye, F., Heroux, P., Liao, X., Beck, N. and M. Olynik (2001): Modeling and estimation of C1-P1 bias in GPS receivers, Journal of Geodesy 74 (2001), 621-626 Garc`e`ıa-Escudero, L.A., Gordaliza, A. and C. Matr`ean (1997): k-medians and trimmed k-medians, Student 2 (1997), 139-148 Garcia-Ligero, M.J., Hermoso, A. and J. Linares (1998): Least squared estimation for distributed parameter systems with uncertain observations: Part 1: Linear prediction and filtering, Applied Stochastic Models and Data Analysis 14 (1998), 11-18 Garderen K.J. van, (1999): Exact geometry of autoregressive models, Journal of Time Series Analysis 20 (1999), 1-21 Gaspari, G. and Cohn, S.E. (1999): Construction of correlation I in two and three dimensions, Q. 1. R. Meteorol. Soc., 757, 1999. Gastwirth, J.L. and H. Rubin (1975): The behaviour of robust estimators on dependent data, Ann. Stat., 3 (1975), pp. 1070-1100 Gatti, M. (2004): An empirical method of estimation of the variance-covariance matrix in GPS network design, Survey Review 37, (2004), pp. 531-541 Gauss, C.F. (1809): Theoria Motus, Corporum Coelesium, Lib. 2, Sec. III, Perthes u. Besser Publ., 205-224, Hamburg 1809 Gauss, C.F. (1816): Bestimmung der Genauigkeit der Beobachtungen, Z. Astronomi 1 (1816), 185-197 Gauss, C.F. (1923): Anwendung der Wahrscheinlichkeitsrechnung auf eine Aufgabe der praktischen Geometrie, Gauss Works IV: 1923 Gautschi, W. (1982): On generating orthogonal polynomials, SIAM Journal on Scientific and Statistical Computing 3 (1982), 289-317 Gautschi, W. (1985): Orthogonal polynomials - constructive theory and applications, J. Comput. Appl. Math. 12/13 (1985), 61-76 Gautschi, W. (1997): Numerical analysis - an introduction, Birkh¨auser-Verlag, Basel Boston Berlin 1997 Gauss, C.F. (1809): Theoria motus corporum coelestium. Gattingen Geisser, S. (1975): The predictive sample reuse method with applications. J. Math. Statist. 28: 385-394 Gelb, A. (1974): Applied optimal estimation, Ed., MIT Press, Cambridge/Mass and London Gelfand, A.E. and D.K. Dey (1988): Improved estimation of the disturbance variance in a linear regression model, J. Econometrics 39 (1988), 387-395 Gelfand, I.M., Kapranov, M.M. and Zelevinsky, A.V. (1990): Generalized Euler Integrals and A-Hypergeometry Functions, Advances in Mathematics 84 (1990) 255-271. Gelfand, I.M., Kapranov, M.M. and Zelevinsky, A.V. (1994): Discriminants, resultants and multidimensional determinants, Birkh¨auser, Boston 1994. Gelderen, M. van and R. Rummel (2001): The solution of the general geodetic boundary value problem by least squares, Journal of Geodesy 75 (2001), 1-11 Gelman, A., Carlin, J.B., Stern, H.S. and D.B. Rubin (1995): Bayesian data analysis, Chapman and Hall, London 1995 Genton, M.G. (1998): Asymptotic variance of M-estimators for dependent Gaussian random variables, Statistics and Probability Lett. 38 (1998), 255-261 Genton, M.G. and Y. Ma (1999): Robustness properties of dispersion estimators, Statistics & Probability Letters 44 (1999), 343-350 Genugten, B. and Van Der, B. (1997): Testing in the restricted linear model using canonical partitions. Linear Algebra Appl. 264 (1997), 349-353. Gephart, J. and Forsyth D. (1984): An improved method for determining the regional stress tensor - using earthquake focal mechanism data: application to the San Fernando earthquake sequence, J. geophys. Res., B89, 9305-9320 Gere, G.M. and W. Weaver (1965): Matrix algebra for engineers, Van Nostrand Co., New York 1956 Ghosh, M., Mukhopadhyay, N. and P.K. Sen (1997): Sequential estimation, J. Wiley, New York 1997
926
References
Ghosh, S. (1996): Wishart distribution via induction, The American Statistician 50 (1996), 243-246 Ghosh, S. (1999): Multivariate analysis, design of experiments, and survey sampling, Marcel Dekker, Basel 1999 Ghosh, S., Beran, J. and J. Innes (1997): Nonparametric conditional quantile estimation in the presence of long memory, Student 2 (1997), 109-117 Ghosh, S. (ed.) (1999): Multivariate analysis, design of experiments, and survey sampling, Marcel Dekker, New York 1999 Giacolone, M. (1997): Lp-norm estimation for nonlinear regression models, Student 2 (1997), 119-130 Giering, O. (1984): Bemerkungen zum ebenen Trilaterationsproblem. Deutsche Geod¨atische Kommission, Reihe A, Nr. 101. Giering, O. (1986): Analytische Behandlung des raumlichen Trilaterationsproblems 4, 6, 0, o. Deutsche Geod¨atische Kommission, Reihe A, Nr. 104 Gil, A. and J. Segura (1998): A code to evaluate prolate and oblate spheroidal harmonics, Computer Physics Communications 108 (1998), 267-278 Gilbert, E.G. and C.P. Foo (1990): Computing the distance between general convex objects in three-dimensional space, JEEE Transactions on Robotics and Automation 6 (1990), 53-61 Gilchrist, W. (1976): Statistical forecasting, Wiley, London 1976 Gill, P.E., Murray, W. and Wright, M.H. (1981): Practical Optimization. Academic Press, London. Gill, P.E., Murray, W. and Saunders, M.A. (2002): Snopt: An SQP algorithm for large scale constrained optimization, Siam J. Optim. 12 (2002), 979-1006 Gillard, D., Wyss, M. and J. Nakata (1992): A seismotectonic model for Western Hawaii based on stress tensor inversion from fault plane solutions, J. Geophys., Res., B97, 6629-6641 Giri, N. (1977): Multivariate statistical inference, Academic Press, New York London 1977 Giri, N. (1993): Introduction to probability and statistics, 2nd edition, Marcel Dekker, New York 1993 Giri, N. (1996a): Multivariate statistical analysis, Marcel Dekker, New York 1996 Giri, N. (1996b): Group invariance in statistical inference, World Scientific, Singapore 1996 Girko, V.L. (1979): Distribution of eigenvalues and eigenvectors of hermitian stochastic matrices, Ukrain. Math. J., 31, 421-424 Girko, V.L. (1985): Spectral theory of random matrices, Russian Math. Surveys, 40, 77-120 Girko, V.L. (1988): Spectral theory of random matrices, Nauka, Moscow 1988 Girko, V.L. (1990): Theory of random determinants, Kluwer Academic Publishers, Dordrecht 1990 Girko, V.L. and A.K. Gupta (1996): Multivariate elliptically contoured linear models and some aspects of the theory of random matrices, in: Multidimensional statistical analysis and theory of random matrices, Proceedings of the Sixth Lukacs Symposium, eds. Gupta, A.K. and V.L. Girko, 327-386, VSP, Utrecht 1996 Glatzer, E. (1999): u¨ ber Versuchsplanungsalgorithmen bei korrelierten Beobachtungen, Master’s thesis, Wirtschaftsuniversit¨at Wien Gleick, J. (1987): Chaos, viking, New York 1987 Gleser, L.J. and I. Olkin (1972): Estimation for a regression model with an unknown covariance matrix, in: Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, Vol. 1, 541-568, University of California Press, Berkeley 1972 Glimm, J. (1960): On a certain class of operator algebras, Trans. American Mathematical Society 95 (1960), 318-340 Gnedenko, B.V. and A.N. Kolmogorov (1968): Limit distributions for sums of independent random variables, Addison-Wesley Publ., Reading, Mass. 1968 Gnedin, A.V. (1993): On multivariate extremal processes, J. Multivar. Anal. 46 (1993), 207-213 Gnedin, A.V. (1994): On a best choice problem with dependent criteria, J. Applied Probability 31 (1994), 221-234 Gneiting, T. (1999): Correlation functions for atmospheric data analysis, Q. J. R. Meteorol. Soc. 125 (1999), 2449-2464
References
927
Gneiting, T. and Savati, Z. (1999): The characterization problem for isotropic covariance functions, Analy. Royal Met. Society 115 (1999) 2449-2450 Gnot, S. and Michalski, A. (1994): Tests based on admissible estimators in two variance components models, Statistics 25 (1994), 213-223 Gnot, S. and Trenkler, G. (1996): Nonnegative quadratic estimation of the mean squared errors of minimax estimators in the linear regression model, Acta Applicandae Mathematicae 43 (1996), 71-80 Godambe, V.P. (1991): Estimating Functions, Oxford University Press 1991 Godambe, V.P. (1995): A unified theory of sampling from finite populations, J. Roy. Statist. Soc. B17 (1955), 268-278 G¨obel, M. (1998): A constructive description of SAGBI bases for polynomial invariants of permutation groups, J. Symbolic Computation 26 (1998), 261-272 Goldberger, A.S. (1962): Best linear unbiased prediction in the generalized linear regression model, J. Am. Statist. Ass. 57 (1962), 369-375 Goldberger, A.S. (1964): Economic theory, Wiley, New York, 1964 Goldberg, M.L., Bates, D.M. and Watts, D.G. (1983): Simplified methods for assessing nonlinearity. American Statistical Association/Business and Economic Statistics Section, 67-74. Goldie, C.M. and S. Resnick (1989): Records in a partially ordered set, Annals Probability 17 (1989), 678-689 Goldie, C.M. and S. Resnick (1995): Many multivariate records, Stochastic Processes Appl. 59 (1995), 185-216 Goldie, C.M. and S. Resnick (1996): Ordered independent scattering, Commun. Statist. Stochastic Models 12 (1996), 523-528 Goldman, A.J. and Zelen, M. (2003): Weak generalized inverses and minimum variance linear unbiased estimation. J. Res. Natl. Bur. Stand., Sect. B 68 (1964), 151-172. Goldstine, H. (1977): A history of numerical analysis from the 16th through the 19th century, Springer-Verlag, Heidelberg Berlin New York 1977 Golshstein, E.G. and N.V. Tretyakov (1996): Modified Lagrangian and monotone maps in optimization, J. Wiley, New York 1996 Golub, G.H. (1968): Least squares, singular values and matrix approximations, Aplikace Matematiky 13 (1968), 44-51 Golub, G.H. (1973): Some modified matrix eigenvalue problems, SIAM Review 15 (1973), 318-334 Golub, G.H. and C.F. van Loan (1983): Matrix Computations. North Oxford Academic, Oxford. Golub, G.H. and C.F. van Loan (1996): Matrix computations, 3rd edition, John Hopkins University Press, Baltimore 1996 Golub, G.H. and W. Kahan (1965): Calculating the singular values and pseudo-inverse of a matrix, SIAM J Numer. Anal. 2 (1965), 205-224 Golub, G.H. and U. von Matt (1991): Quadratically constrained least squares and quadratic problems, Numer. Math. 59 (1991), 561-580 Golub, G.H., Hansen, P.C. and O’Leary, D.P. (1999): Tikhonov regularization and total least squares, SIAM J. Matrix Anal. Appl. 21 (1999), 185-194 G`eomez, E., G`eomez-Villegas, M.A. and Mar`e`ın, J.M. (1998): A multivariate generalization of the power exponential family of distributions, Commun. Statist. - Theory Meth. 27 (1998), 589-600 Gonin, R. and A.H. Money (1987): A review of computational methods for solving the nonlinear L 1 norm estimation problem, in: Statistical data analysis based on the L1 norm and related methods, Ed. Y. Dodge, North Holland 1987 Gonin, R. and A.H. Money (1989): Nonlinear Lp-norm estimation, Marcel Dekker, New York 1989 Goodall, C. (1991): Procrustes methods in the statistical analysis of shape, J. Roy. Stat. Soc. B53 (1991), 285-339 Goodman, J.W. (1985): Statistical optics, J. Wiley, New York 1985 Gordon, A.D. (1997): L1-norm and L2-norm methodology in cluster analysis, Student 2 (1997), 181-193 Gordon, A.D. (1999): Classification, 2nd edition, Chapman and Hall, Yew York 1999
928
References
Gordon, L. and M. Hudson (1977): A characterization of the Von Mises Distribution, Ann. Statist. 5 (1977), 813-814 Gordon, R. and Stein, S. (1992): Global tectonics and space geodesy, Science, 256, 333-342 Gordonova, V.I. (1973): The validation of algorithms for choosing the regularization parameter, Zh. vychisl. Mat. mat. Fiz. 13 (1973), 1328-1332 Gorman, O.T.W. (2001): Adaptive estimation using weighted least squares, Aust. N. Z. J. Stat. 43 (2001), 287-297 Gosset, W.S. (1908): The probable error of a mean, Biometrika, 6, 1-25. Gotthardt, E. (1940): Zur Unbestimmtheit des r¨aumlichen R¨uckw¨artseinschnittes, Mitteilungen der Ges. f. Photogrammetry e.V., J¨anner 1940, Heft 5. Gotthardt, E. (1971): Grundsutzliches zur Fehlertheorie and Ausgleichung von Polygonzugen and Polygonnetzen, Wichmann Verlag, Karlsruhe 1971 Gotthardt, E. (1974): Ein neuer gef¨ahrlicher Ort zum r¨aumlichen R¨uckw¨artseinschneiden, Bildm. u. Luftbildw., 1974. Gotthardt E. (1978): Einf¨uhrung in die Ausgleichungsrechnung, 2nd ed., Wichmann, Karlsruhe 1978 Gould, A.L. (1969): A regression technique for angular varietes, Biometrica 25 (1969), 683-700 Gower, J.C. (1974): The median center, Applied Statistics 2, (1974), pp. 466-470 Grabisch, M. (1998): Fuzzy Integral as a Flexible and Interpretable Tool of Aggregation, In (Ed. B. Bouchon-Meunier, Aggregation and Fusion of Imperfect Information, pp. 51-72, Physica Verlag, Heidelberg, 1998 Grafarend, E.W.: a) DGK C 153, M¨unchen 1970; b) AVN 77, 17 (1970); c) Vermessungstechnik 19, 66 (1971); d) ZfV 96, 41 (1971); e) ZAMM. imDruck. Grafarend, E.W. (1967a): Bergbaubedingte Deformation und ihr Deformationstensor, Bergbauwissenschaften 14 (1967), 125-132 Grafarend, E.W. (1967b): Allgemeiner Fehlertensor bei a priori und a posteriori-Korrelationen, Zeitschrift f¨ur Vermessungswesen 92 (1967), 157-165 Grafarend, E.W. (1969): Helmertsche Fußpunktkurve oder Mohrscher Kreis? Allgemeine Vermessungsnachrichten 76 (1969), 239-240 Grafarend, E.W. (1970a): Praeidiktion und Punktamasse, AVN (1970), S. 17 Grafarend, E.W. (1970b): Verallgemeinerte Methode der kleinsten Quadrate f¨ur zyklische Variable, Zeitschrift f¨ur Vermessungswesen 4 (1970), 117-121 Grafarend, E.W. (1970c): Die Genauigkeit eines Punktes im mehrdimensionalen Euklidischen Raum, Deutsche Geod¨atische Kommission bei der Bayerischen Akademie der Wissenschaften C 153, M¨unchen 1970 Grafarend, E.W. (1970d): Fehlertheoretische Unsch¨arferelation, Festschrift Professor Dr.-Ing. Helmut Wolf, 60. Geburtstag, Bonn 1970 Grafarend, E.W. (1971a): Mittlere Punktfehler und Vorw¨artseinschneiden, Zeitschrift f¨ur Vermessungswesen 96 (1971), 41-54 Grafarend, E.W. (1971b): Isotropietests von Lotabweichungen Westdeutschlands, Z. Geophysik 37 (1971), 719-733 Grafarend, E.W. (1972a): Nichtlineare Pr¨adiktion, Zeitschrift f¨ur Vermessungswesen 97 (1972), 245-255 Grafarend, E.W. (1972b): Isotropietests von Lotabweichungsverteilungen Westdeutschlands II, Z. Geophysik 38 (1972), 243-255 Grafarend, E.W. (1972c): Genauigkeitsmaße geod¨atischer Netze, Deutsche Geod¨atische Kommission bei der Bayerischen Akademie der Wissenschaften A73, M¨unchen 1972 Grafarend, E.W. (1973a): Nichtlokale Gezeitenanalyse, Mitt. Institut f¨ur Theoretische Geod¨asie No. 13, Bonn 1973 Grafarend, E.W. (1973b): Optimales Design geod¨atische Netze 1 (zus. P. Harland), Deutsche Geod¨atische Kommission bei der Bayerischen Akademie der Wissenschaften A74, M¨unchen 1973
References
929
Grafarend, E.W. (1973c): Eine Lotabweichungskarte Westdeutschlands nach einem geod¨atisch konsistenten Kolmogorov-Wiener Modell , Deutsche Geod¨atische Kommission, Bayerische Akademie der Wissenschaften, Report A 82, M¨unchen 1975 Grafarend, E.W. (1974): Optimization of geodetic networks, Bollettino di Geodesia e Scienze Affini 33 (1974), 351-406 Grafarend, E.W. (1975a): Three dimensional Geodesy 1. The holonomity problem, Zeitschrift f¨ur Vermessungswesen 100 (1975) 269-280. Grafarend, E.W. (1975b): Second order design of geodetic nets, Zeitschrift f¨ur Vermessungswesen 100 (1975), 158-168 Grafarend, E.W. (1976): Geodetic applications of stochastic processes, Physics of the Earth and Planetory Interiors 12 (1976), 151-179 Grafarend, E.W. (1977a): Geodasie - Gaufische oder Cartansche Flachengeometrie. Allgemeine Vermessungs-Nachrichten, 84. Jg., Heft 4, 139-150. Grafarend, E.W. (1977b): Stress-strain relations in geodetic networks Report 16, Geodetic Institute, University of Uppsala. Grafarend, E.W. (1978): Operational geodesy, in: Approximation Methods in Geodesy, eds. H. Moritz and H. S¨unkel, 235-284, H. Wichmann Verlag, Karlsruhe 1978 Grafarend, E.W. (1981): Die Beobachtungsgleichungen der dreidimensionalen Geod¨asie im Geometrie- und Schwereraum. Ein Beitrag zur operationellen Geod¨asie, Zeitschrift f¨ur Vermessungswesen 106 (1981) 411-429. Grafarend, E.W. (1982): Six lectures on geodesy and global geodynamics, In: H. Moritz and H. Snkel (Ed.), Lecture Notes, Mitt. Geod .. lnst. Techn. UnL Graz, 41, 531-685 Grafarend, E.W. (1983): Stochastic models for point manifolds, in: Mathematical models of geodetic/photogrammetric point determination with regard to outliers and systematic errors, ed. F.E. Ackermann, Report A98, 29-52, Deutsche Geod¨atische Kommission, Bayerische Akademie der Wissenschaften, M¨unchen 1983 Grafarend, E.W. (1984): Variance-covariance component estimation of Helmert type in the Gauss-Helmert model, Zeitschrift f¨ur Vermessungswesen 109 (1984), 34-44 Grafarend, E.W. (1985a): Variance-covariance component estimation, theoretical results and geodetic applications, Statistics and Decision, Supplement Issue No. 2 (1985), 407-447 Grafarend, E.W. (1985b): Criterion matrices of heterogeneously observed three-dimensional networks, manuscripta geodaetica 10 (1985), 3-22 Grafarend, E.W. (1985c): Criterion matrices for deforming networks, in: Optimization and Design of Geodetic Networks, E. Grafarend and F. Sanso (eds.), 363-428, Springer-Verlag, Heidelberg Berlin New York 1985 Grafarend, E.W. (1986a): Generating classes of equivalent linear models by nuisance parameter elimination - applications to GPS observations, manuscripta geodaetica 11 (1986) 262-271 Grafarend, E.W. (1986b): Three dimensional deformation analysis: global vector spherical harmonic and local element representation, Tectonophysics, 130, 337-359 Grafarend, E.W. (1988): Azimuth transport and the problem of orientation within geodetic traverses and geodetic networks, Vermessung, Photogrammetrie, Kulturtechnik 86 (1988) 132-150. Grafarend, E.W. (1989a): Four lectures on special and general relativity, Lecture Notes in Earth Sciences, F. Sanso and R. Rummel (eds.), Theory of Satellite Geodesy and Gravity Field Determination, Nr. 25, 115-151, Springer-Verlag Berlin New York 1989 Grafarend, E.W. (1989b): Photogrammetrische Positionierung, Festschrift Prof. Dr.-Ing. Dr. H.C. Friedrich Ackermann zum 60. Geburtstag, Institut f¨ur Photogrammetrie, Universit¨at Stuttgart, Report 14, 45-55, Suttgart Grafarend, E.W. (1990): Dreidimensionaler Vorwartsschnitt. Zeitschrift fiir Vermessungswesen, 10, 414-419. Grafarend, E.W. (1991a): Relativistic effects in geodesy, Report Special Study Group 4.119, International Association of Geodesy, Contribution to Geodetic Theory and Methodology ed. F. Sanso, 163-175, Politecnico di Milano, Milano/Italy 1991
930
References
Grafarend, E.W. (1991b): The Frontiers of Statistical Scientific Theory and Industrial Applications (Volume II of the Proceedings of ICOSCO-I), American Sciences Press, 405-427, New York 1991 Grafarend, E.W. (1991c): Application of Geodesy to Engineering. IAG-Symposium No. 108, Eds. K. Linkwitz, V. Eisele, H. J. M¨onicke, Springer-Verlag, Berlin-Heidelberg-New York 1991. E. Grafarend, T. Krarup, R. Syffus: Journal of Geodesy 70 (1996) 276-286 Grafarend, E.W. (1998): Helmut Wolf - das wissenschaftliche Werk/the scientific work , Heft A115, Deutsche Geod¨atische Kommission, Bayerische Akademie der Wissenschaften, C.H. Beck’sche Verlagsbuchhandlung, 97 Seiten, M¨unchen 1998 Grafarend, E.W. (2000): Mixed integer-real valued adjustment (IRA) problems, GPS Solutions 4 (2000), 31-45 Grafarend, E.W. and Kunz, J. (1965): Der R¨uckw¨artseinschnitt mit dem Vermessungskreisel, Bergbauwissenschaften 12 (1965) 285-297. Grafarend, E. and E. Livieratos (1978): Rand defect analysis of satellite geodetic networks I : geometric and semi-dynamic mode, manuscipta geodaetica 3 (1978) 107-134 Grafarend, E. and K. Heinz (1978): Rank defect analysis of satellite geodetic networks II : dynamic mode, manuscipta geodaetica 3 (1978) 135-156 Grafarend, E.W. and P. Harland (1973) : Optimales Design geodaetischer Netze I, Deutsche Geodaetische Kommission, Bayerische Akademie der Wissenschaften, Series A, C.H. Beck Verlag, Munich 1973 Grafarend, E.W. and B. Schaffrin (1974): Unbiased free net adjustment, Surv. Rev. XXII, 171 (1974), 200-218 Grafarend, E.W., Heister, H., Kelm, R., Kropff, H. and B. Schaffrin (1979): Optimierung geod¨atischer Messoperationen , 499 pages, H. Wichmann Verlag, Karlsruhe 1979 Grafarend, E.W. and G. Offermanns (1975): Eine Lotabweichungskarte Westdeutschlands nach einem geod¨atisch konsistenten Kolmogorov-Wiener Modell, Deutsche Geod¨atische Kommission bei der Bayerischen Akademie der Wissenschaften A82, M¨unchen 1975 Grafarend, E.W. and B. Schaffrin (1976): Equivalence of estimable quantities and invariants in geodetic networks, Z. Vermessungswesen 191 (1976), 485-491 ¨ Grafarend, E.W., Schmitt, G. and B. Schaffrin (1976): Uber die Optimierung lokaler geod¨atischer Netze (Optimal design of local geodetic networks), 7th course, High precision Surveying Engineering (7. Int. Kurs f¨ur Ingenieurvermessung hoher Pr¨azision) 29 Sept - 8 Oct 1976, Darmstadt 1976 Grafarend, E.W. and Richter, B. (1977): Generalized Laplace condition, Bull. Geod. 51 (1977) 287-293. Grafarend, E.W. and B. Richter (1978): Threedimensional geodesy II - the datum problem -, Zeitschrift f¨ur Vermessungswesen 103 (1978), 44-59 Grafarend, E.W. and A. d’Hone (1978): Gewichtssch¨atzung in geod¨atischen Netzen, Deutsche Geod¨atische Kommission bei der Bayerischen Akademie der Wissenschaften A88, M¨unchen 1978 Grafarend, E.W. and B. Schaffrin (1979): Kriterion-Matrizen I - zweidimensional homogene und isotope geod¨atische Netze - ZfV 104 (1979), 133-149 Grafarend, E.W., J.J. Mueller, H.B. Papo, and B. Richter (1979): Concepts for reference frames in geodesy and geodynamics: the reference directions, Bull. G`eeod`eesique 53 (1979), 195-213 Grafarend, E.W., Heister, H., Kelm, R., Knopff, H. and Schaffrin, B. (1979): Optimierung Geod¨atischer Messoperationen, Herbat Wichmann Verlag, Karlsruhe 1979. Grafarend, E.W. and R.H. Rapp (1980): Advances in geodesy. Selected papers from Rev. Geophys. Space Phys., Richmond, Virg., Am Geophys Union,Washington Grafarend, E.W. and A. Kleusberg (1980): Expectation and variance component estimation of multivariate gyrotheodolite observations, I. Allgemeine Vermessungs-Nachrichten 87 (1980), 129-137 Grafarend, E.W., Kleusberg, A. and B. Schaffrin (1980): An introduction to the variancecovariance component estimation of Helmert type, Zeitschrift f¨ur Vermessungswesen 105 (1980), 161-180
References
931
Grafarend, E.W. and B. Schaffrin (1982): Kriterion Matrizen II: Zweidimensionale homogene und isotrope geod¨atische Netze, Teil II a: Relative cartesische Koordinaten, Zeitschrift f¨ur Vermessungswesen 107 (1982), 183-194, Teil IIb: Absolute cartesische Koordinaten, Zeitschrift f¨ur Vermessungswesen 107 (1982), 485-493 Grafarend, E.W., Knickemeyer, E.H. and B. Schaffrin (1982): Geod¨atische Datumtransformationen, Zeitschrift f¨ur Vermessungswesen 107 (1982), 15-25 Grafarend, E.W., Krumm, F. and B. Schaffrin (1985): Criterion matrices of heterogeneously observed threedimensional networks, manuscripta geodaetica 10 (1985), 3-22 Grafarend, E.W. and F. Sanso (1985): Optimization and design of geodetic networks, SpringerVerlag, Heidelberg Berlin New York 1985 Grafarend, E.W. and V. M¨uller (1985): The critical configuration of satellite networks, especially of Laser and Doppler type, for planar configurations of terrestrial points, manuscripta geodaetica 10 (1985), 131-152 Grafarend, E.W. and F. Krumm (1985): Continuous networks I, in: Optimization and Design of Geodetic Networks. E. Grafarend and F. Sanso (eds.), 301-341, Springer-Verlag, Heidelberg Berlin New York 1985 Grafarend, E.W., Krumm, F. and B. Schaffrin (1985): Criterion matrices of heterogenously observed three dimensional networks, manuscripta geodetica 10 (1985) 3-22 Grafarend, E.W., Krumm, F. and B. Schaffrin (1986): Kriterion-Matrizen III: Zweidimensionale homogene und isotrope geod¨atische Netze, Zeitschrift f¨ur Vermessungswesen 111 (1986), 197-207 Grafarend, E.W., Krumm, F. and B. Schaffrin (1986): Kriterion-Matrizen III: Zweidimensional homogene und isotrope geod¨atische Netze, Zeitschrift f¨ur Vermessungswesen 111 (1986), 197-207 Grafarend, E.W. and B. Schaffrin (1988): Von der statistischen zur dynamischen Auffasung geod¨atischer Netze, Zeitschrift f¨ur Vermessungswesen 113 (1988), 79-103 Grafarend, E.W. and B. Schaffrin (1989a): The Planar Trisection Problem. Festschrift to Torben Krarup, Eds.: E. Kejlso, K. Poder, C.C. Tscherning, Geodaetisk Institute, No. 58, Kobenhavn, 149-172. Grafarend, E.W. and B. Schaffrin (1989b): The geometry of nonlinear adjustment - the planar trisection problem, Festschrift to Torben Krarup eds. E. Kejlso, K. Poder and C.C. Tscherning, Geodaetisk Institut, Meddelelse No. 58, 149-172, Kobenhavn 1989 Grafarend, E.W. and A. Mader (1989): A graph-theoretical algorithm for detecting configuration defects in triangular geodetic networks, Bull. G`eeod`eesique 63 (1989), 387-394 Grafarend, E.W., Lohse, P. and Schaffrin, B. (1989): Dreidimensionaler R¨uckw¨artsschnitt, Zeitschrift f¨ur Vermessungswesen 114 (1989) 61-67, 127-137, 172-175, 225-234, 278-287. Grafarend, E.W. and B. Schaffrin (1991): The planar trisection problem and the impact of curvature on non-linear least-squares estimation, Comput. Stat. Data Anal. 12 (1991), 187-199 Grafarend, E.W. and Lohse, P. (1991): The minimal distance mapping of the topographic surface onto the (reference) ellipsoid of revolution, Manuscripta Geodaetica 16 (1991) 92-110. Grafarend, E.W. and B.Schaffrin (1993): Ausgleichungsrechnung in linearen Modellen, Brockhaus, Mannheim 1993 Grafarend, E.W. and Xu, P. (1994): Observability analysis of integrated INS/GPS system, Bollettino di Geodesia e Scienze Affini 103 (1994), 266-284 Grafarend, E.W., Krumm, F. and F. Okeke (1995): Curvilinear geodetic datum transformations, Zeitschrift f¨ur Vermessungswesen 120 (1995), 334-350 Grafarend, E.W. and P. Xu (1995): A multi-objective second-order optimal design for deforming networks, Geophys. Journal Int. 120 (1995), 577-589 Grafarend, E.W. and Keller, W. (1995): Setup of observational functionals in gravity space as well as in geometry space, Manuscripta Geodetica 20 (1995) 301-325. Grafarend, E.W., Krumm, F. and Okeke, F. (1995): Curvilinear geodetic datum transformation, Zeitschrift f¨ur Vermessungswesen 120 (1995) 334-350. Grafarend, E.W., Syffus, R. and You, R.J. (1995): Projective heights in geometry and gravity space, Allgemeine Vermessungs-Nachrichten 102 (1995) 382-402.
932
References
Grafarend, E.W., Krarup, T. and R. Syffus (1996): An algorithm for the inverse of a multivariate homogeneous polynomial of degree n, Journal of Geodesy 70 (1996), 276-286 Grafarend, E.W. and G. Kampmann (1996): C10(3): The ten parameter conformal group as a datum transformation in three dimensional Euclidean space, Zeitschrift f¨ur Vermessungswesen 121 (1996), 68-77 Grafarend, E.W. and Shan, J. (1996): Closed-form solution of the nonlinear pseudo-ranging equations (GPS), Artificial Satellites, Planetary Geodesy No. 28, Special issue on the 30th anniversary of the Department of Planetary Geodesy, Vol. 31, No. 3, pp. 133-147, Warszawa 1996. Grafarend, E.W. and Shan, J. (1997a): Closed-form solution of P4P or the three-dimensional resection problem in terms of M¨obius barycentric coordinates, Journal of Geodesy 71 (1997) 217-231. Grafarend, E.W., and Shan, J. (1997b): Closed form solution of the twin P4P or the combined three dimensional resection-intersection problem in terms of M¨obius barycentric coordinates, Journal of Geodesy 71 (1997) 232-239. Grafarend, E.W. and Shan J. (1997c): Estimable quantities in projective networks, Zeitschrift f¨ur Vermessungswesen, Part I, 122 (1997), 218-226, Part II, 122 (1997), 323-333 Grafarend, E.W. and Okeke, F. (1998): Transformation of conformal coordinates of type Mercator from global datum (WGS 84) to local datum (regional, national), Marine Geodesy 21 (1998) 169-180. Grafarend, E.W. and Syfuss, R. (1998): Transformation of conformal coordinates of type Gauss-Kr¨uger or UTM from local datum (regional, national, European) to global datum (WGS 84) part 1: The transformation equations, Allgemeine Vermessungs-Nachrichten 105 (1998) 134-141. Grafarend, E.W. and Ardalan, A. (1999): World geodetic datum 2000, Journal of Geodesy 73 (1999) 611-623. Grafarend, E.W. and Awange, J.L. (2000): Determination of vertical deflections by GPS/LPS measurements, Zeitschrift f¨ur Vermessungswesen 125 (2000) 279-288. Grafarend, E.W. and J. Shan (2002): GPS Solutions: closed forms, critical and special configurations of P4P, GPS Solutions 5 (2002), 29-42 Grafarend, E.W. and J. Awange (2002): Algebraic solution of GPS pseudo-ranging equations, GPS Solutions 5 (2002), 20-32 Grafarend, E.W. and J. Awange (2002): Nonlinear adjustment of GPS observations of type pseudo-ranges, GPS Solutions 5 (2002), 80-93 Graham, A. (1981): Kronecker products and matrix calculus, J. Wiley, New York 1981 ¨ Gram, J.P. (1883): Uber die Entwicklung reeller Functionen in Reihen mittelst der Methode der kleinsten Quadrate, J. Reine Angew. Math. 94 (1883), 41-73 Granger, C.W.J. and Hatanka, M. (1964): Spectral analysis of economic time series, Princeton 1964 Granger, C.W.J. and P. Newbold (1986): Forecasting economic time series, 2nd ed., Academic Press, New York London 1986 Granger, C.W.J. and T. Ter¨asvirta (1993): Modelling nonlinear economic relations, Oxford University Press, New York 1993 J.T. Graves: Transactions of the Irish Academy 21 (1848) 338 Graybill, F.A. (1954): On quadratic estimates of variance components, The Annals of Mathematical Statistics 25 (1954), 267-372 Graybill, F.A. (1961): An intorduction to linear statistical models, Mc Graw-Hill, New York, 1961 Graybill, F.A. (1983): Matrices with applications in statistics, 2nd ed., Wadsworth, Beltmont 1983 Graybill, F.A. and R.A. Hultquist (1961): Theorems concerning Eisenhart’s model II, The Annals of Mathematical Statistics 32 (1961), 261-269 Green, B. (1952): The orthogonal approximation of an oblique structure in factor analysis, Psychometrika 17 (1952), 429-440 Green, P.J. and B.W. Silverman (1993): Nonparametric regression and generalized linear models, Chapman and Hall, Boca Raton 1993 Green, P.E., Frosch, R.A. and Romney, C.F. (1965): Principles of an experimental large aperture seismic array, Proc. IEEE 53 (1965) 1821-1833
References
933
Green, P.E., Kelly, E.J. and Levin, M.J. (1966): A comparison of seismic array processing methods, Geophys. J. R. astr. Soc. 11 (1966) 67-84 Greenbaum, A. (1997): Iterative methods for solving linear systems, SIAM, Philadelphia 1997 Greenwood, J.A. and D. Durand (1955): The distribution of length and components of the sum of n random unit vectors, Ann. Math. Statist. 26 (1955), 233-246 Greenwood, P.E. and G. Hooghiemstra (1991): On the domain of an operator between supremum and sum, Probability Theory Related Fields 89 (1991), 201-210 Gregersen, S. (1992): Crustal stress regime in Fennoscandia from focal mechanisms, J. geophys. Res., B97, 11821-11827 Grenander, U. (1981): Abstract inference, J. Wiley, New York 1981 Greub, W.H. (1967): Multilinear algebra, Springer Verlag, Berlin 1967 Grewal, M.S., Weill, L.R. and Andrews, A.P. (2001): Global Positioning Systems, Inertial Navigation and Integration, John Wiley & Sons, New York 2001. Griffith, D.F. and D.J. Higham (1997): Learning LaTeX, SIAM, Philadelphia 1997 Grigoriu, M. (2002): Stochastic calculus, Birkhanser, Boston-Basel-Baein 2002 Grimstad, A. and T.Mannseth (2000): Nonlinearity, sale and sensitivity for parameter estimation problems, SIAM J. Sci. Comput. 21 (2000), 2096-2113 Grodecki, J. (1999): Generalized maximum-likelihood estimation of variance components with inverted gamma prior, Journal of Geodesy 73 (1999), 367-374 Gr¨obner, W. (1968): Algebraische Geometrie I. Bibliographisches Institut, Mannheim. Gr¨obner, W. (1970): Algebraische Geometrie II. Bibliographisches Institut, Mannheim Gr¨obner, W. and N. Hofreiter (eds., 1973): Integraltafel, zweiter Teil, bestimmte Integrale. 5. ’verbesserte Auflage, Wien. Gr¨ochenig, K. (2001): Foundations of time-frequency analysis, Birk¨auser Verlag, Boston Basel Berlin 2001 Groenigen, van J.W. and Siderius, W. and Stein, A. (1999): Constrained optimisation of soil sampling for minimisation of the kriging variance. Geoderma 87:239-259 Groß, J. (1996): On a class of estimators in the general Gauss-Markov model, Commun. Statist. Theory Meth. 25 (1996), 381-388 Groß, J. (1996): Estimation using the linear regression model with incomplete ellipsoidal restrictions, Acta Applicandae Mathematicae 43 (1996), 81-85 Groß, J. (1998): Statistical estimation by a linear combination of two given statistics, Statistics and Probability Lett. 39 (1998), 379-384 Groß, J. (2003): Linear regression. Springer, Berlin, 2003. Groß, J. and G. Trenkler (1997): When do linear transforms of ordinary least squares and Gauss-Markov estimator coincide?, Sankhya 59 (1997), 175-178 Groß, J., Trenkler, G. and E.P. Liski (1998): Necessary and sufficient conditions for superiority of misspecified restricted least squares regression estimator, J. Statist. Planning and Inference 71 (1998), 109-116 Groß, J., Puntanen, S. (2000): Estimation under a general partitioned linear model. Linear Algebra Appl. 321, 1-3 (2000), 131-144. Groß, J., Trenkler, G. and H.J. Werner (2001): The equality of linear transforms of the ordinary least squares estimator and the best linear unbiased estimator, Sankhya: The Indian Journal of Statistics 63 (2001), 118-127 Großmann, W. (1973): Grundz¨uge der Ausgleichungsrechnung, Springer-Verlag, Heidelberg Berlin New York 1973 Großman, N. (1980): Four Lectures on Geometry for-Geodesists. In: Proceedings of the International School of Advanced Geodesy, Erice (Italy), May 18.-June 2.1978, Eds.: E. W. Grafarend, A. Marussi, published by the University FAF, Munich. Groten, E. (1964): Symposium “Extension of the Gravity Anomalies to Unsurveyed Areas”, Columbus (Ohio) 1964 Groten, E. (1964): lnst. f. Astr. Phys. Geod., Publ. No. 39, Miinchen 1967. Grubbs, F.E. (1973): Errors of measurement, precision, accuracy and the statistical comparison of measuring instruments, Technometrics 15 (1973), 53-66
934
References
Grubbs, F.E. and G. Beck (1972): Extension of sample sizes and precentage points for significance tests of outlying observations, Technometrics 14 (1972), 847-854 Gruen, A. and Kahmen, H. (Eds.) (1989): Optical 3-D Measurement Techniques. Wichmann, Karlsruhe. Gr¨undig, L. (1975): Die Berechnung vorgespannter Seil- und Hangenetze unter der Beriicksichtigung ihrer topologischen und physikalischen Eigenschaften und der Ausgleichungsrechnung. Sonderforschungsbereich 64, Universitat Stuttgart, Mitteilungen 34/1975. Gr¨undig, L. (1976): Ausgleichung grofier geod¨atischer Netze, Bestimmung von Naherungskoordinaten, Ausgleichungstechniken und Fehlersuche. VII. Internationaler Kurs fiir Ingenieurvermessungen, Darmstadt, 41-51. Grunert, J.A. (1841): Das Pothenotsche Problem in erweiterter Gestalt; nebst Bemerkungen u¨ ber seine Anwendungen in der Geod¨asie, Grunerts Archiv f¨ur Mathematik und Phsyik 1 pp. 238-241, 1841. Guckenheimer, J., Myers, M. and Sturmfels, B. (1997): Computing Hopf birfucations, SIAM J. Numerical Analysis 34 (1997) 1-21. Guenther, W.C. (1964): Another derivation of the non-central Chi-Square distribution, J. Am. Statist. Ass. 59 (1964), 957-960 Gui, Q. and J. Zhang (1998): Robust biased estimation and its applications in geodetic adjustments, Journal of Geodesy 72 (1998), 430-435 Gui, Q.M. and J.S. Liu (2000): Biased estimation in the Gauss-Markov model, Allgemeine Vermessungsnachrichten 107 (2000), 104-108 Gulliksen, H. and S.S. Wilks (1950), Regression tests for several samples Psychometrica, 15, 91-114 Gullikson, M. (1995a): The partial Procrustes problem - A first look, Department of Computing Science, Umea University, Report UMINF-95.11, Sweden 1995. Gullikson, M. (1995b): Algorithms for the partial Procrustes problem, Department of Industrial Technology, Mid Sweden University s-891 18, Report 1995:27, Ornskoldsvik, Sweden 1995. Gullikson, M. and S¨oderkvist, I. (1995): Surface fitting and parameter estimation with nonlinear least squares, Zeitschrift f¨ur Vermessungswesen 25 (1989) 611-636. Gullikson, M., Soederkvist, I. and P.A. Wedin (1997): Algorithms for constrained and weighted nonlinear least-squares, Siam J. Optim. 7 (1997), 208-224 Gullikson, M. and P.A. Wedin (2000): The use and properties of Tikhonov filter matrices, SIAM J. Matrix Anal. Appl. 22 (2000), 276-281 Gumbel, E.J., Greenwood, J.A. and D. Durand (1953): The circular normal distribution: theory and tables, J. Am. Statist. Ass. 48 (1953), 131-152 Gundlich, B. and Koch, K.R. (2002): Confidence regions for GPS baselines by Bayesian statistics. J Geod 76:55-62 Guolin, L. (2000): Nonlinear curvature measures of strength and nonlinear diagnosis, Allgemeine Vermessungsnachrichten 107 (2000), 109-111 Gupta, A.K. and D.G. Kabe (1997): Linear restrictions and two step multivariate least squares with aplications, Statistics & Probability Letters 32 (1997), 413-416 Gupta, A.K. and D.K. Nagar (1998): Quadratic forms in disguised matrix T-variate, Statistics 30 (1998), 357-374 Gupta, S.S. (1963): Probability integrals of multivariate normal and multivariate t1, Annals of Mathematical Statistics 34 (1963), 792-828 Gurtin, M.E. (1981): An introduction to continuum mechanics, Academic Press, New York Guttman, I. (1970): Statistical tolerance regions, Griffin, London 1970 Guttman, L. (1946): Enlargement methods for computing the inverse matrix, Ann. Math. Statist. 17 (1946), 336-343 Guttorp, P. and Sampson, P. (1994). Methods for estimating heterogeneous spatial covariance. In Patil, G. and Rao, C. eds., Handbook of Statistics, vol. 12, pp. 661-689. Guu, S.M., Lur, Y.Y. and C.T. Pang (2001): On infinite products of fuzzy matrices, SIAM J. Matrix Anal. Appl. 22 (2001), 1190-1203
References
935
Haantjes, J. (1937): Conformal representations of an n-dimensional Euclidean space with a nondefinite fundamental form on itself, in: Nederl. Akademie van Wetenschappen, Proc. Section of Sciences, Vol. 40, 700-705, Noord-Hollandsche Uitgeversmaatschappij, Amsterdam 1937 Haantjes, J. (1940): Die Gleichberechtigung gleichf¨ormig beschleunigter Beobachter f¨ur die elektromagnetischen Erscheinungen, in: Nederl. Akademie van Wetenschappen, Proc. Section of Sciences, Vol. 43, 1288-1299, Noord-Hollandsche Uitgeversmaatschappij, Amsterdam 1940 Habermann, S.J. (1996): Advanced statistics, volmue I: description of populations, SpringerVerlag, Heidelberg Berlin New York 1996 Hadamard, J. (1893): Leson sur la propagation des ondes et les quations de l’hydrodynamique, Paris 1893, reprint Chelsea Publ., New York 1949 Hadamard, J. (1899): Theorem sur les series entieres, Acta Math. 22 (1899), 1-28 Hadamard, J. Lecons sur la propagation des ondes et les e´ quations de l’hydrodynamique, Paris 1893, reprint Chelsea Publ., New York 1949 H¨ardle, W., Liang, H. and J. Gao (2000): Partially linear models, Physica-Verlag, Heidelberg 2000 Hager, W.W. (1989): Updating the inverse of a matrix, SIAM Rev. 31 (1989), 221-239 Hager, W.W. (2000): Iterative methods for nearly singular linear systems, SIAM J. Sci. Comput. 22 (2000), 747-766 Hahn, M. and R. Bill (1984): Ein Vergleich der L1- und L2-Norm am Beispiel. Helmerttransformation, AVN 11-12, pp. 440, 1984 Hahn, W. and P. Weibel (1996): Evolution¨are Symmetrietheorie, Wiss. Verlagsgesellschaft, Stuttgart 1996 Hajos G. (1970): Einfiihrung in die Geometrie. BSB B. G. Teubner, Leipzig. Hald, A. (1998): A history of mathematical statistics from 1750 to 1930, J. Wiley, New York 1998 Hald, A. (2000): The early history of the cumulants and the Gram-Charlier series, International Statistical Review 68 (2000), 137-153 Halmos, P.R. (1946): The theory of unbiased estimation, Ann. Math. Statist. 17 (1946), 34-43 Hamilton, W.C. (1964): Statistics in Physical Science. The Ronald Press Company, New-York Hammer, E. (1896): Zur graphischen Ausgleichung beim trigonometrischen Einschneiden von Punkten, Optimization methods and softwares 5 (1995) 247-269. Hammersley, J.M. (1950): On estimating restricted parameters, J. R. Statist. Soc. B12 (1950), 192 Hampel, F.R. (1971) A general qualitative definition of robustness, Ann. Math. Phys. Statist. 42 (1971) DOI 10.1214/aoms/1177693054 Hampel, F.R. (1973): Robust estimation: a condensed partial survey, Zeitschrift f¨ur Wahrscheinlichkeitstheorie und verwandte Gebiete 27 (1973), 87-104 Hampel, F.R. (1974): The influence curve and its role in robust estimation, J. Ann. Stat. Assoc., 69 (1974), pp. 383-393 Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. and W.A. Stahel (1986): Robust statistics, J. Wiley, New York 1986 Han, S. (1995): Ambiguity resolution techniques using integer least squares estimation for rapid static or kinematic positioning. In: Symposium Satellite Navigation Technology: 1995 and beyond, Brisbane, Australia, 10 p Han, S. C., Kwon, J. H. and Jekeli, C. (2001): Accurate absolute GPS positioning through satellite clock error estimation, Journal of Geodesy 75 (2001) 33-43. Han, Y.Y. (1992): Geometric analysis of finite strain tensors in tectonodeformation mechanics, Se. China, B35, 102-116 Hanagal, D.D. (1996): UMPU tests for testing symmetry and stress-passing in some bivariate exponential models, Statistics 28 (1996), 227-239 Hand, D.J. and M.J. Crowder (1996): Practical longitudinal data analysis, Chapman and Hall, Boca Raton 1996 Hand, D.J., Daly, F., McConway, K., Lunn, D. and E. Ostrowski (1993): Handbook of small data sets, Chapman and Hall, Boca Raton 1993 Hand, D.J. and C.C. Taylor (1987): Multivariate analysis of variance and repeated measures, Chapman and Hall, Boca Raton 1987
936
References
Handl, A. (2010): Multivariate Analysemethoden. Theorie und Praxis multivariater Verfahren unter besonderer Ber¨ucksichtigung von S-PLUS, Springer-Verlag, Heidelberg Berlin New York Hanke, M. (1991): Accelerated Landweber iterations for the solution of ill-posed equations, Numer. Math. 60 (1991), 341-375 Hanke, M. and P.C. Hansen (1993): Regularization methods for large-scale problems, Surveys Math. Indust. 3 (1993), 253-315 Hanselman, D. and Littlefield, B. (1997): The student edition of Matlab, Prentice-Hall, New Jersey 1997. Hansen, E. (1992): Global Optimization Using Interval Analysis, Marcel Dekker, New York Hansen, P.C. (1987): The truncated SVD as a method for regularization, BIT 27 (1987), 534-553 Hansen, P.C. (1990): The discrete Picard condition for discrete ill-posed problems, BIT 30 (1990), 658-672 Hansen, P.C. (1990): Truncated singular value decomposition solutions to discrete ill-posed problems with ill-determined numerical rank, SIAM J. Sci. Statist. Comput. 11 (1990), 503-518 Hansen, P.C. (1994): Regularization tools: a Matlab package for analysis and solution of discrete ill-posed problems, Numer. Algorithms 6 (1994), 1-35 Hansen, P.C. (1995): Test matrices for regularization methods, SIAM J. Sci. Comput. 16 (1995), 506-512 Hansen, P.C. (1998): Rank-deficient and discrete ill-posed problems, SIAM, Philadelphia 1998 Hanssen, R.F. (2001): Radar infriometry, Kluwer Academic Pule, Dordecht-Boston-London 2001 Hanssen, R.F., Teunissen, P.J.G. and Joosten, P. (2001): Phase ambiguity resolution for stacked radar interferometric data. In: Proc KIS2001, international symposium on kinematic systems in geodesy, Geomatics and Navigation, Banff, pp 317-320 Haralick, R.M., Lee, C., Ottenberg, K., and N¨olle, M. (1991): Analysis and Solution of the Three Point Perspective Pose Estimation Problem, Proc. IEEE Org. on Computer Vision and Pattern Recognition, pp. 592-598, 1991. Haralick, R.M., Lee, C., Ottenberg, K., and N¨olle, M. (1994): Review and Analysis of Solution of the Three Point Perspective Pose Estimation Problem, International Journal of Computer Vision 13 3 (1994) 331-356. Hardtwig, E. (1968): Fehler- und Ausgleichsrechung, Bibliographisches Institut, Mannheim 1968 Harley, B.I. (1954): A note on the probability integral of the correlation coefficient, Biometrika, 41, 278-280 Harley, B.I. (1956): Some properties of an angular transformation for the correlation coefficient, Biometrika 43 (1956), 219-223 Harris, B. (1967): Spectral analysis of time series, New York 1967 Harter, H.L. (1964): Criteria for best substitute interval estimators with an application to the normal distribution, J. Am. Statist. Ass. 59 (1964), 1133-1140 Harter, H.L. (1974/75): The method of least squares and some alternatives (five parts) International Statistics Review 42 (1974), 147-174, 235-264, 43 (1975), 1-44, 125-190, 269-278 Harter, H.L. (1977): The non-uniqueness of absolute values regression, Commun. Statist. Simul. Comput. 6 (1977), 829-838 Hartley, H.O. and J.N.K. Rao (1967): Maximum likelihood estimation for the mixed analysis of variance model, Biometrika 54 (1967), 93-108 Hartinger, H. and Brunner, F.K. (1999): Variances of GPS Phase Observations: The SIGMAModel. GPS Solution 2/4: 35-43, 1999 Hartman, P. and G.S. Watson (1974): Normal distribution functions on spheres and the modified Bessel function, Ann.Prob. 2 (1974), 593-607 Hartmann, C., Van Keer Berghen, P., Smeyersverbeke, J. and D.L. Massart (1997): Robust orthogonal regression for the outlier detection when comparing two series of measurement results, Analytica Chimica Acta 344 (1997), 17-28 Hartung, J. (1978): Zur Verwendung von Vorinformation in der Regressionsanalyse, Tech. report, Inst. f¨ur Angew. Statistik, Universit¨at Bonn, Germany 1978 Hartung, J. (1981): Non-negative minimum biased invariant estimation in variance componet models, Annals of Statistics 9(1981), 278-292
References
937
Hartung, J. (1982): Statistik, pp. 611-613, Oldenburg Verlag, Miinchen Hartung, J. (1999): Ordnungserhaltende positive Varianzsch¨atzer bei gepaarten Messungen ohne Wiederholungen, Allg. Statistisches Archiv 83 (1999), 230-247 Hartung, J. and B. Voet (1986): Best invariant unbiased estimators for the mean squared error of variance component estimators, J. Am. Statist. Ass. 81 (1986), 689-691 Hartung, J. and Elpelt, B. (1989): Multivariate Statistik, Oldenbourg Verlag, M¨unchen 1989 Hartung, J., Elpelt, B. and K.H. Kl¨osener (1995): Statistik, Oldenbourg Verlag, M¨unchen 1995 Hartung, J. and K.H. J¨ockel (1982): Zuverl¨assigkeits- und Wirtschaftlichkeits¨uberlegungen bei Straßenverkehrssignalanlagen, Qualit¨at und Zuverl¨assigkeit 27 (1982), 65-68 Hartung, J. and D. Kalin (1980): Zur Zuverl¨assigkeit von Straßenverkehrssignalanlagen, Qualit¨at und Zuverl¨assigkeit 25 (1980), 305-308 Hartung, J. and H.J. Werner (1980): Zur Verwendung der restringierten Moore-Penrose-Inversen beim Testen von linearen Hypothesen, Z. Angew. Math. Mechanik 60 (1980), T344-T346 Harvey, A.C. (1993): Time series models, 2nd ed., Harvester Wheatsheaf, New York 1993 Harville, D.A. (1976): Extension of the Gauss-Markov theorem to include the estimation of random effects, Annals of Statistics 4 (1976), 384-395 Harville, D.A. (1977): Maximum likelihood approaches to variance component estimation and to related problems, J. Am. Statist. Ass. 72 (1977), 320-339 Harville, D.A. (1997): Matrix algebra from a statistician’s perspective, Springer-Verlag, Heidelberg Berlin New York 1997 Harville, D.A. (2001): Matrix algebra: exercises and solutions, Springer-Verlag, Heidelberg Berlin New York 2001 Haselett, J. (1989): Space time modeling in meterology- a review, Bul. Int. Statist. 53 (1989) 229-246 Hasselman, K., Munk, W. and MacDonald, G. (1963): Bispectrum of ocean waves in: M. Rosenblatt: Timeseries analysis, New York 1963 Hassibi, A. and S. Boyd (1998): Integer parameter estimation in linear models with applications to GPS, JEEE Trans. on Signal Processing 46 (1998), 2938-2952 Haslett, J. (1989): Space time modeling in meteorology - a review, Bull. lnt. Stat. lnst., 53, 229-246, 1989 Haslett, J. and K. Hayes (1998): Residuals for the linear model with general covariance structure, J. Roy. Statist. Soc. B60 (1998), 201-215 Hastie, T.J. and R.J. Tibshirani (1990): Generalized additive models, Chapman and Hall, Boca Raton 1990 Hauser, M.A., P¨otscher, B.M. and E. Reschenhofer (1999): Measuring persistence in aggregate output: ARMA models, fractionally integrated ARMA models and nonparametric procedures, Empirical Economics 24 (1999), 243-269 Haussdorff, F. (1901): Beitr¨age zur Wahrscheinlichkeitsrechnung, K¨oniglich S¨achsische Gesellschaft der Wissenschaften zu Leipzig, berichte Math. Phys. Chasse 53 (1901), 152-178 Hawkins, D.M. (1993): The accuracy of elemental set approximation for regression, J. Am. Statist. Ass. 88 (1993), 580-589 Hayakawa, T. and Puri, M.L. (1985): Asymptotic distributions of likelihood ratio criterion for testing latent roots and latent vectors of a covariance matrix under an elliptical population, Biometrica, 72, 331-338 Hayes, K. and J. Haslett (1999): Simplifying general least squares, American Statistician 53 (1999), 376-381 O. Hazlett: On the theory of associative division algebras, Trans. American Math. Soc. 18 (1917) 167-176 He, K. (1995): The robustness of bootstrap estimator of variance, J. Ital. Statist. Soc. 2 (1995), 183-193 He, X. (1991): A local breakdown property of robust tests in linear regression, J. Multivar. Anal. 38, 294-305, 1991 He, X., Simpson, D.G. and Portnoy, S.L. (1990): Breakdown robustness of tests, J. Am. Statist. Ass. 85, 446-452, 1990
938
References
Healy, D.M. (1998): Spherical Deconvolution, J. Multivar. Anal. 67 (1998), 1-22 Heck, B. (1981): Der Ein uß einzelner Beobachtungen auf das Ergebnis einer Ausgleichung und die Suche nach Ausreißern in den Beobachtungen, Allgemeine Vermessungsnachrichten 88 (1981), 17-34 Heck, B.(1987): Rechenverfahren und Auswertemodelle der Landesvermessung, Wichmann Verlag, Karlsruhe 1987. Heideman, M.T., Johnson, D.H. and C.S. Burrus (1984): Gauss and the history of the fast Fourier transform, JEEE ASSP Magazine 1 (1984), 14-21 Heiligers, B. (1994): E-optimal designs in weighted polynomial regression, Ann. Stat. 22 (1994), 917-929 Hein, G. (1986): Integrated geodesy. In: Suenkel H (ed), Mathematical and numerical techniques in physical geodesy. Lecture Notes in Earth Sciences vol. 7. Springer, Heidelberg, pp. 505-548 Hein, G. and Landau, H. (1983): A contribution to 3D - operational geodesy, Part 3: Opera - a multi-purpose program for operational adjustment of geodetic observations of terrestrial type. Deutsche Geod¨atische Kommission, 8ayerische Akademie der Wissenschaften, Report B 264, M¨unchen Heine, V. (1955): Models for two-dimensional stationary stochastic processes, Biometrika 42 (1955), 170-178 Heinrich, L. (1985): Nonuniform estimates, moderate and large derivations in the central limit theorem for m-dependent random variable, Math. Nachr. 121 (1985), 107-121 Heitz, S. (1967): Habilitationsschrift, Bonn 1967 Hekimoglu, S. (1998): Application of equiredundancy design to M-estimation, Journal of Surveying Engineering 124 (1998), 103-124 Hekimoglu, S. and M. Berber (2003): Effectiveness of robust methods in heterogeneous linear models, Journal of Geodesy 76 (2003), 706-713 ¨ Helmert, F.R. (1875): Uber die Berechnung des wahrscheinlichen Fehlers aus einer endlichen Anzahl wahrer Beobachtungsfehler, Z. Math. U. Physik 20 (1875), 300-303 Helmert, F.R. (1876): Diskussion der Beobachtungsfehler in Koppes Vermessung f¨ur die Gotthardtunnelachse, Z. Vermessungswesen 5 (1876), 129-155 Helmert, F.R. (1876): Die Genauigkeit der Formel von Peters zur Berechnung der wahrscheinlichen Beobachtungsfehler direkter Beobachtungen gleicher Genauigkeit, Astronomische Nachrichten 88 (1876), 113-120 Helmert, F.R. (1876): Die Genauigkeit der Formel von Peters zur Berechnung des wahrscheinlichen Fehlers direkter Beobachtungen gleicher Genauigkeit, Astron. Nachrichten 88 (1976), 113-132 ¨ Helmert, F.R. (1876a): Uber die Wahrscheinlichkeit der Potenzsummen der Beobachtungsfehler, Z. Math. U. Phys. 21 (1876), 192-218 Helmert, F.R. (1907): Die Ausgleichungsrechnung nach der Methode der kleinsten Quadrate, mit Anwendungen auf die Geod¨asie, die Physik und die Theorie der Messinstrumente, B.G. Teubner, Leipzig Berlin 1907 Helmert, F.R. (1924): Die Ausgleichungsrechnung nach der. Methode der Kleinsten Quadrate. 3. Auflage. Teubner. Leipzig. 1924 Henderson, H.V. (1981): The vec-permutation matrix, the vec operator and Kronecker products: a review, Linear and Multilinear Algebra 9 (1981), 271-288 Henderson, H.V. and S.R. Searle (1981): Vec and vech operators for matrices, with some uses in Jacobians and multivariate statistics Henderson, H.V. and S.R. Searle (1981): On deriving the inverse of a sum of matrices, SIAM Review 23 (1981), 53-60 Henderson, H.V., Pukelsheim, F. and S.R. Searle (1983): On the history of the Kronecker product, Linear and Multilinear Algebra 14 (1983), 113-120 Hendriks, H. and Z. Landsman (1998): Mean location and sample mean location on manifolds: Asymptotics, tests, confidence regions, J. Multivar. Anal. 67 (1998), 227-243 Hengst, M. (1967): Einfuehrung in die Mathematische Statistik und ihre Anwendung, Bibliographisches Institut, Mannheim 1967
References
939
Henrici, P. (1962): Bounds for iterates, inverses, spectral variation and fields of values of non-normal matrices, Numer. Math. 4 (1962), 24-40 Hensel, K. (1968): Leopold Kronecker’s Werke, Chelsea Publ. Comp., New York 1968 Herzberg, A.M. and A.V. Tsukanov (1999): A note on the choice of the best selection criterion for the optimal regression model, Utilitas Mathematica 55 (1999), 243-254 Hesse, K. (2003): Domain decomposition methods in multiscale geopotential determination from SST and SGG, Berichte aus der Mathematik, Shaker Verlag, Aachen 2003 Hetherington, T.J. (1981): Analysis of directional data by exponential models, Ph.D. Thesis, University of California, Berkeley 1981 Heuel, S. and Forstner, W. (2001): Matching, Reconstructing and Grouping 30 Lines From Multiple Views Using Uncertain Projective Geometry. CVPR 2001, Posters Session 4.2001, S. 721. Heyde, C.C. (1997): Quasi-likelihood and its application. A general approach to optimal parameter estimation, Springer-Verlag, Heidelberg Berlin New York 1997 Hickernell, F.J. (1999): Goodness-of-fit statistics, discrepancies and robust designs, Statistics and Probability Letters 44 (1999), 73-78 Hida, T. and Si, S. (2004): An innovation approach to random fields: application of white noice theory, World Scientific 2004 Higham, N.J. and F. Tisseur (2000): A block algorithm for matrix 1-norm estimation, with an application to 1-norm pseudospectra, SIAM J. Matrix Anal. Appl. 21 (2000), 1185-1201 Hinze, J.O. (1975): Turbulence, Mc Graw Hill, New York 1975 Heikkinen, M. (1982): Geschlossene Formeln zur Berechnung r¨aumlicher geod¨atischer Koordinaten aus rechtwinkligen Koordinaten, Zeitschrift f¨ur Vermessungswesen 107 (1982) 207-211. Hein, G. (1982a): A contribution to 3D-operational Geodesy (Principles and observation equations of terrestrial type), DGK Reihe B, Heft Nr. 258/VII pp. 31-64. Hein, G. (1982b): A contribution to 3D-operational Geodesy part 2, DGK Reihe B, Heft Nr. 258/VII pp. 65-85. Heiskanen, W. A. and Moritz, H. (1976): Physical Geodesy, Freeman and Company, London 1976. Hemmleb, G. (1986): Vor hundert Fahren wurde Friedvich Robert Helmert Direkto des Pvenssischen Geodetischen Institutes, Vermessungstechnik 34, (1986), pp. 178 Hide, R. (2010): A path od discovery in geophysical fluid dynamics, Royal Astron. Soc., Astronomy and Geophysics, 51 (2010) 4.16-4.23 Hinde, J. (1998): Overdispersion: models and estimation, Comput. Stat. & Data Anal. 27 (1998), 151-170 Hinkley, D. (1979): Predictive likelihood, Ann. Statist. 7 (1979), 718-728 Hinkley, D., Reid, N. and E.J. Snell (1990): Statistical theory and modelling, Chapman and Hall, Boca Raton 1990 Hiroven, R.A.: a) Publ. Inst. Geod. Photogr. Cart., Ohio State University Report No.4, 1956; b) Ann. Acad. Scient. 1962. Hirvonen, R.A. (1971): Adjustment by least squares in geodesy and photogrammetry, Frederick Ungar Publ. Co., New York Hirvonen, R.A. and Moritz, H. (1963): Practical computation of gravity at high altitudes, Institute of Geodesy, Photogrammetry and Cartography, Ohio State University, Report No. 27, Ohio 1963. Hjorth, J.S.U. (1993): Computer intensive statistical methods, Chapman and Hall, Boca Raton 1993 Ho, L.L. (1997): Regression models for bivariate counts, Brazilian J. Probability and Statistics 11 (1997), 175-197 Hoaglin, D.C. and R.E. Welsh (1978): The Hat Matrix in regression and ANOVA, The American Statistician 32 (1978), 17-22 Hodge, W.V.D. (1941): Theory and applications of harmonic integrals, Cambridge University Press, Cambridge 1941 Hodge, W.V.D. and D. Pedoe (1968): Methods of algebraic geometry, I, Cambridge University Press, Cambridge 1968
940
References
Hoel, P.G. (1965): Minimax distance designs in two dimensional regression, Ann. Math. Statist. 36 (1965), 1097-1106 Hoel, P.G., S.C. Port and C.J. Stone (1972): Introduction to stochastic processes, Houghton Mifflin Publ., Boston 1972 Hoerl, A.E. and Kennard, R.W. (1970): Ridge regression: biased estimation for nonorthogonal problems, Technometrics 12 (1970), 55-67 Hoerl, A.E. and R.W. Kennard (2000): Ridge regression: biased estimation for nonorthogonal problems, Technometrics 42 (2000), 80-86 H¨opke, W. (1980): Fehlerlehre und Ausgleichungsrechnung, De Gruyter, Berlin 1980 Hoffmann, K. (1992): Improved estimation of distribution parameters: Stein-type estimators, Teubner-Texte zur Mathematik, Stuttgart/Leipzig 1992 Hofmann, B. (1986): Regularization for applied inverse and ill-posed problems, Teubner Texte zur Mathematik 85, Leipzig 1986 Hofman-Wellenhof, B., Lichtenegger, H. and Collins, J. (1994): Global Positioning System: Theory and practice, Springer-Verlag, Wien 1994. Hofmann-Wellenhof, B., Moritz, H. (2005): Physical Geodesy, Springer Wien, New York Hogg, R.V. (1972): Adaptive robust procedures: a partial review and some suggestions for future applications and theory, J. Am. Statist. Ass. 43 (1972), 1041-1067 Hogg, R.V. and R.H. Randles (1975): Adaptive distribution free regression methods and their applications, Technometrics 17 (1975), 399-407 Hong, C.S. and H.J. Choi (1997): On L1 regression coefficients, Commun. Statist. Simul. Comp. 26 (1997), 531-537 Hora, R.B. and R.J. Buehler (1965): Fiducial theory and invariant estimation, Ann. Math. Statist. 37 (1965), 643-656 Horaud, R., Conio, B. and Leboulleux, O. (1989): An Analytical Solution for the Perspective 4-Point Problem, Computer Vision, Graphics and Image Processing 47 (1989) 33-44. Horn, R.A. (1989): The Hadamard product, in Matrix Theory and Applications, C.R. Johnson, ed., Proc. Sympos. Appl. Math. 40 (1989), 87-169 Horn, R.A. and C.R. Johnson (1990): Matrix analysis, Cambridge University Press, Cambridge 1990 Horn, R.A. and C.R. Johnson (1991): Topics on Matrix analysis, Cambridge University Press, Cambridge 1991 Horn, S.D. and R.A. Horn (1975): Comparison of estimators of heteroscedastic variances in linear models, Am. Stat. Assoc., 70, 872-879. Hornoch, A.T. (1950): u¨ ber die Zur¨uckf¨uhrung der Methode der kleinsten Quadrate auf das ¨ Prinzip des arithmetischen Mittels, Osterr. Zeitschrift f¨ur Vermessungswesen 38 (1950), 13-18 Hoschek, J. and Lasser, D. (1992): Grundlagen der geometrischen Datenverarbeitung. B.G. Teubner, Stuttgart. Hosking, J.R.M. and J.R. Wallis (1997): Regional frequency analysis. An approach based on L-moments, Cambridge University Press 1997 Hosoda, Y. (1999): Truncated least-squares least norm solutions by applying the QR decomposition twice, trans. Inform. Process. Soc. Japan 40 (1999), 1051-1055 Hotelling, H. (1931): The generalization of Student’s ratio, Ann. Math. Stat., 2, 360-378. Hotelling, H. (1933): analysis of a complex of statistical variables into principal components, J. Educ. Psych., 24, 417-441, 498-520. Hotelling, H. (1951): A generalized T test and measure of multivariate dispersion, Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley and Los Angeles, 23-41. Hotelling, H. (1953): New light on the correlation coefficient and its transform, J. Roy. Statist. Soc. B15 (1953), 225-232 Householder, A.S. (1958): Unitary Triangulation of a Nonsymmetric Matrix. J. Assoc. Compo Mach., Vol. 5, 339. Howind, J., B¨ohringer, M., Mayer, M., Kutterer, H., Lindner, K.v. and Heck, B (2000): Korrelationsstudien bei GPS-Phasenbeobachtungen. In: Dietrich, R. (ed): Deutsche Beitr¨age
References
941
zu GPS-Kampagnen des Scientific Committee on Antartic Research (SCAR) 1995-1998, Deutsche Geod¨atische Kommission B310: 201-205, Munich, 2000. Hoyle, M.H. (1973), Transformations - an introduction and a bibliography, Int. Statist. Review 41 (1973), 203-223 Hsu, J.C. (1996): Multiple comparisons, Chapman and Hall, Boca Raton 1996 Hsu, P.L. (1938a): Notes on Hotelling’s, generalized T, Ann. Math. Stat., 9, 231-243 Hsu, P.L. (1938b): On the best unbiassed quadratic estimate of the variance, Statist. Res. Mem., Univ. London 2 (1938), 91-104. Hsu, P.L. (1940a): On generalized analysis of variance (1), Biometrika, 31, 221-237 Hsu, P.L. (1940s): An algebraic derivation of the distribution of rectangular coordinates, Proc. Edinburgh Math. Soc. 6 (1940), 185-189 Hsu, R. (1999): An alternative expression for the variance factors in using Iterated Almost Unbiased Estimation, Journal of Geodesy 73 (1999), 173-179 Hsu, Y.S., Metry, M.H. and Y.L. Tong (1999): Hypotheses testing for means of dependent and heterogeneous normal random variables, J. Statist. Planning and Inference 78 (1999), 89-99 Huang, J.S. (1999): Third-order expansion of mean squared error of medians, Statistics & Probability Letters 42 (1999), 185-192 H¨uber, P.J. (1964): Robust estimation of a location parameter, Annals Mathematical Statistics 35 (1964), 73-101 H¨uber, P.J. (1972): Robust statistics: a review, Annals Mathematical Statistics 43 (1972), 1041-1067 H¨uber, P.J. (1981): Robust Statistics, J. Wiley, New York 1981 H¨uber, P.J. (1984): Finite sample breakdown of M- and P- estimators, Ann. Stat., 12 (1984), pp. 119-126 H¨uber, P.J. (1985): Projection persuit, Annals of Statistics 13, (1984), pp. 435-475 Huckeman, S., Hotz, T. and A. Munk (2007): Intrinsic shape analysis: geodesic PCA for Riemannian manifolds modulo isometric Lie group actions, Statistica Sinicia 2007 Huckeman, S. (2009): On the meaning of the mean slope, Digital Signal Processing 16 (2009) 468-478 Huckeman, S. (2010): On the meaning of mean slope, arXiv. (1002.07950.) (stat. ME), preprint 2010 Huet, S., A. Bouvier, M.A. Gruet and E. Jolivet (1996): Statistical tools for nonlinear regression, Springer-Verlag, Heidelberg Berlin New York 1996 Huffel, van S. and H. Zha (1991a): The restricted total least squares problem: Formulation, algorithm, and properties, SIAM J. Matrix Anal. Appl. 12 (1991), 292-309 Huffel, van S. and H. Zha (1991b): The total least squares problem, SIAM J. Matrix Anal. Appl. 12 (1991), 377-407 Huffel, van S. and H. Zha (1993): The total least square problem, in: C.R. Rao (ed.), Handbook of Statistics, Vol. 2, pp. 337-408, North Holland, Amsterdam-London-New York-Tokyo 1993 Hugentobler, U. and Dach, R. and Fridez, P. (2004): Bernese GPS Software Version 5.0. University of Bern, 2004. Humak K M. S. (1977): Statistische Methoden der Modellbildung Band I. Statistische Inferenz f¨ur lineare Parameter. Akademie-Verlag, Berlin, 1977. Hunter, D.B. (1995): The evaluation of Legendre functions of the second kind, Numerical Algorithms 10 (1995), 41-49 Huwang, L. and Y.H.S. Huang (2000): On errors-in-variables in polynomical regression-Berkson case, Statistica Sinica 10 (2000), 923-936 Hwang, C. (1993): Fast algorithm for the formation of normal equations in a least-squares spherical harmonic analysis by FFT, manuscripta geodaetica 18 (1993), 46-52 Ibragimov, F.A. and R.Z. Kasminskii (1981): Statistical estimation, asymptotic theory, SpringerVerlag, Heidelberg Berlin New York 1981 ICD (2000): Interface Control Document, Navstar GPS Space Segment/Navigation User Interfaces, ICDGPS-200C
942
References
Ihorst, G. and G. Trenkler (1996): A general investigation of mean square error matrix superiority in linear regression, Statistica 56 (1996), 15-23 Imhof, L. (2000): Exact designs minimizing the integrated variance in quadratic regression, Statistics 34 (2000), 103-115 Imhof, J.P. (1961): Computing the distribution of quadratic forms in normal variables, Biometrika 48 (1961), 419-426 Imon, A.H.M.R. (2002): On deletion residuals. Calcutta Stat. Assoc. Bull. 53, 209-210 (2002), 65-79. Inda, de M.A. et al. (1999): Parallel fast Legendre transform, proceedings of the ECMWF Workshop Towards TeraComputing - the Use of Parallel Processors in Meteorology”, Worls Scientific Publishing Co 1999 Ireland, K. and Rosen, M. (1990): A classical introduction to modern number theory, SpringerVerlag, New York 1990. Irle, A. (1990): Sequentialanalyse: Optimale sequentielle Tests, Teubner Skripten zur Mathematischen Stochastik. Stuttgart 1990 Irwin, J.O. (1937): On the frequency distribution of the means of samples from a population having any law of frequency with finite moments with special reference to Pearson’s Type II, Biometrika 19 (1937), 225-239 Isham, V. (1993): Statistical aspects of chaos, in: Networks and Chaos, Statistical and Probabilistic Aspects (ed. D.E. Barndorff-Nielsen et al.), 124-200, Chapman and Hall, London 1993 Ishibuchi, H., Nozaki, K. and Tanaka H. (1992): Distributed representation of fuzzy rules and its application to pattern classification, Fuzzy Sets and Systems 52 (1992), 21-32 Ishibuchi, H., Nozaki, K., Yamamoto, N. and Tanaka, H. (1995): Selecting fuzzy if-then rules for classification problems using genetic algorithms, IEEE Transactions on Fuzzy Systems 3 (1995), 260-270 Ishibuchi, H. and Murata, T. (1997): Minimizing the fuzzy rule base and maximizing its performance by a multi-objective genetic algorithm, in: Sixth FUZZ-IEEE Conference, Barcelona 1997, 259-264 Ishimaru, A. (1978): Wave Propagation and Scattering in Random Media, Vol.2. Academic Press, New York, 1978. Ivanov, A.V., S. Zwanzig, (1981): An asymptotic expansion for the distribution of the least squares estimator of a vector parameter in nonlinear regression. Soviet Math. Dokl. Vol. 23 (1981), No.1, 118-121. Ivanov, A.V., S. Zwanzig, (1996): Saddlepoint approximation in nonlinear regression. Theory of Stochastic Processes, 2, (18),1996,156-162. Ivanov, A.V., S. Zwanzig, (1983): An asymptotic expansion of the distribution of the least squares estimator in the nonlinear regression model. Statistics, VoL 14 (1983), No. 1, 7-27. Izenman, A.J. (1975): Reduced-rank regression for the multivariate linear model, J. Multivar. Anal. 5 (1975), 248-264 Jacob, N. (1996): Pseudo-differential operators and Markov processes, Akademie Verlag, Berlin 1996 Jacobi, C.G.J. (1841): Deformatione et proprietatibus determinatum, Crelle’s J. reine angewandte Mathematik, Bd.22 Jacod, J., Protter, P. (2000): Probability essentials, Springer-Verlag, Heidelberg Berlin New York 2000 Jaeckel, L.A. (1972): Estimating regression coefficients by minimizing the dispersion of the residuals, Annals Mathematical Statistics 43 (1972), 1449-1458 Jajuga, K. (1995): On the principal components of time series, Statistics in Transition 2 (1995), 201-205 James, A.T. (1954): Normal multivariate analysis and the orthogonal group, Ann. Math. Statist. 25 (1954), 40-75 Jammalamadaka S. R., Sengupta D. (1999): Changes in the general linear model: A unified approach. Linear Algebra Appl. 289, 1-3 (1999), 225-242.
References
943
Jammalamadaka, S.R., SenGupta, A. (2001): Topics in circular statistics, World Scientific, Singapore 2001 Janacek, G. (2001): Practical time series, Arnold, London 2001 Jara, A., Quintana, E. and E. Sammertin (2008): Linear effects mixed models with skew-elliptical distributions: A Bayesian approach. In Computational Statist. Data Analysis 52 (2008) 5033-5045 Jeffery, H. (1961): Cartesian tensors, Cambridge University Press Jekeli, C. (2001): Inertial Navigation Systems with Geodetic Applications. Walter deGruyter, Berlin-New York, 352p Jennison, C. and B.W. Turnbull (1997): Distribution theory of group sequential t, x2 and F-Tests for general linear models, Sequential Analysis 16 (1997), 295-317 Jennrich, R.I. (1969): Asymptotic properties of nonlinear least squares estimation, Ann. Math. Statist. 40 (1969), 633-643 Jensen, J.L. (1981): On the hyperboloid distribution, Scand. J. Statist. 8 (1981), 193-206 Jiang, J. (1997): A derivation of BLUP - Best linear unbiased predictor, Statistics & Probability Letters 32 (1997), 321-324 Jiang, J. (1999): On unbiasedness of the empirical BLUE and BLUP, Statistics & Probability 41 (1999), 19-24 Jiang, J., Jia, H. and H. Chen (2001): Maximum posterior estimation of random effects in generalized linear mixed models, Statistica Sinica 11 (2001), 97-120 Joe, H. (1997): Multivariate models and dependence concepts, Chapman and Hall, Boca Raton 1997 J¨orgensen, B. (1984): The delta algorithm and GLIM, Int. Statist. Review 52 (1984), 283-300 J¨orgensen, B. (1997): The theory of dispersion models, Chapman and Hall, Boca Raton 1997 J¨orgensen, B., Lundbye-Christensen, S., Song, P.X.-K. and L. Sun (1996b): State space models for multivariate longitudinal data of mixed types, Canad. J. Statist. 24 (1996b), 385-402 Jorgensen, P.C., Kubik, K., Frederiksen, P. and W. Weng (1985): Ah, robust estimation! Australian Journal of Geodesy, Photogrammetry and Surveying 42 (1985), 19-32 John, S. (1962): A tolerance region for multivariate normal distributions, Sankya A24 (1962), 363-368 Johnson, N.L. and S. Kotz (1972): Distributions in statistics: continuous multivariate distributions, J. Wiley, New York 1972 Johnson, N.L., Kotz, S. and X. Wu (1991): Inspection errors for attributes in quality control, Chapman and Hall, Boca Raton 1991 Johnson, N.L. and S. Kotz and Balakrishnan, N. (1994): Continuous Univariate Distributions, vol 1. 2nd ed, John Wiley & Sons, New York, 784p Johnson, N.L. and S. Kotz and Balakrishnan, N. (1995): Continuous Univariate Distributions, vol 2. 2nd ed, John Wiley & Sons, New York, 752p Jonge de, P.J., Tiberius, C.C.J.M. (1996a): The LAMBDA method for integer ambiguity estimation: implementation aspects. Publications of the Delft Computing Centre, LGR-Series No. 12 Jonge de, P.J., Tiberius, C.C.J.M. (1996b): Integer estimation with the LAMBDA method. In: Beutler, G et al. (eds) Proceedings IAG symposium No. 115, GPS trends in terrestrial, airborne and spaceborne applications. Springer, Heidelberg, pp 280-284 Jonge de, P.J., Tiberius, C.C.J.M., Teunissen, P.J.G. (1996): Computational aspects of the LAMBDA method for GPS ambiguity resolution. In: Proceedings ION GPS-96, pp 935-944 Jordan, W., Eggert, O. and Kneissl, M. (1956): Handbuch der Vermessungskunde, Vol. III, Hohenmessung. Metzlersche Verlagsbuchhandlung, 10th edition, Stuttgart. Jorgenson, B. (1999): Multivariate dispersion models, J. multivariate Analysis, 74, (1999), pp. 267-281 Joshi, V.M. (1966): Admissibility of confidence intervals, Ann. Math. Statist. 37 (1966), 629-638 Judge, G.G. and M.E. Bock (1978): The statistical implications of pre-test and Stein-rule estimators in econometrics, Amsterdam 1978
944
References
Judge, G.G., W.E. Griffiths, R.C. Hill, and T.C. Lee (1980): The Theory and Practice of Econometrics, New York: John Wiley & Sons, 793 pages Judge, G.G. and T.A. Yancey (1981): Sampling properties of an inequality restricted estimator, Economics Lett. 7 (1981), 327-333 Judge, G.G. and T.A. Yancey (1986): Improved methods of inference in econometrics, Amsterdam 1986 Juki`ec, D. and R. Scitovski (1997): Existence of optimal solution for exponential model by least squares, J. Comput. Appl. Math. 78 (1997), 317-328 Jupp, P.E. and K.V. Mardia (1980): A general correlation coefficient for directional data and related regression problems, Biometrika 67 (1980), 163-173 Jupp, P.E. and K.V. Mardia (1989): A unified view of the theory of directional statistics, 1975-1988, International Statist. Rev. 57 (1989), 261-294 Jureckova, J. (1995): Affine- and scale-equivariant M-estimators in linear model, Probability and Mathematical Statistics 15 (1995), 397-407 Jureckova, J.; Sen, P.K. (1996): In Robust Statistical Procedures Asymptotics and Interrelation, John Wiley & Sons: New York, USA, 1996. Jurisch, R. and G. Kampmann (1997): Eine Verallgemeinerung des arithmetischen Mittels f¨ur einen Freiheitsgrad bei der Ausgleichung nach vermittelnden Beobachtungen, Zeitschrift f¨ur Vermessungswesen 122 (1997), 509-521 Jurisch, R. and G. Kampmann (1998): Vermittelnde Ausgleichungsrechnung mit balancierten Beobachtungen - erste Schritte zu einem neuen Ansatz, Zeitschrift f¨ur Vermessungswesen 123 (1998), 87-92 Jurisch, R. and G. Kampmann (2001): Pl¨ucker-Koordinaten - ein neues Hilfsmittel zur GeometrieAnalyse und Ausreissersuche, Vermessung, Photogrammetrie und Kulturtechnik 3 (2001), 146-150 Jurisch, R. and G. Kampmann (2002): Teilredundanzen und ihre nat¨urlichen Verallgemeinerungen, Zeitschrift f¨ur Vermessungswesen 127 (2002), 117-123 Jurisch, R., Kampmann, G. and B. Krause (1997): u¨ ber eine Eigenschaft der Methode der kleinsten Quadrate unter Verwendung von balancierten Beobachtungen, Zeitschrift f¨ur Vermessungswesen 122 (1997), 159-166 Jurisch, R., Kampmann, G. and J. Linke (1999a): u¨ ber die Analyse von Beobachtungen in ¨ der Ausgleichungsrechnung - Außere und innere Restriktionen, In “Qua Vadis Geodesia”, Festschrift f¨ur Erik W. Grafarend, Schriftenreihe der Institute des Studienganges Geod¨asie und Geoinformatik 1999, 207-230 Jurisch, R., Kampmann, G. and J. Linke (1999b): u¨ ber die Analyse von Beobachtungen in der Ausgleichungsrechnung - Teil I, Zeitschrift f¨ur Vermessungswesen 124 (1999), 350-357 Jurisch, R., Kampmann, G. and J. Linke: u¨ ber die Analyse von Beobachtungen in der Ausgleichungsrechnung - Teil II, Zeitschrift f¨ur Vermessungswesen 124 (1999), 350-357 Kagan, A.M., Linnik, J.V. and C.R. Rao (1965): Characterization problems of the normal law based on a property of the sample average, Sankya A27 (1965), 405-406 Kagan, A. and L.A. Shepp (1998): Why the variance?, Statist. Prob. Letters 38 (1998), 329-333 Kagan, A. and Z. Landsman (1999): Relation between the covariance and Fisher information matrices, Statistics & Probability Letters 42 (1999), 7-13 Kagan, Y.Y. (1992): Correlations of earthquake focal mechanisms, Geophys. J. Int., 110, 305-320 Kagan, Y.Y. (1990): Random stress and earthquake statistics: spatial dependence, Geophys. J. Int., 102, 573-583 Kagan, Y.Y. (1991): Likelihood analysis of earthquake catalogues, Geophys. J. Int., 106, 135-148 Kagan, Y.Y. and L. Knopoff (1985):Lorder statistical moment of the seismic moment tensor, Geophys. J. R. astr. Soc., 81,429-444 Kahmen, H. and Faig, W. (1988): Surveying, Walter de Gruyter, Berlin 1988. Kahn, M., Mackisack, M.S., Osborne, M.R. and G.K. Smyth (1992): On the consistency of Prony’s method and related algorithms, Jour. Comp. and Graph. Statist. 1 (1992), 329-349
References
945
Kahng, M.W. (1995): Testing outliers in nonlinear regression, Journal of the Korean Stat. Soc. 24 (1995), 419-437 Kakkuri, J. and Chen, R. (1992): On horizontal crustal strain in Finland, Bull. Geod., 66, 12-20 Kakwani, N.C. (1968): Note on the unbiasedness of mixed regression estimation, Econometrica 36 (1968), pp. 610-611 Kalkbrener, M. (1990a): Primitive Polynomial Remainder Sequences. Technical Report RISCLinz Series no. 90-01.0, Research Institute for Symbolic Computation, Univ. of Linz. Kalkbrener, M. (1990b): Solving Systems of Bivariate Algebraic Equations by using Primitive Polynomial Remainder Sequences. Technical Report RISC-Linz Series no. 90-21.0, Research Institute for Symbolic Computation, Univ. of Linz. Kalkbrener, M. (1990c): Schreiben von M. Kalkbrener, Research Institute for Symbolic Computation, Johannes Kepler Universitat Linz an P. Lohse, Geod¨atisches Institut der Universitat Stuttgart vom 30. Mai 1990. nicht veroffentlicht. Kalkbrener, M. (1991): Three Contributions to Elimination Theory. Dissertation, Institut fiir Mathematik, Technisch-Naturwissenschaftliche Fakultat, Johannes Kepler Universit.at Linz. Kallianpur, G. (1963): von Mises functionals and maximum likelihood estimation, Sankya A25 (1963), 149-158 Kallianpur, G. and Y.-T. Kim (1996): A curious example from statistical differential geometry, Theory Probab. Appl. 43 (1996), 42-62 Kallenberg, O. (1986): Random measures, Academic Press, London 1984 Kallenberg, O. (1997): Foundations of modern probability, Springer-Verlag, Heidelberg Berlin New York 1997 Kaleva, O. (1994): Interpolation of fuzzy data, Fuzzy Sets Sys, 61, 63-70. Kalman, R.: A New Approach to Linear Filtering and Prediction Problems, Trans. ASME Ser. D, J. Basic Eng. 82, 35 (1960J. Kaminsky, K.S. and P.I. Nelson (1975): Best linear unbiased prediction of order statistics in location and scale families, J. Am. Statist. Ass. 70 (1975), 145-150 Kaminsky, K.S. and L.S. Rhodin (1985): Maximum likelihood prediction, Ann. Inst. Statist. Math. A37 (1985), 507-517 Kampmann, G. (1988): Zur kombinativen Norm-Sch¨atzung mit Hilfe der L1-, der L2- und der Boskovic-Laplace-Methode mit den Mittlen der linearen Programmierung, Ph.D. Thesis, Bonn University, Bonn 1988 Kampmann, G. (1992): Zur numerischen u¨ berf¨uhrung verschiedener linearer Modelle der Ausgleichungsrechnung, Zeitschrift f¨ur Vermessungswesen 117 (1992), 278-287 Kampmann, G. (1994): Robuste Deformationsanalyse mittels balancierter Ausgleichung, Allgemeine Vermessungs-Nachrichten 1 (1994), 8-17 Kampmann, G. (1997): Eine Beschreibung der Geometrie von Beobachtungen in der Ausgleichungsrechnung, Zeitschrift f¨ur Vermessungswesen 122 (1997), 369-377 Kampmann, G. and B. Krause (1996): Balanced observations with a straight line fit, Bolletino di Geodesia e Scienze Affini 2 (1996), 134-141 Kampmann, G. and B. Krause (1997): A breakdown point analysis for the straight line fit based on balanced observations, Bolletino di Geodesia e Scienze Affini 3 (1997), 294-303 Kampmann, G. and B. Renner (1999): u¨ ber Modell¨uberf¨uhrungen bei der linearen Ausgleichungsrechnung, Allgemeine Vermessungs-Nachrichten 2 (1999), 42-52 Kampmann, G. and B. Renner (2000): Numerische Beispiele zur Bearbeitung latenter Bedingungen und zur Interpretation von Mehrfachbeobachtungen in der Ausgleichungsrechnung, Zeitschrift f¨ur Vermessungswesen 125 (2000), 190-197 Kampmann, G. and B. Krause (2004): Zur statistischen Begr¨undung des Regressionsmodells der balanzierten Ausgleichungsrechnung, Z. Vermessungswesen 129 (2004), 176-183 Kannan, N. and D. Kundu (1994): On modified EVLP and ML methods for estimating superimposed exponential signals, Signal Processing 39 (1994), 223-233 Kantz, H. and Scheiber, T. (1997): Nonlinear rime series analysis, Cambridge University Press, Cambridge 1997
946
References
Kanwal, R.P. (1971): Linear integral equations, theory and techniques. Academic Press, New York, London 1971 Kaplan, E.D. and Hegarty, C.J. (eds) (2006): Understanding GPS: Principles and Applications. 2nd ed, Artech Karatzas, I. and S.E. Shreve (1991): Brownian motion and stochastic calculus, Springer-Verlag, Heidelberg Berlin New York 1991 Karcz, I., Forrai, J. and Steinberg, G. (1992): Geodetic Network for Study of Crustal Moven1ents Across the Jordan-Deas Sea Rift. Journal of Geodynamics. 16:123-133. Kariya, T. (1989): Equivariant estimation in a model with an ancillary statistic, Ann. Statist 17 (1989), 920-928 Karlin, S. and W.J. Studden (1966): Tchebychev systems, Interscience, New York (1966) Karlin, S. and W.J. Studden (1966): Optimal experimental designs, Ann. Math. Statist. 57 (1966), 783-815 Karman, T. (1937): Proc. Nat. Acad. Sci. Volume 23, Washington 1937 Karman, T. and L. Howarth: Proc. Roy. Soc. A 164 (1938). Karman, T. (1937): On the statistical theory of turbulence, Proc. Nat. Acad. Sci. 23 (1937) DS-1U5. Karman, T. and L. Howrath (1938): On the statistical theory of isotropic turbulence, Proc. Royal Soc. London A 164 (1938) 192-215 Karr, A.F. (1986): Pointe process and their statistical interference, Marcel Dekker, New York, 1986 Kaula, W.M.: a) J. Geoph. Res. 64, 2401 (1959); b, Rev. Geoph. 507 (1963). Kasala, S. and T. Mathew (1997): Exact confidence regions and tests in some linear functional relationships, Statistics & Probability Letters 32 (1997), 325-328 Kasietczuk, B. (2000): Geodetic network adjustment by the maximum likelihood method with application of local variance, asymmetry and excess coefficients, Anno LIX - Bollettino di Geodesia e Scienze Affini 3 (2000), 221-235 Kass, R.E. and P.W. Vos (1997): Geometrical foundations of asymptotic inference, J. Wiley, New York 1997 Kastrup, H.A. (1962): Zur physikalischen Deutung und darstellungstheoretischen Analyse der konformen Transformationen von Raum und Zeit, Annalen der Physik 9 (1962), 388-428 Kastrup, H.A. (1966): Gauge properties of the Minkowski space, Physical Review 150 (1966), 1183-1193 Kay, S.M. (1988): Sinusoidal parameter estimation, Prentice Hall, Englewood Cliffs, N.J. 1988 Keller, J.B. (1975): Closest unitary, orthogonal and Hermitian operators to a given operator, Math. Mag. 46 (1975), 192-197 Kelly, R.J. and T. Mathew (1993): Improved estimators of variance components having smaller probability of negativity, J. Roy. Stat. Soc. B55 (1993), 897-911 Kelm R, (1978): Ist die Variantzshatzung nach Helmert MINQUE? Allgemeine VermessungsNachrichten, 85, 49-54. Kemperman, J.H.B. (1956): Generalized tolerance limits, Ann. Math. Statist. 27 (1956), 180-186 Kendall, D.G. (1974): Pole seeking Brownian motion and bird navigation, J. Roy. Statist. Soc. B36 (1974), 365-417 Kendall, D.G. (1984): Shape manifolds, Procrustean metrics, and complex projective space, Bulletin of the London Mathematical Society 16 (1984), 81-121 Kendall, M.G. (1960): The evergreen correlation coefficient, 274-277, in: Essays on Honor of Harold Hotelling, ed. I. Olkin, Stanford University Press, Stanford 1960 Kendall, M.G. and Stuart, A. (1958): The advanced theory of statistics, Vol. 1, distribution theory, Charles Griffin and Company Ltd, London Kendall, M.G. and Stuart, A. (1977): The advanced theory of statistics. Vol. 1: Distribution theory. Charles Griffin & Company Lt., London-High Wycombe, 1977 (4th ed.) Kendall, M.G. and Stuart, A. (1979): The advanced theory of statistics. Vol. 2: Inference and relationship. Charles Griffin & Company Lt., London-High Wycombe, 1979 (4th ed.). Kenney, C.S., Laub, A.J. and M.S. Reese (1998): Statistical condition estimation for linear systems, SIAM J. Scientific Computing 19 (1998), 566-584
References
947
Kent, J. (1976): Distributions, processes and statistics on spheres, Ph. D. Thesis, University of Cambridge Kent, J.T. (1983): Information gain and a general measure of correlation, Biometrika 70 (1983), 163-173 Kent, J.T. (1997): Consistency of Procrustes estimators, J. R. Statist. Soc. B59 (1997), 281-290 Kerr, R.M. (1985): Higher order derivative correlations and alignment of small scale structures in isotropic numerical turbulence, J. Fluid Mech. 153 (1985) 31-58 Kertz, W. (1969): Einfuehrung in die Geophysik, BI - Taschenbuch, Mannheim 1969/71 Keshin, M. (2004): Directional statistics of satellite-satellite single-difference widelane phase biases, Artificial satellites, 39 (2004), pp. 305-324 Khan, R.A. (1998): Fixed-width confidence sequences for the normal mean and the binomial probability, Sequential Analysis 17 (1998), 205-217 Khan, R.A. (2000): A note on Hammersley’s estimator of an integer mean, J. Statist. Planning and Inference 88 (2000), 37-45 Khatri, C.G. and C.R. Rao (1968): Solutions to some fundamental equations and their applications to characterization of probability distributions, Sankya A30 (1968), 167-180 Khatri, C.G. and S.K. Mitra (1976): Hermitian and nonnegative definite solutions of linear matrix equations, SIAM J. Appl. Math. 31 (1976), 597-585 Khuri, A.I., Mathew, T. and B.K. Sinha (1998): Statistical tests for mixed linear models, J. Wiley, New York 1998 Khuri, A.I. (1999): A necessary condition for a quadratic form to have a chi-squared distribution: an accessible proof, Int. J. Math. Educ. Sci. Technol. 30 (1999), 335-339 Kiamehr R., (2006): Precise Gravimetric Geoid Model for Iran Based o’n GRACE and SRTM Data and the Least-Squares Modification of Stokes’ Formula with Some Geodynamic Interpretations. Ph.D. Thesjs, Division of Geodesy, Royal Institute of Technology, Stockholm, Sweden Kiamehr, R. and Eshagh, M. (2008): Estimating variance components of ellipsoidal, orthometric and geoidal heights through the GPS/levelling network in Iran, J. Earth and Space Physics 34 (2008) 1-13 Kianifard F., Swallow W. H. (1996): A review of the development and application of recursive residuals in linear models. J. Am. Stat. Assoc. 91, 433 (1996), 391-400. Kida, S. (1989): Statistics of velocity gradients in turbulence at moderate Reynolts number, Fluid dynamics research 4 (1989) 347-370 Kidd, M. and N.F. Laubscher (1995): Robust confidence intervals for scale and its application to the Rayleigh distribution, South African Statist. J. 29 (1995), 199-217 Kiefer, J. (1974): General equivalence theory for optimal designs (approximate theory), Ann. Stat. 2 (1974), 849-879 Kiefer, J. (1959): Optimum experimental design. J Royal Stat Soc Ser B 21:272-319 Kiefer, J.C. and J. Wolfowitz (1959): Optimum design in regression problem, Ann. Math. Statist. 30 (1959), 271-294 Killian, K. (1990): Der gef¨ahrliche Ort des u¨ berbestimmten r¨aumlichen R¨uckw¨artseinschneidens, ¨ Ost.Zeitschrift f¨ur Vermessungs wesen und Photogrammetry 78 (1990) 1-12. King, J.T. and D. Chillingworth (1979): Approximation of generalized inverses by iterated regularization, Numer. Funct. Anal. Optim. 1 (1979), 499-513 King, M.L. (1980): Robust tests for spherical symmetry and their application to least squares regression, Ann. Statist. 8 (1980), 1265-1271 Kirkwood, B.H., Royer, J.Y., Chang, T.C. and R.G. Gordon (1999): Statistical tools for estimating and combining finite rotations and their uncertainties, Geophys. J. Int. 137 (1999), 408-428 Kirsch, A. (1996): An introduction to the mathematical theory of inverse problems, SpringerVerlag, Heidelberg Berlin New York 1996 Kitagawa, G. and W. Gersch (1996): Smoothness priors analysis of time series, Springer-Verlag, Heidelberg Berlin New York 1996 Klebanov, L.B. (1976): A general definition of unbiasedness, Theory of Probability and Appl. 21 (1976), 571-585
948
References
Klees, R., Ditmar, P. and P. Broersen (2003): How to handle colored observation noise in large least-squares problems, Journal of Geodesy 76 (2003), 629-640 Kleffe, J. (1976): A note on MINQUE for normal models, Math. Operationsforschg. Statist. 7 (1976), 707-714 Kleffe, J. (1977): Invariant methods for estimating variance components in mixed linear models, Math. Operations forsch. Statistic. Series Ststist., 8 (1977), pp. 233-250 Kleffe, J. and R. Pincus (1974): Bayes and the best quadratic unbiased estimators for parameters of the covariance matrix in a normal linear model, Math. Operationsforschg. Statistik 5 (1974), 43-76 Kleffe, J. and R. Pincus (1974): Bayes and the best quadratic unbiased estimators for variance components and heteroscedastic variances in linear models, Math. Operationsforschg. Statistik 5 (1974), 147-159 Kleffe, J. and J.N.K. Rao (1986): The existence of asymptotically unbiased nonnegative quadratic estimates of variance components in ANOVA models, J. Am. Statist. Ass. 81 (1986), 692-698 Kleijnen, J.P.C. and Beers, van W.C.M. (2004): Application driven sequential design for simulation experiments: Kriging metamodeling. J Oper Res Soc 55:876-893 Kleijnen, J.P.C. (2004): An overview of the design and analysis of simulation experiments for sensitivity analysis. Eur J Oper Res 164:287-300 Klein, U. (1997): Analyse und Vergleich unterschiedlicher Modelle der dreidimensionale, DGK, Reihe C, Heft Nr. 479. Kleusberg, A. and E.W. Grafarend (1981): Expectation and variance component estimation of multivariate gyrotheodolite observation II , Allgemeine Vermessungsnachrichten 88 (1981), 104-108 Kleusberg, A. (1994): Die direkte L¨osung des r¨aumlichen Hyperbelschnitts, Zeitschrift f¨ur Vermessungswesen 119 (1994) 188-192. Kleusberg, A. (1999): Analytical GPS navigation solution, Quo vadis geodesia...? Festschrift for E.W. Grafarend on the occasion of his 60th birthday, Eds. F. Krumm and V.S. Schwarze, Report Nr. 1999.6-1. Klingbeil, E. (1977): Variationsrechnung. Bibliographisches Institut. B.I. Wissenschaftsverlag, Mannheim. Klingenberg, W. (1973): Eine Vorlesung iiber Differentialgeometrie. Springer-Verlag, Berlin Heidelberg. Klingenberg, W. (1982): Riemannian Geometry. Walter de Gruyter, Berlin. Klonecki, W. and S. Zontek (1996): Improved estimators for simultaneous estimation of variance components, Statistics & Probability Letters 29 (1996), 33-43 Kmenta, J. (1971): Elements of econometrics, Macmillan, New York, 1971 Knautz, H. (1996): Linear plus quadratic (LPQ) quasiminimax estimation in the linear regression model, Acta Applicandae Mathematicae 43 (1996), 97-111 Knautz, H. (1999): Nonlinear unbiased estimation in the linear regression model with nonnormal disturbances, J. Statist. Planning and Inference 81 (1999), 293-309 Knickmeyer, E.H. (1984): Eine approximative L¨osung der allgemeinen linearen Geod¨atischen Randwertaufgabe durch Reihenentwicklungen nach Kugelfunktionen, Deutsche Geod¨atische Kommission bei der Bayerischen Akademie der Wissenschaften, M¨unchen 1984 Knuth D. E. (1973): The Art of Computer Programming. Vol. I, Fundamental Algorithms, 2nd Edn., Addison-Wesley Koch, G.G. (1968): Some futher remarks on A general approach to the estimation of variance components, Technometrics 10 (1968), 551-558 Koch, K.R. (1991): Moment tensor inversion of local earthquake data - I. investigation of the method and its numerical stability with model calculations, Geophys. J. Int., 106, 305-319 Koch, K.R. (1969): Solution of the Geodetic Boundary Value Problem for a Reference Ellipsoid. Journal of Geophysical Research, 74, 3796-3803; 1969 Koch, K.R. (1970a): Lunar Shape and Gravity Field. Photogrammetric Engineering, 36, 375-380, 1970
References
949
Koch, K.R. (1970b): Darstellung’des Erdschwerefeldes in der Satellitengeodasie durchdas Potential einer einfachen Schicht Zeitschrift f¨ur Vermessungswesen, 95, 173-179, 1970 Koch, K.R. (1970c): Reformulation of the Geodetic Boundary Value Problem in View of the Results of Geometric Satellite Geodesy. Advances in Dynamic Gravimetry, edited by W.T. Kattner, 111-114, Instrument Society of America, Pittsburgh 1970 ¨ Koch, K.R. (1970d): Uber die Bestimmung des Gravitationsfeldes eines Himmelskorpers mit Hilfe der Saterliten photogrammetrie. Bildmessung und Luftbildwesen, 38, 331-338, Karlsruhe 1970 Koch, K.R. (1970e): Surface Density Values for the Earthfrom Satellite and Gravity Observations. Geophysical Journal of the Royal Astronomical. Society, 21, 1-12 1970 Koch, K.R. (1971a): Die geod¨atische Randwertaufgabe bei bekannter ErdoberfHiche. Zeitschrift fiiI Vermessungswesen, 96, 218-224, 1971 Koch, K.R. (1971b): Simple Layer Potential of a Level. Ellipsoid. Bollettino di Geofisica teorica ed applicata, 13, 264-270, 1971 Koch, K.R. (1971c): Geophysikalische Interpretation von mittleren D’ichtewerten der Erdoberflache aus Satelitenbeobachtungen und Schwerem,essungen. Das Unternehmen Erdmantel, herausgegeben von W; Kertz u.a., 244-246, Deutsche Forschungsgemeinschaft, Bonn 1971 Koch, K.R. (1972a): Geophysical Interpretation of Density Anomalies of the Earth Computed from Satellite Observations and Gravity Measurements. Zeitschrift f¨ur Geophysik, 38, 75-84, 1972 Koch, K.R. (1972b): Method of Integral Equations for the Geodetic Boundary Value Problem. Mitteilungen aus dem Institut f¨ur Theoretische Geodasie der Universitat Bonn, Nr, 4, 38-49, Bonn 1972 . Koch, K.R. (1973): Kontinentalverschiebung und Erdschwerefeld. Zeitschrift f¨ur Vermessungswesen, 98, 8-12, 1973 Koch, K.R. (1974): Earth’s Gravity Field and Station Coordinates From Doppler Data, Satellite Triangulation, and Gravity Anomalies. NOAA Technical Report NOS 62, U.S. Department of Commerce, Rockville, Md 1974 Koch, K.R. (1975): Rekursive numerische Filter. Zeitschrift fiir Vermessungswesen,. 100, 281-292, 1975 Koch, K.R. (1976): Schatzung. einer Kovarianzmatrix und Test ihrer Identitat mit einer gegebenen Matrix. Allgemeine Vermessungs-’-Nachrichten, 83, 328-333, 1976 Koch, K.R. (1978a): Hypothesentests bei singulren Ausgleichungsproblemen, ZfV, Vol. 103, pp. 1-10 Koch, K.R. (1978b): Schatzung von Varianzkomponenten. Allgemeine Vermessungs-Nachrichten, 85, 264-269, 1978 Koch, K.R. (1979a): Parameter Estimation in Mixed Models. Correlation Analysis and Systematical Errors with Special Reference, to Geodetic Problems, Report No.6, 1-10, Geodetic Institute, Uppsala University, Uppsala 1979 Koch, K.R. (1979b): Parameter Estimation in the Gauss-Helmert Model. Bollettino di Geodesia e Scienze Affini, 38, 553-563, 1979 Koch, K.R. (1980): Parameterschatzung und Hypothesentests in linearep. Modellen. Dummler, Bonn, 1980, pp. 308 Koch, K.R. (1981): Varianz- und Kovarianzkomponentenschatzung f¨ur Streckenmessungen auf Eichlinien Allgemeine Vermessungs-Nachrichten, 88, 135-132; 1981 Koch, K.R. (1982): S-transformations and projections for obtaining estimable parameters, in: Blotwijk, M.J. et al. (eds.): 40 Years of Thought, Anniversary volume for Prof. Baarda’s 65th Birthday Vol. 1, Technische Hogeschool Delft, Delft (1982), 136-144 Koch, K.R. (1983): Estimation of Variances and Covariances in the Multivariate and in the Incomplete Multivariate Model. Deutsche Geod¨atische Kommission, Reihe A, Nr. 98, 53-59, M¨unchen 1983 Koch, K.R. (1985): Ein Statistisches Auswerteverfahren f¨ur Deformationsmessungen, AVN, Vol. 92, pp. 97-108 Koch, K.R. (1986): Maximum Likelihood Estimate of Variance Components; Ideas by A.J. Pope. Bulletin Geodesique 60 329-338 1986
950
References
Koch, K.R. (1987a): Zur Auswertung von Streckenmessungen auf Eichlinien mi’ttels Varianzkomponentenschatzung. Allgemeine Vennessungs-Nachrichten, 94, 63-71, 1987 Koch, K.R. (1987b): Bayesian Inference for Variance Components. Manuscripta Geodaetica, 12, 309-313 1987 Koch, K.R. (1987c): Parameterschaetzung und Hypothesentests in linearen Modellen, 2nd ed., Duemmler, Bonn 1987 Koch, K.R. (1988a): Parameter estimation and hypothesis testing in linear models, SpringerVerlag, Heidelberg Berlin New York 1988 Koch, K.R. (1988b): Konfidenzinte:r:valle der Bayes-Statistik f¨ur die Varianzen von Streckenmessungen ’auf Eichlinien. Vermessung Photogrammetrie, Kultur’technik, 86, 337-340, 1988 Koch, K.R. (1988c): Bayesian Statistics for Variance Components with Informative and Noninformative Priors. Manuscripta GeodaetIca, 13, 370-373; 1988 Koch, K.R. (1990): Bayesian Inference with Geodetic Applications. Springer-Verlag, Berlin Heidelberg NewYork 1990, pp. 208 Koch, K.R. (1999): Parameter estimation and hypothesis testing in linear models, 2nd ed., Springer-Verlag, Heidelberg Berlin New York 1999 Koch, K.R. (2008): Determining uncertainties of correlated measurements by Monte Carlo simulations applied to laserscanning, Journal of Applied Geodesy, 2, 139-147 (2008). Koch, K.R. and J. Kusche (2002): Regularization of geopotential determination from satellite data by variance components, Journal of Geodesy 76 (2002), 259-268 Koch, K.R. and A.J. Pope, (1969): Least Squares Adjustment with Zero Variances. Zeitschrift f¨ur Vermessungswesen, 94, 390-393, 1969 (zusammen mit A.J. Pope) Koch, K.R. and B.U. Witte, (1971): Earth’s’ Gravity Field Represented by a Simple-Layer Potential from Doppler Tracking of Satellites, Journal of Geophysical Research, 76, 8471-8479, 1971 Koch, K.R. and A.J. Pope, (1972): Punktbestimmung durch die Vereinigung von geometrischer und dynamischer Satelllitengeodasie. Zeitschrift f¨ur Vermessungswesen, 97, 1-6, 1972 Koch, K.R. and A.J. Pope, (1972): Uniqueness and Existence for the Geodetic Boundary Value Problem Using the Known Surface of the Earth. Bulletin Geodesique, 106, 467-476, 1972 Koch, K.R. and H. Frohlich (1974): Integrationsfehler in den Variationsgleichungen f¨ur das’ Modell der einfachen Schicht in der Satellitengeodasie. Mitteilungen aU8 dem Institut f¨ur Theoretische Geodasie der Universitat Bonn, Nr. 25; Bonn 1974 Koch, K.R. and W. Bosch, (1980): The geopotential from Gravity Measurements, Levelling Data and Satellite Results. Bulletin Geodesique, 54, 73-79, 1980 Koch, K.R. and M. Schmidt, (1994): Deterministiche and Stochastische signals, Dummler, Bonn 1994, pp. 358 Koch, K.R. and Z. Ou, (1994): Analytical expressions for Bayes estimates of variance components. Manuscripta Geodetica 19 284-293, 1994 K¨onig, D. and V. Schmidt (1992): Zuf¨allige Punktprozesse, Teubner Skripten zur Mathematischen Stochastik, Stuttgart 1992 Koenker, R. and G. Basset (1978): Regression quantiles, Econometrica 46 (1978), 33-50 Kollo, T. and H. Neudecker (1993): Asymptotics of eigenvalues and unit-length eigenvectors of sample variance and correlation matrices, J. Multivar. Anal. 47 (91993), 283-300 Kollo, T. and D. von Rosen (1996): Formal density expansions via multivariate mixtures, in: Multidimensional statistical analysis and theory of random matrices, Proceedings of the Sixth Lukacs Symposium, eds. Gupta, A.K. and V.L.Girko, 129-138, VSP, Utrecht 1996 Kolmogorov, A.N. (1941): Interpolation’ und Extrapolation von stationaeren zufaelligen Folgen, Bull. Acad. Sci. (USSR), Ser. Math. 5. (1941), 3-14 Kolmogorov, A.N. (1950): Foundations of the Theory of Probability, New York, Chelsea Pub. co. Kolmogorov, A.N. (1962): A refinement of previous hypothesis concerning the local structure of turbulence in a viscous incompressible fluid at high Reynolds number, Fluid mechanics 13 (1962) 82-85 Konoshi, S. and Rao, C.R. (1992): Principal component analysis for multivariate familial data, Biometrica, 79, 631-641
References
951
Koopman, R. (1982): Parametersch¨atzung ber a priori information, Vandenhoeck and Ruprecht, G¨ottingen 1982 Koopmans, T.C. and O. Reiersol (1950): The identification of structural characteristics, Ann. Math. Statistics 21 (1950), 165-181 Korc, F. and Forstner, W. (2008): Interpreting Terrestrial/mages of Urban Scenes Using Discriminative Random Fields. 21 st Congress of the International Society for Photogrammetry and Remote Sensing (ISPRS). Beijing, China 2008, S. 291-296 Part B3a. Kosko, B. (1992): Networks and fuzzy systems, Prentice-Hall, Englewood Cliffs, N.J. 1992 Kosmol, P. (1989): Methoden zur numerischen Behandlung nichtlinearer Gleichungen und Optimierungsaufgaben, B. G. Teubner, Stuttgart. Kotecky, R. and J. Niederle (1975): Conformally covariant field equations: First order equations with non-vanishing mass, Czech. J. Phys. B25 (1975), 123-149 Kotsakis, C. (2005): A type of biased estimators for linear models with uniformly biased data, Journal of Geodesy, vol. 79, no. 6/7, pp. 341-350 Kott, P.S. (1998): A model-based evaluation of several well-known variance estimators for the combined ratio estimator, Statistica Sinica 8 (1998), 1165-1173 Koukouvinos, C. and J. Seberry (1996): New weighing matrices, Sankhya: The Indian Journal of Statistics B, 58 (1996), 221-230 Kourouklis S., Paige C. C. (1996): A constrained least squares approach to the general GaussMarkov linear model. J. Am. Stat. Assoc. 76, 375 (1981), 620-625. Kowalewski, G. (1995): Robust estimators in regression, Statistics in Transition 2 (1995), 123-135 Kraichnan R.H. (1974): On Kolmogorov’s inertial-range theories, J . Fluid Mech. (1974), vol. 62, part 2, pp. 305-330 Kraemer, J. and B. Eisfeller (2009): A-GNSS, a different approach, Inside GNSS, Sep/Oct issue, (2009), 52-61 Kr¨amer, W., Bartels, R. and D.G. Fiebig (1996): Another twist on the equality of OLS and GLS, Statistical Papers 37 (1996), 277-281 Krantz, S.G. and H.R. Parks (2002): The implicit function theorem - history, theory and applications, Birkh¨auser-Verlag, Basel Boston Berlin 2002 Krarup, T. (1969): A Contribution to the mathematical foundation of physical geodesy, Publ. Danish Geod. Inst. 44, Copenhagen Krarup, T. (1979): S-transformations or how to live without the generalized inverse-almost, Danish Geodetic Institute, Copenhagen 1979 Krarup, T. (1980): Integrated geodesy. In: Proceedings of the international school of advanced geodesy. Bollettino di Geodesia e Scienze Affini 38:480-496 Krarup, T. (1982): Nonlinear adjustment and curvature, In. Forty years of thought, Delft, pp. 145-159, 1982. Krarup, T., Juhl, J. and K. Kubik (1980): G¨otterd¨ammerung over least squares adjustment, in: Proc. 14th Congress of the International Society of Photogrammetry, Vol. B3, Hamburg 1980, 369-378 Krause, L.O. (1987): A direct solution of GPS-Type Navigation equations, IEEE Transactions on Aerospace and Electronic Systems 23 (1987) 225-232. Krauss, K.: a) ZfV 95,387 (1970); b) ZfV 96, 233 (1971). Krengel, U. (1985): Ergodic theorems, de Gruyter, Berlin New York 1985 Kres, H. (1983): Statistical tables for multivariate analysis, Springer-Verlag, Heidelberg Berlin New York 1985 Krige, D. (1951): A statistical approach to some basic mine valuation problems on the Witwatersrand. J Chem Metall Mining Soc South Afr 52:119-139 Krishna, S. and Manocha, D. (1995): Numerical algorithms for evaluating one-dimensional algebraic sets, Proceedings of the International Symposium on Symbolic and Algebraic Computation ISSAC, July 10-12, pp. 59-67, Montreal, Canada, 1995. Kronecker, L. (1903): Vorlesungen u¨ ber die Theorie der Determinanten, Erster Band, Bearbeitet und fortgef¨uhrt von K.Hensch, B.G.Teubner, Leipzig 1903
952
References
Krumbein, W.C. (1939): Preferred orientation of pebbles in sedimentary deposits, J. Geol. 47 (1939), 673-706 Krumm, F. (1982): Criterion matrices for estimable quantities. Proceedings Survey Control Networks, ed. Borre, K., Welsch, W.M., Vol. 7, pp. 245-257, Schriftenreih Wiss. Studiengang Vermessungswesen, Hochschule der Bundeswehr M¨unchen. Krumm, F., Grafarend, E. and B. Schaffrin (1986): Continuous networks, Fourier analysis and criterion matrices, manuscripta geodaetica 11 (1986), 57-78 Krumm, F. (1987): Geod¨atische Netze im Kontinuum: Inversionsfreie Ausgleichung und Konstruktion von Kriterionmatrizen, Deutsche Geod¨atische Kommission, Bayerische Akademie der Wissenschaften, Ph. D. Thesis, Report C334, M¨unchen 1987 Krumm, F. and F. Okeke (1998): Graph, graph spectra, and partitioning algorithms in a geodetic network structural analysis and adjustment, Bolletino di Geodesia e Science Affini 57 (1998), 1-24 Kruskal, W. (1946): Helmert’s distribution, American Math. Monthly 53 (1946), 435-438 Kruskal, W. (1968): When are Gauß-Markov and least squares estimators identical? A coordinatefree approach, Ann. Statistics 39 (1968), 70-75 Kryanev, A.V. (1974): An iterative method for solving incorrectly posed problem, USSR. Comp. Math. Math. Phys. 14 (1974), 24-33 Krzanowski, W.J. and F.H.C. Marriott (1994): Multivariate analysis: part I - distribution, ordination and inference, Arnold Publ., London 1994 Krzanowski, W.J. and F.H.C. Marriott (1995): Multivariate analysis: part II - classification, covariance structures and repeated measurements, Arnold Publ., London 1995 Kuang, S.L. (1991): Optimization and design of deformations monitoring schemes, PhD dissertation, Department of Surveying Engineering, University of New Brunswick, Tech. Rep. 91, Fredericton 1991 Kuang, S. (1996): Geodetic network analysis and optimal design, Ann Arbor Press, Chelsea, Michigan 1996 Kubacek, L. (1988): Foundations of the Estimation Theory. Elsevier, Amsterdam-Oxford, New York-Tokyo, 1988. Kubacek, L. (1995): Linear statistical models with constraints revisited. Math. Slovaca 45, 3 (1995), 287-307. Kubacek, L. (1995): On a linearization of regression models. Appl. Math., Praha, 40, 1 (1995), 61-78. Kubacek, L. (1996a): Linear model with inaccurate variance components, Applications of Mathematics 41 (1996), 433-445 Kubacek, L. (1996b): Nonlinear error propagation law, Applications of Mathematics 41 (1996), 329-345 Kubacek, L. (1996c): Models with a low nonlinearity. Tatra Mt. Math. Publ. 7 (1996), 149-155. Kubacek, L. (1997): Notice on the Chipman generalization of the matrix inverse. Acta Univ. Palacki. Olomuc., Fac. Rerum Nat., Math. 36 (1997), 95-98. Kubacek, L. (2002): On an accuracy of change points. Math. Slovaca 52,4 (2002), 469-484. Kubacek, L. (2005): Underparametrization in a regression model with constraints II. Math. Slovaca 66 (2005), 579-596. Kubacek, L. and Kubackova, L. (1978): The present approach to the study of the least-squares method. Studia geoph. et geod. 22 (1978), 140-147. Kubacek, L., Kubackova, L. and J. Kukaca (1987): Probability and statistics in geodesy and geophysics, Elsevier, Amsterdam 1987 Kubacek, L., Kubackova, L. and Volaufova, J. (1995): Statistical Models with Linear Structures. Veda, Bratislava, 1995. Kubacek, L. and Kubackova, L. and J. Kukaca (1995): Statistical models with linear structures, slovak Academy of Sciences, Bratislava 1995 Kubacek, L. and Kubackova, L. (1997): One of the calibration problems. Acta Univ. Palacki. 010muc., Fac. Rerum Nat., Math. 36 (1997), 117-130.
References
953
Kubacek, L. and Kubackova, L. (1998): Testing statistical hypotheses in deformation measurement; one generalization of the Scheffe theorem. Acta Univ. Palacki. Olomu., Fac. Rerum Nat., Math. 37 (1998), 81-88. Kubacek, L. and Kubackova, L. (2000): Nonsensitiveness r:egions in universal models. Math. 810- vaca 50, 2 (2000), 219-240. Kubacek, L., Kubackova L., SevCik J. (2002): Linear conform transformation: errors in both coordinate systems. Appl. Math., Praha 47, 4 (2002), 361-380. Kubacek, L., Fiserova, E. (2003): Problems of sensitiveness and linearization in a determination of isobestic points. Math. Slovaca 53, 4 (2003), 407-426. Kubackova, L. (1996): Joint confidence and threshold ellipsoids in regression models. Tatra Mt. Math. Publ. 7 (1996), 157-160. Kubik, K.K. (1967): Iterative Methoden zur L¨osunge des nichtlinearen Ausgleichungsproblemes, Zeitschrift f¨ur Vermessungswesen 91 (1967) 145-159. Kubik, K. (1970): The estimation of the weights of measured quantities within the method of least squares. Bulletin Geodesique, 44, 21-40. Kubik, K. (1982): Kleinste Quadrate und andere Ausgleichsverfahren, Vermessung Photogrammetrie Kulturtechnik 80 (1982), 369-371 Kubik, K., and Y. Wang (1991): Comparison of different principles for outlier detection, Australian Journal of Geodesy, Photogrammetry and Surveying 54 (1991), 67-80 Kuechler, U. and M. Soerensen (1997): Exponential families of stochastic processes, SpringerVerlag, Heidelberg Berlin New York 1997 Kukush, A. and S. Zwanzig (2002): On consistent estimatiors in nonlinear functional errorsin-variables models. p. 145-154. In Total Least squares and Erros-in-Variables Modeling ed. Sabine Huffel, Philippe Lemmerling, Kluwer Kullback, S. (1934): An application of characteristic functions to the distribution problem of statistics, Annals Math. Statistics 4 (1934), 263-305 Kullback, S. (1935): On samples from a multivariate normal population, Ann. Math. Stat. 6, 202-213 Kullback, S.(1952): An application of information theory to multivariate analysis, Ann. Math. Stat., 23, 88-102 Kullback, S. (1956): An application of information theory to multivariate analysis II, Ann. Math. Stat., 27, 122-146 Kumaresan, R. (1982): Estimating the parameters of exponentially damped or undamped sinusoidal signals in noise, PhD thesis, The University of Rhode Island, Rhode Island 1982 Kunderova, P. (2000): Linear models with nuisance parameters and deformation measurement. Acta Univ. Palacki. Olomuc., Fac. Rerum Nat., Math. 39 (2000), 95-105. Kunderova, P. (2001): Locally best and uniformly best estimators in linear models with nuisance parameters. Tatra Mt. Math. Publ. 22 (2001), 27-36. Kunderova, P. (2001): Regular linear model with the nuisance parameters with constraints of the type I. Acta Univ. Palacki. Olomuc., Fac. Rerum Nat., Math. 40 (2001), 151-159. Kunderova, P. (2002): Regular linear model with nuisance parameters with constraints of the type II. Folia Fac. Sci. Nat. Univ. Masarykianae Brunensis, Math. 11 (2002), 151-162. Kunderova, P. (2003): Eliminating transformations for nuisance parameters in linear model. Acta Univ. Palacki. Olomuc., Fac. Rerum Nat., Math. 42 (2003), 59-68. Kunderova, P. (2005): One singular multivariate linear model with nuisance parameters. Acta Univ. Palacki. Olomuc., Fac. Rerum Nat., Math. 44 (2005), 57-69. Kundu, D. (1993a): Estimating the parameters of undamped exponential signals, Technometrics 35 (1993), 215-218 Kundu, D. (1993b): Asymptotic theory of least squares estimator of a particular nonlinear regression model, Statistics and Probability Letters 18 (1993), 13-17 Kundu, D. (1994a): Estimating the parameters of complex valued exponential signals, Computational Statistics and Data Analysis 18 (1994), 525-534 Kundu, D. (1994b): A modified Prony algorithm for sum of damped or undamped exponential signals, Sankhya A56 (1994), 525-544
954
References
Kundu, D. (1997): Asymptotic theory of the least squares estimators of sinusoidal signal, Statistics 30 (1997), 221-238 Kundu, D. and A. Mitra (1998): Fitting a sum of exponentials to equispaced data, Sankhya: The Indian Journal of Statistics 60 (1998), 448-463 Kundu, D. and R.D. Gupta (1998): Asymptotic properties of the least squares estimators of a two dimensional model, Metrika 48 (1998), 83-97 Kunst, R.M. (1997): Fourth order moments of augmented ARCH processes, Commun. Statist. Theor. Meth. 26 (1997), 1425-1441 Kurz, S. (1996): Positionierung mittels R¨uckwartsschnitt in drei Dimensionen. Studienarbeit, Geod¨atisches Institut, University of Stuttgart, Stuttgart 1996. Kuo, W., Prasad, V.R., Tillman, F.A. and C.-L. Hwang (2001): Optimal reliability design, Cambridge University Press, Cambridge 2001 Kuo, Y. (1976): An extended Kronecker product of matrices, J. Math. Anal. Appl. 56 (1976), 346-350 Kupper, L.L. (1972): Fourier series and spherical harmonic regression, J. Roy. Statist. Soc. C21 (1972), 121-130 Kupper, L.L. (1973): Minimax designs of Fourier series and spherical harmonic regressions: a characterization of rotatable arrangements, J. Roy. Statist. Soc. B35 (1973), 493-500 Kurata, H. (1998): A generalization of Rao’s covariance structure with applications to several linear models, J. Multivar. Anal. 67 (1998), 297-305 Kurz, L. and M.H. Benteftifa (1997): Analysis of variance in statistical image processing, Cambridge University Press, Cambridge 1997 Kusche, J. (2001): Implementation of multigrid solvers for satellite gravity anomaly recovery, Journal of Geodesy 74 (2001), 773-782 Kusche, J. (2002): Inverse Probleme bei der Gravitationsfeldbestimmung mittels SST- und SGG-Satellitenmissionen , Deutsche Geod¨atische Kommission, Report C548, M¨unchen 2002 Kusche, J. (2003): A Monte-Carlo technique for weight estimation in satellite geodesy, Journal of Geodesy 76 (2003), 641-652 Kushner, H. (1967): Dynamical equations for optimal nonlinear filtering, J. Diff. Eq. 3 (1967), 179-190 Kushner, H. (1967): Approximations to optimal nonlinear filters, IEEE Trans. Auto. Contr. AC-12 (1967), 546-556 Kutoyants, Y.A. (1984): Parameter estimation for stochastic processes, Heldermann, Berlin 1984 Kutterer, H. (1994): Intervallmathematische Behandlung endlicher Unsch¨arfen linearer Ausgleichsmodelle, PhD Thesis, Deutsche Geod¨atische Kommission DGK C423, M¨unchen 1994 Kutterer, H. (1999): On the sensitivity of the results of least-squares adjustments concerning the stochastic model, Journal of Geodesy 73 (1999), 350-361 Kutterer, H. (2002): Zum Umgang mit Ungewissheit in der Geod¨asie - Bausteine f¨ur eine neue Fehlertheorie - , Deutsche Geod¨atische Kommission, Report C553, M¨unchen 2002 Kutterer, H. and S.Sch¨on (1999): Statistische Analyse quadratischer Formen - der Determinantenansatz, Allg. Vermessungsnachrichten 10 (1999), 322-330 Lagrange, J. L. (1877): Lecons eIementaires sur les mathematiques donnees a l’Ecole Normale en 1795. In: Oeuvres de Lagrange, Ed.: M. J. A. Serret, Tome 7, Section IV, Paris. Laha, R.G. (1956): On the stochastic independence of two second-degree polynomial statistics in normally distributed variates, The Annals of Mathematical Statistics 27 (1956), 790-796 Laha R.G. and E. Lukacs (1960): On a problem connected with quadratic regression, Biometrika 47 (1960), 335-343 Lai, T.L. and C.Z. Wie (1982): Least squares estimates in stochastic regression model with applications to stochastic regression in linear dynamic systems, Anals of Statistics 10 (1982), 154-166 Lai, T.L. and C.P. Lee (1997): Information and prediction criteria for model selection in stochastic regression and ARMA models, Statistical Sinica 7 (1997) 285-309 Laird, N.M. and J.H. Ware (1982): Random-effects models for longitudinal data, Biometrics 38 (1982), 963-974
References
955
Lame, M. G. (1818): Examen des differentes methodes employees pour resoudre les problemes de geometrie. Paris LaMotte, L.R. (1973a): Quadratic estimation of variance components, Biometrics 29 (1973), 311-330 LaMotte, L.R. (1973b): On non-negative quadratic unbiased estimation of variance components, J. Am. Statist. Ass. 68(343): 728-730. LaMotte, L.R. (1976): Invariant quadratic estimators in the random, one-way ANOVA model, Biometrics 32 (1976), 793-804 LaMotte, L.R. (1996): Some procedures for testing linear hypotheses and computing RLS estimates in multiple regression. Tatra Mt. Math. Publ. 7 (1996), 113-126. LaMotte, L.R., McWhorter, A. and R.A. Prasad (1988): Confidence intervals and tests on the variance ratio in random models with two variance components, Com. Stat. - Theory Meth. 17 (1988), 1135-1164 Lamperti, J. (1966): Probility: a survey of the mathematical theory, WA Benjamin Inc., New York 1966 Lancaster, H.O. (1965): The helmert matrices, American Math. Monthly 72 (1965), 4-11 Lancaster, H.O. (1966): Forerunners of the Pearson Chi2 , Australian Journal of Statistics 8 (1966), 117-126 Langevin, P. (1905): Magnetisme et theorie des electrons, Ann. De Chim. et de Phys. 5 (1905), 70-127 Lanzinger, H. and U. Stadtm¨uller (2000): Weighted sums for i.i.d. random variables with relatively thin tails, Bernoulli 6 (2000), 45-61 Lapaine, M. (1990): “A new direct solution of the transformation problem of Cartesian into ellipsoidal coordinates”. Determination of the geoid: Present and future. Eds. R.H. Rapp and F. Sanso, pp. 395-404. Springer-Verlag, New York 1990. Laplace, P.S. (1793): Sur quelques points du Systeme du monde, Memoire de I’Academie royale des Sciences de Paris, in Oeuvres Completes, 11 (1793), pp. 477-558 Laplace, P.S. (1811): M´emoiresur les int´egrales d´efinies et leur application aux probabilites et specialement a la recherche du milieu qu’il faut choisir entre les resultats des observations, Euvres completes XII, Paris, 1811, 357-412. Lardy, L.J. (1975): A series representation for the generalized inverse of a closed linear operator, Atti Accad. Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. 58 (1975), 152-157 Lauer, S.: Dissertation, Bonn 1971 Lauritzen, S. (1973): The probabilistic background of some statistical methods in Physical Geodesy, Danish Geodetic Institute, Maddelelse 48, Copenhagen 1973 Lauritzen, S. (1974): Sufficiency, prediction and extreme models, Scand. J. Statist. 1 (1974), 128-134 L¨auter, H. (1970): Optimale Vorhersage und Sch¨atzung in regul¨aren und singul¨aren Regressionsmodellen, Math. Operationsforschg. Statistik 1 (1970), 229-243 L¨auter, H. (1971): Vorhersage bei stochastischen Prozessen mit linearem Regressionsanteil, Math. Operationsforschg. Statistik 2 (1971), 69-85, 147-166 Lawless, J.F. (1982): Statistical models and lifetime data, Wiley, New York 1982 Lawley, D.N. (1938): A generalization of Fisher’s z-test. Biometrika, 30, 180-187 Lawson, C.L. and R.J. Hanson (1995): Solving least squares problems, SIAM, Philadelphia 1995 (reprinting with corrections and a new appendix of a 1974 Prentice Hall text) Lawson, C.L. and R.J. Hanson (1974): Solving least squares problems, Prentice-Hal, Englewod Cliffs, NJ 1974 Laycock, P.J. (1975): Optimal design: regression models for directions, Biometrika 62 (1975), 305-311 LeCam, L. (1960): Locally asymptotically normal families of distributions, University of California Publication 3, Los Angeles 1960, 37-98 LeCam, L. (1970): On the assumptions used to prove asymptotic normality of maximum likelihood estimators, Ann. Math. Statistics 41 (1970), 802-828
956
References
LeCam, L. (1986): Proceedings of the Berkeley conference in honor of Jerzy Neyman and Jack Kiefer, Chapman and Hall, Boca Raton 1986 Leichtweiss, K. (1962): -ober eine Art von Kriimmungsinvarianten beliebiger Untermannigfaltigkeiten des n-dimensionalen euklidischen Raums. Hbg. Math. Abh., Bd. XXVI, 156-190. ¨ Leis, R. (1967): Vorlesung”en Uber partielle Differentialgleichungen zweiter Ordnung. Mannheim Lee, Y. W. (1960): Statistical theory of communication, New York 1960 Lee, Y. W., in: Selected Papers of N. Wiener, MIT-Press 1964 Lee, J.C. and S. Geisser (1996): On the Prediction of Growth Curves, in: Lee, C., Johnson, W.O. and A. Zellner (eds.): Modelling and Prediction Honoring Seymour Geisser, Springer-Verlag, Heidelberg Berlin New York 1996, 77-103 Lee, J.C., Johnson, W.O. and A. Zellner (1996): Modeling and prediction, Springer-Verlag, Heidelberg Berlin New York 1996 Lee, M. (1996): Methods of moments and semiparametric econometrics for limited dependent variable models, Springer-Verlag, Heidelberg Berlin New York 1996 Lee, P. (1992): Bayesian statistics, J. Wiley, New York 1992 Lee, S.L. (1995): A practical upper bound for departure from normality, Siam J. Matrix Anal. Appl. 16 (1995), 462-468 Lee, S.L. (1996): Best available bounds for departure from normality, Siam J. Matrix Anal. Appl. 17 (1996), 984-991 Lee, S.L. and J.-Q. Shi (1998): Analysis of covariance structures with independent and non-identically distributed observations, Statistica Sinica 8 (1998), 543-557 Lee, Y. and J.A. Nelder (1996): Hierarchical generalized linear models, J. Roy. Statist. Soc. B58 (1996), 619-678 Lee, Y., Neider, J. and Pawitan, Y. (2006): Generalized linear models with random effects, Chapman & Hall/CRC, New York Lehmann, E.L. (1959): Testing statistical hypotheses, J. Wiley, New York 1959 Lehmann, E.L. and H. Scheff`ee (1950): Completeness, similar regions and unbiased estimation, Part I, Sankya 10 (1950), 305-340 Lehmann, E.L. and G.Casella (1998): Theory of point estimation, Springer-Verlag, Heidelberg Berlin New York 1998 Leick, A. (1995): GPS satellite surveying, 2nd Edition, John Wiley & Sons, New York 1995. Leick, A. (2004): GPS Satellite Surveying, 3rd ed, John Wiley & Sons, New Jersey, 435p Lemmerling H., Van Huffel S. (2002): Structured total least squares. Analysis, algorithms and applications. In: Total least squares and errors-in-variables modelling. Analysis, algorithms and applications (S. Van Huffel et al., ed), Kluwer Academic Publishers, Dordrecht, 2002, pp. 79-91. Lenstra, H.W. (1983): Integer programming with a fixed number of variables, Math. Operations Res. 8 (1983), 538-548 Lenth, R.V. (1981): Robust measures of location for directional data, Technometrics 23 (1981), 77-81 Lenstra, A.K., Lenstra, H.W. and L. Lovacz (1982): Factoring polynomials with rational coefficients, Math. Ann. 261 (1982), 515-534 Lentner, M.N. (1969): Generalized least-squares estimation of a subvector of parameters in randomized fractional factorial experiments, Ann. Math. Statist. 40 (1969), 1344-1352 Lenzmann, L. (2003): Strenge Auswertung des nichtlnearen Gauß-Helmert-Modells, AVN 2 (2004), 68-73 Lesaffre, E. and G. Verbeke (1998): Local influence in linear mixed models, Biometrics 54 (1998), 570-582 Lesanska E. (2001): Insensitivity regions for testing hypotheses in mixed models with constraint. Tatra Mt. Math. Publ. 22 (2001), 209-222. Lesanska E. (2002): Effect of inaccurate variance components in mixed models with constraints. Folia Fac. Sci. Nat. Univ. Masarykianae Brunensis, Math. 11 (2002), 163-172. Lesanska E. (2002): Insensitivity regions and their properties. J. Electr. Eng. 53 (2002), 68-71. Lesanska E. (2002): Nonsensitiveness regions for threshold ellipsoids. Appl. Math., Praha 47, 5 (2002), 411-426.
References
957
Lesanska E. (2002): Optimization of the size of nonsensitive regions. Appl. Math., Praha 47, 1 (2002), 9-23. Leslie, M. (1990): Turbulence in fluids, 2nd ed., Kluwer publ., Dordicht 1990 Letac, G. and M. Mora (1990): Natural real exponential families with cubic variance functions, Ann. Statist. 18 (1990), 1-37 Lether, F.G. and P.R. Wentson (1995): Minimax approximations to the zeros of Pn (x) and Gauss-Legendre quadrature, J. Comput. Appl. Math. 59 (1995), 245-252 Levenberg, K. (1944): A method for the solution of certain non-linear problems in least-squares, Quaterly Appl. Math. 2 (1944) 164-168 Levin, J. (1976): A parametric algorithm for drawing pictures of solid objects composed of quadratic surfaces, Communications of the ACM 19 (1976), 555-563 Lewis, R.M. and V. Torczon (2000): Pattern search methods for linearly constrained minimization, SIAM J. Optim. 10 (2000), 917-941 Lewis, T.O. and T.G. Newman (1968): Pseudoinverses of positive semidefinite matrices, SIAM J. Appl. Math. 16 (1968), 701-703 Li, B.L. and C. Loehle (1995): Wavelet analysis of multiscale permeabilities in the subsurface, Geophys. Res. Lett. 22 (1995), 3123-3126 Li, C.K. and R. Mathias (2000): Extremal characterizations of the Schur complement and resulting inequalities, SIAM Review 42 (2000), 233-246 Liang, K. and K. Ryu (1996): Selecting the form of combining regressions based on recursive prediction criteria, in: Lee, C., Johnson, W.O. and A. Zellner (eds.): Modelling and prediction honoring Seymour Geisser, Springer-Verlag, Heidelberg Berlin New York 1996, 122-135 Liang, K.Y. and S.L. Zeger (1986): Longitudinal data analysis using generalized linear models, Biometrika 73 (1986), 13-22 Liang, K.Y. and S.L. Zeger (1995): Inference based on estimating functions in the presence of nuisance parameters, Statist. Sci. 10 (1995), 158-199 ¨ Lichtenegger, H. (1995): Eine direkte L¨osung des r¨aumlichen Bogenschnitts, Osterreichische Zeitschrift f¨ur Vermessung und Geoinformation 83 (1995) 224-226. Lilliefors, H.W. (1967): On the Kolmogorov-Smirnov test for normality with mean and variance unknown, J. Am. Statist. Ass.. 62 (1967), 399-402 Lim, Y.B., Sacks, J. and Studden, W.J. (2002): Design and analysis of computer experiments when the output is highly correlated over the input space. Canad J Stat 30:109-126 Lin, K.C. and Wang, J. (1995): Transformation from geocentric to geodetic coordinates using Newton’s iteration, Bulletin Geodesique 69 (1995) 300-303. Lin, X. and N.E. Breslow (1996): Bias correction in generalized linear mixed models with multiple components of dispersion, J. Am. Statist. Ass. 91 (1996), 1007-1016 Lin, X.H. (1997): Variance component testing in generalised linear models with random effects, Biometrika 84 (1997), 309-326 Lin, T. and Lee, J. (2007):.Estimation and prediction in linear mixed models with skew-normal random effects for longitudinal data. In Statistics in Medicine, published online Lindley, D.V. and A.F.M. Smith (1972): Bayes estimates for the linear model, J. Roy. Stat. Soc. 34 (1972), 1-41 Lindsey, J.K. (1997): Applying generalized linear models, Springer-Verlag, Heidelberg Berlin New York 1997 Lindsey, J.K. (1999): Models for repeated measurements, 2nd edition, Oxford University Press, Oxford 1999 Linke, J., Jurisch, R., Kampmann, G. and H. Runne (2000): Numerisches Beispiel zur Strukturanalyse von geod¨atischen und mechanischen Netzen mit latenten Restriktionen, Allgemeine Vermessungsnachrichten 107 (2000), 364-368 Linkwitz, K. (1961): Fehlertheorie und Ausgleichung von Streckennetzen nach der Theorie elastischer Systeme. Deutsche Geod¨atische Kommission, Reihe C, Nr. 46. Linkwitz, K. (1989): Bemerkungen zu Linearisierungen in der Ausgleichungsrechnung. In: Prof. Dr.Ing. Dr. h.c. Friedrich Ackermann zum 60. Geburtstag, Schriftenreihe des Institutes fiir Photogrammetrie, Heft 14.
958
References
Linkwitz, K. and Schek, H.J. (1971): Einige Bemerkungen zur Berechnung von vorgespannten Seilnetz-Konstruktionen. Ingenieur-Archiv, 40. Linnainmaa, S., Harwood, D. and Davis, L.S. (1988): Pose Determination of a Three-Dimensional Object Using Triangle Pairs, IEEE transaction on pattern analysis and Machine intelligence 105 (1988) 634-647. Linnik, J.V. and I.V. Ostrovskii (1977): Decomposition of random variables and vectors, Transl. Math. Monographs Vol. 48, American Mathematical Society, Providence 1977 Lippitsch, A. (2007): A deformation analysis method for the metrological ATLAS cavern network at CERN, Ph.D. thesis, Graz University of Technology, Graz/Austria/2007 Liptser, R.S. and A.N. Shiryayev (1977): Statistics of random processes, Vol. 1, Springer-Verlag, Heidelberg Berlin New York 1977 Liski, E.P. and S. Puntanen (1989): A further note on a theorem on the difference of the generalized inverses of two nonnegative definite matrices, Commun. Statist.-Theory Meth. 18 (1989), 1747-1751 Liski, E.P. and T. Nummi (1996): Prediction in repeated-measures models with engineering applications, Technometrics 38 (1996), 25-36 Liski, E.P., Luoma, A., Mandal, N.K. and B.K. Sinha (1998): Pitman nearness, distance criterion and optimal regression designs, Calcutta Statistical Ass. Bull. 48 (1998), 191-192 Liski, E.P., Mandal, N.K., Shah, K.R. and B.K. Sinha (2002): Topics in optimal design, Springer-Verlag, Heidelberg Berlin New York 2002 Lisle, R. (1992): New method of estimating regional stress orientations: application to focal mechanism of data of recent British earthquakes, Geophys. J. Int., 110,276.,282 Liu, J. (2000): MSEM dominance of estimators in two seemingly unrelated regressions, J. Statist. Planning and Inference 88 (2000), 255-266 Liu, S. (2000): Efficiency comparisons between the OLSE and the BLUE in a singular linear model, J. Statist. Planning and Inference 84 (2000), 191-200 Liu, X.-W. and Y.-X. Yuan (2000): A robust algorithm for optimization with general equality and inequality constraints, SIAM J. Sci. Comput. 22 (2000), 517-534 Ljung, L. (1979): Asymptotic behavior of the extended Kalman filter as a parameter estimator for linear systems, IEEE Trans. Auto. Contr. AC-24 (1979), 36-50 Lloyd, E.H. (1952): Least squares estimation of location and scale parameters using order statistics, Biometrika 39 (1952), 88-95 Lohse, P. (1990): Dreidimensionaler R¨uckw¨artsschnit. Ein Algorithmus zur Streckenberechnung ohne Hauptachsentransformation, Zeitschrift f¨ur Vermessungswesen 115 (1990) 162-167. Lohse, P. (1994): Ausgleichungsrechnung in nichtlinearen Modellen, Deutsche Geod. Kommission C 429, M¨unchen 1994 Lohse, P., Grafarend, E.W. and Schaffriin, B. (1989a): Dreidimensionaler Riickwartsschnitt. Teil IV, V, Zeitschrift fiir Vermessungswesen, 5, 225-234; 6, 278-287. Lohse, P., Grafarend, E.W. and Schaffriin, B. (1989b): Three-Dimensional Point Determination by Means of Combined Resection and Intersection. Paper presented at the Conference on 3-D Measurement Techniques, Vienna, Austria, September 18-20, 1989. Longford, N.T. (1993): Random coefficient models, Clarendon Press, Oxford 1993 Longford, N. (1995): Random coefficient models, Oxford University Press, 1995 Longley, J.W. and R.D. Longley (1997): Accuracy of Gram-Schmidt orthogonalization and Householder transformation for the solution of linear least squares problems, Numerical Linear Algebra with Applications 4 (1997), 295-303 Lord, R.D. (1948): A problem with random vectors, Phil. Mag. 39 (1948), 66-71 Lord, R.D. (1995): The use of the Hankel transform in statistics, I. General theory and examples, Biometrika 41 (1954), 44-55 Lorentz, G.G. (1966): Metric entropy and approximation, Bull. Amer. Math. Soc. 72 (1966), 903-937 Loskowski, P. (1991): Is Newton’s iteration faster than simple iteration for transformation between geocentric and geodetic coordinates? Bulletin Geodesique 65 (1991) 14-17.
References
959
Lu, S., Molz, F.J. and H.H. Liu (2003): An efficient three dimensional anisotropic, fractional Brownian motion and truncated fractional LEVY motion simulation algorithm based on successive random additions, Computers and Geoscience 29 (2003) 15-25 Ludlow, J. and W. Enders (2000): Estimating non-linear ARMA models using Fourier coefficients, International Journal of Forecasting 16 (2000), 333-347 Lumley, J.L. (1970): Stochastic tools in turbulence, Academic Press, New York 1970 Lumley, J.L. (1978): Computational modelling of turbulence, Advances in applied mechanics 18 (1978) 123-176 Lumley, J.L. (1983): Turbulence modelling, J. App. Mech. Trans. ASME 50 (1983) 1097-1103 Lund, U. (1999): Least circular distance regression for directional data, Journal of Applied Statistics 26 (1999), 723-733 L¨utkepol, H. (1996): Handbook of matrices, J. Wiley, New York 1996 Lyubeznik, G. (1995): Minimal resultant system, Journal of Algebra 177 (1995) 612-616. Ma, X.Q. and Kusznir, N.J. (1992): 3-D subsurface displacement and strain fields for faults and fault arrays in a layered elastic half-space, Geophys. J. Int., 111, 542-558 Mac Duffee, C.C. (1946): The theory of matrices (1933), reprint Chelsea Publ., New York 1946 Macaulay, F. (1902): On some formulae in elimination, Proceeding in London Mathematical Society pp. 3-27, 1902. Macaulay, F. (1916): The algebraic theory of modular systems, Cambridge Tracts in Mathematics 19, Cambridge University Press, Cambridge 1916. Macaulay, F. (1921): Note on the resultant of a number of polynomials of the same degree, Proceeding in London Mathematical Society 21 (1921) 14-21. Macinnes, C.S. (1999): The solution to a structured matrix approximation problem using Grassmann coordinates, SIAM J. Matrix Analysis Appl. 21 (1999), 446-453 Madansky, A. (1959): The fitting of straight lines when both variables are subject to error, J. Am. Statist. Ass. 54 (1959), 173-205 Madansky, A. (1962): More on length of confidence intervals, J. Am. Statist. Ass. 57 (1962), 586-589 Mader, G.L. (2003): GPS Antenna Calibration at the National Geodetic Survey. http://www. ngs.noaa.gov/ANTCAL/images/summary.pdf Maejima, M. (1978): Some Lp versions for the central limit theorem, Ann. Probability 6 (1978), 341-344 Maekkinen, J. (2002): A bound for the Euclidean norm of the difference between the best linear unbiased estimator and a linear unbiased estimator, Journal of Geodesy 76 (2002), 317-322 Maess, G. (1988): Vorlesungen u¨ ber numerische Mathematik II. Analysis, Birkh¨auser-Verlag, Basel Boston Berlin 1988 Magder, L.S. and S.L. Zeger (1996): A smooth nonparametric estimate of a mixing distribution using mixtures of Gaussians, J. Am. Statist. Ass. 91 (1996), 1141-1151 Magee, L. (1998): Improving survey-weighted least squares regression, J. Roy. Statist. Soc. B60 (1998), 115-126 Magness, T.A. and J.B. McGuire (1962): Comparison of least-squares and minimum variance estimates of regression parameters, Ann. Math. Statist. 33 (1962), 462-470 Magnus, J.R. and H. Neudecker (1988): Matrix differential calculus with applications in statistics and econometrics, J. Wiley, New York 1988 Mahalanobis, P. C. (1930): On tests and measures of group divergence, J. and Proc. Asiat. Soc. Beng., 26, 341-5S8. Mahalanobis, P. C. (1936): On the generalized distance in statistics, Proc. Nat. Inst. Sci. India, 2, 49-55. Mahalanobis, A. and M. Farooq (1971): A second-order method for state estimation of nonlinear dynamical systems, Int. J. Contr. 14 (1971), 631-639 Mahalanobis, P.C., Bose, R.C. and S.N. Roy (1937): Normalisation of statistical variates and the use of rectangular coordinates in the theory of sampling distributions, Sankhya 3 (1937), 1-40 Mallet, A. (1986): A maximum likelihood estimation method for random coefficient regression models, Biometrika 73 (1986), 645-656
960
References
Malliavin, P. (1997): Stochastic analysis, Springer-Verlag, Heidelberg Berlin New York 1997 Mallick, B., Denison, D. and Smith, A. (2000): Semiparamefric generalized linear models. Dey, D., Ghosh, S:ai1d MaJITck B.K. eds., Generalized linear models: A Bayesian perspective, Marcel Dekker, New York. Mallows, C.L. (1959): Latent roots and vectors of random matrices, Tech. Report 35, Statistical Tech. Research Group, Section of Math. Stat., Dept. of Math., Princeton University, Princeton, 1959 Mallows, C.L. (1961): Latent vectors of random symmetric matrices, Biometrika 48 (1961), 133-149 Malyutov, M.B. and R.S. Protassov (1999): Functional approach to the asymptotic normality of the non-linear least squares estimator, Statistics & Probability Letters 44 (1999), 409-416 Mamontov, Y. and M.Willander (2001): High-dimensional nonlinear diffusion stochastic processes, World Scientific, Singapore 2001 Mandel, J. (1994): The analysis of two-way layouts, Chapman and Hall, Boca Raton 1994 Mangoubi, R.S. (1998): Robust estimation and failure detection, Springer-Verlag, Heidelberg Berlin New York 1998 Manly, B.F. (1976): Exponential data transformation, The Statistican 25 (1976) 37-42 Manocha, D. (1992): Algebraic and Numeric techniques for modeling and Robotics, Ph.D. thesis, Computer Science Division, Department of Electrical Engineering and Computer Science, University of California, Berkeley 1992. Manocha, D. (1993): Efficient algorithms for multipolynomial resultant, the Computer Journal 36 (1993) 485-496. Manocha, D. (1994a): Algorithms for computing selected solutions of polynomial equations, Extended abstract appearing in the proceedings of the ACM ISSAC 94. Manocha, D. (1994b): Computing selected solutions of polynomial equations, Proceedings of the International Sypmosium on Symbolic and Algebraic Computations ISSAC, July 20-22, pp. 1-8, Oxford 1994. Manocha, D. (1994c): Solving systems of polynomial equations, IEEE Computer Graphics and application, pp. 46-55, March 1994. Manocha, D. (1998): Numerical methods for solving polynomial equations, Proceedings of Symposia in Applied Mathematics 53 (1998) 41-66. Manocha, D. and Canny, J. (1991): Efficient techniques for multipolynomial resultant algorithms, Proceedings of the International Symposium on Symbolic Computations, July 15-17, 1991, pp. 86-95, Bonn 1991. Manocha, D. and Canny, J. (1992): Multipolynomial resultant and linear algebra, Proceedings of the International Symposium on Symbolic and Algebraic Computations ISSAC, July 27-29, 1992, pp. 158-167, Berkeley 1992. Manocha, D. and Canny, J. (1993): Multipolynomial resultant algorithms, Journal of Symbolic Computations 15 (1993) 99-122. Mardia, K.V. (1962): Multivariate Pareto distributions, Ann. Math. Statistics 33 (1962), 1008-1015 Mardia, K.V. (1970a): Families of bivariate distributions, Griffin, London 1970 Mardia, K.V. (1970b): Measures of multivariate skewness and kurtosis with applications, Biometrika 57 (1970), 519-530 Mardia, K.V. (1972): Statistics of directional data, Academic Press, New York London 1972 Mardia, K.V. (1975): Characterization of directional distributions, Statistical Distributions, Scientific Work 3 (1975), G.P. Patil et al. (eds.), 365-385 Mardia, K.V. (1975): Statistics of directional data, J. Roy. Statist. Soc. B37 (1975), 349-393 Mardia, K.V. (1976): Linear-circular correlation coefficients and rhythmometry, Biometrika 63 (1976), 403-405 Mardia, K.V. (1988): Directional data analysis: an overview, J. Applied Statistics 2 (1988), 115-122 Mardia, K.V. and M.L. Puri (1978): A robust spherical correlation coefficient against scale, Biometrika 65 (1978), 391-396
References
961
Mardia, K.V., Kent, J. and J.M. Bibby (1979): Multivariate analysis, Academic Press, New York London 1979 Mardia, K.V., Southworth, H.R. and C.C. Taylor (1999): On bias in maximum likelihood estimators, J. Statist. Planning and Inference 76 (1999), 31-39 Mardia, K.V. and P.E. Jupp (1999): Directional statistics, J. Wiley, New York 1999 Marinkovic, P, Grafarend, E. and T. Reubelt (2003): Space gravity spectroscopy: the benefits of Taylor-Karman structured criterion matrices, Advances in Geosciences 1 (2003), 113-120 Maritz, J. S. (1953): Estimation of the correlation coefficient in the case of a bivariate normal population when one of the variables is dichotomized, Psychometrika, 18, 97-110. Mariwalla, K.H. (1971): Coordinate transformations that form groups in the large, in: De Sitter and Conformal Groups and their Applications, A.O. Barut and W.E. Brittin (Hrsg.), Vol. 13, 177-191, Colorado Associated University Press, Boulder 1971 Markatou, M. and X. He (1994): Bounded influence and high breakdown point testing procedures in linear models, J. Am. Statist. Ass. 89, 543-49, 1994 Markiewicz, A. (1996): Characterization of general ridge estimators, Statistics & Probability Letters 27 (1996), 145-148 Markov, A.A. (1912): Wahrscheinlichkeitsrechnung, 2nd edition, Teubner, Leipzig 1912 Maroˇssevi`ec, T. and D. Juki`ec (1997): Least orthogonal absolute deviations problem for exponential function, Student 2 (1997), 131-138 Marquardt, D.W. (1963): An algorithm for least-squares estimation of nonlinear parameters, J. Soc. Indust. Appl. Math. 11 (1963), 431-441 Marquardt, D.W. (1970): Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation, Technometrics 12 (1970), 591-612 Marriott, J. and P. Newbold (1998): Bayesian comparison of ARIMA and stationary ARMA models, Journal of Statistical Review 66 (1998), 323-336 Marsaglia, G. and G.P.H. Styan (1972): When does rank (ACB) D rank A C rank B?, Canad. Math. Bull. 15 (1972), 451-452 Marsaglia G. and G.P.H. Styan (1974a): Equalities and inequalities for ranks of matrices. Linear Multilinear Algebra 2 (1974), 269-292. Marsaglia G. and G.P.H. Styan (1974b): Rank conditions for generalized inverses of partitioned matrices. Sankhya, Ser. A 36 (1974), 437-442. Marshall, J. (2002): L1-norm pre-analysis measures for geodetic networks, Journal of Geodesy 76 (2002), 334-344 Martinec, Z. (2002): Lecture Notes 2002. Scalar surface spherical harmonics, GeoForschungsZentrum Potsdam 2002 Maruyama, Y. (1998): Minimax estimators of a normal variance, Metrika 48 (1998), 209-214 Masry, E. (1997): Additive nonlinear ARX time series and projection estimates, Econometric Theory 13 (1997), 214-252 Mastronardi, N., Lemmerling, P. and S. van Huffel (2000): Fast structured total least squares algorithm for solving the basic deconvolution problem, Siam J. Matrix Anal. Appl. 22 (2000), 533-553 Masuyama, M. (1939): Correlation between tensor quantities, Proc. Phys.-Math. Soc. Japan, (3): 21, 638-647 Masuyama, M. (1939b): Tensor characteristic of vector set,-Proc. Phys.-Math. Soc. Japan (3): 21-648 Mathai, A.M. (1997): Jacobians of matrix transformations and functions of matrix arguments, World Scientific, Singapore 1997 Mathai, A.M. and Provost, S.B. (1992): Quadratic Forms in Random Variables. Marcel Dekker, New York, 367p Mathar, R. (1997): Multidimensionale Skalierung, B. G. Teubner Verlag, Stuttgart 1997. Matheron, G. (1970): The theory of regionalized variables and its applications. Fascicule 5, Les Cahiers du Centre de Morphologie Mathematique. Ecole des Mines de Paris, Fontainebleau, 211 p
962
References
Matheron, G. (1993): The intrinsic random functions and their applications, Adv. App. Prob. 5 (1973) 439-468 Mathes, A. (1998): GPS und GLONASS als Teil eines hybrid Meßsystems in der geod¨asie am Beispiel des Systems HIGGINS, Dissertationen, DGK, Reihe C, Nr. 500. Mathew, T. (1989): Optimum invariant tests in mixed linear models with two variance components, Statistical Data Analysis and Inference, Y. Dodge (ed.), North-Holland 1989, 381-388 Mathew, T. (1997): Wishart and Chi-Square Distributions Associated with Matrix Quadratic Forms, J. Multivar. Anal. 61 (1997), 129-143 Mathew, T. and B.K. Sinha (1988): Optimum tests in unbalanced two-way models without interaction, Ann. Statist. 16 (1988), 1727-1740 Mathew, T. and S. Kasala (1994): An exact confidence region in multivariate calibration, The Annals of Statistics 22 (1994), 94-105 Mathew, T. and W. Zha (1996): Conservative confidence regions in multivariate calibration, The Annals of Statistics 24 (1996), 707-725 Mathew, T. and K. Nordstr¨om (1997): An inequality for a measure of deviation in linear models, The American Statistician 51 (1997), 344-349 Mathew, T. and W. Zha (1997): Multiple use confidence regions in multivariate calibration, J. Am. Statist. Ass. 92 (1997), 1141-1150 Mathew, T. and W. Zha (1998): Some single use confidence regions in a multivariate calibration problem, Applied Statist. Science III (1998), 351-363 Mathew, T., Sharma, M.K. and K. Nordstr¨om (1998): Tolerance regions and multiple-use confidence regions in multivariate calibration, The Annals of Statistics 26 (1998), 1989-2013 Mathias, R. and G.W. Stewart (1993): A block QR algorithm and the singular value decomposition, Linear Algebra Appl. 182 (1993), 91-100 Matz, G., Hlawatsch, F. and Kozek, W. (1997): Generalized evolutionary spectral analysis and the Weyl-spectrum of non stationary random processes. In IEEE Trans. Signal. Proc., vol. 45, no. 6, pp. 1520-1534. Mauly, B.F.J. (1976): Exponential data transformations, Statistician 27 (1976), 37-42 Mautz, R. (2001): Zur L¨osung nichtlinearer Ausgleichungsprobleme bei der Bestimmung von Frequenzen in Zeitreihen, DGK, Reihe C, Nr. 532. Mautz, R. (2002): Solving nonlinear adjustment problems by global optimization, Boll. di Geodesia e Scienze Affini 61 (2002), 123-134 Maxwell, S.E. (2003): Designig experiments and analyzing data. A model comparison perspective, Lawrence Erlbaum Associates, Publishers, London New Jersey 2003 Maybeck, P. (1979): Stochastic models, estimation, and control, Vol. 1, Academic Press, New York London 1979 Maybeck, P. (1982): Stochastic models, estimation and control, Vol. 1 and 2, Academic Press, Washington 1982 Mayer, D.H. (1975): Vector and tensor fields on conformal space, J. Math. Physics 16 (1975), 884-893 McComb, W.D. (1990): the physics of fluid turbulence, Clarendon Press, 572 pages, Oxford 1990 McCullagh, P. (1983): Quasi-likelihood functions, The Annals of Statistics 11 (1983), 59-67 McCullagh, P. (1987): Tensor methods in statistics, Chapman and Hall, London 1987 McCullagh, P. and Nelder, J.A. (1989): Generalized linear models, Chapman and Hall, London 1989 McCullagh, P. and Nelder, L. (2006): Generalized linear models. Chapman and Hall, CRC, New York McCulloch, C.E. (1997): Maximum likelihood algorithms for generalized linear mixed models, J. Am. Statist. Ass. 92 (1997), 162-170 McCulloch, C.E. and S.R. Searle (2001): Generalized, linear, and mixed models, J. Wiley, New York 2001 McElroy, F.W. (1967): A necessary and sufficient condition that ordinary least-squares estimators be best linear unbiased, J. Am. Statist. Ass. 62 (1967), 1302-1304
References
963
McGaughey, D.R. and G.J.M Aitken (2000): Statistical analysis of successive random additions for generating fractional Brownian motion, Physics A 277 (2000) 25-34 McGilchrist, C.A. (1994): Estimation in generalized mixed models, J. Roy. Statist. Soc. B56 (1994), 61-69 McGilchrist, C.A. and C.W. Aisbett (1991): Restricted BLUP for mixed linear models, Biometrical Journal 33 (1991), 131-141 McGilchrist, C.A. and K.K.W. Yau (1995): The derivation of BLUP, ML, REML estimation methods for generalised linear mixed models, Commun. Statist.-Theor. Meth. 24 (1995), 2963-2980 McMorris, F.R. (1997): The median function on structured metric spaces, Student 2 (1997), 195-201 Meditch, J.S. (1969): Stochastic optimal linear estimation and control, McGraw Hill, New York 1969 Meetz, K. and W.L. Engl (1980): Electromagnetic field, Springer Mehta, M.L. (1967): Random/matrices and the statistical theory of energy levels, Academic Press, New York Mehta, M.L. (1991): Random matrices, Academic Press, New York London 1991 Meidow, J., Beder, C. and Forstner, W. (2009): Reasoning with uncertain points, straight lines, and straight line segments in 20. ISPRS Journal of Photogrammetry and Remote Sensing, 64. Jg. 2009, Heft: 2, S. 125-139. Meidow, J., F¨orstner, W. and C. Beder (2009): Optimal parameter estimation with homogeneous entities and arbitrary constraints, pappers recognition, proc. 31st DAGM Sym. , Jena, Germany, pages 292-301, Springer Verlag, Berlin-Heidelberg 2010 Meier, S. (1967): Die Terrestriche Refraktion in Kongsfjord (West Spitzbergen), Geod. Geophys. Vertiff., Reihe 3, pp. 32-51, Berlin 1967 Meier, S. (1970): Beitrdge zur Refraktion in Kohen Breiten, Geod. Geophys. Veroff., Reihe, Berlin 1970 Meier, S. (1975): Zur Geninigkert der Trigonometrischen Hohen messung u¨ ber Eis bei stabiler Temperatur and Refrektions schichtung, Vermessungstechnik 23 (1975) 382-385 ¨ Meier, S. (1976): Uber die Abhungigkeit der Refraktions schwankung von der Zeileite, vermessungstechnik 24 (1976) 378-382 Meier, S. (1977): Zur Refraktions schwankung in der turbulenten unterschicht der Atmosphere Vermassungstechnik 25 (1977) 332-335 Meier, S. (1978): Stochastische Refreaktions modelle, Geod. Geophys. Reihe3, 40 (1978) 63-73 Meier, S. (1978): Refraktions schwankung and Turbilenz, Geod. Geophy. Reihe3, 41, (1978) 26-27 Meier, S. (1981): Planar geodetic covariance functions , Rev. of Geophysics and Space Physics 19 (1981) 673-686 Meier, S. (1982): Stochastiche Refraktions modell f¨ur Zielstrahlen in dreidimensionalen Euklischen Raum Vermessungstechnik 30 (1982) 55-57 Meier, S. (1982): Stochastiche Refraktions modell f¨ur Zielstrahlen in eiven Vertikalchene der Freien Atmosphere Vermessungstechnik 30 (1982) 55-57 Meier, S. (1982): Varianz and Erhaltungs neiyung der lokalen Lichtstrahlkrummung in Bodennahe, Vermessungstechnik 30 (1982) 420-423 Meier, S. and W. Keller (1990): Geostatistik, Einfuhrung in der Theorie der Zufulls prozesse, Springer Verley, Wein New York 1990 Meinhold, P. und E. Wagner (1979): Partielle Differentialgleichungen. Frankfurt/Main. Meissl, P. (1965): u¨ ber die Innere Genauigkeit dreidimensionaler Punkthaufen, Z. Vermessungswesen 90 (1965), 198-118 Meissl, P. (1969): Zusammenfassung und Ausbau der inneren Fehlertheorie eines Punkthaufens, Deutsche Geod. Kommission A61, M¨unchen 1994, 8-21 Meissl, P. (1970): Ober die Fehlerfortpflanzung in gewissen regelmaBigen flachig ausgebreiteten Nivellementnetzen. Zeitschrift f¨ur Vermessungswesen 2i, 103-109. Meissl, P. (1976): Hilbert spaces and their application to geodetic least-squares problems, Bolletino di Geodesia e Scienze Affini 35 (1976), 49-80
964
References
Meissl, P. (1982): Least squares adjustment. A modern approach, Mitteilungen der geod¨atischen Institut der Technischen Universit¨at Craz, Folge 43. Melas, V.B. (2006): Functional approach to optimal experimental design. Springer, Berlin Melbourne, W. (1985): The case of ranging in GPS-based geodetic systems, Proc.1st Int.Symp. on Precise Positioning with GPS, Rockville, Maryland (1985), 373-386 Menz, J. (2000): Forschungsergebnisse zur Geomodellierung und deren Bedeutung, Manuskript 2000 Menz, J. and N. Kolesnikov (2000): Bestimmung der Parameter der Kovarianzfunktionen aus den Differenzen zweier Vorhersageverfahren, Manuskript, 2000 Merriman, M. (1877): On the history of the method of least squares, The Analyst 4 (1877) Merriman, M. (1884): A textbook on the method of least squares, J. Wiley, New York 1884 Merrit, E. L. (1949): Explicit Three-point resection in space, Phot. Eng. 15 (1949) 649-665. Meyer, C.D. (1973): Generalized inverses and ranks of block matrices, SIAM J. Appl. Math. 25 (1973), 597-602 Mhaskar, H.N., Narcowich, F.J and J.D. Ward (2001): Representing and analyzing scattered data on spheres, In: Multivariate Approximations and Applications, Cambridge University Press, Cambridge 2001, 44-72 Mierlo van, J. (1980): Free network adjustment and S-transformations, Deutsche Geod. Kommission B252, M¨unchen 1980, 41-54 Mierlo van, J. (1982): Difficulties in defining the quality of geodetic networks. Proceedings Survey Control Networks, ed. Borre, K., Welsch, W.M., Vol. 7, pp. 259-274. Schriftenreihe Wiss. Studiengang Vermessungswesen, Hochschule der Bundeswehr, M¨unchen. Migon, H.S. and D. Gammermann (1999): Statistical inference, Arnold London 1999 Milbert, D. (1979): Optimization of Horizontal Control Networks by nonlinear Programming. NOAA, Technical Report NOS 79 NGS 12, Rockville Millar, A.M. and Hamilton, D.C. (1999): Modern outlier detection methods and their effect on subsequent inference. J. Stat. Comput. Simulation 64, 2 (1999), 125-150. Miller, R.G. (1966): Simultaneous statistical inference, Mc Graw-Hill Book Comp., New York 1966 Mills, T.C. (1991): Time series techniques for economists, Cambridge University Press, Cambridge 1991 Minzhong, J. and C. Xiru (1999): Strong consistency of least squares estimate in multiple regression when the error variance is infinite, Statistica Sinica 9 (1999), 289-296 Minkler, G. and J. Minkler (1993): Theory and application of Kalman filtering, Magellan Book Company 1993 Minoux, M. (1986): Mathematical Programming. John Wiley & Sons, Chicester. Mirsky, C. (1960): Symmetric gauge functions and unitarily invariant norms, Quarterly Journal of Mathematics, 11 (1960), pp. 50-59 Mises, von R. (1918): Ueber die Ganzzahligkeit der Atomgewichte und verwandte Fragen, Phys.Z. 19 (1918) 490-500 Mises von, R. (1945): On the classification of observation data into distinct groups, Annals of Mathematical Statistics, 16, 68-73. Misra, P. and Enge, P. (2001): Global Positioning System-Signals, Measurement, and Performance. Ganga-Jamuna Press, Lincoln, Massachusetts, 2001 Misra, R.K. (1996): A multivariate procedure for comparing mean vectors for populations with unequal regression coefficient and residual covariance matrices, Biometrical Journal 38 (1996), 415-424 Mitchell, T.J. and Morris, M.D. (1992): Bayesian design and analysis of computer experiments: two examples. Stat Sinica 2:359-379 Mitra, S.K. (1971): Another look at Rao’s MINQUE of variance components, Bull. Inst. Inernat. Stat. Assoc., 44 (1971), pp. 279-283 Mitra, S.K. (1982): Simultaneous diagonalization of rectangular matrices, Linear Algebra Appl. 47 (1982), 139-150
References
965
Mitra, S.K. and Rao, C.R. (1968): Some results in estimation and tests of linear h-ypotheses under the Gauss-Markov model. Sankhya, Ser. A, 30 (1968), 281-290. Mittermayer, E. (1972): A generalization of least squares adjustment of free networks, Bull. Geod. No. 104 (1972) 139-155. Moenikes, R., Wendel, J. and Trommer, G.F. (2005): A modified LAMBDA method for ambiguity resolution in the presence of position domain constraints. In: Proceedings ION GNSS2005. Long Beach Mohan, S.R. and S.K. Neogy (1996): Algorithms for the generalized linear complementarity problem with a vertical block Z-matrix, Siam J. Optimization 6 (1996), 994-1006 Moire, C. and J.A. Dawson (1992): Distribution, Chapman and Hall, Boca Raton 1996 Molenaar, M. (1981): A further inquiry into the theory of S - transformations and criterion matrices. Neth. Geod. Comm., Publ. on Geodesy. New Series, Vol. 6, Nr. 1, Delft M¨oller, H. M. (1998): Gr¨obner bases and numerical analysis, In Groebner bases and applications. Eds. B. Buchberger and F. Winkler, London mathematical society lecture note series 251. Cambridge university press, pp. 159-177, 1998. Molz, F.J., Liu, H.H. and Szulga, J. (1987): Fractional Brownian motion and fractional Gaussian noise in subsurface hydrology: A review, presentation of fundamental properties and extensions, Water Resour. Res., 33(10), 2273-2286z Money, A.H. et al. (1982): The linear regression model: Lp-norm estimation and the choice of p, Commun. Statist. Simul. Comput. 11 (1982), 89-109 Monin, A.S. and A.M. Yaglom (1981): Statistical fluid mechanics: mechanics of turbulence, Vol. 2, The Mit Press, Cambridge 1981 Monin, A.S. and A.M. Yaglom (1971): Statistical fluid mechanics, Vol I, 769 pages, MIT Press, Cambridge/Mass 1971 Monin, A.S. and A.M. Yaglom (1975): Statistical fluid mechanics, Vol II, 874 pages, pages, MIT Press, Cambridge/Mass 1975 Montfort van, K. (1989): Estimating in structural models with non-normal distributed variables: some alternative approaches, ’Reprodienst, Subfaculteit Psychologie’, Leiden 1989 Montgomery, D.C. (1996): Introduction to statistical quality control, 3rd edition, J. Wiley, New York 1996 Montgomery, D.C., Peck, E.A. and Vining, G.G. (2001): Introduction to linear regression analysis. John Wiley & Sons, Chichester, 2001 (3rd ed). Mood, A.M., Graybill, F.A. and D.C. Boes (1974): Introduction to the theory of statistics, 3rd ed., McGraw-Hill, New York 1974 Moon, M.S. and R.F. Gunst (1994): Estimation of the polynomial errors-in-variables model with decreasing error variances, J. Korean Statist. Soc. 23 (1994), 115-134 Moon, M.S. and R.F. Gunst (1995): Polynomial measurement error modeling, Comput. Statist. Data Anal. 19 (1995), 1-21 Moore, E.H. (1900): A fundamental remark concerning determinantal notations with the evaluation of an important determinant of special form, Ann. Math. 2 (1900), 177-188 Moore, E.H. (1920): On the reciprocal of the general algebraic matrix, Bull. Amer. Math. Soc. 26 (1920), 394-395 Moore, E.H. (1935): General Analysis, Memoirs American Philosophical Society, pp. 147-209, New York Moore, E.H. and M.Z. Nashed (1974): A general approximation theory of generalized inverses of linear operators in Banoch spaces, SIAM J. App. Math. 27 (1974) Moors, J.J.A., and van Houwelingen, V.C. (1987): Estimation of linear models with inequality restrictions, Tilburg University Research report FEW 291, The Netherlands Morgan, B.J.T. (1992): Analysis of quantal response data, Chapman and Hall, Boca Raton 1992 Morgan, A.P. (1992): Polynomial continuation and its relationship to the symbolic reduction of polynomial systems, In Symbolic and Numerical Computations for Artificial Intelligence, pp. 23-45, 1992.
966
References
Morgenthaler, S. (1992): Least-absolute-deviations fits for generalized linear models, Biometrika 79 (1992), 747-754 Morgera, S. (1992): The role of abstract algebra in structured estimation theory, IEEE Trans. Inform. Theory 38 (1992), 1053-1065 Moritz, H. (1972): Advanced least - squares estimation. The Ohio State University, Department of Geodetic Science, Report No. 175, Columbus.. Moritz, H. (1973): Least-squares collocation. DGK, A 59, Muenchen Moritz, H. (1976): Covariance functions in least-squares collocation, Rep. Ohio State Uni. 240 Moritz, H. (1978): The operational approach to physical geodesy. The Ohio State University, Department of Geodetic Science, Report 277, Columbus Moritz, H. (1980): Advanced physical geodesy. Herbert Wichmann Verlag Karlsruhe Moritz, H. and Suenkel, H. (eds) (1978): Approximation methods in geodesy. Sammlung Wichmann Neue Folge, Band 10, Herbert Wichmann Verlag Morris, C.N. (1982): Natural exponential families with quadratic variance functions, Ann. Statist. 10 (1982), 65-80 Morris, M.D. and Mitchell, T.J. and Ylvisaker, D. (1993): Bayesian design and analysis of computer experiments: use of derivatives in surface prediction. Technometrics 35:243-255 Morrison, D.F. (1967): Multivariate statistical methods, Mc Graw-Hill Book Comp., New York 1967 Morrison, T.P. (1997): The art of computerized measurement, Oxford University Press, Oxford 1997 Morsing, T. and C. Ekman (1998): Comments on construction of confidence intervals in connection with partial least-squares, J. Chemometrics 12 (1998), 295-299 Moser, B.K. and J.K. Sawyer (1998): Algorithms for sums of squares and covariance matrices using Kronecker Products, The American Statistician 52 (1998), 54-57 Mount, V.S. and Suppe J. (1992): Present-day stress orientations adjacent to active strike-slip faults: California and Sumatra, J. geophys. Res., B97, 11995-12013 Moutard, T. (1894): Notes sur la propagation des ondes et les e` equations de l’hydroudynamique, Paris 1893, reprint Chelsea Publ., New York 1949 Mudholkar, G.S. (1997): On the efficiencies of some common quick estimators, Commun. Statist.-Theory Meth. 26 (1997), 1623-1647 Mukhopadhyay, P. and R. Schwabe (1998): On the performance of the ordinary least squares method under an error component model, Metrika 47 (1998), 215-226 Mueller, C.H. (1997): Robust planning and analysis of experiments, Springer-Verlag, Heidelberg Berlin New York 1997 Mueller, C.H. (1998): Breakdown points of estimators for aspects of linear models, in: MODA 5 - Advances in model-oriented data analysis and experimental design, Proceedings of the 5th International Workshop in Marseilles, eds. Atkinson, A.C., Pronzato, L. and H.P. Wynn, Physica-Verlag, Heidelberg 1998 Mueller, H. (1985): Second-order design of combined linear-angular geodetic networks, Bull. Geod`eesique 59 (1985), 316-331 Muller, B., Zoback, M.L., Fuchs, K., Mastin, L., Geregersen, S., Pavoni, N., Stephansson, O. and Ljunggren, C. (1992): Regional patterns of tectonic stress in Europe, J. geophys. Res., B97, 11783-11803 Muller, D.E. (1956): A method for solving algebraic equations using an automatic computer. Mathematical Tables and other Aids to Computation, Vol. X, No. 56, 208-215. M¨uller, F.J. (1925): Direkte (Exakte) L¨osungen des einfachen R¨uckwarscnittseinschneidens im Raum. 1 Teil, Zeitschrift f¨ur Vermessungswesen 37 (1925) 249-255, 265-272, 349-353, 365-370, 569-580. M¨uller, H. and M. Illner (1984): Gewichtsoptimierung geod¨atischer Netze. Zur Anpassung von Kriteriummatrizen bei der Gewichtsoptimierung, Allgemeine Vermessungsnachrichten (1984), 253-269
References
967
M¨uller, H. and G. Schmitt (1985): SODES2 - Ein Programm-System zur Gewichtsoptimierung zweidimensionaler geod¨atischer Netze. Deutsche Geod¨atische Kommission, M¨unchen, Reihe B, 276 (1985) M¨uller, W.G. (2005): A comparison of spatial design methods for correlated observations. Environmetrics 16:495-505 M¨uller, W.G. and Pazman, A. (1998): Design measures and approximate information matrices for experiments without replications. J Stat Plan Inf 71:349-362 M¨uller, W.G. and Pazman, A. (1999): An algorithm for the computation of optimum designs under a given covariance structure. J Comput Stat 14:197-211 M¨uller, W.G. and Pazman, A. (2003): Measures for designs in experiments with correlated errors. Biometrika 90:423-434 M¨uller, P., Sanso, B. and De Iorio, M. (2004): Optimal Bayesian design by inhomogeneous Markov Chain simulation. J Am Stat Assoc 99: 788-798 Mueller, J. (1987): Sufficiency and completeness in the linear model, J. Multivar. Anal. 21 (1987), 312-323 Mueller, J., Rao, C.R. and B.K. Sinha (1984): Inference on parameters in a linear model: a review of recent results, in: Experimental design, statistical models and genetic statistics, K. Hinkelmann (ed.), Chap. 16, Marcel Dekker, New York 1984 ¨ Mueller, W. (1995): An example of optimal design for correlated observations, Osterreichische Zeitschrift f¨ur Statistik 24 (1995), 9-15 Mueller, W. (1998): Spatial data collection, contributions to statistics, Physica Verlag, Heidelberg 1998 Mueller, W. (1998): Collecting spatial data - optimum design of experiments for random fields, Physica-Verlag, Heidelberg 1998 Mueller, W. (2001): Collecting spatial data - optimum design of experiments for random fields, 2nd ed., Physica-Verlag, Heidelberg 2001 Mueller, W. and A. P`eazman (1998): Design measures and extended information matrices for experiments without replications, J. Statist. Planning and Inference 71 (1998), 349-362 Mueller, W. and A.P`eazman (1999): An algorithm for the computation of optimum designs under a given covariance structure, Computational Statistics 14 (1999), 197-211 Mueller-Gronbach, T. (1996): Optimal designs for approximating the path of a stochastic process, J. Statist. Planning and Inference 49 (1996), 371-385 Muir, T. (1911): The theory of determinants in the historical order of development, volumes 1-4, Dover, New York 1911, reprinted 1960 Muirhead, R.J. (1982): Aspects of multivariate statistical theory, J. Wiley, New York 1982 Mukherjee, K. (1996): Robust estimation in nonlinear regression via minimum distance method, Mathematical Methods of Statistics 5 (1996), 99-112 Mukhopadhyay, N. (2000): Probability and statistical inference, Dekker, New York 2000 Muller, C. (1966): Spherical harmonics - Lecture notes in mathematics 17 (1966), Springer-Verlag, Heidelberg Berlin New York, 45 pp. Muller, D. and Wei, W.W.S. (1997): Iterative least squares estimation and identification of the tranfer function model, J. Time Series Analysis 18 (1997), 579-592 Mundt, W. (1969): Statistische Analyse geophysikalischer Potentialfelder hinsichtlich Aufbau und Struktur der tieferen, Erdkruste, Deutsche Akademic der Wissen schaften, Berlin 1969 Munoz-Pichardo, J.M., Munoz-Garc`e`ıa, J., Fern`eandez-Ponce, J.M. and F. L`eopez-Bl`eaquez (1998): Local influence on the general linear model, Sankhya: The Indian Journal of Statistics 60 (1998), 269-292 Musyoka, S. M. (1999): Ein Modell f¨ur ein vierdimensionales integriertes regionales geod¨atisches Netz, Ph. D. thesis, Department of Geodesy, University of Karlsruhe (Internet Publication), 1999. Murray, F.J. and J. von Neumann (1936): On rings of operator I, Ann. Math. 37 (1936), 116-229 Murray, J.K. and J.W. Rice (1993): Differential geometry and statistics, Chapman and Hall, Boca Raton 1993 Myers, J.L. (1979): Fundamentals of experimental designs, Allyn and Bacon, Boston 1979
968
References
Naes, T. and H. Martens (1985): Comparison of prediction methods for multicollinear data, Commun. Statist. Simulat. Computa. 14 (1985), 545-576 Nagaev, S.V. (1979): Large deviations of sums of independent random variables, Ann. Probability 7 (1979), 745-789 Nagar, A.L. and N.C. Kakwani (1964): The bias and moment matrix of a mixed regression estimator, Econometrica, 32 (1964), pp. 174-182 Nagar, A.L. and N.C. Kakwani (1969): Note on the use of prior information in statistical estimation of econometrics relation, Sankhya series A 27, (1969), pp. 105-112 Nagar, D.K. and A.K. Gupta (1996): On a test statistic useful in Manova with structured covariance matrices, J. Appl. Stat. Science 4 (1996), 185-202 Nagaraja, H.N. (1982): Record values and extreme value distributions, J. Appl. Prob. 19 (1982), 233-239 Nagarsenker, B.N. (1977): On the exact non-null distributions of the LR criterion in a general MANOVA model, Sankya A39 (1977), 251-263 Nakamura, N. and S.Konishi (1998): Estimation of a number of components for multivariate normal mixture models based on information criteria, Japanese J. Appl. Statist. 27 (1998), 165-180 Nakamura, T. (1990): Corrected score function for errors-invariables models: methodology and application to generalized linear models, Biometrika 77 (1990), 127-137 Namba, A. (2001): MSE performance of the 2SHI estimator in a regression model with multivariate t error terms, Statistical Papers 42(2001), 82-96 Nashed, M.Z. (1970): Steepest descent for singular linear operator equations, SIAM J. Number. Anal. 7 (1970), pp. 358-362 Nashed, M.Z. (1971): Generalized inverses, normal solvability, and iteration for singular operator equations pp. 311-359 in Nonlinear Functional Analysis and Applications (L. B. Rail, editor), Academic Press, New York, 1971 Nashed, M.Z. (1974): Generalized Inverses and Applications, Academic Press, New York 1974 Nashed, M.Z. (1976): Generalized inverses and applications, Academic Press, New York London 1976 N¨ather, W. (1985): Exact designs for regression models with correlated errors, Statistics 16 (1985), 479-484 N¨ather, W. and W¨alder, K. (2003): Experimental design and statistical inference for cluster point processeswith applications to the fruit dispersion of Anemochorous forest trees, Biometrical J 45, pp. 1006-1022, 2003 N¨ather, W. and W¨alder, K. (2007): Applying fuzzy measures for considering interaction effects in root dispersal models in Fuzzy Sets and Systems 158; S. 572-582, 2007 Nelder, J.A. and R.W.M. Wedderburn (1972): Generalized linear models, J. Roy. Stat. Soc., Series A 35, (1972), pp. 370-384 Nelson, C.R. (1973): Applied time series analysis for managerial forecasting, Holden Day, San Francisco, 1973 Nelson, R. (1995): Probability, stochastic processes, and queueing theory, Springer-Verlag, Heidelberg Berlin New York 1995 Nesterov, Y.E. and A.S. Nemirovskii (1992): Interior point polynomial methods in convex programming, Springer-Verlag, Heidelberg Berlin New York 1992 Nesterov, Y.E. and A.S. Nemirovskii (1994): Interior point polynomial algorithms in convex programming, SIAM Vol. 13, Pennsylvania 1994 Neudecker, H. (1968): The Kronecker matrix product and some of its applications in econometrics, Statistica Neerlandica 22 (1968), 69-82 Neudecker, H. (1969): Some theorems on matrix differentiation with special reference to Kronecker matrix products, J. Am. Statist. Ass. 64 (1969), 953-963 Neudecker, H. (1978): Bounds for the bias of the least squares estimator of 2 in the case of a firstorder autoregressive process (positive autocorrelation), Econometrica 45 (1977), 1257-1262 Neudecker, H. (1978): Bounds for the bias of the LS estimator of 2 in the case of a first-order (positive) autoregressive process when the regression contains a constant term, Econometrica 46 (1978), 1223-1226
References
969
Neumaier, A. (1998): Solving ill-conditioned and singular systems: a tutorial on regularization, SIAM Rev. 40 (1998), 636-666 Neumann, J. (2009): Zur modellierung cines erweitertur unischerheits hanshaltes in Parameter Schetzung and hyphothesen test, Bayer, Akad. Wissen Schaften, Deutsche Geodeitische Kommission Reiche C634, M¨unchen 2009 Neumann, J. and H. Kutterer (2007): Adaptative Kalman Filtering mit imprazisen Daten, in: F. K. Brunner (ed.) Ingenienrvermessung, pages 420-424, Wickmann Verlag, Heidelberg 2007 Neuts, M.F. (1995): Algorithmic probability, Chapman and Hall, Boca Raton 195 Neutsch, W. (1995): Koordinaten: Theorie und Anwendungen, Spektrum Akademischer Verlag, Heidelberg 1995 Neway, W.K. and J.K. Powell (1987): Asymmetric least squares estimation and testing, Econometrica 55 (1987), 819-847 Newman, D. (1939): The distribution of range in samples from a normal population, expressed in terms of an independent estimate of standard deviation, Biometrika 31 (1939), 20-30 Neykov, N.M. and C.H. Mueller (2003): Breakdown point and computation of trimmed likelihood estimators in generalized linear models, in: R. Dutter, P. Filzmoser, U. Gather, P.J. Rousseeuw (eds.), Developments in Robust Statistics, 277-286, Physica Verlag, Heidelberg 2003 Neyman, J. (1934): On the two different aspects of the representative method, J. Roy. Statist. Soc. 97 (1934), 558-606 Neyman, J. (1937): Outline of the theory of statistical estimation based on the classical theory of probability, Phil. Trans. Roy. Soc. London 236 (1937), 333-380 Neytchev P. N. (1995): SWEEP operator for least-squares subject to linear constraints. Comput. Stat. Data Anal. 20, 6 (1995), 599-609. Nicholson, W.K. (1999): Introduction to abstract algebra, 2nd ed., J. Wiley, New York 1999 Niemeier, W. (1985): Deformationsanalyse, in: Geod¨atische Netze in Landes- und Ingenieurvermessung II, Kontaktstudium, ed. H. Pelzer, Wittwer, Stuttgart 1985 Nobre, S. and M. Teixeiras (2000): Der Geod¨at Wilhelm Jordan und C.F. Gauss, GaussGesellschaft e.V. Goettingen, Mitt. 38, 49-54 Goettingen 2000 Nyquist, H. (1988): Least orthogonal absolute deviations, Comput. Statist. Data Anal. 6 (1988), 361-367 Nyquist, H., Rice, S.O. and J. Riordan ( l959): The distribution of random determinants, Quart. Appl. Math., 12, 97-104. O’Neill, M., Sinclair, L.G. and F.J. Smith (1969): Polynomial curve fitting when abscissa and ordinate are both subject of error, Comput. J. 12 (1969), 52-56 O’Neill, M.E. and K. Mathews (2000): A weighted least squares approach to Levene’s test of homogeneity of variance, Australia & New Zealand J. Statist. 42 (2000), 81-100 Nicholson, W. K. (1999): Introduction to abstract algebra, Second Edition, John Wiley & Sons, New York-Chichester-weinheim-Brisbane-Singapore 1999. Nkuite G. (1998): Adjustment with singular variance-covariance matrix for models of geometric deformation analysis. Ph.D. thesis, Deutsche Geod¨atische Kommission bei der Bayerischen Akademie der Wissenschaften Reihe C: Dissertationen, 501 Mlinchen: Verlag der Bayerischen Akademie der Wissenschaften, 1998 (in German). Nordstrom K. (1985): On a decomposition of the singular Gauss-Markov model. In: Linear Statistical Inference, Proc. Int. Conf., Poznan, Poland, 1984, Lect. Notes Stat. (Calh1ski T., Klonecki W., eds.) Springer-Verlag, Berlin, 1985, pp. 231-245. Nordstrom, K. and Fellman, J. (1990): Characterizations and dispersion matrix robustness of efficiently estimable parametric functionals in linear models with nuisance parameters. Linear Algebra Appl. 127 (1990), 341-36l. Nurhonen, M. and Puntanen, S. (1992): A property of partitioned generalized regression. Commun. Stat., Theory Methods 21, 6 (1992), 1579-1583. Nussbaum, M. and S. Zwanzig(1988): A minimax result in a model with infmitely many nuisance parameters, Transaction of the Tenth Prague Conference on Information Theory, Statistical Decisions, Random Processes, Prague 1988, 215-222.
970
References
Oberlack, M. (1997): Non-Isotropic Dissipation in Non-Homogeneous Turbulence, J. Fluid Mech., vol. 350, pp. 351-374, (1997) Obuchow, A.M. (1958): Statistische Beschribung stetiger Felder, in: H. Goering (ed.) Sammelband zur ststischen Theorie der Turbolenz, pp. 1-42, Akademic Verlag 1958. Obuchow, A.M. (1962): Some specific features of atmospheric turbulence, Fluid mechanics 13 (1962) 77-81 Obuchow, A.M. (1962): Some specific features of atmospheric turbulence, J. Geophys. Res. 67 (1962) 3011-3014 Odijk, D. (2002): Fast precise GPS positioning in the presence of ionospheric delays. Publications on Geodesy, vol. 52, Netherlands Geodetic Commission, Delft Offlinger, R. (1998): Least-squares and minimum distance estimation in the three-parameter Weibull and Fr`eechet models with applications to river drain data, in: Kahle, et al. (eds.), Advances in stochastic models for reliability, quality and safety, 81-97, Birkh¨auser-Verlag, Basel Boston Berlin 1998 Ogawa, J. (1950): On the independence of quadratic forms in a non-central normal system, Osaka Mathematical Journal 2 (1950), 151-159 Ohtani, K. (1988): Optimal levels of significance of a pre-test in estimating the disturbance variance after the pre-test for a linear hypothesis on coefficients in a linear regression, Econom. Lett. 28 (1988), 151-156 Ohtani, K. (1996): Further improving the Stein-rule estimator using the Stein variance estimator in a misspecified linear regression model, Statist. Probab. Lett. 29 (1996), 191-199 Ohtani, K. (1998): On the sampling performance of an improved Stein inequality restricted estimator, Austral. & New Zealand J. Statist. 40 (1998), 181-187 Ohtani, K. (1998): The exact risk of a weighted average estimator of the OLS and Stein-rule estimators in regression under balanced loss, Statistics & Decisions 16 (1998), 35-45 Okamoto, M. (1973): Distinctness of the Eigenvalues of a quadratic form in a multivariate sample, Ann. Stat. 1 (1973), 763-765 Okeke, F.I. (1998): The curvilinear datum transformation model, DGK, Reihe C, Heft Nr. 481. Okeke, F. and F. Krumm (1998): Graph, graph spectra and partitioning algorithms in a geodetic network structural analysis and adjustment, Bolletino di Geodesia e Scienze Affini 57 (1998), 1-24 Oktaba, W. (1984): Tests of hypotheses for the general Gauss-Markov model. Biom. J. 26 (1984), 415-424. Olkin, I. (1998): The density of the inverse and pseudo-inverse of a random matrix, Statistics and Probability Letters 38 (1998), 131-135 Olkin, J. (2000): The 70th anniversary of the distribution of random matrices: a survey, Technical Report No. 2000-06, Department of Statistics, Stanford University, Stanford 2000 Olkin, I. and S.N. Roy (1954): On multivariate distribution theory, Ann. Math. Statist. 25 (1954), 329-339 Olkin, I. and A.R. Sampson (1972): Jacobians of matrix transformations and induced functional equations, Linear Algebra Appl. 5 (1972), 257-276 Olkin, J. and J.W. Pratt (1958): Unbiased estimation of certain correlation coefficient, Annals Mathematical Statistics 29 (1958), 201-211 Olsen, A., Seely, J. and D. Birkes (1976): Invariant quadratic unbiased estimators for two variance components, Annals of Statistics 4 (1976), 878-890 Omre, H. (1987): Bayesian krigingmerging observations and qualified guess in kriging. Math Geol 19:25-39 Omre, H. and Halvorsen, K. (1989): The Bayesian bridge between simple and universal kriging. Math Geol 21:767-786 Ord, J.K. and S. Arnold (1997): Kendall’s advanced theory of statistics, volume IIA, classical inference, Arnold Publ., 6th edition, London 1997 Ortega, J. and Rheinboldt, W. (1970): Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York. Osborne, M.R. (1972): Some aspects of nonlinear least squares calculations, Numerical Methods for Nonlinear Optimization, ed. F.A. Lootsma, Academic Press, New York London 1972
References
971
Osborne, M.R. (1976): Nonlinear least squares the Levenberg algorithm revisited, J. Aust. Math. Soc. B19 (1976), 343-357 Osborne, M.R. and G.K. Smyth (1986): An algorithm for exponential fitting revisited, J. App. Prob. (1986), 418-430 Osborne, M.R. and G.K. Smyth (1995): A modified Prony algorithm for fitting sums of exponential functions, SIAM J. Sc. And Stat. Comp. 16 (1995), 119-138 Osiewalski, J. and M.F.J. Steel (1993): Robust Bayesian inference in elliptical regression models, J. Economatrics 57 (1993), 345-363 Ouellette, D.V. (1981): Schur complements and statistics, Linear Algebra Appl. 36 (1981), 187-295 Owens, W.H. (1973): Strain modification of angular density distributions, Techtonophysics 16 (1973), 249-261 Ozone, M.I. (1985): Non-iterative solution of the equations, Surveying and Mapping 45 (1985) 169-171. Paciorek, C. (2007): Bayesian smoothing with Gaussian processes using Fourier basis functions in the spectialGP package, J. Statist. Software 19 (2007) 1-38 Padmawar, V.R. (1998): On estimating nonnegative definite quadratic forms, Metrika 48 (1998), 231-244 Pagano, M. (1978): On periodic and multiple autoregressions, Annals of Statistics 6 (1978), 1310-1317 Paige, C.C. and M.A. Saunders (1975): Solution of sparse indefinite systems of linear equations, SIAM J. Numer. Anal. 12 (1975), 617-629 Paige, C. and C. van Loan (1981): A Schur decomposition for Hamiltonian matrices, Linear Algebra and its Applications 41 (1981), 11-32 Pakes, A.G. (1999): On the convergence of moments of geometric and harmonic means, Statistica Neerlandica 53 (1999), 96-110 Pal, N. and W.K. Lim (2000): Estimation of a correlation coefficient: some second order decision - theoretic results, Statistics and Decisions 18 (2000), 185-203 Pal, S.K. and P.P. Wang (1996): Genetic algorithms for pattern recognition, CRC Press, Boca Raton 1996 Palancz, B. and J.L. Awange (2012): Application of Pareto optimality to linear models with errors-in-all-variables. J. Geo. (2011), doi: 10.1007/s00190-011-0536-1 Panazachos C. and Kiratzi A. (1992): A formulation for reliable estimation of active crustal deformation and its application to central Greece, Geophys. J. Int., 111, 424-432 Panda, R. et. al. (1989): Turbulenz in a randomly stirred fluid, Phys. Fluid, A1 (1989) 1045-1053 Papo, H. (1985): Extended Free Net Adjustment Constraints. Bulletin Geodesique 59, pp. 378-390. Papo, H. (1986): Extended Free Net Adjustment Constraints. NOAA Technical Report NOS 119 NGS 37, September 1986, 16 p. Papoulis, A. (1991): Probability, random variables and stochastic processes, McGraw Hill, New York 1991 Park, H. (1991): A parallel algorithm for the unbalanced orthogonal Procrustes problem, Parallel Computing 17 (1991), 913-923 Park, S.H., Kim Y.H. and H. Toutenberg (1992): Regression diagnostic for removing an observation with animating graphics, Statistical papers, 33 (1992), pp. 227-240 Parker, W.V. (1945): The characteristic roots of matrices, Duke Math. J. 12 (1945), 519-526 Parkinson, B.W. and Spilker, J.J. (eds) (1996): Global Positioning System: Theory and Applications. vol 1, American institute of Aeronautics and Astronautics, Washington, DC, 793p Parthasaratky, K.R. (1967): Probability measures on metric spaces, Academic Press, New York London 1967 Parzen, E. (1962): On estimation of a probability density function and mode, Ann. Math. Statistics 33 (1962), 1065-1073 ¨ Passer, W. (1933): Uber die Anwendung statischer Methoden auf den Ausgleich von Liniennetzen. OZfV 31, 66-71.
972
References
Patel, J.K. and C.B. Read (1982): Handbook of the normal distribution, Marcel Dekker, New York and Basel 1982 Patil, V.H. (1965): Approximation to the Behrens-Fisher distribution, Biometrika 52 (1965), 267-271 Patterson, H.D. and Thompson, R. (1971): Recovery of the inter-block information when block sizes are unequal. Biometrika, 58, 545-554 . Paul, M.K. (1973): A note on computation of geodetic coordinates from geocentric (Cartesian) coordinates, Bull. Geod. No. 108 (1973)135-139. Pazman, A. (1982): Geometry of Gaussian Nonlinear Regression - Parallel Curves and Confidence Intervals. Kybernetica, 18, 376-396. Pazman, A. (1984): Nonlinear Least Squares - Uniqueness Versus Ambiguity. Math. Operationsforsch. u. Statist., ser. statist., 15, 323-336. Pazman, A. and M¨uller, W.G. (2001): Optimal design of experiments subject to correlated errors. Stat Probab Lett 52:29-34 P`eazman, A. (1986): Foundations of optimum experimental design, Mathematics and its applications, D. Reidel, Dordrecht 1986 P`eazman, A. and J.-B. Denis (1999): Bias of LS estimators in nonlinear regression models with constraints. Part I: General case, Applications of Mathematics 44 (1999), 359-374 P`eazman, A. and W. M¨uller (1998): A new interpretation of design measures, in: MODA 5 Advances in model-oriented data analysis and experimental design, Proceedings of the 5th International Workshop in Marseilles, eds. Atkinson, A.C., Pronzato, L. and H.P. Wynn, Physica-Verlag, Heidelberg 1998 Pearson, E.S. (1970): William Sealy Gosset, 1876-1937: Student as a statistician, Studies in the History of Statistics and Probalbility (E.S. Pearson and M.G. Kendall, eds.), Hafner Publ., 360-403, New York 1970 Pearson, E. S. and S. S. Wilks (1933): Methods of statistical analysis appropriate for k samples of two variables, Biometrika, 25, 353-378 Pearson, E.S. and H.O. Hartley (1958): Biometrika Tables for Statisticians Vol. 1, Cambridge University Press, Cambridge 1958 Pearson, K. (1905): The problem of the random walk, Nature 72 (1905), 294 Pearson, K. (1906): A mathematical theory of random migration, Mathematical Contributions to the Theory of Evolution, XV Draper’s Company Research Memoirs, Biometrik Series III, London 1906 Pearson, K. (1931): Historical note on the distribution of the Standard Deviations of Samples of any size from any indefinitely large Normal Parent Population, Biometrika 23 (1931), 416-418 Peddada, S.D. and T. Smith (1997): Consistency of a class of variance estimators in linear models under heteroscedasticity, Sankhya 59 (1997), 1-10 Pleitgen, H and Saupe, D (1988): The science of fractal image, Springer Verlag, New York 1988 Peitgen, H-O. and Jurgens, H. and Saupe, D. (1992): Chaos and fractals, springer Verlag, 1992 Pelzer, H. (1971): Zur Analyse geod¨atischer Deformationsmessungen, Deutsche Geod¨atische Kommission, Akademie der Wissenschaften, Reihe C (164), M¨unchen 1971 Pelzer, H. (1974): Zur Behandlung singul¨arer Ausgleichungsaufgaben, Z. Vermessungswesen 99 (1974), 181-194, 479-488 Pelzer, H. (1985): Geodiitische Netze , in Landes- und Ingenieurvermessung II. Wittwer, Stuttgart, 847p (in German) Pena, D., Tiao, G.C., and R.S. Tsay (2001): A course in time series analysis, J. Wiley, New York 2001 Penev, P. (1978): The transformation of rectangular coordinates into geographical by closed formulas Geo. Map. Photo, 20 (1978) 175-177. Peng, H.M., Chang, F.R. and Wang, L.S. (1999): Attitude determination using GPS carrier phase and compass data. In: Proceedings of ION NTM 99, pp 727-732 Penrose, R. (1955): A generalised inverse for matrices, Proc. Cambridge Phil. Soc. 51 (1955), 406-413
References
973
Penrose, R. (1956): On best approximate solution of linear matrix equations, Proc. Cambridge Philos. Soc. 52 (1956) pp. 17-19 Penny, K.I. (1996): Appropriate critical values when testing for a single multivariate outlier by using the Mahalanobis distance, in: Applied Statistics, ed. S.M. Lewis and D.A. Preece, J. Roy. Statist. Soc. 45 (1996), 73-81 Percival, D.B. and A.T. Walden (1993): Spectral analysis for physical applications, Cambridge, Cambridge University Press 1997 Percival, D.B. and A.T. Walden (1999): Wavelet methods for time series analysis, Cambridge University Press, Cambridge 1999 Perelmuter, A. (1979): Adjustment of free networks, Bull. Geod. 53 (1979) 291-295. Perlman, M.D. (1972): Reduced mean square error estimation for several parameters, Sankhya series, B 34 (1972), pp. 89-92 Perrin, J. (1916): Atoms, Van Nostrand, New York 1916 Perron, F. and N. Giri (1992): Best equivariant estimation in curved covariance models, J. Multivar. Anal. 40 (1992), 46-55 Perron, F. and N. Giri (1990): On the best equivariant estimator of mean of a multivariate normal population, J. Multivar. Anal. 32 (1990), 1-16 Persson, C.G., (1980): MINQUE and related estimators for variance components in linear models. PhD thesis, Royal Institute of Technology, Stockholm, Sweden. Petrov, V.V. (1975): Sums of independent random variables, Springer Verlag, Berlin-HeidelbergNew York 1975 Pfeufer, A. (1990): Beitrag zur Identifikation und Modellierung dynamischer Deformationsprozesse, Vermessungstechnik 38 (1990), 19-22 Pfeufer, A. (1993): Analyse und Interpretation von u¨ berwachungsmessungen - Terminologie und Klassifikation, Zeitschrift f¨ur Vermessungswesen 118 (1993), 19-22 Pick, M.(1985): Closed formulae for transformation of Cartesian coordinates into a system of geodetic coordinates, Studia geoph. et geod. 29 (1985) 653-666. Pilz, J. (1983): Bayesian estimation and experimental design in linear regression models, Teubner-Texte zur Mathematik 55, Teubner, Leipzig 1983 Pilz, J. and Spock, G. (2008): Bayesian spatial sampling design. In: Proc. 18th Intl. Geostatics World Congeress (J. M. Oritz and X. Emery, Eds.), Gecamin Ltd., Santiago de Chile 2008, 21-30 Pilz, J.(1991): Bayesian estimation and experimental design in linear regression models. Wiley, New York Pilz, J. Schimek, M.G. and Spock, G. (1997): Taking account of uncertainty in spatial covariance estimation. In: Baafi E, Schofield N (ed) Proceedings of the fifth international geostatistics congress. Kluwer, Dordrecht, pp. 302-313 Pilz, J. and Spock, G. (2006): Spatial sampling design for prediction taking account of uncertain covariance structure. In: Caetano M, PainhoM (ed) Proceedings of accuracy 2006. Instituto Geografico Portugues, pp. 109-118 Pilz, J. and Spock, G. (2008): Why do we need and how should we implement Bayesian kriging methods. Stoch Environ Res Risk Assess 22/5:621-632 Pincus, R. (1974): Estimability of parameters of the covariance matrix and variance components, Math. Operationsforschg. Statistik 5 (1974), 245-248 Pinheiro, J.C. and D.M. Bates (2000): Mixed-effects models in S and S-Plus, Statistics and Computing, Springer-Verlag, Heidelberg Berlin New York 2000 Piquet, J. (1999): Turbulent flows, models and physics, Springer Verlag, 761 pages, Berlin 1999 Pistone, G. and Wynn, H. P. (1996): Generalized confounding with Gr¨obner bases, Biometrika 83 (1996) 112-119. Pitman, E.J.G. (1979): Some basic theory for statistical inference, Chapman and Hall, Boca Raton 1979 Pitmen, E.J.G. (1939): A note on normal correlation, Biometrika, 31, 9-12. Pitman, J. and M. Yor (1981): Bessel processes and infinitely divisible laws, unpublished report, University of California, Berkeley
974
References
Plachky, D. (1993): An estimation-theoretical characterization of the Poisson distribution, Statistics and Decisions, Supplement Issue 3 (1993), 175-178 Plackett, R.L. (1949): A historical note on the method of least-squares, Biometrika 36 (1949), 458-460 Plackett, R.L. (1972): The discovery of the method of least squares, Biometrika 59 (1972), 239-251 Plato, R. (1990): Optimal algorithms for linear ill-posed problems yield regularization methods, Numer. Funct. Anal. Optim. 11 (1990), 111-118 Pohst, M. (1987): A modification of the LLL reduction algorithm, J. Symbolic Computation 4 (1987), 123-127 Poirier, D.J. (1995): Intermediate statistics and econometrics, The MIT Press, Cambridge 1995 Poisson, S.D. (1827): Connaissance des temps de l’annee 1827 Polasek, W. and S. Liu (1997): On generalized inverses and Bayesian analysis in simple ANOVA models, Student 2 (1997), 159-168 Pollock, D.S.G. (1979): The algebra of econometrics, Wiley, Chichester 1979 Pollock, D.S.G. (1999): A handbook of time series analysis, signal processing and dynamics, Academic Press, San Diego 1999 Polya, G. (1919): Zur Statistik der sphaerischen Verteilung der Fixsterne, Astr. Nachr. 208 (1919), 175-180 Polya, G. (1930): Sur quelques points de la th`eeorie des probabilit`ees, Ann. Inst. H. Poincare 1 (1930), 117-161 Polzehl, J. and S. Zwanzig (2004): On a symmetrized simulation extrapolation estimator in linear errors-in-variables mddels, Journal of Computational Statistics and Data Analysis.47 (2004) 675-688 Pope, A.J. (1972): Some pitfalls to be avoided in the iterative adjustment of nonlinear problems. Proceedings of the 38th Annual Meeting. American Society of Photogrammetry, Washington D. C., March 1972. Pope, A.J. (1974): Two Approaches to Nonlinear Least Squares Adjustments. The Canadian Surveyor, Vol. 28, No.5, 663-669. Pope, A.J. (1976): The statistics of residuals and the detection of outliers, NOAA Technical Report, NOW 65 NGS 1, U.S. Dept. of Commerce, Rockville, Md., 1976 Pope, A. (1982): Two approaches to non-linear least squares adjustments, The Canadian Surveyor 28 (1982) 663-669. Popinski, W. (1999): Least-squares trigonometric regression estimation, Applicationes Mathematicae 26 (1999), 121-131 Pordzik, P.R. and Trenkler, G. (1996): MSE comparisons of restricted least squares estimators in linear regression model-revisited. Sankhya, Ser. B 58, 3 (1996), 352-359. Portnoy, S.L. (1977): Robust estimation in dependent situations, Annals of Statistics, 5 (1977), pp. 22-43 Portnoy, S. and Koenker, R. (1997): The Gaussian hare and the Laplacian tortoise: computability of squared error versus absolute-error estimators, Statistical Science 12 (1997), 279-300 Powell, J.L. (1984): Least absolute deviations estimating for the censored regression model, J. of Econometrics, 25 (1984), pp. 303-325 Pratt, J.W. (1961): Length of confidence intervals, J. Am. Statist. Ass. 56 (1961), 549-567 Pratt, J.W. (1963): Shorter confidence intervals for the mean of a normal distribution with known variance, Ann. Math. Statist. 34 (1963), 574-586 Prentice, R.L. and L.P. Zhao (1991): Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses, Biometrics, 47 (1991), pp. 825-839 Prenter, P.M. (1975): Splines and Variational Methods. John Wiley & Sons. New York. Presnell, B. Morrison, S.P. and R.C. Littell (1998): Projected multivariate linear models for directional data, J. Am. Statist. Ass. 93 (1998), 1068-1077 Press, S.J. (1989): Bayesian statistics: Principles, models and applications, J. Wiley, New York 1989 Press, W.H., Teukolsky, S.A., Vetterling, W.T. and B.P. Flannery (1992): Numerical Recipes in FORTRAN (2nd edition), Cambridge University Press, Cambridge 1992
References
975
Priestley, M.B. (1981): Spectral analysis and time series, Vols. 1 and 2, Academic Press, New York London 1981 Priestley, M.B. (1988): Nonlinear and nonstationary time series analysis, Academic Press, New York London 1988 Prony, R. (1795): Essai experimentale et analytique, Journal of Ecole Polytechnique (Paris) 1 (1795), 24-76 Pruscha, H. (1996): Angewandte Methoden der Mathematischen Statistik, Teubner Skripten zur Mathematischen Stochastik, Stuttgart 1996 Pugachev, V.S. (1965): Theory of random functions, Oxford 1965 Pugachev, V.S. and I.N. Sinitsyn (2002): Stochastic systems, Theory and applications, Russian Academy of Sciences 2002 Puntanen, S. (1986): Comments on “ on necessary and sufficient condition for ordinary least estimators to be best linear unbiased estimators”, J. of American Statistical Association, 40 (1986), p. 178 Puntanen, S. (1996): Some matrix results related to a partitioned singular linear model. Commun. Stat., Theory Methods 25, 2 (1996), 269-279. Puntanen, S. (1997): Some further results related to reduced singular linear models. Commun. Stat., Theory Methods 26, 2 (1997), 375-385. Puntanen, S. and Styan, G.P.H. (1989): The equality of the ordinary least squares estimator and the best linear unbiased estimator. Amer. Statist. 43 (1989), 153-16l. Puntanen, S. and Scott, A.J. (1996): Some further remarks on the singular linear model. Linear Algebra Appl. 237/8 (1996), 313-327. Puntanen, S., Styan, G.P.H. and H.J. Werner (2000): Two matrix-based proofs that the linear estimator Gy is the best linear unbiased estimator, J. Statist. Planning and Inference 88 (2000) 173-179 Pukelsheim, F. (1976): Estimating variance components in linear models, J. Multivariate Anal. 6 (1976), pp. 626-629 Pukelsheim, F. (1977): On Hsu’s model in regression analysis, Math. Operations forsch. Statist. Ser. Statistics, 8 (1977), pp. 323-331 Pukelsheim, F. (1979): Classes of linear models, in: van Vleck, L.D. and S.R. Searle (eds.), pp. 69-83, cornell University/Ithaca 1979 Pukelsheim, F. (1980): Multilinear estimation of skewness and curtosis in linear models, Metrika, 27 (1980), pp. 103-113 Pukelsheim, F. (1981a): Linear models and convex geometry: Aspects of non-negative variance estimation, Math. Operationsforsch. u. Stat. 12 (1981), 271-286 Pukelsheim, F. (1981b): On the existence of unbiased nonnegative estimates of variance covariance components, Ann. Statist. 9 (1981), 293-299 Pukelsheim, F. (1990) : Optimal design of experiments , J. Wiley, New York 1990 Pukelsheim, F. (1993): Optimal design of experiments, J. Wiley, New York 1993 Pukelsheim, F. (1994): The three sigma rule, American Statistician 48 (1994), 88-91 Pukelsheim, F. and G.P. Styan (1979): Nonnegative definiteness of the estimated dispersion matrix in a multivariate linear model. Bull. Acad. Polonaise des Sci., Ser. Sci. Math., 27 (1979), pp. 327-330 Pukelsheim, F. and B. Torsney (1991): Optimal weights for experimental designs on linearly independent support points, The Annals of Statistics 19 (1991), 1614-1625 Pukelsheim, F. and W.J. Studden (1993): E-optimal designs for polynomial regression, Ann. Stat. 21 (1993), 402-415 Pynn, R. and Skjeltorp, A. (eds) Scaling Phenomena in Disordered Systems, Plenum Press New york (1985). Qingming, G. and L. Jinshan (2000): Biased estimation in Gauss-Markov model, Allgemeine Vermessungsnachrichten 107 (2000), 104-108 Qingming, G., Yuanxi, Y. and G. Jianfeng (2001): Biased estimation in the Gauss-Markov model with constraints, Allgemeine Vermessungsnachrichten 108 (2001), 28-30
976
References
Qingming, G., Lifen, S., Yuanxi, Y. and G. Jianfeng (2001): Biased estimation in the GaussMarkov model not full of rank, Allgemeine Vermessungsnachrichten 108 (2001), 390-393 Quintana-Orti, G., Sun, X. and C.H. Bischof (1998): A BLAS-3 version of the QR factorization with column pivoting, SIAM J. Sci. Comput. 19 (1998), 1486-1494 Raj, D. (1968): Sampling theory, Mc Graw-Hill Book Comp., Bombay 1968 Ralston, A. and Wilf, H. W. (1979): Mathematische Methoden fiir Digitalrechner 2. R. Oldenbourg Verlag, Miinchen. Ramsey, J.O. and B.W. Silverman (1997): Functional data analysis, Springer-Verlag, Heidelberg Berlin New York 1997 Rao, B.L.S.P. (1997a): Weighted least squares and nonlinear regression, J. Ind. Soc. Ag. Statistics 50 (1997), 182-191 Rao, B.L.S.P. (1997b): Variance components, Chapman and Hall, Boca Raton 1997 Rao, B.L.S.P. and B.R. Bhat (1996): Stochastic processes and statistical inference, New Age International, New Delhi 1996 Rao, C.R. (1945): Generalisation of Markoff’s Theorem and tests of linear hypotheses, Sankya 7 (1945), 9-16 Rao, C.R. (1948a): Test of significance in multivariate analysis, Biometrika 35 (1948a) 58-79 Rao, C.R. (1949): On the transformation useful in multivariate computations, Sankhya 9(1949) 251-253 Rao, C.R. of (1951): An asymptotic expansion of the distribution of Wilks criterion Bull.Inst. Internat. Stat., 33, Part II, 177-180. Rao, C.R. (1952a): Some theorems on Minimum Variance Estimation, Sankhya 12, 27-42 Rao, C.R. (1952b): Advanced statistical methods in biometric research, J. Wiley, New York 1952 Rao. C.R. (1965a): Linear Statistical Interference and ist Applications, J. Wiley, New York 1965 Rao, C.R. (1965b): The theory of least squares when the parameters are stochastic and its application to the analysis of growth curves, Biometrika 52 (1965), 447-458 Rao, C.R. (1965c): Linear Statistical Inference and Its Applications. John Wiley & Sons, New York. Rao, C.R. (1967): Least squares theory using an estimated dispersion matrix and its application to measurement of signals, Procedure of the Fifth Barkeley Symposium, Barkeley 1967. Rao, C.R. (1970): Estimation of heteroscedastic variances in linear models, J. Am. Stat. Ass. 65 (1970), 161-172 Rao, C.R. (1971a): Estimation of variance and covariance components - MINQUE theory, J. Multivar. Anal. 1 (1971), 257-275 Rao, C.R., (1971b): Estimation of variance components and application, North-Holland Series in statistics and probability, vol 3. Rao, C.R. (1971c): Unified theory of linear estimation, Sankhya A33 (1971), 371-394 Rao, C.R. (1971d): Minimum variance quadratic unbiased estimation of variance components, J. Multivar. Anal. 1 (1971), 445-456 Rao, C.R. (1972): Estimation of variance and covariance components in linear models. J. Am. Stat. Ass. 67 (1972), 112-115 Rao, C.R. (1973a): Linear statistical inference and its applications, 2nd ed., J. Wiley, New York 1973 Rao, C.R. (1973b): Representation of best linear unbiased estimators in the Gauss-Markoff model with a singular dispersion matrix, J. Multivar. Anal. 3 (1973), 276-292 Rao, C.R. (1973c): Unified theory of least squares, Comm. Statist., 1 (1977), pp. 1-8 Rao, C.R. (1976): Estimation of parameters in a linear model, Ann. Statist. 4 (1976), 1023-1037 Rao, C.R. (1977): Prediction of future observations with special reference to linear models. In. Multivariate Analysis, Vol 4 (1977), pp. 193-208 Rao, C.R. (1978): Choice of the best linear estimators in the Gauss-Markov model with singular dispersion matrix, Comm. Stat. Theory Meth. A7 (13) (1978) 1199-1208. Rao, C.R. (1979): Separation theorems for singular values of matrices and their applications in multivariate analysis, Journal of Multivariate Analysis 9, pp. 362-377.
References
977
Rao, C.R. (1984): Prediction of future observations in polynomial growth curve models, In Proceedings of the Indian Statistical Institute Golden Jubilee Intl. Conf. on Stat.: application and future directions, Indian Stat. Ins., pp. 512-520, Calcutta 1984 Rao, C.R. (1985): The inefficiency of least squares: extensions of the Kantorovich inequality, Linear algebra and its applications 70 (1985), 249-255 Rao, C.R. (1988a): Methodology based on the L1-norm in statistical inference, Sankhya 50, 289-313 Rao, C.R. (1988b): Prediction of future observations in growth curve models, J. of Statistical Science, 2, 434-471 Rao, C.R. (1989): A lemma on optimization of a matrix function and a review of the unified theory of linear estimation, ln: Y. Dodge (Ed.), Statistical Data Analysis and interference, pp. 337-418, Elsevier, Amsterdam 1989 Rao, C.R. and S.K. Mitra (1971): Generalized inverse of matrices and itsapplications, J. Wiley, New York 1971 Rao, C.R. and S.K. Mitra (1972): Generalized inverse of a matrix and its applications, in: Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, Vol. 1, 601-620, University of California Press, Berkeley 1972 Rao, C.R. and J. Kleffe (1979): Variance and covariance components estimation and applications, Technical Report No. 181, Ohio State University, Dept. of Statistics, Columbus, Ohio. Rao, C.R. and J. Kleffe (1988): Estimation of variance components and applications, North Holland, Amsterdam 1988 Rao, C.R. and L.C. Zhao (1993): Asymptotic normality of LAD estimator in censored regression models, Math. methods in Statistics, 2 (1993), pp. 228-239 Rao, C.R. and H. Toutenburg (1995): Linear models, least-squares and alternatives, SpringerVerlag, Heidelberg Berlin New York 1995 Rao, C.R. and R. Mukerjee (1997): Comparison of LR, score and Wald tests in a non-iiD setting, J. Multivar. Anal. 60 (1997), 99-110 Rao, C.R. and M.B. Rao (1998): Matrix algebra and its application to statistics and econometrics, World Scientific, Singapore 1998 Rao, C.R. and H. Toutenburg (1999): Linear models, Least squares and alternatives, 2nd ed., Springer-Verlag, Heidelberg Berlin New York 1999 Rao, C.R. and G.J. Szekely (2000): Statistics for the 21st century - Methodologies for applications of the future, Marcel Dekker, Basel 2000 Rao, P.S.R.S. and Y.P. Chaubey (1978): Three modifications of the principle of the MINQUE, Commun. Statist. Theor. Methods A7 (1978), 767-778 Ratkowsky, D.A. (1983): Nonlinear regression modelling, Marcel Dekker, New York Ravi, V. and H.-J. Zimmermann (2000): Fuzzy rule based classification with Feature Selector and modified threshold accepting, European Journal of Operational Research 123 (2000), 16-28 Ravi, V., Reddy, P.J. and H.-J. Zimmermann (2000): Pattern classification with principal component analysis and fuzzy rule bases, European Journal of Operational Research 126 (2000), 526-533 Ravishanker, N. and D.K. Dey (2002): A first course in linear model theory, CRC Press, Boca Raton 2002 Rayleigh, L. (1880): On the resultant of a large number of vibrations of the same pitch and of arbitrary phase, Phil. Mag. 5 (1880), 73-78 Rayleigh, L. (1905): The problem of random walk, Nature 72 (1905), 318 Rayleigh, L. (1919): On the problem of random vibrations, and of random flights in one, two or three dimensions, Phil Mag. 37 (1919), 321-347 Rebai, S. Philip, H. and Taboada, A. (1992): Modern tectonic stress field in the Mediterranean region: evidence for variation instress directions at different scales, Geophys. J. Int., 110, 106-140 Reeves, J. (1998): A bivariate regression model with serial correlation, The Statistician 47 (1998), 607-615
978
References
Reich, K. (2000): Gauss’ Sch¨uler. Studierten bei Gauss und machten Karriere. Gauss’ Erfolg als Hochschullehrer (Gauss’s students: studied with him and were successful. Gauss’s success as a university professor), Gauss Gesellschaft E.V.G¨ottingen, Mitteilungen Nr. 37, 33-62, G¨ottingen 2000 Reilly, W., Fredrich, G., Hein, G., Landau, H., Almazan, J. and Caturla, J. (1992): Geodetic determination of crustal defoemation across the Strait of Gibraltar, Geophys. J. Int., 111, 391-398 Relles, D.A. (1968): Robust regression by modified least squares, Ph.D. Thesis, Yale University, Yale 1968 Remmer, O. (1973): A Stability Investigation of Least Squares Adjustment by elements, Geodaetisk Institute Copenhagen, 1973 Remondi, B.W. (1984): Using the Global Positioning System (GPS) phase observable for relative geodesy: modelling, processing and results. Ph.D.Thesis, Center for Space Research, The University of Texas, Austin 1984 Ren, H. (1996): On the eroor analysis and implementation of some eigenvalue decomposition and singular value decomposition algorithms, UT-CS-96-336, LAPACK working note 115 (1996) Rencher A.C. (2000): Linear models in statistics, J. Wiley, New York 2000 Renfer, J.D. (1997): Contour lines of L1-norm regression, Student 2 (1997), 27-36 Resnikoff, G.J. and G.J. Lieberman (1957): Tables of the noncentral t-distribution, Stanford University Press, Stanford 1957 Reubelt, T., Austen, G. and Grafarend, E.W. (2003): Harmonic analysis of the Earth’s gravitational field by means of semi-continuous ephemeris of a low Earth orbiting (LEO) GPS-tracked satellitecase study CHAMP, J. Geodesy, 77, 257-278 Rham De, G. (1984): Differentiable Manifolds. Springer-Verlag, Berlin Heidelberg Riccomagno, E., Schwabe, R. and H.P. Wynn (1997): Lattice-based optimum design for Fourier regression, Ann. Statist. 25 (1997), 2313-2327 Rice, J.R. (1969): The approximation of functions, Vol. II - Nonlinear and multivariate theory, Addison-Wesley, Reading 1969 Richards, F.S.G. (1961): A method of maximum likelihood estimation, J. Roy. Stat. Soc. B23 (1961), 469-475 Richardson, R. (1992): Ridge forces, absolute plate motions, and the intraplate stress field, J. geophys. Res., B97, 11739-11748 Richter, H. and V. Mammitzsch (1973): Methode der kleinsten Quadrate, Stuttgart 1973 Richter, B. (1986): Entwurf eines nichtrelativistischen geod¨atisch-astronomischen Bezugssystems, DGK, Reihe C, Heft Nr. 322. Riedel, K.S. (1992): A Sherman-Morrison-Woodbury identity for rank augmenting matrices with application to centering, SIAM J. Matrix Anal. Appl. 13 (1992), 659-662 Riedwyl, H. (1997): Lineare Regression, Birkh¨auser-Verlag, Basel Boston Berlin 1997 Richter, W.D. (1985): Laplace-Gauß integrals, Gaussian measure asymptotic behaviour and probabilities of moderate deviations, Z.f. Analysis und ihre Anwendungen 4 (1985), 257-267 ¨ Rinner, K. (1962): Uber die Genauigkeit des r¨aumlichen Bogenschnittes, Zeitschrift f¨ur Vermessungswesen 87 (1962) 361-374. Rivest, L.P. (1982): Some statistical methods for bivariate circular data, J. Roy. Statist. Soc. B44 (1982), 81-90 Ritt, J.F. (1950): Differential algebra, AMS colloquium publications. Rivest, L.P. (1988): A distribution for dependent unit vectors, Comm. Statistics A: Theory Methods 17 (1988), 461-483 Rivest, L. (1989): Spherical regression for concentrated Fisher-von Mises distributions, Annals of Statistics 17 (1989), 307-317 Roach, G.F. (1982): Green’s Functions. Cambridge University Press, Cambridge. Roberts, P.H. and H.D. Ursell (1960): Random walk on the sphere and on a Riemannan manifold, Phil. Trans. Roy. Soc. A252 (1960), 317-356 Robertson, H.P. (1940): The invariant theory of isotropic turbulence Proc. Cambridge Phil. Soc. 36 (1940) 209-223
References
979
Robinson, E.A. (1967): Multichannel time series analysis with digital programs, 298 pages, Holden Day, San Francisco 1967 Robinson, E.A. and Treitel, S. (1967): Principles of digital filtering, Geophysics 29 (1964), 395-404 Robinson, G.K. (1982): Behrens-Fisher problem, Encyclopedia of the Statistical Sciences, Vol. 1, J. Wiley, New York 1982 Rodgers, J.L. and W.A. Nicewander (1988): Thirteen ways to look at the correlation coefficient, the Maerican Statistician 42 (1988), 59-66 Rohde C.A. (1965): Generalized inverses of partitioned matrices. J. Soc. Ind. Appl. Math. 13 (1965), 1033-1035. Rohde, C.A. (1966): Some results on generalized inverses, SIAM Rev. 8 (1966), 201-205 Romano, J.P. and A.F. Siegel (1986): Counterexamples in probability and statistics, Chapman and Hall, Boca Raton 1986 Romanowski, M. (1979): Random errors in observations and the influence of modulation on their distribution, K. Wittwer Verlag, Stuttgart 1979 Rosen, F.: The algebra of Mohammed Ben Musa, London: Oriental Translation Fund 1831 Rosen van, D. (1988): Moments for matrix normal variables, Statistics 19 (1988), 575-583 Rosen, J.B., Park, H. and J. Glick (1996): Total least norm formulation and solution for structured problems, SIAM J. Matrix Anal. Appl. 17 (1996), 110-126 Ros`een, K.D.P. (1948): Gauss’s mean error and other formulae for the precision of direct observations of equal weight, T¨atigkeitsbereiche Balt. Geod. Komm. 1944-1947, 38-62, Helsinki 1948 Rosenblatt, M. and Van Ness, J.W. (1965): Estimation of bispectrum, Ann. Math. Statist. 36 (1965), 1120-1136 Rosenblatt, M. (1966): Remarks on higher order spectra, in : Multivariate Analysis, P.R. Krishnaiah (ed.) Academic Press, New York 1966 Rosenblatt, M. (1971): Curve estimates, Ann. Math. Statistics 42 (1971), 1815-1842 Rosenblatt, M. (1997): Some simple remarks on an autoregressive scheme and an implied problem, J. Theor. Prob. 10 (1997), 295-305 Ross, G.J.S. (1982): Non-linear models, Math. Operationsforschung Statistik 13 (1982), 445-453 Rosenborck, H.H. (1960): An Automatic Method for finding the Greatest or Least Value of a Function. Computer Journal, 3, 175-184. Ross, S.M. (1983): Stochastic processes, J. Wiley, New York 1983 Rotta, J.C. (1972): Turbulente Str¨omuugen, Teubner Verlag, 267 pages, Stuttgart 1972 Rousseeuw, P.J. (1984): Least Median of Squares Regression, Journal of the American Statistical Association, 79, pp. 871-880 Rousseeuw, P.J. and A.M. Leroy (1987): Robust regression and outlier detection, J. Wiley, New York 1987 Roy, T. (1995): Robust non-linear regression analysis, J. Chemometrics 9 (1995), 451-457 Rozanski, I.P. and R. Velez (1998): On the estimation of the mean and covariance parameter for Gaussian random fields, Statistics 31 (1998), 1-20 Rubin, D.B. (1976): Inference and missing data, Biometrica 63 (1976), pp. 581-590 Rueda, C., Salvador, B. and M.A. Fern`eandez (1997): Simultaneous estimation in a restricted linear model, J. Multivar. Anal. 61 (1997), 61-66 Rueschendorf, L. (1988): Asymptotische Statistik, Teubner, Stuttgart 1988 Rummel, R. (1975): Zur Behandlung von Zufallsfunktionen und -folgen in der physikalischen Geod¨asie, Deutsche Geod¨atische Kommission bei der Bayerischen Akademie der Wissenschaften, Report No. C 208, M¨unchen 1975 Rummel, R. (1976): A model comparison in least-squares collocation. Bull Geod 50:181-192 Rummel, R. and K. P.Schwarz (1977): On the nonhomogenity of the global covariance function, Bull. G`eeod`eesique 51 (1977), 93-103 Rummel, R. and Teunissen, P. (1982): A connection between geometric and gravimetric geodesy - some remarks on the role of the gravity field. Feestbundel ter gelegenheid van de 65ste verjaardag van Professor Baarda, Deel II, pp. 603-623, Department of GeodeticScience, Delft University
980
References
Runge, C. (1900): Graphische Ausgleichung beim R¨uckw¨atseinchneiden, Zeitschrift f¨ur Vermessungswesen 29 (1900) 581-588. Ruppert, D. and R.J. Carroll (1980): Trimmed least squares estimation in the linear model, J. Am. Statist. Ass. 75 (1980), 828-838 Rutherford, D.E. (1933): On the condition that two Zehfuss matrices be equal, Bull. Amer. Math. Soc. 39 (1933), 801-808 Rutherford, A. (2001): Introducing Anova and Ancova - a GLM approach, Sage, London 2001 Rysavy, J. (1947): Higher geodesy. Jeska matice technicka, Praha, 1947 (in Czech). Saalfeld, A. (1999): Generating basis sets of double differences, Journal of Geodesy 73 (1999), 291-297 Saastamoinen, J. (1973a): Contributions to the theory of atmospheric refraction. J. of Geodesy 46: 279-298 Saastamoinen, J. (1973b):Contribution to the theory of atmospheric refraction. Part II refraction corrections in satellite geodesy. Bulletin Godsique 107:13-34, 1973. Sacks, J. and D. Ylvisaker (1966): Design for regression problems with correlated errors, Annals of Mathematical Statistics 37 (1966), 66-89 Sahai, H. (2000): The analysis of variance: fixed, random and mixed models, 778 pages, Birkh¨auser-Verlag, Basel Boston Berlin 2000 Sahin, M., Cross, P.A. and P.C. Sellers. (1992): Variance components estimation applied to satellite laser ranging, Bulletin Geodesique, vol. 66, no. 3, p. 284-295. Saichev, A.I. and W.A. Woyczynski (1996): Distributions in the physical and engineering sciences, Vol. 1, Birk¨auser Verlag, Basel 1996 Saito, T. (1973): The non-linear least squares of condition equations, Bull. Geod. 110 (1973) 367-395. Salmon, G. (1876): Lessons Introductory to modern higher algebra, Hodges, Foster and Co., Dublin 1876. Samorodnitsky, G. and M.S. Taqqu (1994): Stable non-Gaussian random processes, Chapman and Hall, Boca Raton 1994 Sampson, P.D. and P. Guttorp (1992): Nonparametric estimation of nonstationary spatial covariance structure, J. Am. Statist. Ass. 87 (1992), 108-119 Sander, B. (1930): Gefugekunde und Gesteine, J. Springer, Vienna 1930 Sans´eo, F. (1990): On the aliasing problem in the spherical harmonic analysis, Bull. G`eeodesique 64 (1990), 313-330 Sans´eo, F. and G. Sona (1995): The theory of optimal linear estimation for continuous fields of measurements, Manuscripta Geodetica 20 (1995), 204-230 Sanso, F. (1973): An Exact Solution of the Roto-Translation Problem. Photogrammetria, Vol. 29, 203-216. Sanso, F. (1986): Statistical methods in physical geodesy. In: Suenkel H (ed), Mathematical and numerical techniques in physical geodesy, Lecture Notes in Earth Sciences, vol. 7 Springer, pp 49-156 Sanso, F. and Tscherning, C.C. (2003): Fast spherical collocation: theory and examples. J Geod 77(1-2):101-112 Sastry, K. and Krishna, V. (1948) On a Bessel function of the second kind and Wilks Z-distribution. Proc Indian Acad Sci Series A 28:532-536 Sastry, S. (1999): Nonlinear systems: Analysis, stability and control, Springer-Verlag, Heidelberg Berlin New York 1999 Sathe, S.T. and H.D. Vinod (1974): Bound on the variance of regression coefficients due to heteroscedastic or autoregressive errors, Econometrica 42 (1974), 333-340 Saw, J.G. (1978): A family of distributions on the m-sphere and some hypothesis tests, Biometrika 65 (1978), 69-73 Saw, J.G. (1981): On solving the likelihood equations which derive from the Von Mises distribution, Technical Report, University of Florida, 1981
References
981
Sayed, A.H., Hassibi, B. and T. Kailath (1996), Fundamental inertia conditions for the minimization of quadratic forms in indefinite metric spaces, Oper. Theory: Adv. Appl., Birkh¨auser-Verlag, Basel Boston Berlin 1996 Schach, S. and T. Sch¨afer (1978): Regressions- und Varianzanalyse, Springer-Verlag, Heidelberg Berlin New York 1978 Schafer, J.L. (1997): Analysis of incomplete multivariate data, Chapman and Hall, London 1997 Schaffrin, B. (1979): Einige erg¨anzende Bemerkungen zum empirischen mittleren Fehler bei kleinen Freiheitsgraden, Z. Vermessungsesen 104 (1979), 236-247 Schaffrin, B. (1981a): Some proposals concerning the diagonal second order design of geodetic networks, manuscripta geodetica 6 (1981) 303-326 Schaffrin, B. (1981b): Varianz-Kovarianz-Komponenten-Schatzung bei def Ausgleichung heterogener, Wiederhungsmessungen. Ph.D. Thesis, University of Bonn, Bonn, Germany. Schaffrin, B. (1983a): Varianz-Kovarianz Komponentensch¨atzung bei der Ausgleichung heterogener Wiederholungsmessungen, Deutsche Geod¨atische Kommission, Report C 282, M¨unchen, 1983 Schaffrin, B. (1983b): Model choice and adjustment techniques in the presence of prior information, Ohio State University Department of Geodetic Science and Surveying, Report 351, Columbus 1983 Schaffrin, B. (1983c): A note on linear prediction within a Gauˇ - Markov model linearized with respect to a random approximation. 1st Int. Tampere Seminar on Linear Models and their Applications. Tampere/ Finland. Schaffrin, B. (1984): Das geod¨atische Datum mit stochastischer Vorinformation. Habilitationsschrift Stuttgart. Schaffrin, B. (1985a): The geodetic datum with stochastic prior information, Publ. C313, German Geodetic Commission, M¨unchen 1985 Schaffrin, B. (1985b): On Design Problems in Geodesy using Models with Prior Information. Statistics & Decisions, Supplement Issue 2 443-453. Schaffrin, B. (1985c): A note on linear prediction within a Gauss-Markov model linearized with respect to a random approximation. In: Proc. First Tampere Sem. Linear Models (1983), Eds.: T. Pukkila, S. Puntanen, Dept. of Math. Science/Statistics, Univ. of Tamp ere/Finland, Report No. A-138, 285-300. Schaffrin, B. (1986): New estimation/prediction techniques for the determination of crustal deformations in the presence of geophysical prior information, Technometrics 130 (1986), pp. 361-367 Schaffrin, B. (1991): Generalized robustified Kalman filters for the integration of GPS and INS, Tech. Rep. 15, Geodetic Institute, Stuttgart Unversity 1991 Schaffrin, B. (1997): Reliability measures for correlated observations, Journal of Surveying Engineering 123 (1997), 126-137 Schaffrin, B. (1999): Softly unbiased estimation part1: The Gauss-Markov model, Linear Algebra and its Applications 289 (1999), 285-296 Schaffrin, B. (2001): Equivalent systems for various forms of kriging, including least-squares collocation, Zeitschrift f¨ur Vermessungswesen 126 (2001), 87-94 Schaffrin, B. (2005): On total least squares adjustment with constraints, in: A window on the future of Geodesy, F. Sanso (ed.), pp. 417-421., Springer Verlag, Berlin-Heidelberg-New York 2005 Schaffrin, B. (2007): Connecting the dots: The straight line case revisited, Deutscher Verein f¨ur Vermessungswesen (zfv2007), 132.Jg., 2007, pp. 385-394 Schaffrin, B. (2008): Minimum mean squared error (MSE) adjustment and the optimal TykhonovPhillips regularization parameter via reproducing best invariant quadratic uniformly unbiased estimates (repro-BIQUUE), J. Geodesy, 92 (2008) 113-121 Schaffrin, B., Grafarend, E. and Schmitt G. (1977): Kanonisches Design geod¨atischer Netze I. Manuscripta geodaetica 2, pp. 263-306. Schaffrin, B., Grafarend, E. and G. Schmitt (1977): Kanonisches Design Geod¨atischer Netze I, Manuscripta Geodaetica 2 (1977), 263-306
982
References
Schaffrin, B., Grafarend, E. and G. Schmitt (1978): Kanonisches Design Geod¨atischer Netze II, Manuscripta Geodaetica 2 (1978), 1-22 Schaffrin, B., Krumm, F. and Fritsch, D. (1980): Positiv diagonale Genauigkeit 0p.timierun von ¨ Realnetzen Uber den Komplementarltats-Algorlthmus In: Ingenieurvermessung 80, ed. Conzett R, Schmitt M,Schmitt H. Beitrage zum VIII. Internat. Kurs f¨ur Ingenieurvermessung, Zurich. Schaffrin, B. and E.W. Grafarend (1982a): Kriterion-Matrizen II: Zweidimensionale homogene und isotrope geod¨atische Netze, Teil II a: Relative cartesische Koordinaten, Zeitschrift f¨ur Vermessungswesen 107 (1982), 183-194 Schaffrin, B. and E.W. Grafarend (1982b): Kriterion-Matrizen II: Zweidimensionale homogene und isotrope geod¨atische Netze. Teil II b: Absolute cartesische Koordinaten, Zeitschrift f¨ur Vermessungswesen 107 (1982), 485-493 Schaffrin, B. and E. Grafarend (1986): Generating classes of equivalent linear models by nuissance parameter elimination, manuscripta geodaetica 11 (1986), 262-271 Schaffrin, B. and E.W. Grafarend (1991): A unified computational scheme for traditional and robust prediction of random effects with some applications in geodesy, The Frontiers of Statistical Scientific Theory & Industrial Applications 2 (1991), 405-427 Schaffrin, B. and J.H. Kwon (2002): A Bayes filter in Friendland form for INS/GPS vector gravimetry, Geophys. J. Int. 149 (2002), 64-75 Schaffrin, B. and A. Wieser (2008): On weighted total least squares adjustment for linear regression, J. Geodesy, 82, (2008) 415-421 Schall, R., Dunne, T.T. (1988): A unified approach to outliers in the general linear model. Sankhya, Ser. B 50, 2 (1988), 157-167. Scheff´e, H. (1959): The analysis of variance, Wiley, New York 1959 Scheidegger, A.E. (1965): On the statistics of the orientation of bedding planes, grain axes and similar sedimentological data, U.S. Geol. Survey Prof. Paper 525-C (1965), 164-167 Schleider, D. (1982): Complex crustal strain approximation, Ph.D. dissertation, Reports of Department of Surveying Engineering of UNB No.91, Canada Scheffe, H. (1943): On solutions of the Behrens-Fisher problem based on the t-distribution, Ann. Math. Stat., 14, 35-44 Scheffe, H. (1999): The analysis of variance. John Wiley & Sons, New York, 1999 (repr. of the 1959 orig.). Schek, H.J. (1974): The Force Densities Method for Form Finding and Computation of General Networks. Computer Methods in Applied Mechanics and Engineering, 3, 115-134. Schek, H.J. (1975): Least-Squares-Losungen und optimale Dampfung bei nichtlinearen Gleichungssystemen im Zusammenhang mit der bedingten Ausgleichung. Zeitschrift fiir Vermessungswesen, 2, 67-77. Schek, H.J. and Maier, P. (1976): Nichtlineare Normalgleichungen zur Bestimmung der Unbekannten und deren Kovarianzmatrix, Zeitschrift f¨ur Vermessungswesen 101 (1976) 140-159. Schetzen, M. (1980): The Volterra and Wiener theories of nonlinear systems, J. Wiley, New York 1980 Schick, A. (1999): Improving weighted least-squares estimates in heteroscedastic linear regression when the variance is a function of the mean response, J. Statist. Planning and Inference 76 (1999), 127-144 Schiebler, R. (1988): Giorgio de Chirico and the theory of relativity, Lecture given at Stanford University, Wuppertal 1988 Schmetterer, L. (1956): Einf¨uhrung in die mathematische Statistik, Wien 1956 Schmidt, E. (1907): Entwicklung willk¨urlicher Funktionen, Math. Annalen 63 (1907), 433-476 Schmidt, K. (1996): A comparison of minimax and least squares estimators in linear regression with polyhedral prior information, Acta Applicandae Mathematicae 43 (1996), 127-138 Schmidt, K.D. (1996): Lectures on risk theory, Teubner Skripten zur Mathematischen Stochastik, Stuttgart 1996 Schmidt, P. (1976): Econometrics, Marcel Dekker, New York 1976
References
983
Schmidt-Koenig, K. (1972): New experiments on the effect of clock shifts on homing pigeons in animal orientation and navigation, Eds.: S.R. Galler, K. Schmidt-Koenig, G.J. Jacobs and R.E. Belleville, NASA SP-262, Washington D:C. 1972 ¨ Schmidt, V. (1999): Begrundung einer Meˇanweisung f¨ur die Uberwachungsmessungen am Maschinenhaus des Pumpspeicherwerkes Hohenwarthe, Diplomarbeit, TU Bergakademie Frieberg Inst. f¨ur Markscheidewesen und Geodasie, 1999 Schmidt, W.H. and S. Zwanzig (1984): Testing hypothesis in nonlinear regression for nonnormal distributions, Proceedings of the Conference on Robustness of Statistical Methods and Nonparametric Statistics, VEBDeutscher Verlag der Wissenschaften Berlin, 1984, 134 138. Schmidt, W.H. and S. Zwanzig (1986):, Second order asymptotics in nonlinear regression. J. Multiv. Anal., VoL 18, No. 2, 187-215. Schmidt, W.H. and S. Zwanzig (1986): Testing hypothesis in nonlinear regression for nonnormal distributions. Statistics, VoL 17, No.4, 483-503. Schmitt, G. (1975): Optimaler Schnittwinkel der Bestimmungsstrecken beim einfachen Bogenschnitt, Allg. Vermessungsnachrichten 6 (1975), 226-230 Schmitt, G. (1977): Experiences with the second-order design problem in theoretical and practical geodetic networks, Proceedings International Symposium on Optimization of Design and Computation of Control Networks, Sporon 1977 Schmitt, G. (1977): Experiences with the second-order design problem in theoretical and practical geodetic networks, Optimization of design and computation of control networks. F. Halmos and J. Somogyi eds, Akad`eemiai Kiad`eo, Budapest (1979), 179-206 Schmitt, G. (1978): Gewichtsoptimierung bei Mehrpunkteinschaltung mit Streckenmessung, Allg. Vermessungsnachrichten 85 (1978), 1-15 Schmitt, G., Grafarend, E. and B. Schaffrin, (1978): Kanonisches Design Geod¨atischer Netze II. Manuscr. Geod. 1, 1-22. Schmitt, G. (1979): Zur Numerik der Gewichtsoptimierung in geod¨atischen Netzen, Deutsche Geod¨atische Kommission, Bayerische Akademie der Wissenschaften, Report 256, M¨unchen 1979 Schmitt, G. (1980): Second order design of a free distance network considering different types of criterion matrices, Bull. Geodetique 54 (1980), 531-543 Schmitt, G. (1985): Second Order Design, Third Order Design, Optimization and design of geodetic networks, Springer-Verlag, Heidelberg Berlin New York 1985, 74-121 Schmitt, G., Grafarend, E.W. and B. Schaffrin.(1977): Kanonisches Design geod¨atischer Netze I, manus-cripta geodaetica 2 (1977), 263-306 Schmitt, G., Grafarend, E.W. and B. Schaffrin (1978): Kanonisches Design geod¨atischer Netze II, manuscripta geodaetica 3 (1978), 1-22 Schmutzer, E. (1989): Grundlagen der Theoretischen Physik, Teil I, BI Wissenschaftsverlag, Mannheim Schneeweiß, H. and H.J. Mittag (1986): Lineare Modelle mit fehlerbehafteten Daten, PhysicaVerlag, Heidelberg 1986 Schock, E. (1987): Implicite iterative methods for the approximate solution of ill-posed problems, Bolletino U.M.I., Series 1-B, 7 (1987), 1171-1184 Schoenberg, I.J. (1938): Metric spaces and completely monotone functions, Ann. Math. 39 (1938), 811-841 Sch¨on, S. (2003): Analyse und Optimierung geod¨atischer Messanordnungen unter besonderer Ber¨ucksichtigung des Intervallansatzes, DGK, Reihe C, Nr. 567, M¨unchen, 2003 Sch¨on, S. and Kutterer, H. (2006): Uncertainty in GPS networks due to remaining systematic errors: the interval approach, in: Journal of Geodesy, Vol. 80 no. 3, pp. 150-162, 2006. Sch¨onemann, P.H. (1996): Generalized solution of the orthogonal Procrustes problem, Psychometrika 31 (1996) 1-10. ¨ Sch¨onemann, S.D. (1969): Methoden der Oknometrie, Bd. 1, Vuhlen, Berlin 1969 Sch¨onfeld P., Werner H.J. (1994): A note on C.R. Rao’s wider definition BLUE in the general Gauss-Markov model. Sankhya, Ser. B 49, 1 (1987), 1-8. Schott, J.R. (1997): Matrix analysis for statistics, J. Wiley, New York 1997
984
References
Schott, J.R. (1998): Estimating correlation matrices that have common eigenvectors, Comput. Stat. & Data Anal. 27 (1998), 445-459 Schouten, J.A. and J. Haantjes (1936): u¨ ber die konforminvariante Gestalt der relativistischen Bewegungsgleichungen, in: Koningl. Ned. Akademie van Wetenschappen, Proc. Section of Sciences, Vol. 39, Noord-Hollandsche Uitgeversmaatschappij, Amsterdam 1936 Schouten, J.A. and Struik, D.J. (1938): Einfiihrung in die neueren Methoden der Differentialgeometrie. I/Il, Noordhoff, Groningen-Batavia. Schreiber, O. (1882): Anordnung der Winkelbeobachtungen im GHttinger Basisnetz. lfV I, 129-161 Schroeder, M. (1991): Fractals, chaos, power laws, Freeman, New York 1991 Schultz, C. and G. Malay (1998): Orthogonal projections and the geometry of estimating functions, J. Statist. Planning and Inference 67 (1998), 227-245 Schultze, J. and J. Steinebach (1996): On least squares estimates of an exponential tail coefficient, Statistics & Decisions 14 (1996), 353-372 Schupler, B.R., Allshouse, R.L. and Clark, T.A. (1994): Signal Characteristics of GPS User Antennas. Navigation 4113: 177-295 Schuster, H.-F. and Forstner, W. (2003): Segmentierung, Rekonstruktion undDatenfusion bei der Objekterfassung mit Entfernungsdaten - ein Oberb/ick. Proceedings 2. Oldenburger 3DTage. Oldenburg Schur, J. (1911): Bemerkungen zur Theorie der verschr¨ankten Bilinearformen mit unendlich vielen Ver¨anderlichen, J. Reine und Angew. Math. 140 (1911), 1-28 Schur, J. (1917): u¨ ber Potenzreihen, die im Innern des Einheitskreises beschr¨ankt sind, J. Reine Angew. Math. 147 (1917), 205-232 Schwarz, G. (1978): Estimating the dimension of a model, The Annals Of Statistics 6 (1978), 461-464 Schwarz, H. (1960): Stichprobentheorie, Oldenbourg, M¨unchen 1960 Schwarz, K. P. (1976): Least-squares collocation for large systems, Boll. Geodesia e Scienze Affini 35 (1976), 309-324 Schwarz, C.R. and Kok, J.J. (1993) Blunder detection and Data Snooping in LS and Robust Adjustments. J Surv Eng 119:127-136 Schwarze, V.S. (1995): Satellitengeod¨atische Positionierung in der relativistischen Raum-Zeit, DGK, Reihe C, Heft Nr.449. Schwieger, W. (1996): An approach to determine correlations between GPS monitored deformation epochs, In: Proc. of the 8th International Symposium on Deformation Measurements, Hong Kong, 17-26, 1996. Scitovski, R. and D. Juki`ec (1996): Total least squares problem for exponential function, Inverse Problems 12 (1996), 341-349 Sckarosfsky, I.P. (1968): Generalized turbulence and space correlation and wave numberspectrum function pairs, Canadian J. of Physics 46 (1968) 2133-2153 Seal, H.L. (1967): The historical development of the Gauss linear model, Biometrika 54 (1967), 1-24 Searle, S.R. (1971a). Linear Models, J. Wiley, New York 1971 Searle, S.R. (1971b): Topics in variance components estimation, Biometrics 27 (1971), 1-76 Searle, S.R. (1974): Prediction, mixed models, and variance components, Reliability and Biometry (1974), 229-266 Searle, S.R. (1982): Matrix algebra useful for statistics, Wiley, New York 1982 Searle, S.R. (1994): Extending some results and proofs for the singular model. Linear Algebra Appl. 210 (1994), 139-15l. Searle, S.R. and C.R. Henderson (1961): Computing procedures for estimating components of variance in the two-way classification, mixed model, Biometrics 17 (1961), 607-616 Searle, S.R., Casella, G. and C.E. McCulloch (1992): Variance components, J. Wiley, New York 1992 Seber, G.A.F. (1977): Linear regression analysis, J. Wiley, New York 1977 Seber, G.A.F. and Wild, C.J. (1989): Nonlinear Regression. John Wiley & Sons, New York.
References
985
Seeber, G. (1989): Satellitengeodiisie: - Grundlagen, Methoden und Anwendungen. Walter deGruyter, Berlin- New York, 489p Seely, J. (1970): Linear spaces and unbiased estimation, Ann. Math. Statist. 41 (1970), 1725-1734 Seely, J. (1970): Linear spaces and unbiased estimation - Application to the mixed linear model, Ann. Math. Statist. 41 (1970), 1725-1734 Seely, J. (1971): Quadratic subspaces and completeness, Ann. Math. Statist. 42 (1971), 710-721 Seely, J. (1975): An example of an inquadmissible analysis of variance estimator for a variance component, Biometrika 62 (1975), 689 Seely, J. (1977): Minimal sufficient statistics and completeness, Sankhya, Series A, Part 2, 39 (1977), 170-185 Seely, J. (1980): Some remarks on exact confidence intervals for positive linear combinations of variance components, J. Am. Statist. Ass. 75 (1980), 372-374 Seely, J. and R.V. Hogg (1982): Unbiased estimation in linear models, Communication in Statistics 11 (1982), 721-729 Seely, J. and Y. El-Bassiouni (1983): Applying Wald’s variance components test, Ann. Statist. 11 (1983), 197-201 Seely, J. and E.-H. Rady (1988): When can random effects be treated as fixed effects for computing a test statistics for a linear hypothesis?, Communications in Statistics 17 (1988), 1089-1109 Seely, J. and Y. Lee (1994): A note on the Satterthwaite confidence interval for a variance, Communications in Statistics 23 (1994), 859-869 Seely, J., Birkes, D. and Y. Lee (1997): Characterizing sums of squares by their distribution, American Statistician 51 (1997), 55-58 Seemkooei, A.A. (2001): Comparison of reliability and geometrical strength criteria in geodetic networks, Journal of Geodesy 75 (2001), 227-233 Segura, J. and A. Gil (1999): Evaluation of associated Legendre functions off the cut and parabolic cylinder functions, Electronic Transactions on Numerical Analysis 9 (1999), 137-146 Seidelmann, P.K. (ed) (1992): Explanatory Supplement to the Astronomical Almanac. University Science Books, Mill Valley, 752p Selby, B. (1964): Girdle distributions on the sphere, Biometrika 51 (1964), 381-392 Sengupta D. (1995): Optimal choice of a new observation in a linear model. Sankhya, Ser. A 57, 1 (1995), 137-153. Sengupta, A. and R. Maitra (1998): On best equivariance and admissibility of simultaneous MLE for mean direction vectors of several Langevin distributions, Ann. Inst. Statist. Math. 50 (1998), 715-727 Sengupta, D. and S.R. Jammalamadaka (2003): Linear models, an integrated approach, In series of Multivariate analysis 6 (2003) Sering, R.J. (1980): Approximation theorems of mathematical statistics, J. Wiley, New York 1980 Shaban, A.M.M. (1994): ANOVA, MINQUE, PSD-MIQMBE, CANOVA and CMINQUE in estimating variance components, Statistica 54 (1994), 481-489 Shah, B.V. (1959): On a generalisation of the Kronecker product designs, Ann. Math. Statistics 30 (1959), 48-54 Shalabh (1998): Improved estimation in measurement error models through Stein rule procedure, J. Multivar. Anal. 67 (1998), 35-48 Shamir, G. and Zoback, M.D. (1992): Stress orientation profile to 3.5 km depth near the San Andreas fault at Cajon Pass, California, J. geophys. Res., B97, 5059-5080 Shao, Q.-M. (1996): p-variation of Gaussian processes with stationary increments, Studia Scientiarum Mathematicarum Hungarica 31 (1996), 237-247 Shapiro, S.S. and M.B. Wilk (1965): An analysis of variance for normality (complete samples), Biometrika 52 (1965), 591-611 Shapiro, S.S., Wilk, M.B. and M.J. Chen (1968): A comparative study of various tests for normality, J. Am. Statist. Ass. 63 (1968), 1343-1372 Shapiro, L.S. and Brady, M. (1995): Rejecting outliers and estimating errors in an orthogonal regression framework. Philos. Trans. R. Soc. Lond., Ser. A 350, 1694 (1995), 407-439.
986
References
Sheppard, W.F. (1912): Reduction of errors by means of negligible differences, Proc. 5th Int. Congress Mathematicians (Cambridge) 2 (1912), 348-384 Sheynin, O.B. (1966): Origin of the theory of errors, Nature 211 (1966), 1003-1004 Sheynin, O.B. (1979): Gauß and the theory of errors, Archive for History of Exact Sciences 20 (1979) Sheynin, O. (1995): Helmert’s work in the theory of errors, Arch. Hist. Exact Sci. 49 (1995), 73-104 Shiryayev, A.N. (1973): Statistical sequential analysis, Transl. Mathematical Monographs 8, American Mathematical Society, Providence/R.I. 1973 Shkarofsky, I.P. (1968): Generalized turbulence space-correlation and wave-number spectrumfunction pairs, Canadian Journal of Physics 46 (1968), 2133-2153 Shorack, G.R. (1969): Testing and estimating ratios of scale parameters, J. Am. Statist. Ass. 64, 999-1013, 1969 Shrivastava, M.P. (1941): Bivariate correlation surfaces. Sci Cult 6:615-616 Shumway, R.H. and D.S. Stoffer (2000): Time series analysis and its applications, Springer-Verlag, Heidelberg Berlin New York 2000 Shut, G.H. (1958/59): Construction of orthogonal matrices and their application in analytical Photogrammetrie, Photogrammetria XV (1958/59) 149-162 Shwartz, A. and Weiss, A. (1995): Large deviations for performance analysis. Chapman and Hall, Boca Raton Sibuya, M. (1960): Bivariate extreme statistics, Ann. Inst. Statist. Math. 11 (1960), 195-210 Sibuya, M. (1962): A method of generating uniformly distributed points on n-dimensional spheres, Ann. Inst. Statist. Math. 14 (1962), 81-85 Siegel, A.F. (1982): Robust Regression Using Repeated Medians Biometrika 69, pp. 242-244 ¨ Siegel, C.L. (1937): Uber die analytische Theorie der quadratischen Formen, Ann. of Math. vol. 38, (1937), pp. 212-291 Sillard, P., Altamimi, Z. and C. Boucher (1998): The ITRF96 realization and its associated velocity field, Geophysical Research Letters 25 (1998), 3223-3226 Silvey, S.D. (1969): Multicollilinearlity and imprecise estimation, J. Roy. Stat. Soc., Series B 35 (1969), pp 67-75 Silvey, S.D. (1975): Statistical inference, Chapman and Hall, Boca Raton 1975 Silvey, S.D. (1980): Optimal design, Chapman and Hall, 1980 Sima, V. (1996): Algorithms for linear-quadratic optimization, Dekker, New York 1996 Simmonet, M. (1996): Measures and probabilities, Springer-Verlag, Heidelberg Berlin New York 1996 Simoncini, V. and F. Perotti (2002): On the numerical solution of and application to structural dynamics, SIAM J. Sci. Comput. 23 (2002), 1875-1897 Simonoff. J.S. (1996): Smoothing methods in statistics. Springer, Heidelberg Berlin New York Sinai, Y.G. (1963): On properties of spectra of ergodic dynamical systems, Dokl. Akad. Nauk. SSSR 150 (1963) Sinai, Y.G. (1963): On higher order spectral measures of ergodic stationary processes Theory Probability Appl. (USSR) (1963) 429-436 Singer, P., Str¨obel, D,. H¨ordt, R., Bahndorf, J. and Linkwitz, K. (1993): Direkte L¨osung des r¨au¨ mlichen Bogenschnitts, Zeitschrift f¨ur Vermessungswesen 118 (1993) 20-24 Singh, R. (1963): Existence of bounded length confidence intervals, Ann. Math. Statist. 34 (1963), 1474-1485 Sion, M. (1958): On general minimax theorems. Pac. J. Math. 8:171-176 Sirkova, L. and V. Witkovsky (2001): On testing variance components in unbalanced mixed linear model, Applications in Mathematics 46(2001), 191-213 Sj¨oberg, L.E. (1983a): Unbiased estimation of variance-covariance components in condition adjustment. Univ. of Uppsala, Institute of Geophysics, Dept. of Geodesy, Report No. 19, 223-237. Sj¨oberg, L.E. (1983b): Unbiased estimation of variance-components in condition adjustment with unknowns-a MINQUE approach. Zeitschriftfur Vennessungen, 108(9), 382-387.
References
987
Sj¨oberg, L.E. (1984a): Non-negative variance component estimation in the Gauss-Helmert adjustment model. Manuscripta Geodaetica, 9, 247-280. Sj¨oberg, L.E. (1984b): Least-Squares modification of Stokes’ and Venin g-Meinez , formula by accounting for truncation and potential coefficients errors. Manuscripta Geodaetica, 9, 209-229. Sj¨oberg, L.E. (1985): Adjustment and. variance components estimation with a singular covariance . matrix. Zeitschriftfur Vermessungen, 110(4),145-151. Sj¨oberg, L.E. (1993): General Matrix calculus, adjustment, variance covariance components estimation. Lecture note, Royal Institute of Technology, Stockholm, Sweden. Sj¨oberg, L.E. (1994): The Best quadratic minimum biased non-negative definite estimator for an additive two variance components model. Royal Institute of Technology, Stockholm, Sweden. Sj¨oberg, L.E. (1995): The best quadratic minimum bias non-negative estimator for an additive two variance component model. Manuscripta Geodaetica (1995) 20:139-144. Sj¨oberg, L.E. (1999): An efficient iterative solution to transform rectangular geocentric coordinates to geodetic coordinates, Zeitschrift f¨ur Vermessungswesen 124 (1999) 295-297. Sj¨oberg, L.E. (2008): A computational scheme to model the geoid by the modified Stokes formula without gravity reductions, J. Geod., 74, 255-268. Sj¨oberg, L.E. (2011): On the best quadratic minimum bias non-negative estimator of a two-variance component model, J. Geodetic Science 1(2011) 280-285 Skolnikoff, I.S. (1956): Mathematical Theory of Elasticity, McGraw-Hill, New York 1956 Slakter, M.J. (1965): A comparison of the Pearson chi-square and Kolmogorov goodness-of-fittests with respect to validity, J. Am. Statist. Ass. 60 (1965), 854-858 Small, C.G. (1996): The statistical theory of shape, Springer-Verlag, Heidelberg Berlin New York 1996 Smith, A.F.M. (1973): A general Bayesian linear model, J. Roy. Statist. Soc. B35 (1973), 67-75 Smith, P.J. (1995): A recursive formulation of the old problem of obtaining moments from cumulants and vice versa, J. Am. Statist. Ass. 49 (1995), 217-218 Smith, T. and S.D. Peddada (1998): Analysis of fixed effects linear models under heteroscedastic errors, Statistics & Probability Letters 37 (1998), 399-408 Smyth, G.K. (1989): Generalized linear models with varying dispersion, J. Roy. Statist. Soc. B51 (1989), 47-60 Snedecor, G.W. and W.G. Cochran (1967): Statistical methods, 6th ed., Ames lowa State University Press 1967 Sneeuw, N. and R. Bun (1996): Global spherical harmonic computation by two-dimensional Fourier methods, Journal of Geodesy 70 (1996), 224-232 Solari, H.G., Natiello, M.A. and G.B. Mindlin (1996): Nonlinear dynamics, IOP, Bristol 1996 Soler, T. and Hothem, L.D. (1989): Important parameters used in geodetic transformations, Journal of Surveying Engineering 115 (1989) 414-417. Soler, T. and van Gelder, B. (1991): On covariances of eigenvalues and eigenvectors of second rank symmetric tensors, Geophys. J. Int., 105, 537-546 Solm, B.Y. and Kim, G.B. (1997): Detection of outliers in weighted least squares regression. Korean J. Comput. Appl. Math. 4, 2 (1997), 441-452. Solomon, P.J. (1985): Transformations for components of variance and covariance, Biometrica 72 (1985), 233-239 Somogyi, J. (1998): The robust estimation of the 2D-projective transformation, Acta Geod. Geoph. Hung. 33 (1998), 279-288 Song, S.H. (1999): A note on S2 in a linear regression model based on two-stage sampling data, Statistics & Probability Letters 43 (1999), 131-135 Soper, H.E. (1916): On the distributions of the correlation coefficient in small samples, Biometrika 11 (1916), 328-413 Soper, H.E., Young, A.W., Cave, B.M., Lee, A. and Pearsom, K. (1955) Biometrika 11 (1955) 328–415 Spanos, A. (1999): Probability theory and statistical inference, Cambridge University Press, Cambridge 1999
988
References
Sp¨ath, H. and G.A. Watson (1987): On orthogonal linear l1 approximation, Numerische Mathematik 51 (1987), 531-543 Spilker, J.J. (1996): Tropospheric Effects on GPS. In: Parkinson, B.W., Spilker, J.J. (eds) Global Positioning System: Theory and Applications, Volume 1, American Institute of Aeronautics and Astronautics, Washington DC, pp 517-546 Spivak, M. (1979): Differential Geometry I-V. Publish or Perish Inc., Berkeley. Sp¨ock, G. (1997): Die geostatistische Berucksichtigung von a-priori Kenntnissen uber die Trendfunktion und die Kovarianzfunktion aus Bayesscher, Minimax und Spektraler Sicht, master thesis, University of Klagenfurt Sp¨ock, G. (2008): Non-stationary spatial modeling using harmonic analysis. In: Ortiz, J.M., Emery, X. (ed) Proceedings of the eighth international geostatistics congress. Gecamin, Chile, pp. 389-398 Sp¨ock, G. and J. Pilz (2008): Non-spatial modeling using harmonic analysis, VIII Int. Geostatistics congress, pp. 1-10, Santiago, 2008 Sp¨ock, G. and J. Pilz (2009): Spatial modelling and design covariance- robust minimax prediction based on convex design ideas, Stoch. Environ. Res. Risk Assess (2009) 1-21 Sposito, V.A. (1982): On unbiased Lp regression. J. Am. Statist. Ass. 77 (1982), 652-653 Sprent, P. and N.C. Smeeton (1989): Applied nonparametric statistical methods, Chapman and Hall, Boca Raton, Florida 1989 Sprinsky, W.H. (1974): The design of special purpose horizontal geodetic control networks. Ph.D. thesis, Tha Ohio State Univ., Columbus. Sprott, D.A. (1978): Gauss’s contributions to statistics, Historia Mathematica 5 (1978), 183-203 Srivastava, A.K., Dube, M. and V. Singh (1996): Ordinary least squares and Stein-rule predictions in regression models under inclusion of some superuous variables, Statistical Papers 37 (1996), 253-265 Srivastava, A.K. and Shalabh, S. (1996): Efficiency properties of least squares and Stein-Rule predictions in linear regression models, J. Appl. Stat. Science 4 (1996), 141-145 Srivastava, A.K. and Shalabh, S. (1997): A new property of Stein procedure in measurement error model, Statistics & Probability Letters 32 (1997), 231-234 Srivastava, M.S. and D. von Rosen (1998): Outliers in multivariate regression models, J. Multivar. Anal. 65 (1998), 195-208 Srivastava, M.S., Rosen von, D. (2002): Regression models with unknown singular covariance matrix. Linear Algebra Appl. 354, 1-3 (2002), 255-273. Srivastava, V.K. and Upadhyaha, S. (1975): Small-disturbance and large sample approximations in mixed regression estimation, Eastern Economic Journal, 2 (1975), pp. 261-265 Srivastava, V.K. and B. Rah (1979): The Existence of the Mean of the Estimator in Seemingly, Communication in Statistics-Theory and Methods, A8(7)(1979), pp. 713-717 Srivastava, V.K. and Maekawa, K. (1995): Efficiency properties of feasible generalized least squares estimators in SURE models under non-normal disturbances, J. Econometrics 66, pp. 99-121 (1995) Stahlecker, P. and K. Schmidt (1996): Biased estimation and hypothesis testing in linear regression, Acta Applicandae Mathematicae 43 (1996), 145-151 Stahlecker, P., Knautz, H. and G. Trenkler (1996): Minimax adjustment technique in a parameter restricted linear model, Acta Applicandae Mathematicae 43 (1996), 139-144 Stam, A.J. (1982): Limit theorems for uniform distributions on spheres in high dimensional Euclidean spaces, J. Appl. Prob. 19 (1982), 221-229 Stark, E. and Mikhail, E. (1973): Least Squares and Non-Linear Functions. Photogrammetric Engineering, Vol. XXXIX, No.4, 405-412. Stark, H. (ed) (1987): Image recovery: theory and application. Academic, New York Staudte, R.G. and Sheather, S.J. (1990): In Robust Estimation and Testing. John Wiley & Sons: USA, 1990. Steeb, W.H. (1991): Kronecker product of matrices and applications, B.J. Wissenschaftsverlag 1991 Steele, B.M.: A modified EM algorithm for estimation in generalized mixed models
References
989
Stefanski, L.A. (1989): Unbiased estimation of a nonlinear function of a normal mean with application to measurement error models, Communications Statist. Theory Method. 18 (1989), 4335-4358 Stefansky, W. (1971): Rejecting outliers by maximum normal residual, Ann. Math. Statistics 42 (1971), 35-45 Stein, C. (1945): A two-sample test for a linear hypothesis whose power is independent of the variance, Ann. Math. Statistics 16 (1945), 243-258 Stein, C. (1950): Unbiased estimates with minimum variance, Ann. Math. Statist. 21 (1950), 406-415 Stein, C. (1959): An example of wide discrepancy between fiducial and confidence intervals, Ann. Math. Statist. 30 (1959), 877-880 Stein, C. (1964): Inadmissibility of the usual estimator for the variance of a normal distribution with unknown mean, Ann. Inst. Statist. Math. 16 (1964), 155-160 Stein, C. and A. Wald (1947): Sequential confidence intervals for the mean of a normal distribution with known variance, Ann. Math. Statist. 18 (1947), 427-433 Stein, M.L. (1999): Interpolation of spatial data: some theory for kriging. Springer, New York Steinberg, G. (1994): A Kinematic Approach in the Analysis of Kfar Hnassi Network. Perelmuter Workshop on Dynamic Deformation Models, Haifa, Israel. Steiner, F. and B. Hajagos (1999): A more sophisticated definition of the sample median, Acta Geod. Geoph. Hung. 34 (1999), 59-64 Steiner, F. and B. Hajagos (1999): Insufficiency of asymptotic results demonstrated on statistical efficiencies of the L1Norm calculated for some types of the supermodel fp(x), Acta Geod. Geoph.Hung. 34 (1999), 65-69 Steiner, F. and B. Hajagos (1999): Error characteristics of MAD-S (of sample medians) in case of small samples for some parent distribution types chosen from the supermodels fp(x) and fa(x), Acta Geod. Geoph. Hung. 34 (1999), 87-100 Steinmetz, V. (1973): Regressionsmodelle mit stochastischen Koeffizienten, Proc. Oper. Res. 2, DGOR Ann. Meet., Hamburg 1973, 95-104 Stenger, H. (1971): Stichprobentheorie, Physica-Verlag, W¨urzburg 1971 Stephens, M.A. (1963): Random walk on a circle, Biometrika 50 (1963), 385-390 Stephens, M.A. (1964): The testing of unit vectors for randomness, J. Amer. Statist. Soc. 59 (1964), 160-167 Stephens, M.A. (1969): Tests for randomness of directions against two circular alternatives, J. Am. Statist. Ass. 64 (1969), 280-289 Stephens, M.A. (1969): Test for the von Mises distribution, Biometrika 56 (1969), 149-160 Stephens, M.A. (1979): Vector correlations, Biometrika 66 (1979), 41-88 Stepniak, C. (1985): Ordering of nonnegative definite matrices with application to comparison of linear models, Linear Algebra And Its Applications 70 (1985), 67-71 Stewart, G.W. (1995): Gauss, statistics, and Gaussian elimination, Journal of Computational and Graphical Statistics 4 (1995), 1-11 Stewart, G.W. (1995): Afterword, in Translation: Theoria Combinationis Observationum Erroribus Minimis Obnoxiae, pars prior-pars posterior-supplementum by Carl Friedrich Gauss Theory of the Combination of Observations Least Subject to Errors, Classics in Applied Mathematics, SIAM edition, 205-236, Philadelphia 1995 Stewart, G.W. (1977): On the perturbation of pseudo-inverses, projections and linear least squares, SIAM Review 19 (1977), 634-663 Stewart, G.W. (1992): An updating algorithm for subspace tracking, IEEE Trans. Signal Proc. 40 (1992), 1535-1541 Stewart, G.W. (1998): Matrix algorithms, Vol. 1: Basic decompositions, SIAM, Philadelphia 1998 Stewart, G.W. (1999): The QLP approximation to the singular value decomposition, SIAM J. Sci. Comput. 20 (1999), 1136-1348 Stewart, G.W. (2001): Matrix algorithms, Vol. 2: Eigen systems, SIAM, Philadelphia 2001 Stewart, G.W. and Sun Ji-Guang (1990): Matrix perturbation theory, Academic Press, New York London 1990
990
References
Stewart, K.G. (1997): Exact testing in multivariate regression, Econometric reviews 16 (1997), 321-352 Steyn, H.S. (1951): The Wishart distribution derived by solving simultaneous linear differential equations, Biometrika, 38, 470-472 Stigler, S.M. (1973): Laplace, Fisher, and the discovery of the concept of sufficiency, Biometrika 60 (1973), 439-445 Stigler, S.M. (1973): Simon Newcomb, Percy Daniell, and the history of robust estimation 1885-1920, J. Am. Statist. Ass. 68 (1973), 872-879 Stigler, S.M. (1977): An attack on Gauss, published by Legendre in 1820, Historia Mathematica 4 (1977), 31-35 Stigler, S.M. (1986): The history of statistics, the measurement of uncertainty before 1900, Belknap Press, Harvard University Press, Cambridge/Mass. 1986 Stigler, S.M. (1999): Statistics on the table, the history of statistical concepts and methods, Harvard University Press, Cambridge London 1999 Stigler, S.M. (2000): International statistics at the millennium: progressing or regressing, International Statistical Review 68 (2000), 2, 111-121 Stone, M. (1974): Cross-validatory choice and assessment of statistical predictions. J R Statist Soc B 36:111-133 Stopar, B. (1999): Design of horizontal GPS net regarding non-uniform precision of GPS baseline vector components, Bollettino di Geodesia e Scienze Affini 58 (1999), 255-272 Stopar, B. (2001): Second order design of horizontal GPS net, Survey Review 36 (2001), 44-53 Storm, R. (1967): Wahrscheinlichkeitsrechnung, mathematische Statistik und statistische Qualit¨atskontrolle, Leipzig 1967 Stotskii, A.A. and Elgerad, K.G., Stoskaya, J.M. (1998): Structure analysis of path delay variations in the neutral atmosphere, Astr. Astrophys. Transact. 17 (1998) 59-68 Stoyan, D. and Stoyan, H. (1994): Fractals, Random Shapes and Point Fields, Chichester: John Wiley & Sons, 1994 Stoyanov, J. (1997): Regularly perturbed stochastic differential systems with an internal random noise, Nonlinear Analysis, Theory, Methods & Applications 30 (1997), 4105-4111 Stoyanov, J. (1998): Global dependency measure for sets of random elements: The Italian problem and some consequences, in: Ioannis Karatzas et al. (eds.), Stochastic process and related topics in memory of Stamatis Cambanis 1943-1995, Birkh¨auser-Verlag, Basel Boston Berlin 1998 Strang, G. and Borre, K. (1997): Linear Algebra, Geodesy and GPS, Wellesley Cambridge Press, Wellesley 1997. Strubecker, K. (1964): Differentialgeometrie I-III. Sammlung Goschen, Berlin. Stuelpnagel, J. (1964): On the parametrization of the three-dimensional rotation group. SIAM Review, Vol. 6, No.4, 422-430. Sturmfels, B. (1994): Multigraded resultant of Sylvester type, Journal of Algebra 163 (1994) 115-127. Sturmfels, B. (1996): Gr¨obner bases and convex polytopes, American Mathematical Society, Providence 1996. Sturmfels, B. (1998): Introduction to resultants, Proceedings of Symposia in Applied Mathematics 53 (1998) 25-39. Stoyanov, J. (1999): Inverse Gaussian distribution and the moment problem, J. Appl. Statist. Science 9 (1999), 61-71 Stoyanov, J. (2000): Krein condition inprobabilistic moment problems, Bernoulli 6 (2000), 939-949 Srecok, A.J. (1968): On the calculation of the inverse of the error function, Math. Computation 22 (1968), 144-158 Str¨obel, D. (1997): Die Anwendung der Ausgleichungsrechnung auf elastomechanische Systeme, Deutsche Geod¨atische Kommission, Bayerische Akademie der Wissenschaften, Report C478, M¨unchen 1997 Stroud, A.H. (1966): Gaussian quadrature formulas, Prentice Hall, Englewood Cliffs, N.J. 1966
References
991
Stuart, A. and J.K. Ord (1994): Kendall’s advanced theory of statistics: volume I, distribution theory, Arnold Publ., 6th edition, London 1997 Student: The probable error of a mean, Biometrika 6 (1908), 1-25 Stulajter, F. (1978): Nonlinear estimators of polynomials in mean values of a Gaussian stochastic process, Kybernetika 14 (1978), 206-220 Styan, G.P.H. (1973): Hadamard products and multivariate statistical analysis, Linear Algebra Appl., 6 (1973), pp. 217-240 Styan, G.P.H. (1983): Generalised inverses in: Encyclopedia of statistical sciences, Vol. 3, S. Kotz, N.L. Johnson and C.B. Read (eds.), pp. 334-337, wiley, New York, 1983 Styan, G.P.H. (1985): Schur complements and statistics, in: Proc. First International Tampere Seminar on Linear Statistical Models, T. Pukkila and S. Puntaten (eds.), pp. 37-75, Dept, of Math. Sciences, Univ. of Tampere, Finland, 1985 Subrahamanyan, M. (1972): A property of simple least squares estimates, Sankhya B34 (1972), 355-356 Sugaria, N. and Y. Fujikoshi (1969): Asymptotic expansions of the non-null distributions of the likelihood ratio criteria for multivariate linear hypothesis and independence, Ann. Math. Stat. 40 (1969), 942-952 Sun, J.-G. (2000): Condition number and backward error for the generalized singular value decomposition, Siam J. Matrix Anal. Appl. 22 (2000), 323-341 Sun, W. (1994): A New Method for Localization of Gross Errors. Surv. Rev. 1994, 32, 344-358. Sun, Z. and Zhao, W. (1995): An algorithm on solving least squares solution of rank-defective linear regression equations with constraints. Numer. Math., Nanjing 17, 3 (1995), 252-257. S¨unkel, H. (1984): Fourier analysis of geodetic networks. Erice Lecture Notes, this volume, Heidelberg. S¨unkel, H. (1999): Ein nicht-iteratives Verfahren zur Transformation geod¨atischer Koordinaten, ¨ Oster. Zeitschrift f¨ur Vermessungswesen 64 (1999) 29-33. Svendsen, J.G.G. (2005): Some properties of decorrelation techniques in the ambiguity space. GPS Solutions. doi:10.1007/s10291-005-0004-6 Swallow, W.H. and S.R. Searle (1978): Minimum variance quadratic unbiased estimation (MIVQUE) of variance components, Technometrics 20 (1978), 265-272 Swamy, P.A.V.B. (1971): Statistical inference in random coefficient regression models, SpringerVerlag, Heidelberg Berlin New York 1971 Swamy, P.A.V.B. and J.A. Mehta (1969): On Theil’s mixed regression estimators, J. Americ. Stst. Assoc., 64 (1969), pp. 273-276 Swamy, P.A.V.B. and J.A. Mehta (1977): A note on minimum average risks estimators for coefficients in linear models, Communications in Stat., Part A -Theory and Methods, 6 (1977), pp. 1181-1186 Sylvester, J.J. (1850): Additions to the articles, On a new class of theorems, and On Pascal’s theorem, Phil. Mag. 37 (1850), 363-370 Sylvester, J.J. (1851): On the relation between the minor determinants of linearly equivalent quadratic functions, Phil. Mag. 14 (1851), 295-305 Szabados, T. (1996): An elementary introduction to the Wiener process and stochastic integrals, Studia Scientiarum Mathematicarum Hungarica 31 (1996), 249-297 Szasz, D. (1996): Boltzmann’s ergodic hypothesis, a conjecture for centuries?, Studia Scientiarum Mathematicarum Hungarica 31 (1996), 299-322 Takos, I. (1999): Adjustment of observation equations without full rank, Bolletino di Geodesia e Scienze Affini 58 (1999), 195-208 Tanana, V.P. (1997): Methods for solving operator equations, VSP, Utrecht, Netherlands 1997 Tanizaki, H. (1993): Nonlinear filters - estimation and applications, Springer-Verlag, Heidelberg Berlin New York 1993 Tanner, A. (1996): Tools for statistical inference, 3rd ed., Springer-Verlag, Heidelberg Berlin New York 1996
992
References
Tarpey, T. (2000): A note on the prediction sum of squares statistic for restricted least squares, The American Statistician 54 (2000), 116-118 Tarpey, T. and B. Flury (1996): Self-consistency, a fundamental concept in statistics, Statistical Science 11 (1996), 229-243 Tasche, D. (2003): Unbiasedness in least quantile regression, in: R. Dutter, P. Filzmoser, U. Gather, P.J. Rousseeuw (eds.), Developments in Robust Statistics, 377-386, Physica Verlag, Heidelberg 2003 Tashiro, Y. (1977): On methods for generating uniform random points on the surface of a sphere, Ann. Inst. Statist. Math. 29 (1977), 295-300 Tatarski, V.J. (1961): Wave propagation in a turbulent medium Mc Graw Hill book Comp., New York 1961 Tate, R.F. (1959): Unbiased estimation functions of location and scale parameters, Ann. Math. Statist. 30 (1959), 341-366 Tate, R.F. and G.W. Klett (1959): Optimal confidence intervals for the variance of a normal distribution, J. Am. Statist. Ass.. 16 (1959), 243-258 Tatom, F.B. (1995): The relationship between fractional calculus and fractals. Fractals, 1995, 3, 217-229 Teicher, H. (1961): Maximum likelihood characterization of distribution, Ann. Math. Statist. 32 (1961), 1214-1222 Tennekes, H. and J.L. Lumley (1972): A First Course in Turbulence, MIT Press, 300 pages, Cambridge, MA Ter¨asvirta, T. (1981): Some results on improving the least squares estimation of linear models by mixed estimators, Scandinavian Journal of Statistics 8 (1981), pp. 33-38 Ter¨asvirta, T. (1982): Superiority comparisons of homogeneous linear estimators. Communications in Statistics, All, pp. 1595-1601 Taylor, J.R. (1982): An introduction to error analysis, University Science Books, Sausalito 1982 Taylor, G.I. (1935): Statistical theory of turbulence. Proc. Roy. Soc. A 151, pp. 421-478. Taylor, G.I. (1938): The spectrum of turbulence. Proceedings Royal Society of London A 164, pp. 476-490. Taylor, G.I. (1938): Production and dissipation of vorticity in a turbulent fluid. Proceedings Royal Society of London A164, pp. 15-23. Tesafikova, E. and Kubacek, L. (2003): Estimators of dispersion in models with constraints (demoprogram). Department of Algebra and Geometry, Faculty of Science, Palacky University, Olomouc, 2003. Tesafikova, E. and Kubacek, L. (2004): A test in nonlinear regression models. Demo program. Department of Algebra and Geometry, Faculty of Science, Palacky University, Olomouc, 2004 (in Czech). Tesafikova, E. and Kubacek, L. (2005): Variance components and nonlinearity. Demoprogram. Department of Algebra and Geometry, Faculty of Science, Palacky University, Olomouc, 2005. Teunissen, P.J.G. (1985a): The geometry of geodetic inverse linear mapping and non-linear adjustment, Netherlands Geodetic Commission, Publications on Geodesy, New Series, Vol. 8/1, Delft 1985 Teunissen, P.J.G. (1985b): Zero order design: generalized inverses, adjustment, the datum problem and S-transformations, In: Optimization and design of geodetic networks, Grafarend, E.W. and F. Sanso eds., Springer-Verlag, Heidelberg Berlin New York 1985 Teunissen, P.J.G. (1987): The 1 and 2D Symmetric Helmert Transformation: Exact Non-linear Least-Squares Solutions. Reports of the Department of Geodesy, Section Mathematical and Physical Geodesy, No. 87.1, Delft Teunissen, P.J.G. (1989a): Nonlinear inversion of geodetic and geophysical data: diagnosing nonlinearity, In: Brunner, F.K. and C. Rizos (eds.): Developments in four-dimensional geodesy, Lecture Notes in Earth Sciences 29 (1989), 241-264 Teunissen, P.J.G. (1989b): First and second moments of non-linear least-squares estimators, Bull. Geod. 63 (1989), 253-262
References
993
Teunissen, P.J.G. (1990): Non-linear least-squares estimators, Manuscripta Geodaetica 15 (1990), 137-150 Teunissen, P.J.G. (1991): Circular Letter to all Special Study Group 4.120 Members, 24.01.1991, nicht ver6:ffentlicht. Teunissen, P.J.G. (1993): Least-squares estimation of the integer GPS ambiguities, LGR series, No. 6, Delft Geodetic Computing Centre, Delft 1993 Teunissen, P.J.G. (1995a): The invertible GPS ambiguity transformation, Manuscripta Geodaetica 20 (1995), 489-497 Teunissen, P.J.G. (1995b): The least-squares ambiguity decorrelation adjustment: a method for fast GPS integer ambiguity estimation, Journal of Geodesy 70 (1995), 65-82 Teunissen, P.J.G. (1997): A canonical theory for short GPS baselines. Part I: The baseline precision, Journal of Geodesy 71 (1997), 320-336 Teunissen, P.J.G. (1997): On the sensitivity of the location, size and shape of the GPS ambiguity search space to certain changes in the stochastic model, Journal of Geodesy 71 (1997), 541-551 Teunissen, P.J.G. (1997): On the GPS widelane and its decorrelating property, Journal of Geodesy 71 (1997), 577-587 Teunissen, P.J.G. (1997): The least-squares ambiguity decorrelation adjustment: its performance on short GPS baselines and short observation spans, Journal of Geodesy 71 (1997), 589-602 Teunissen, P.J.G. (1998): Quality Control and GPS. In: Teunissen PJG, KIeusberg A(eds) GPSfor Geodesy, 2nd ed, Springer Verlag, Berlin, Hedilberg, pp. 271-318 Teunissen, P. (1998b): First and second moments of non-linear least-quares estimations, Bull. Geod. 63 (1989) 253-262 Teunissen, P.J.G. (1999a): The probability distribution of the GPS baseline for a class of integer ambiguity estimators. J Geod 73: 275-284 Teunissen, P.J.G. (1999b): An optimality property of the integer leastsquares estimator. J. Geod. 73:587-593 Teunissen, P.J.G. (2003): Theory of integer equivariant estimation with application to GNSS. J. Geod. 77:402-410 Teunissen, P.J.G. and Knickmeyer, E.H. (1988): Non-linearity and least squares, CISM Journal ASCGC 42 (1988) 321-330. Teunissen, P.J.G. and Kleusberg, A. (1998): GPS for geodesy, 2nd enlarged edition. Springer, Heidelberg Teunissen, P.J.G., Simons, D.G., and Tiberius, C.C.J.M. (2005): Probability and observation theory. Lecture Notes Delft University of Technology, 364 p Teunissen, P.J.G. and A.R. Amiri-Simkooei (2008): Least-squares variance component estimation, Journal of Geodesy, 82 (2008): 65-82. Teunissen, P.J.G. (2008):On a stronger-than-best property for best prediction, J. Geodesy 82 (2008) 165-175 Theil, H. (1963): On the use of incomplete prior information in regression analysis, J,. Amer. Stat. Assoc., 58 (1963), pp. 401-414 Theil, H. (1965): The analysis of disturbances in regression analysis, J. Am. Statist. Ass. 60 (1965), 1067-1079 Theil, H. (1971): Principles of econometrics, Wiley, New York 1971 Theil, H. and A.S. Goldberger (1961): On Pure and Mixed Statistical Estimation in Economics, International Economic Review, Vol. 2, 1961, p. 65 Theobald, C.M. (1974): Generalizations of Mean Square Error Applied to Ridge Regression, Journal of the Royal Statistical Society 36 (series B): 103-106. Thompson, R. (1969). Iterative estimation of variance components for non-orthogonal data, Biometrics 25 (1969), 767-773 Thompson, W.A. (1955): The ratio of variances in variance components model, Ann. Math. Statist. 26 (1955), 325-329 Thompson, E. H. (1959a): A method for the construction of orthogonal matrices, Photogrammetria III (1959) 55-59.
994
References
Thompson, E. H. (1959b): An exact linear solution of the absolute orientation. Photogrammetria XV (1959) 163-179. Thomson, D.J. (1982): Spectrum estimation and harmonic analysis, Proceedings Of The IEEE 70 (1982), 1055-1096 Thorand, V. (1990): Algorithmen zur automatischen Berechnung von Naherungskoordinaten in geod¨atischen Lagenetzen. Vermessungstechnik, 4, 120-124. Thorpe, J.A. (1979): Elementary Topics in Differential Geometry. Springer-Verlag, New York. Tychonoff, A.N. (1963): Solution of incorrectly formulated problems and regularization method, Dokl. Akad. Nauk. SSSR, 151 (1963), pp. 501-504 Tiberius, C.C.J.M., de Jonge, P.J. (1995): Fast positioning using the LAMBDA method. In: Proceedings DSNS-95, paper 30, 8 p Tiberius, C.C.J.M., Teunissen, P.J.G. and de Jonge, P.J. (1997): Kinematic GPS: performance and quality control. In: Proceedings KIS97, pp. 289-299 Tiberius, C.C.J.M., Jonkman, N. and F. Kenselaar (1999): The Stochastics of GPS Observables. GPS World 10: 49-54 Tiberius, C.C.J.M. and F. Kenselaar (2000): Estimation of the stochastic model for GPS code and phase observables, Survey Review 35 (2000), 441-455 Tiberius, C.C.J.M. and F. Kenselaar (2003): Variance Component Estimation and Precise GPS Positioning: Case study, Journal of surveying engineering, 129(1): 11-18, 2003. Tikhonov, A.N. (1963): Solution of incorrectly formulated problems arid regularization method. Dokl. Akad. Nauk. SSSR, 1 51, 501-504. Tikhonov, A.N. and V.Y. Arsenin (1977): Solutions of ill-posed problems, J. Wiley, New York 1977 Tikhonov, A.N., Leonov, A.S. and A.G. Yagola (1998): Nonlinear ill-posed problems, Vol. 1, Appl. Math. and Math. Comput. 14, Chapman & Hall, London 1998 Tj¨ostheim, D. (1990): Non-linear time series and Markov chains, Adv. Appl. Prob. 22 (1990), 587-611 Tjur, T. (1998): Nonlinear regression, quasi likelihood, and overdispersion in generalized linear models, American Statistician 52 (1998), 222-227 Tobias, P.A. and D.C. Trinidade (1995): Applied reliability, Chapman and Hall, Boca Raton 1995 Tominaga, Y. and I. Fujiwara (1997): Prediction-weighted partial least-squares regression (PWPLS), Chemometrics and Intelligent Lab Systems 38 (1997), 139-144 Tong, H. (1990): Non-linear time series, Oxford University Press, Oxford 1990 Toranzos F.I. (1952): An asymmetric bell-shaped frequency curve, Ann. Math. Statist. 23 (1952), 467-469 Torge, W. and Wenzel, H.G. (1978): Dreidimensionale Ausgleichung des Testnetzes Westharz.Deutsche Geodattsche Kommission, Bayerische Akademie der Wissenschaften Report B234, M¨unchen. Torge, W. (2001): Geodesy, 3rd edn. Walter de Gruyter, Berlin Toro-Vizearrondo, C. and T.D. Wallace (1969): A Test of the Mean Square Error Criterion for Restrictions in Linear Regression, Journal of the American Stst. Assoc., 63 (1969), pp. 558-572 Toutenburg, H. (1968): Vorhersage im allgemeinen linearen Regressionsmodell mit Zusatzinformation u¨ ber die Koeffzienten, Operationsforschung Maths Statistik, Vol. 1, pp. 107-120, Akademic Verlag, Berlin 1968 Toutenburg, H. (1970a): Vorhersage im allgemeinen linearen Regressionsmodell mit stochastischen Regressoren, Math. Operationsforschg. Statistik 2 (1970), 105-116 Toutenburg, H. (1970b): Probleme linearer Vorhersagen im allgemeinen linearen Regressionsmodell, Biometrische Zeitschrift, 12: pp. 242-252 ¨ Toutenburg, H. (1970c): Uber die Wahl zwischen erwartungstreuen und nichterwartungstreuen Vorhersagen, Operationsforschung Mathematische Statistik, vol. 2, pp. 107-118, Akademic Verlag, Berlin 1970 Toutenburg, H. (1973): Lineare Restriktionen und Modellwahl im allgemeinen linearen Regressionsmodell, Biometrische Zeitschrift, 15 (1973), pp. 325-342 Toutenburg, H. (1975): Vorhersage in linearen Modellen, Akademie Verlag, Berlin 1975
References
995
Toutenburg, H. (1976): Minimax-linear and MSE-linear estimators in generalized regression, Biometr. J. 18 (1976), pp. 91-104 Toutenburg, H. (1982): Prior Information in Linear Models, Wiley, New York. Toutenburg, H. (1989a): Investigations on the MSE-superiority of several estimators of filter type in the dynamic linear model (i.e. Kalman model), Technical Report 89-26, Center for Multivariate Analysis, The Pennsylvania State University, State College Toutenburg, H. (1989b): Mean-square-error-comparisons between restricted least squares, mixed and weighted mixed estimators, Forschungsbericht 89/12, Fachbereich Statistik, Universitat Dortmund, Germany Toutenburg, H. (1996): Estimation of regression coefficients subject to interval constraints, Sankhya: The Indian Journal of Statistics A, 58 (1996), 273-282 Toutenburg, H. and Schaffrin, B. (1989): Biased mixed estimation and related problems, Technical Report, Universitat Stuttgart, Germany Toutenburg, H. and Trenkler, G. (1990): Mean square error matrix comparisons of optimal and classical predictors and estimators in linear regression, Computational Statistics and Data Analysis 10: 297-305 Toutenburg, H., Trenkler, G. and Liski, E.P. (1992): Optimal estimation methods under weakened linear restrictions, Computational Statistics and Data Analysis 14: 527-536. Toutenburg, H. and Shalabh (1996): Predictive performance of the methods of restricted and mixed regression estimators, Biometrical Journal 38: 951-959. Toutenburg, H., Fieger, A. and Heumann, C. (1999): Regression modelling with fixed effectsmissing values and other problems, in C. R. Rao and G. Szekely (eds.), Statistics of the 21st Century, Dekker, New York. Townsend, A.A. (1976): The Structure of Turbulent Shear Flow, Second Edition, Cambridge University Press, 429 pages 1976 Townsend, E.C. and S.R. Searle (1971): Best quadratic unbiased estimation of variance components from unbalanced data in the l-way classification, Biometrics 27 (1971), 643-657 Trefethen, L.N. and D. Bau (1997): Numerical linear algebra, Society for Industrial and Applied Mathematics (SIAM), Philadelphia 1997 Trenkler, G. (1985): Mean square error matrix comparisons of estimators in linear regression, Communications in Statistics, Part A-Theory and Methods 14: 2495-2509 Trenkler, G. (1987): Mean square error matrix comparisons among restricted least squares estimators, Sankhya, Series A 49: 96-104 Trenkler, G. and Trenkler, D. (1983): A note on superiority comparisons of linear estimators, Communications in Statistics, Part A-Theory and Methods 17: 799-808 Trenkler, G. and Toutenburg, H. (1990): Mean-square error matrix comparisons between biased estimators an overview of recent results, Statistical Papers 31: 165-179 Troskie, C.G. and D.O. Chalton (1996): Detection of outliers in the presence of multicollinearity, in: Multidimensional statistical analysis and theory of random matrices, Proceedings of the Sixth Lukacs Symposium, eds. Gupta, A.K. and V.L. Girko, 273-292, VSP, Utrecht 1996 Troskie, C.G., Chalton, D.O. and M. Jacobs (1999): Testing for outliers and influential observations in multiple regression using restricted least squares, South African Statist. J. 33 (1999), 1-40 Trujillo-Ventura, A. and Ellis, J.H. (1991): Multiobjective air pollution monitoring network design. Atmos Environ 25:469-479 Tscherning, C.C. (1978): Collocation and least-squares methods as a tool for handling gravity field dependent data obtained through space research techniques. Bull Geod 52:199-212 Tseng, Y.Y. (1936): The characterisitic value problem of Hermitian functional operators in a non-Hilbert space, Ph.D. Thesis, Univeristy of Chicago, Chicago 1936 Tseng, Y.Y. (1949a): Sur les solutions des e` quations op`eratrices fonctionnelles entre les espaces enitaires, C.R. Acad. Sci. Paris, 228 (1949), pp. 640-641 Tseng, Y.Y. (1949b): Generalized inverses of unbounded operators between two arbitrary spaces, Dokl. Acas. Nauk SSSR (new series) 67 (1949), pp. 431-434 Tseng, Y.Y. (1949c): Properties and classification of generalized inverses of closed operators, Dokl. Acad. Nauk SSSR (new series) 67 (1949), pp. 607-610
996
References
Tseng, Y.Y. (1956): Virtual solutions and general inversions, Uspehi. Mat. Nauk (new series) 11, (1956), pp. 213-215 Tsimikas, J.V. and J. Ledolter (1997): Mixed model representation of state space models: New smoothing results and their application to REML estimation, Statistica Sinica 7 (1997), 973-991 Tu, R. (1996): Comparison between confidence intervals of linear regression models with and without restriction. Commun. Stat., Theory Methods 28, 12 (1999), 2879-2898. Tufts, D.W. and R. Kumaresan (1982): Estimation of frequencies of multiple sinusoids: making linear prediction perform like maximum likelihood, Proc. of IEEE Special issue on Spectral estimation 70 (1982), 975-989 Tukey, J.W. (9159): An introduction to the measurement of spectra, in: Grenander U.: Probability and statistics, New York Turkington, D. (2000): Generalised vec operators and the seemingly unrelated regression equations model with vector correlated disturbances, Journal of Econometrics 99 (2000), 225-253 Ulrych, T.J. and R.W. Clayton (1976): Time series modelling and maximum entropy, Phys. Earth and Planetary Interiors 12 (1976), 188-200 Vainikko, G.M. (1982): The discrepancy principle for a class of regularization methods, USSR. Comp. Math. Math. Phys. 22 (1982), 1-19 Vainikko, G.M. (1983): The critical level of discrepancy in regularization methods, USSR. Comp. Math. Math. Phys. 23 (1983), 1-9 Van Der Waerden, B.L. (1950): Modern Algebra, 3rd Edition, F. Ungar Publishing Co., New York 1950. Van Huffle, S. (1990): Solution and properties of the restricted total least squares problem, Proceedings of the International Mathematical Theory of Networks and Systems Symposium (MTNS 1989), 521-528 Vanicek, P. and E.W. Grafarend (1980): On the weight estimation in leveling, National Oceanic and Atmospheric Administration, Report NOS 86, NGS 17, Rockville 1980 Vanicek, P. and Krakiwsky, E.J. (1982): Geodesy: The concepts. North-Holland Publishing Company, Amsterdam-New York-Oxford 1982. Van Loa, C.F. (1978): Computing integrals involving the matrix exponential. IEEE Trans. Autom. Control AC- 23: 395-404 Van Mierlo, J. (1988): R¨uckw¨artschnitt mit Streckenverh¨altnissen, Algemain Vermessungs Nachrichten 95 (1988) 310-314. Van Montfort, K. (1988): Estimating in structural models with non-normal distributed variables: some alternative approaches, Leiden 1988 Van Ness, J.W. (1965): Asymptotic normality of bispectral estimates, Technical report No 14, Dept. Statistics, Stanford University 1965 Vasconcellos, K.L.P. and M.C. Gauss (1997): Approximate bias for multivariate nonlinear heteroscedastic regressions, Brazilian J. Probability and Statistics 11 (1997), 141-159 Vasconcelos, W.V. (1998): Computational methods in commutative algebra and algebraic geometry, Springer-Verlag, Berlin-Heidelberg 1998. Ventsel, A.D. and M.I. Freidlin (1969): On small random perturbations of dynamical systems, Report delivered at the meeting of the Moscow Mathematical Society on March 25, 1969, Moscow 1969 Ventzell, A.D. and M.I. Freidlin (1970): On small random perturbations of dynamical systems, Russian Math. Surveys 25 (1970), 1-55 Verbeke, G. and G. Molenberghs (2000): Linear mixed models for longitudinal data, SpringerVerlag, Heidelberg Berlin New York 2000 Vernik, L. and Zoback, M.D. (1992): Estimation of ma.xjmum horizontal principal stress magnitude from stress-induced well bore breakouts in the Cajon Pass scientific research borehole, J. geophys. Res., B97, 5109-5119 Vernizzi, A., Goller, R. and P. Sais (1995): On the use of shrinkage estimators in filtering extraneous information, Giorn. Econ. Annal. Economia 54 (1995), 453-480
References
997
Veroren, L.R. (1980): On estimation of variance components, Statistica Neerlandica 34 (1980), 83-106 Vetter, M. (1992): Automatische Berechnung zweidimensionaler Naherungskoordinaten Konzeption und Realisierung im Programm AURA (Automatische Berechnung und Revision approximativer Koordinaten). Allgemeine Vermessungs-N achrichten, 99. Jg., Heft 6, 245-255. Vichi, M. (1997): Fitting L2 norm classification models to complex data sets, Student 2 (1997), 203-213 Viertl, R. (1996): Statistical Methods for Non-Precise DATA. CRC Press, Boca Raton, New York, London and Tokyo, 1996 Vigneau, E., Devaux, M.F., Qannari, E.M. and P. Robert (1997): Principal component regression, ridge regression and ridge principal component regression in spectroscopy calibration, J. Chemometrics 11 (1997), 239-249 Vincenty, T. (1978): Vergleich zweier Verfahren zur Berechnung der geod¨atischen Breite und H¨ohe aus rechtwinkligen koorninaten, Allgemeine Vermessungs-Nachrichten 85 (1978) 269-270. Vincenty, T. (1980): Zur r¨aumlich-ellipsoidischen Koordinaten-Transformation, Zeitschrift f¨ur Vermessungswesen, 105 (1980) 519-521. Vinod, H.D. and L.R. Shenton (1996): Exact moments for autoregressive and random walk models for a zero or stationary initial value, Econometric Theory 12 (1996), 481-499 Vinograde, B. (1950): Canonical positive definite matrices underinternal linear transformations, Proc. Amer. Math. Soc.1 (1950), 159-161 Vogel, M. and Van Mierlo, J. (1994): Deformation Analysis of the Kfar-Hanassi Network. Perelmuter Workshop on Dynamic Deformation Models, Haifa, Israel. Vogler, C.A. (1904): Zeitschrift f¨ur Vermessungswesen 23 (1904), pp. 394-402 Voinov, V.G. and M.S. Nikulin (1993): Unbiased estimators and their applications, volume 1: univariate case, Kluwer, Academic Publishers, Dordrecht 1993 Voinov, V.G. and M.S. Nikulin (1993): Unbiased estimators and their applications, volume 2: multivariate case, Kluwer, Academic Publishers, Dordrecht 1993 Volaufova, J. (1980): On the confidence region of a vector parameter. Math. Slovaca 30 (1980), 113-120. Volterra, V. (1930): Theory of functionals, Blackie, London 1930 Von Mises, R. (1981): u¨ ber die Ganzzahligkeit der Atomgewichte und verwandte Fragen, Phys. Z. 19 (1918), 490-500 Wackernagel, H. (1995): Multivariate geostatistics. Springer, Heidelberg Wackernagel, H. (2003): Multivariate geostatistics. Springer-Verlag, Berlin, 3rd edition. Waerden Van Der, B.L. (1967): Algebra II. Springer-Verlag, Berlin. Wagner, H. (1959): Linear programming techniques for regression analysis, J. Am. Statist. Ass. 56 (1959), 206-212 Wahba, G. (1975): Smoothing noisy data with spline functions, Numer. Math. 24 (1975), 282-292 Wald, A. (1939): Contributions to the theory of statistical estimation and testing hypotheses, Ann. Math. Statistics 10 (1939), 299-326 Wald, A. (1945): Sequential tests for statistical hypothesis, Ann. Math. Statistics 16 (1945), 117-186 Walder, O. (2006): Forpflanzung der Un scharfe von Messdaten auf abgeleitete differential geometrische Grossen, PFG 6 (2006), 491-499 Walder, O. (2007a): An application of the fuzzy theory in surface interpolation and surface deformation analysis, Fuzzy sets and systems 158 (2007), 1535-1545 Walder, O. (2007b): On analysis and forecasting of surface movement and deformation: Some AR models and their application, Ace. Verm, Nachr., 3 (2007), 96-100 Walder, O. and M. Buchroithner (2004): A method for sequential thinning of digital raster terrain models, PF G 3, (2004), 215-221 Walk, H. (1967a): Randeigenschaften zufalliger P otenzreihen und eine damit zusamnlenhangende Verrillgemeirierung des Satzes von Fatou und Nevanlinna (mit K. Hinderer). Math. Ann. 172, 94-104 (1967).
998
References
Walk, H. (1967b): Symmetrisierte und zentrierte Folgen von Zufallsvariablen. Math. Z. 102, 44-55 (1967). ¨ Walk, H. (1968): Uber das Randverhalten zufalliger Potenzreihen. 1. reine angew. Math. 230, 66-103 (1968). Walk, H. (1969a): Approximation durch Folgen linearer positiver Operatoren. Arch. Math. 20, 398-404 (1969). Walk, H. (1969b): Wachsumsverhalten zufalliger Potenzreihen. Z. Wahrscheinlichkeitstheorie verw. Gebiete 12, 293-306 (1969) Walk, H. (1970a): Bemerkungen tiber multiplikative Systeme. Math. Ann. 186, 36-44 (1970) Walk, H. (1970b): Approximation unbeschrankter Funktionen durch lineare positive Operatoren. Habilitationsschrift, Dniv. Stuttgart, Juni 1970. Walk, H. (1972): Konvergenz- und Guteaussagen f¨ur die Approximation durch Folgen linearer positiver - Operatoren (mit M. W. Muller). Proceedings of the International Conference on Constructive Function Theory, Varna, May 1970, 221-233 (1972). Walk, H. (1973): Zufallige Potenzreihen mit multiplikativ abhangigen Koeffizienten. Buletinul Institutului Politehnic din Jasi 19 (23), 107-114 (1973). Walk, H. (1974): A generalization of renewal processes (mit K. Hinderer). European Meeting of Statisticians, Budapest 1972. Colloquia Mathematica Societatis Janos Bolyai 9, 315-318 (1974). Walk, H. (1977a): Approximation von Ableitungen unbeschrankter Funktionen durch line are Operatoren. Mathematica - Revue d’Analyse Numerique et de Theorie de l’Approximation 6, 99-105 (1977). Walk, H. (1977b): An invariance principle in stochastic approximation. Recent Developments in Statistics (eds. J. R. Barra et al.), 623-625 (1977). North Holland, Amsterdam Walk, H. (1979): Sequential estimation of the solution of an integral equation in filtering theory. In: Stochastic Control Theory and Stochastic Differential Systems (eds. M. Kohlmann, W. Vogel), 598-605. Springer-Verlag, Berlin, 1979 Walk, H. (1980): A functional central limit theorem for martingales in C(K) and its application to sequential - estimates.J. reine angew. Math. 314 (1980), 117-135 Walk, H. (1983): Stochastic iteration for a constrained optimization problem. Commun. Statist. Sequential Analysis 2 (1983-84), 369-385 Walk, H. (1985): On recursive estimation of the mode, Statistics and Decisions 3 (1985), 337-350. Walk, H. (1988): Limit behavior of stochastic approximation processes. Statistics and Decisions 6 (1988), 109-128. Walk, H. (1992a): Stochastic Approximation and Optimization of Random Systems (mit L. Ljung und G. Pflug) (113 S.). Birkhauser, Basel, 1992 Walk, H. (1992b): Stochastic Approximation and Optimization of Random Systems (mit L. Ljung und G. Pflug) (113 S.). Birkhauser, Basel, 1992. Walk, H. (1998): Weak and strong universal consistency of semi-recursive partitioning and kernel regression estimates (mit Gyorfi L, Kohler M). Stat. Decis. 16 (1998) 1-18 Walk, H. (2001): Strong universal point wise consistency of recursive regression estimates. Ann. Inst. Statist. Math. 53 (2001), 691-707. Walk, H. (2002): Almost sure convergence properties of Nadaraya-Watson regression estimates. In: Modeling Uncertainty: An Examination of its Theory, Methods and Applications (eds. M. Dror, P. L’Ecuyer, F. Szidarovszky), 201-223 (2002). Kluwer Academic Publishers, Dordrecht. Walk, H. (2003): The estimation problem of minimum mean’ squared error (mit L. Devroye, L. Gyorfi und D. Scnafer). Statistics and Decisions 21 (2003), 15-28 Walk, H. (2005): Strong laws of large numbers by elementary Tauberian arguments. Monatsh. Math. 144 (2005), 329-346. Walk, H. (2006): Rates of convergence for partitioning and nearest neighbor regression estimates with unbounded data (mit M. Kohler und A. Krzyzak). 1. Multivariate Anal. 97 (2006), 311-323 Walk, H. (2008a): A universal strong law of large numbers for conditional expectations via nearest neighbors. J. Multivariate Anal. 99 (2008), 1035-1050.
References
999
Walk, H. (2008b): Non parametric nearest neighbor based empirical portfolio selection strategies (mit L. Gyorfi und F. Udina). Statistics and Decisions 26 (2008), 145-157 . Walk, H. (2009a): Optimal global rates of convergence for nonparametric regression with unbounded data (mit M. Kohler und A. Krzyzak). 1. Statist. Planning Inference 139 (2009), 1286-1296. Walk, H. (2009b): Strong laws of large numbers and nonparametric estimation. Universitat Stuttgart, Fachbereich Mathematik, Preprint 2009-003. In: Recent Developments in Applied Probability and Statistics (eds. L. Devroye, B. Karasozen, M. Kohler, R. Korn), 183-214 (2010). Physica-Verlag, Heidelberg Walk, H. (2010): Strong consistency of kernel estimates of regression function under dependence. Statist. Probab. Letters 80 (2010), 1147-1156 . Walker, J.S. (1996): Fast Fourier transforms, 2nd edition, CRC Press, Boca Raton 1996 Walker, P.L. (1996): Elliptic functions, J. Wiley, New York 1996 Walker, S. (1996): An EM algorithm for nonlinear random effect models, Biometrics 52 (1996), 934-944 Wallace, D.L. (1980): The Behrens-Fisher and Fieller-Creasy problems, in: R.A. Fisher: an appreciation, Fienberg and Hinkley, eds, Springer-Verlag, Heidelberg Berlin New York 1980, 117-147 Wan, A.T.K. (1994): The sampling performance of inequality restricted and pre-test estimators in a misspecified linear model, Austral. J. Statist. 36 (1994), 313-325 Wan, A.T.K. (1994): Risk comparison of the inequality constrained least squares and other related estimators under balanced loss, Econom. Lett. 46 (1994), 203-210 Wan, A.T.K. (1994): The non-optimality of interval restricted and pre-test estimators under squared error loss, Comm. Statist. A - Theory Methods 23 (1994), 2231-2252 Wan, A.T. (1999): A note on almost unbiased generalized ridge regression estimator under asymmetric loss, J. Statist. Comput. Simul. 62 (1999), 411-421 Wan, A.T.K. and K. Ohtani (2000): Minimum mean-squared error estimation in linear regression with an inequality constraint, J. Statist. Planning and Inference 86 (2000), 157-173 Wang, C.C. (1970): A new representation theorem for isotropic functions: an answer to Professor G.F. Smith’s criticism of my papers on representations for isotropic functions. Arch. Rat. Mech. Anal 36. Wang, C.C. (1971): Corrigendum to my recent papers on “Representations for isotropic functions”, Arch. Rational Mech. Anal. 43 (1971) 392-395 Wang H. and D. Suter (2003): Using Symmetry in Robust Model Fitting, Pattern Recognition Letters (PRL), Vol. 24, No.16, pp. 2953-2966 Wang, J. (1996): Asymptotics of least-squares estimators for constrained nonlinear regression, Annals of Statistics 24 (1996), 1316-1326 Wang, J. (1999): Stochastic Modeling for Real-Time Kinematic GPS/GLONASS Positioning. Journal of The Institute of Navigation 46: 297-305 Wang, J. (2000): An approach to GLONASS ambiguity resolution, Journal of Geodesy 74 (2000), 421-430 Wang, J. and Satirapod, C. and Rizos, C.(2002): Stochastic assessment of GPS carrier phase measurements for precise static relative positioning, Journal of Geodesy, 76:95-104, 2002. Wang, M. C. and G.E. Uhlenbeck (1945): On the theory of the Brownian motion II, Review of Modern Physics 17 (1945), 323-342 Wang, N., Lin, X. and R.G. Guttierrez (1998): Bias analysis and SIMEX approach in generalized linear mixed measurement error models, J. Amer. Statist. Ass., 93, (1998), 249-261 Wang, N., Lin, X. and R.G. Guttierrez (1999): A bias correction regression calibration approach in generalized linear mixed measurement error models, Commun.Statist. Theory Meth. 28 (1999), 217-232 Wang, Q-H. and B-Y. Jing (1999): Empirical likelihood for partial linear models with fixed designs, Statistics & Probability Letters 41 (1999), 425-433 Wang, S. and Uhlenbeck, G.I. (1945): On the theory of the Brownian motion II, Rev. Modern Phys. 17 (1945), 323-342
1000
References
Wang, S.G. and Chow, S.C. (1993): Advanced linear models. Theory and applications. Marcel Dekker, Inc., New York, 1993. Wang, S.G. and Ma, W.Q. (2002): On exact tests of linear hypothesis in linear models with nested error structure. J. Stat. Plann. Inference 106, 1-2 (2002), 225-233. Wang, T. (1996): Cochran Theorems for multivariate components of variance models, Sankhya: The Indian Journal of Statistics A, 58 (1996), 238-342 Ward, P. (1996): GPS Satellite Signal Characteristics. In: Kaplan, E.D. (ed.): Understanding GPS-Principles and Applications. Artech House Publishers, BostonILondon, 1996 Wassel, S.R. (2002): Rediscovering a family of means, Mathematical Intelligencer 24 (2002), 58-65 Waterhouse, W.C. (1990): Gauss’s first argument for least squares, Archive for the History of Exact Sciences 41 (1990), 41-52 Watson, G.S. (1983): Statistics on spheres, J. Wiley, New York 1983 Watson, G.S. (1956): Analysis of dispersion on a sphere, Monthly Notices of the Royal Astronomical Society, Geophysical Supplement 7 (1956), 153-159 Watson, G.S. (1956): A test for randomness of directions, Monthly Notices of the Royal Astronomical Society Geophysical Supplement 7 (1956), 160-161 Watson, G.S. (1960): More significance tests on the sphere, Biometrika 47 (1960), 87-91 Watson, G.S. (1961): Goodness-of-fit tests on a circle, Biometrika 48 (1961), 109-114 Watson, G.S. (1962): Goodness-of-fit tests on a circle-II, Biometrika 49 (1962), 57-63 Watson, G.S. (1964): Smooth regression analysis, Sankhya: The Indian Journal of Statistics: Series A (1964), 359-372 Watson, G.S. (1965): Equatorial distributions on a sphere, Biometrika 52 (1965), 193-201 Watson, G.S. (1966): Statistics of orientation data, Jour. of Geology 74 (1966), 786-797 Watson, G.S. (1967): Another test for the uniformity of a circular distribution, Biometrika 54 (1967), 675-677 Watson, G.S. (1967): Some problems in the statistics of directions, Bull. of I.S.I. 42 (1967), 374-385 Watson, G.S. (1968): Orientation statistic in the earth sciences, Bull of the Geological Institutions of the Univ. of Uppsala 2 (1968), 73-89 Watson, G.S. (1969): Density estimation by orthogonal series, Ann. Math. Stat. 40 (1969): Density estimation by orthogonal series, Ann. Math. Stat. 40 (1969), 1469-1498 Watson, G.S. (1970): The statistical treatment of orientation data, Geostatistics - a colloquium (ed. D.F. Merriam), Plenum Press, New York 1970, 1-10 Watson, G.S. (1974): Optimal invariant tests for uniformity, Studies in Probability and Statistics, Jerusalem Academic Press (1974), 121-128 Watson, G.S. (1981) : The Langevin distribution on high -dimensional spheres, J. Appl. Statis. 15 (1981) 123-130 Watson, G.S. (1982) : Asymptotic spectral analysis of cross - product matrices, Tech. Report Dep.Statist., Princeton University 1982 Watson, G.S: (1982): The estimation of palaeomagnetic pole positions, Statistics in Probability: Essay in hohor of C.R. Rao, North-Holland, Amsterdam and New York Watson, G.S. (1982): Distributions on the circle and sphere, in: Gani, J. and E.J. Hannan (eds.): Essays in Statistical Science, Sheffield, Applied Probability Trust, 265-280 (J. of Applied Probability, special volume 19A) Watson, G.S. (1983) : Large sample theory for distributions on the hypersphere with rotational symmetry, Ann. Inst.Statist. Math. 35 (1983) 303-319 Watson, G.S. (1983): Statistics on Spheres. Wiley, New York Watson, G.S. (1984) : The theory of concentrated Langevin distributions, J. Multivariate Analysis 14 (1984) 74-82 Watson, G.S. (1986): Some estimation theory on the sphere, Ann. Inst. Statist. Math. 38 (1986), 263-275 Watson, G.S. (1987): The total approximation problem, in: Approximation theory IV, eds. Chui, C.K. et al., 723-728, Academic Press New York London 1987
References
1001
Watson, G.S. (1988): The Langevin distribution on high dimensional spheres, J. Applied Statistics 15 (1988), 123-130 Watson, G. (1998): On the role of statistics in polomagnetic proof of continental drift, Canadian J. Statistics 26 (1998), 383-392 Watson, G.S. and E.J. Williams (1956): On the construction of significance tests on the circle and the sphere, Biometrika 43 (1956), 344-352 Watson, G.S. and E. Irving (1957): Statistical methods in rock magnetism, Monthly Notices Roy. Astro. Soc. 7 (1957), 290-300 Watson, G.S. and M.R. Leadbetter (1963): On the estimation of the probability density-I, Ann. Math. Stat 34 (1963), 480-491 Watson, G.S. and S. Wheeler (1964): A distribution-free two-sample test on a circle, Biometrika 51 (1964), 256 Watson, G.S. and R.J. Beran (1967): Testing a sequence of unit vectors for serial correlation, Jour. of Geophysical Research 72 (1967), 5655-5659 Watson, G.S., Epp, R. and J.W. Tukey (1971): Testing unit vectors for correlation, J. of Geophysical Research 76 (1971), 8480-8483 Webster, R., Atteia, O. and Dubois, J-P. (1994): Coregionalization of trace metals in the soil in the Swiss Jura. Eur J Soil Sci 45:205-218 Wedderburn, R. (1974): Quasi-likelihood functions, generalized linear models, and the GaußNewton method, Biometrica 61 (1974), 439-447 Wedderburn, R. (1976): On the existence and uniqueness of the maximum likelihood estimates for certain generalized linear models, Biometrika 63: 27-32 Wei, B.C. (1998): Exponential family: nonlinear models, Springer-Verlag, Heidelberg Berlin New York 1998 Wei, B.-C. (1998): Testing for varying dispersion in exponential family nonlinear models, Ann. Inst. Statist. Math. 50 (1998), 277-294 Wei, M. (1997): Equivalent formulae for the supremum and stability of weighted pseudoinverses, Mathematics of Computation 66 (1997), 1487-1508 Wei, M. (2000): Supremum and stability of weighted pseudoinverses and weighted least squares problems: analysis and computations, Nova Science Publisher, New York 2000 Wei, M. (2001): Supremum and stability of weighted pseudoinverses and weighted least squares problems analysis and computations, Nova Science Publishers, New York 2001 Wei, M. and A.R. De Pierro (2000): Upper perturbation bounds of weighted projections, weighted and constrained least squares problems, SIAM J. Matrix Anal. Appl. 21 (2000), 931-951 Wei, M. and A.R. De Pierro (2000): Some new properties of the equality constrained and weighted least squares problem, Linear Algebra and its Applications 320 (2000), 145-165 Weibull, M. (1953): The distribution of t- and F-statistics and of correlation and regression coefficients in stratified samples from normal populations with different means, Skand. Aktuarietidskr., 36, Supple., 1-106. Weisberg, S. (1985): Applied Linear Regression, 2nd ed., Wiley, New York Weiss, J. (1993): Resultant methods for the inverse kinematics problem, Computational Kinematics (Eds.) J. Angeles et al., Kluwer Academic Publishers, Netherlands 1993. Weiss, G. and R. Rebarber (2000): Optimizability and estimability for infinite-dimensional linear systems, Siam J. Control Optim. 39 (2000), 1204-1232 Weisstein, E.W. (1999): Legendre Polynomial, CRC Press LLC, Wolfram Research Inc. 1999 Wellisch, S. (1910): Theorie und Praxis der Ausgleichungsrechnung Band 2: Probleme der Ausgleichungsrechnung, 46-49, Kaiserliche und k¨onigliche Hof-Buchdruckerei und HofVerlags-Buchhandlung, Carl Fromme, Wien und Leipzig 1910 Wellner, J. (1979): Permutation tests for directional data, Ann. Statist. 7 (1979), 924-943 Wells, D.E., Lindlohr, W., Schaffrin, B. and E. Grafarend (1987): GPS design: Undifferenced carrier beat phase observations and the fundamental difference theorem, University of New Brunswick, Surveying Engineering, Technical Report Nr. 116, 141 pages, Fredericton/Canada 1987
1002
References
Welsch, W. (1982): Zur Beschreibung des homogenen Strains oder Einige Betrachtungen zur affinen Transformation, Zeitschrijt f¨ur Vermessungswesen 107(1982), Heft 5, 8.173-182 Welsch, W., Schlemmer, H. and Lang, M., (Hrsg.) (1992): Geod¨atische Me:Bverfahren im Maschinenbau. Schriftenreihe des Deutschen Vereins fiir Vermessungswesen, Verlag Konrad Wittwer, Stuttgart. Welsh, A.H. (1987): The trimmed mean in the linear model, Ann. Statist. 15 (1987), pp. 20-36 Welsh, A.H. (1996): Aspects of statistical inference, J. Wiley, New York 1996 Wenzel, H.G. (1977): Zur Optimierung von Schwerenetzen, Z. Vermessungswesen 102 (1977), 452-457 Werkmeister, P. (1916): Graphische Ausgleichung bei trigonometrischer Punktbestimmung durch Einschneiden, Zeitschrift f¨ur Vermessungswesen 45 (1916), 113-126 Werkmeister, P. (1916): Trigonometrische Punktbestimmung durch einfaches Einschneiden mit Hilfe von Vertikalwinkeln, Zeitschrift f¨ur Vermessungswesen 45 (1916) 248-251. ¨ Werkmeister, P. (1920): Uber die Genauigkeit trigonometrischer Punktbestimmungen, Zeitschrift f¨ur Vermessungswesen 49 (1920) 401-412, 433-456. Werner, D. (1913): Punktbestimmung durch Vertikalwinkelmessung, Zeitschrift f¨ur Vermessungswesen 42 (1913) 241-253. Werner, H.J. (1985): More on BL U estimation in regression models with possibly singular covariances. Linear Algebra Appl. 67 (1985), 207-214. Werner, H.J. and Yapar, C. (1995): More on partitioned possibly restricted linear regression. In: New trends in probability and statistics. Multivariate statistics and matrices in statistics. (Proceed. 5th Conf. Tartu - Plihajarve, Estonia, May 23-28, 1994, E.-M. Tiit et al., ed.), Vol. 3, 1995, Utrecht, pp. 57-66. Werner, H.J. and Yapar, C. (1996): A BLUE decomposition in the general linear regression model. Linear Algebra Appl. 237/238 (1996), 395-404. Wernstedt, J. (1989): Experimentelle Prozeßanalyse, Oldenbourg Verlag, M¨unchen 1989 Wess, J. (1960): The conformal invariance in quantum field theory, in: Il Nuovo Cimento, Nicola Zanichelli (Hrsg.), Vol. 18, Bologna 1960 Wetzel, W., J¨ohnk, M.D. and P. Naeve (1967): Statistische Tabellen, de Gruyter, Berlin 1977 Winkler, F. (1996): A polynomial algorithm in computer algebra, Springer-Verlag, Wien 1996. Whittaker, E.T. and G. Robinson (1924): The calculus of observations, Blackie, London 1924 Whittle, P. (1952): On principal components and least square methods of factor analysis, Skand. Aktuarietidskr., 35, 223-239. Whittle, P. (1954): On Stationary Processes in the Plane. Biometrika 41, 434-449. Whittle, P. (1963): Prediction and regulation, D. van Nostrand Co., Inc., Princeton 1963 Whittle, P. (1963): Stochastic processes in several dimensions, Bull. Inst. Int. Statist. 40 (1963), 974-994 Whittle, P. (1973): Some general points in the theory of optimal experimental design, J. Roy. Statist.-Soc. B35 (1973), 123-130 Wickerhauser, M.V. (1996): Adaptive Wavelet-Analysis, Theorie und Software, Vieweg & Sohn Verlag, Braunschweig/Wiesbaden 1996 Wiener, N. (1923): Differential Space, J. Math. and Phys. 2 (1923) 131-174 Wiener, N. (1949): Extrapolation, interpolation and smoothing of stationary time series, New York 1949 Wiener, N. (1958): Non linear problems in random theory, MIT - Press 1958 Wiener, N. (1964): Selected papers of N. Wiener, MIT - Press 1964 Wiener, N. (1968): Kybernetik, rororo-Taschenbuch, Hamburg 1968 Wieser, A.; Brunner, F.K. (2001): An Extended Weight Model for GPS Phase Observations. Earth, Planets and Spac 52:777-782 Wieser, A.; Brunner, F.K. (2001): Robust estimation applied to correlated GPS phase observations. Presented at First International Symposium on Robust Statistics and Fuzzy Techniques in Geodesy and GIS, Zurich, Switzerland, 2001 Wieser, A. (2002): Robust and fuzzy techniques for parameter estimation and quality assessment in GPS. Reihe ‘Ingenieurgeodasie - TV Graz’, J- Shaker Verlag, Aachen (English)
References
1003
Wieser, A. and Gaggl, M. and Hartinger, H. (2005): Improved positioning accuracy with high sensitivity GNSS receivers and SNR aided integrity monitoring of pseudo-range observations. In: Proc ION GNSS 2005, September 13-16, Long Beach, CA: 1545-1554 Wigner, E.P. (1958): On the distribution of the roots of certain symmetric matrices, Ann. Math. 67 (1958) Wilcox, R.R. (1997): Introduction to robust estimation and hypothesis testing, Academic Press, San Diego 1997 Wilcox, R.R. (2001): Fundamentals of modern statistical methods, Springer-Verlag, Heidelberg Berlin New York 2001 Wilders, P. and E. Brakkee (1999): Schwarz and Schur: an algebraical note on equivalence properties, SIAM J. Sci. Comput. 20 (1999), 2297-2303 Wilkinson, J. (1965): The algebraic eigenvalue problem, Clarendon Press, Oxford 1965 Wilks, S.S. (1932a): Certain generalizations in the analysis of variance, Biometrika 24, 471-494 Wilks, S.S. (1932b): On the sampling distribution of the multiple correlation coefficient, Ann. Math. Stat., 3, 196-203 Wilks, S.S. (1932c): Moments and distributions of estimates of population parameters from fragmentary samples, Annals of Mathematical Statistics 3: 163-195 Wilks, S.S. (1934): Moment-generating operators for determinants of product moments in samples from a normal system, Ann. of Math 35, 312-340 Wilks, S.S. (1962): Mathematical statistics, J. Wiley, New York 1962 Williams, E.J. (1963): A comparison of the direct and fiducial arguments in the estimation of a parameter, J. Roy. Statist. Soc. B25 (1963), 95-99 Wimmer, G. (1995): Properly recorded estimate and confidence regions obtained by an approximate covariance operator in a special nonlinear model, Applications of Mathematics 40 (1995), 411-431 Wimmer, H. (1981a): Ein Beitrag zur Gewichtsoptimierung geod¨atischer Netze, Deutsche Geod¨atische Kommission, M¨unchen, Reihe C (1981), 269 Wimmer, H. (1981b): Second-order design of geodetic networks by an iterative approximation of a given criterion matrix, in: Proc. of the IAG Symposium on geodetic networks and computations, R. Sigle ed., Deutsche Geod¨atische Kommission, M¨unchen, Reihe B, Nr. 258 (1981), Heft Nr. III, 112-127 Wimmer, H. (1982): Ein Beitrag zur gewitchs optimierung geodetischer Netu Report C 269, Deutsche Geodestiche Kommission, Bayerische Akademic der Wissenschaften, M¨unchen 1982 Wishart, J. (1928): The generalized product moment distribution in samples from a normal multivariate population, Biometrika 20 (1928), 32-52 Wishart, J. (1931): The mean, and second moment coefficient of the multiple correlation coefficient in samples from a normal population, Biometrika, 22, 353-361. Wishart, J. (1933): The generalized product moment distribution in a normal system, Proc. Camb. Phil. Soc., 29, 260-270 Wishart, J. (1948a): Proofs of the distribution law of the second order moment statistics, Biometrika, 35, 55-57. Wishart, J. (1948b): Test of homogeneity of regression coefficients, and its application in the analysis of covariance, presented to the Colloque International de Calcul des Probabilites et de Statistique Mathematique, Lyon. Wishart, J. (1955): Multivariate analysis, App. Stat., 4, 103-116 Wishart, J. and M.S. Bartlett (1932): The distribution of second order moment statistics in a normal system, Proc. Camb. Phil. Soc., 28, 455-459. Wishner, R., Tabaczynski, J. and M. Athans (1969): A comparison of three non-linear filters, Automatica 5 (1969), 487-496 Witkovsky, V. (1996): On variance-covariance components estimation in linear models with AR(1) disturbances. Acta Math. Univ. Comen., New Ser. 65, 1 (1996), 129-139. Witkovsky, V. (1998): Modified minimax quadratic estimation of variance components, Kybernetika 34 (1998), 535-543
1004
References
Witting, H. and G. N¨olle (1970): Angewandte Mathematische Statistik, Teubner Verlag, Stuttgart 1970 Wold, S., Wold, H.O., Dunn, W.J. and Ruhe, A. (1984): The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses, SIAM Journal on Scientific and Statistical Computing 5: 735-743 Wolf, H. (1961): Das Fehlerfortpflanzungsgesetz mit Guedern II, ZfV Wolf, H. (1968): Ausgleichungsrechnung nach der Methode der kleinsten Quadrate, Ferdinand D¨ummlers Verlag, Bonn 1968 Wolf, H. (1973): Die Helmert-Inverse bei freien geod¨atischen Netzen, Z. Vermessungswesen 98 (1973), 396-398 Wolf, H. (1975): Ausgleichungsrechnung I, Formeln zur praktischen Anwendung, Duemmler, Bonn 1975 Wolf, H. (1976): The Helmert block method, its origin and development, Proc. 2nd Int. Symp. on problems related to the Redefinition of North American Geodetic Networks, 319-326, Arlington 1976 Wolf, H. (1977): Eindeutige und mehrdeutige Geod¨atische Netze. Adh. Braunschw. Wiss. Ges., Band 28, G6ttingen, FRG, 1977. Wolf, H. (1980a): Ausgleichungsrechnung II, Aufgaben und Beispiel zur praktischen Anwendung, Duemmler, Bonn 1980 Wolf, H. (1980b): Hypothesentests im Gauß-Helmert-Modell, Allg. Vermessungsnachrichten 87 (1980), 277-284 Wolf, H. (1997): Ausgleichungsrechnung I und II, 3. Au age, Ferdinand D¨ummler Verlag, Bonn 1997 Wolkowicz, H. and G.P.H. Styan (1980): More bounds for eigenvalues using traces, Linear Algebra Appl. 31 (1980), 1-17 Wolter, K.H. and Fuller, W.A. (1982), Estimation of the quadratic errors-in-variables model, Biometrika 69 (1982), 175-182 Wong, C.S. (1993): Linear models in a general parametric form, Sankhya 55 (1993), 130-149 Wong, W.K. (1992): A unified approach to the construction of minimax designs, Biometrika 79 (1992), 611-620 Wood, A. (1982): A bimodal distribution for the sphere, Applied Statistics 31 (1982), 52-58 Woolson, R.F. and W.R. Clarke (1984): Analysis of categorical incomplete data, J. Roy. Statist. Soc. A147 (1984), 87-99 Worbs, E. (1955): Carl Friedrich Gauß, ein Lebensbild, Leipzig 1955 Wu, C.F. (1986): Jackknife, bootstrap and other resampling methods in regression analysis, Annals of Statistics 14: 1261-1350 Wu, C.F. (1980): On some ordering properties of the generalized inverses of non negative definite matrices, Linear Algebra Appl., 32 (1980), pp. 49-60 Wu, C.F.J. (1981): Asymptotic theory of nonlinear least squares estimation, Ann. Stat. 9 (1981), 501-513 Wu, I.T., Wu, S.C., Hajj, G.A., Bertiger, W.I. and Lichten, S.M. (1993): Effects of antenna orientation on GPS carrier phase. Manuscripta Geodaetica 18: 91-98 Wu, Q. and Z. Jiang (1997): The existence of the uniformly minimum risk equivariant estimator in Sure model, Commun. Statist. - Theory Meth. 26 (1997), 113-128 Wu, W. (1984): On the decision problem and mechanization of the theorem proving elementary geometry, Scientia Sinica 21 (1984) 150-172. Wu, Y. (1988): Strong consistency and exponential rate of the “minimum L1-norm” estimates in linear regression models, Computational Statistics and Data Analysis 6: 285-295 Wunsch, G. (1986): Handbuch der Systemtheorie, Oldenbourg Verlag, M¨unchen 1986 Wyss, M., Liang, B.Y., Tanagawa, W. and Wu, X. (1992): Comparison of orientations of stress and strain tensors based on fault plane solutions in Kaoiki, Hawaii, J. geophys. Res., 97, 4769-4790 Xi, Z. (1993): Iterated Tikhonov regularization for linear ill-posed problems, Ph.D. Thesis, Universit¨at Kaiserslautern, Kaiserslautern 1993
References
1005
Xu, C., Ding, K., Cai, J. and Grafarend, E. (2009): Method for determining weight scale factor parameter in joint inverse problem of geodesy, Journal of Geodynamics, Vol. 47, 39-46. doi:10.1016/j.jog.2008.06.005 Xu, P. and Rummel, R. (1994): Generalized ridge regression with applications in determination of potential fields, Manuscripta. Geodetica, 20, pp. 8-20 Xu, P.L. (1986): Variance-covariance propagation for a nonlinear function, J. Wuhan Techn. UnL Surv. Mapping, 11, No. 2, 92-99 Xu, P.L. (1989a): On the combined adjustment of gravity and levelling data, Ph.D. diss., Wuhan Tech. Univ. Surv. Mapping Xu, P.L. (1989b): On robust estimation with correlated observations, Bull. G`eeodesique 63 (1989), 237-252 Xu, P.L. (1989c): Statistical criteria for robust methods, ITC Journal, 1 (1989), pp. 37-40 Xu, P.L. (1991): Least squares collocation with incorrect prior information, Z. Vermessungswesen 116 (1991), 266-273 Xu, P.L. (1992): The value of minimum norm estimation of geopotential fields, Geophys. J. Int. 111 (1992), 170-178 Xu, P.L. (1993): Consequences of Constant Parameters and Confidence Intervals of Robust Estimation, Boll. Geod. Sci. Affini, 52, pp. 231-249 Xu, P.L. (1995): Testing the hypotheses of non-estimable functions in free net adjustment models, manuscripta geodaetica 20 (1995), 73-81 Xu, P.L. (1996): Statistical aspects of 2-D symmetric random tensors, Proc. FIG int. Symp. Xu, P.L. (1999a): Biases and accuracy of, and an alternative to, discrete nonlinear filters, Journal of Geodesy 73 (1999), 35-46 Xu, P.L. (1999b): Spectral theory of constrained second-rank symmetric random tensors, Geophys. J. Int. 138 (1999), 1-24 Xu, P.L. (2001): Random simulation and GPS decorrelation, Journal of Geodesy 75 (2001), 408-423 Xu, P.L. (2002a): Isotropic probabilistic models for directions, planes and referential systems, Proc. Royal Soc. London A458 (2002), 2017-2038 Xu, P.L. (2002b): A Hybrid Global Optimization Method: One-dimensional Case , J. comput. appl. Math., 147, pp. 301-314. Xu, P.L. (2003): A Hybrid Global Optimization Method: The Multi-dimensional Case, J. comput. appl. Math., 155, pp. 423-446 Xu, P.L. (2005): Sign-constrained Robust Least Squares, Subjective Breakdown Point and the Effect of Weights of Observations, J. of Geodesy, 79 (2005), pp. 146-159 Xu, P.L. (2006) : Voronoi cells, probabilistic bounds and hypothesis testing in mixed integer linear models, IEEE Transactions on Information Theory 52 (2006) 3122-3138 Xu, P.L. and E.W. Grafarend (1996a): Statistics and geometry of the eigenspectra of threedimensional second-rank symmetric random tensors, Geophysical Journal International 127 (1996), 744-756 Xu, P.L. and E.W. Grafarend (1996b): Probability distribution of eigenspectra and eigendirections of a twodimensional, symmetric rank two random tensor, Journal of Geodesy 70 (1996), 419-430 Xu, P.L., Shen, Y., Fukuda, Y. and Liu, Y. (2006): Variance component estimation in linear inverse ellipsoid models, J. Geodesy 80 (2006) 69-81 Xu, P.L., Liu, Y., Shen, Y. and Fukuda, Y. (2007): Estimability analysis of variance ad covariance components. J. Geodesy 81 (2007) 593-602 Xu, Z.H., Wang, S.Y., Huang, Y.R. and Gao, A. (1992): Tectonic stress field of China inferred from a large number of small earthquakes, J. geophys. Res., B97, 11867-11877 Yadrenko, M.I. (1987): Correlation theory of stationary and related random functions, Vol I, Basic results, Springer Verlag, New York 1987 Yaglom, A.M. (1961): Second-order homogeneous random fields in: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 593-622, University of California Press, Berkeley 1961
1006
References
Yaglom, A.M. (1962): An introduction to the theory of stationary random functions. Dover New York, 1962. Yaglom, A.M. (1987): Correlation theory of stationary and related random functions. Springer, New York Yakhot, V. et. al (1989): Space-time correlations in turbulence: kinematical versus dynamical effects, Phys. Fluids A1 (1989) 184-186 Yancey, T.A., Judge, G.G., and Bock, M.E. (1974): A mean square error test when stochastic restrictions are used in regression, Communications in Statistics, Part A-Theory and Methods 3: 755-768 Yang, H. (1996): Efficiency matrix and the partial ordering of estimate, Commun. Statist. - Theory Meth. 25(2) (1996), 457-468 Yang, Y. (1994): Robust estimation for dependent observations, Manuscripta geodaetica 19 (1994), 10-17 Yang, Y. (1999): Robust estimation of geodetic datum transformation, Journal of Geodesy 73 (1999), 268-274 Yang, Y. (1999): Robust estimation of systematic errors of satellite laser range, Journal of Geodesy 73 (1999), 345-349 Yang, Y., Song, L. and Xu, T. (2001): Robust Estimator for the Adjustment of Correlated GPS Networks. Presented at First International Symposium on Robust Statistics and Fuzzy Techniques in Geodesy and GIS, Zurich, Switzerland, 2001. Yang, Y., Song, L. and Xu, T. (2002): Robust parameter estimation for geodetic correlated observations. Acta Geodaetica et Cartographica Sinica, 31(2), pp. 95-99 Yazji, S. (1998): The effect of the characteristic distance of the correlation function on the optimal design of geodetic networks, Acta Geod. Geoph. Hung. 33 (2-4) (1998), 215-234 Ye, Y. (1997): Interior point algorithms: Theory and analysis, J. Wiley, New York 1997 Yeh, A.B. (1998): A bootstrap procedure in linear regression with nonstationary errors, The Canadian J. Stat. 26 (1998), 149-160 Yeung, M.-C. and T.F. Chan (1997): Probabilistic analysis of Gaussian elimination without pivoting, SIAM J. Matrix Anal. Appl. 18 (1997), 499-517 Ylvisaker, D. (1977): Test resistance, J. Am. Statist. Ass. 72, 551-557, 1977 Yohai, V.J. (1987): High breakdown point and high efficiency robust estimates for regression, Ann. Stat., 15 (1987), pp. 642-656 Yohai, V.J. and R.H. Zamar (1997): Optimal locally robust M-estimates of regression, J. Statist. Planning and Inference 64 (1997), 309-323 Yor, M. (1992): Some aspects of Brownian motion, Part I: Some special functionals, Birkh¨auserVerlag, Basel Boston Berlin 1992 Yor, M. (1992): Some aspects of Brownian motion, Part II: Some recent martingale problems, Birkh¨auser-Verlag, Basel Boston Berlin 1997 You, R.J.(2000): Transformation of Cartesian to geodetic coordinates without iterations, Journal of Surveying Engineering 126 (2000) 1-7. Youssef, A.H.A. (1998): Coefficient of determination for random regression model, Egypt. Statist. J. 42 (1998), 188-196 Yu, Z.C. (1992): A generalization theory of estimation of variance-covariance components, Manusc. Geodetica, 17, 295-301 Yu, Z.C. (1996): A universal formula of maximum likelihood estimation of variance-covariance components, Journal of Geodesy 70 (1996), 233-240 Yu. Z. and M. Li (1996): Simultaneous location and evaluation of multi-dimensional gross errors, Wuhan Tech. University of Surveying and mapping, Science reports, 1, (1996), pp. 94-103 Yuan, K.H. and P.M. Bentler (1997): Mean and covariance structure analysis - theoretical and practical improvements, J. Am. Statist. Ass. 92 (1997), 767-774 Yuan, K.H. and P.M. Bentler (1998): Robust mean and covariance structure analysis, British Journal of Mathematical and Statistical Psychology (1998), 63-88
References
1007
Yuan, Y. (1999): On the truncated conjugate gradient method, Springer-Verlag, Heidelberg Berlin New York Yuan, Y. (2000): On the truncated conjugate gradient method, Math. Prog. 87 (2000), 561-571 Yunck, T.P. (1993): Coping with the Atmosphere and Ionosphere in Precise Satellite and Ground Positioning In: Environmental Effects on Spacecraft Positioning and Trajectories, Geophysical Monograph 73, WGG, Vol l3, pp 1-16 Yusuf, S., Peto, R., Lewis, J., Collins, R. and P. Sleight (1985): Beta blockade during and after myocardial infarction: An overview of the randomized trials, Progress in Cardiovascular Diseases 27 (1985), 335-371 Zabell, S. (1992): R.A. Fisher and the fiducial argument, Statistical Science 7 (1992), 369-387 Zackin, R., de Gruttola, V. and N. Laird (1996): Nonparametric mixed-effects for repeated binary data arising in serial dilution assays: Application to estimating viral burden in AIDS, J. Am. Statist. Ass. 91 (1996), 52-61 Zacks, S. (1971): The theory of statistical inference, J. Wiley, New York 1971 Zadeh, L. (1965): Fuzzy sets, Information and Control 8 (1965), 338-353 Zaiser, J. (1986): Begr¨undung Beobachtungsgleichungen und Ergebnisse f¨ur dreidimensionales geometrisch-physikalishes Modell der Geod¨asie, Zeitschrift f¨ur Vermessungswesen 111 (1986) 190-196. Zavoti J. and T. Jansco (2006): The solution of the 7 parameter Datum transformation problem with and withour the Gr¨obner basis, Acta. Geod. Geophy. Hung. 41 (2006) 87-100 Z`eavoti, J. (1999): Modified versions of estimates based on least squares and minimum norm, Acta Geod. Geoph. Hung. 34 (1999), 79-86 ¨ Zehfuss, G. (1858): Uber eine gewisse Determinante, Zeitschrift f¨ur Mathematik und Physik 3 (1858), 298-301 Zellner, A. (1962): An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias, Journal of the American Statistical Association 57: 348-368 Zellner, A. (1971): An introduction to Bayesian inference in econometrics, J. Wiley, New York 1971 Zha, H. (1995): Comments on large least squares problems involving Kronecker products, SIAM J. Matrix Anal. Appl. 16 (1995), 1172 Zhan, X. (2000): Singular values of differences of positive semi-definite matrices, Siam J. Matrix Anal. Appl. 22 (2000), 819-823 Zhang, B.X., Liu, B.S. and Lu, C.Y. (2004): A study of the equivalence of the BLUEs between a partitioned singular linear model and its reduced singular linear models. Acta Math. Sin., Engl. Ser. 20, 3 (2004), 557-568. Zhang, Y. (1985): The exact distribution of the Moore-Penrose inverse of X with a density, in: Multivariate Analysis VI, Krishnaiah, P.R. (ed), 633-635, Elsevier, New York 1985 Zhang, S. (1994): Anwendung der Drehmatrix in Hamilton normierten Quaternionenen bei der B¨undelblock Ausgleichung, Zeitschrift f¨ur Vermessungswesen 119 (1994) 203-211. Zhang, J., Zhang, K. and Grenfell, R. (2004a): On the relativistic Doppler Effects and high accuracy velocity estimation using GPS. In Proc The 2004 International Symposium on GNSSIGPS, December 6-8, Sydney, Australia Zhang, J., Zhang, K., Grenfell, R., Li, Y. and Deakin, R. (2004b): Real-Time DopplerlDoppler Rate Derivation for Dynamic Applications. In Proc The 2004 International Symposium on GNSSIGPS, December 6-8, Sydney, Australia Zhang, J., Zhang, K., Grenfell, R. and Deakin, R. (2006a): GPS satellite velocity and acceleration Determination using the Broadcast Ephemeris. The Journal of Navigation 59: 293-305 Zhang, J., Zhang, K., Grenfell, R. and Deakin, R. (2006b): Short note: On the relativistic Doppler effect for precise velocity determination using GPS. Journal of Geodesy 80: 104-110 Zhang, J.Z., Chen, L.H. and N.Y. Deng (2000): A family of scaled factorized Broyden-like methods for nonlinear least squares problems, SIAM J. Optim. 10 (2000), 1163-1179 Zhang, Z. and Y. Huang (2003): A projection method for least squares problems with a quadratic equality constraint, SIAM J. Matrix Anal. Appl. 25 (2003), 188-212
1008
References
Zhao, L.P., and Prentice, R.L. (1990): Correlated binary regression using a generalized quadratic model, Biometrika 77: 642-648 Zhao, L.P., Prentice, R.L., and Self, S.G. (1992): Multivariate mean parameter estimation by using a partly exponential model, Journal of the Royal Statistical Society, Series B 54: 805-811 Zhao, Y. and S. Konishi (1997): Limit distributions of multivariate kurtosis and moments under Watson rotational symmetric distributions, Statistics & Probability Letters 32 (1997), 291-299 Zhdanov, M.S. (2002): Geophysical inverse theory and regularization problems, Methods in Geochemistry and Geophysics 36, Elsevier, Amsterdam Boston London 2002 Zhenhua, X. (1993): Iterated Tikhonov regularization for linear ill-posed problem, Ph.D.Thesis University of Kaiserslautern, Kaiserslautern 1993 Zhen-Su, S., Jackson, E. and Orszag, S. (1990): Intermittency of turbulence, in: The Legacy of John von Neumann (J. Glimm, J. Impagliazzo, I. Singer eds.) Proc. Symp. Pure Mathematics, Vol. 50, 197-211, American Mathematical Society, Providence, Rhode Island 1990 Zhou, J. (2001): Two robust design approaches for linear models with correlated errors, Statistica Sinica 11 (2001), 261-272 Zhou, K.Q. and S.L. Portnoy (1998): Statistical inference on hetero-skedastic models based on regression quantiles, J. Nonparametric Statistics 9 (1998), 239-260 Zhou, L.P. and Mathew, T. (1993): Combining independent tests in linear models. J. Am. Stat. Assoc. 88, 422 (1993), 650-655. Zhu, J. (1996): Robustness and the robust estimate, Journal of Geodesy 70 (1996), 586-590 Zhu, Z. and Stein, M.L. (2006): Spatial sampling design for prediction with estimated parameters. J Agricult Biol Environ Stat 11:24-49 Ziegler, A., Kastner, C. and M. Blettner (1998): The generalised estimating equations: an annotated bibliography, Biometrical Journal 40 (1998), 115-139 Zimmermann, H.-J. (1991): Fuzzy set theory and its applications, 2nd ed., Kluwer Academic Publishers, Dordrecht 1991 Zippel, R. (1993): Effective polynomial computation, Kluwer Academic Publications, Boston 1993. Zolotarev, V.M. (1997): Modern theory of summation of random variables, VSP, Utrecht 1997 Zmyslony R. (1980): A characterization of best linear unbiased estimators in the general linear model. In: Mathematical statistics and probability theory. (Proceed. 6th Int. Conf., Wisla, Poland, 1978, Lect. Notes Stat. 2), 1980, pp. 365-373. Zoback, M.D. (1992a): Introduction to special section on the Cajon Pass scientific drilling project, J. geophys. Res., B97, 4991-4994 Zoback, M.D. (1992b): First and second-order patterns of stress in the lithosphere: the World Stress Map Project, J. geophys. Res., B97, 11703-11728 Zoback, M.D. (1992c): Stress field constraints on intraplate seismicityin eastern North America, J. geophys. Res., B97, 11761-11782 Zoback, M.D. and Healy, J. (1992): In-situ stress measurements to 3.5 km depth in the Cajon Pass scientific research borehole: Implications for the mechanics of crustal faulting, J. geophys. Res., B97, 5039-5057 Zumberge, J.F. and Bertiger, W.I. (1996): Ephemeris and Clock Navigation Message Accuracy. In: Parkinson B.W., Spilker J.J. (eds) Global Positioning System: Theory and Applications, Volume 1, American Institute of Aeronautics and Astronautics, Washington DC, pp. 585-599 Zurm¨uhl, R. and S. Falk (1984): Matrizen und ihre Anwendungen, Teil 1: Grundlagen, 5.ed., Springer-Verlag, Heidelberg Berlin New York 1984 Zurm¨uhl, R. and S. Falk (1986): Matrizen und ihre Anwendungen, Teil 2: Numerische Methoden, 5.ed., Springer-Verlag, Heidelberg Berlin New York 1986 Zvara, K. (1989): Regression Analysis. Academia, Praha, 1989 (in Czech). Zwanzig, S. (1980): The choice of approximative models in nonlinear regression, Statistics, Vol. 11, No. 1, 23-47. Zwanzig, S. (1983): A note to the paper of Ivanov and Zwanzig on the asymptotic expansion of the least squares estimator. Statistics, Vol. 14, No. 1, 29-32.
References
1009
Zwanzig, S. (1985): A third order comparison of least squares, jacknifing and cross validation for error variance estimation in nonlinear regression. Statistics, VoL 16, No. 1, 47-54. Zwanzig, S. (1987): On the consistency of the least squares estimator in nonlinear errorsin-variables models, Folia Oeconomia, Acta Universitatis Lodziensis. 8 poo Zwanzig, S. (1989): On an asymptotic minimax result in nonlinear errors-in-variables models, Proceedings of the Fourth Prague Symposium on Asymptotic Statistics, Prague 1989, 549-558. Zwanzig, S. (1990): Discussion to A. Pazman: Small-sample distributional properties of nonlinear regression estimator (a geometric approach). Statistics Vol. 21, No. 3, 361-363. Zwanzig, S. (1991): Least squares estimation in nonlinear functional relations. Proceedings Probastat 91, Bratislava, 171-177. Zwanzig, S. (1994): On adaptive estimation in nonlinear regression. Kybemetika, Vol. 30, No. 3, 359-367. Zwanzig, S. (1997): On L1-norm estimators in nonlinear regression and in nonlinear error-in-variables models, in L1-Statistical Procedures and Related Topics IMS Lecture Notes-Monographs Series. Vol 31, 1997, 101-118. Zwanzig, S. (1998a): Application of Hipparcos dates: A new statistical method for star reduction. Proceedings of Journees 1997, Academy of Sciences of the Czech Republic, Astronomical Institute, 142-144. Zwanzig, S. (1998b): Experimental design for nonlinear error-in-variables models,Proceedings of the Third St.-Petersburg Workshop on Simulation, 225-230. Zwanzig, S. (1999): How to adjust nonparametric regression estimators to models with errors in the variables, Theory of Stochastic Processes. 5 (21) 272-277. Zwanzig, S. (2000): Experimental design for nonlinear error-in-variables models, in Advanced in Stochastc Simulation Methods Ed. Balakrishan, Melas, Ermakov Zwanzig, S. (2007): Local linear estimation in nonparametric errors-in-variables models. Theory of Stochastic Processes, VoL 13, 316-327 Zwet van, W.R. and J. Osterhoff (1967): On the combination of independent test statistics, Ann. Math. Statist. 38 (1967), 659-680 Zwinck, E. (ed) (1988): Christian Doppler- Leben und Werk; Der Doppler effekt. Schriftenreihe des Landespressebiiros / Presse- und Informationszentrum des Bundeslandes Salzburg, Sonderpublikationen 76, 140p (in German) Zyskind, G. (1967): On canonical forms, non-negative covariance matrices and best and simple least squares linear estimators in linear models. Ann. Math. Stat. 38 (1967), 1092-1109. Zyskind, G. (1969): On best linear estimation and a general Gauss-Markov theorem in linear models with arbitrary nonnegative covariance structure, SIAM J. Appl. Math. 17 (1969), 1190-1202 Zyskind, G. (1975): Error structures. Projections and conditional inverses in linear model theory. In A survey of statistical design and linear models, J.W. Srivastava (ed.), pp. 647-663, North Holland, Amsterdam 1975
Index
A. Bjerhammar’s left inverse, 121 Additive rank partitioning, 268, 289 Adjoint form, 151 Algebraic partitioning, 69 Algorithms Buchberger, 534, 536 Euclidean, 534 ANOVA, 240 Antisymmetry operator, 152 Arithmetic mean, 146, 187, 192, 379 Arnoldi method, 135 Associativity, 595
Balancing variance, 335 Basis, 168 Bessel function, 371 Best Linear Uniformly Unbiased Estimation, 207 Bias matrix, 308, 332, 386 Bias norms, 309 Bias vector, 308, 332, 340, 386, 387, 392, 394, 395 Bias weight matrix, 340, 342, 394, 395 BIQUUE, 426 Bounded parameters, 3 Break points, 142 Buchberger algorithm, 592
Canonical base, 136 Canonical form, 673, 696 Canonical LESS, 134 Canonical MINOLESS, 297, 299, 300 Canonical observation vector, 132 Canonical unknown vector, 136 Cartesian coordinates, 117, 642, 675
Cauchy–Green tensor, 128 Cayley calculus, 236 Cayley inverse, 342, 395 Centering matrix, 465, 468 Central point, 362 Central statistical moment of second order, 189 Centralized Lagrangean, 465 Choleski decomposition, 617 Choleski factorization, 136 Circular normal distribution, 363, 374, 375 Clifford algebra, 593 Columns space, 141 Commutativity, 595 Complex algebra, 593 Composition algebra, 592, 622 Condition equation, 194 Condition equations, 122 Conformal, 380 Constrained Lagrangean, 108, 196, 237, 238, 284, 311 Constraint Lagrangean, 208 Coordinate mapping, 265 Coordinates, 152 Cartesian, 557 Correlated observations, 185 Covariance, 754 Covariance components, 231 Covariance factor, 227 Covariance factorization, 215 Covariances, 215 Criterion matrix, 118 Cross product, 150, 153, 155, 158 Cumulative pdf, 679, 738 Cumulative probability, 723–725 Curved surfaces, 102 Cyclic ordering, 152
E. Grafarend and J. Awange, Applications of Linear and Nonlinear Models, Springer Geophysics, DOI 10.1007/978-3-642-22241-2, © Springer-Verlag Berlin Heidelberg 2012
1011
1012 D. Hilbert, 538 Datum defect, 265 Datum problem, 495 Dedekind, 538 Degrees of freedom, 94, 228, 267, 682, 729 Diagonal matrix, 467 Differentiable manifold, 16, 102, 104 Differential Geometry, 102 Dilatation, 473, 674, 683 Dilatation factor, 650 Direct equations, 374 Direct map, 641 Dispersion matrix, 205, 207, 309, 394, 424, 754 Dispersion vector, 217 Distances, 557 Division algebra, 593, 602, 622 Division algorithm, 539 E. Noether, 538 EIGEN, 728 Eigencolumn, 300 Eigenspace, 302 Eigenspace analysis, 111, 130 Eigenspace synthesis, 130, 142 Eigensystems, 135, 301 Eigenvalue, 300, 302 Eigenvalue Analysis, 728 Eigenvalue decomposition, 130, 132, 136 Eigenvalues, 558 positive eigenvalues, 113 zero eigenvalues, 113 Eigenvectors, 302 Elliptic curve, 375 Equations polynomial, 538 Error propagation, 470 Error propagation law, 210, 402, 403, 407, 758 Error vector, 210 Error-in-variables, 384 Euclidean, 113 Euclidean length, 95 Euclidean metric forms, 375 Euclidean space, 112, 118 Euclidian observation space, 642 Euler circles, 104 Euler–Lagrange strain, 128 Euler–Lagrange tensor, 128 Expectation operator, 757 Fibering, 95, 104, 263, 268 Finite matrices, 602 First central moment, 754
Index First derivatives, 96, 106, 191, 284, 303, 366, 413, 450 First moment, 194, 754 First moments, 422 First order moments, 205 1st Grassmann relation, 150 Fisher normal distributed, 378 Fisher sphere, 361 Fisher-von Mises, 362 Fixed effects, 187, 307, 317, 326, 329, 333, 334, 339, 384 Fourier, 381 Fourier–Legendre linear model, 59 Fourier–Legendre space, 51 Fourier-Legendre series, 381 Fourth central moment, 755 Fourth order moment, 196 Free parameters, 3 Frobenius, 311 Frobenius error matrix, 462 Frobenius matrix, 468 Frobenius norm, 116, 311, 332, 387 Frobenius norms, 334, 387 Frobenius theorem, 593 Fuzzy sets and systems, 128
G-inverse, 103 Gain matrix, 520 Gamma function, 645 Gauss elimination, 110, 209, 414, 451, 534, 535, 545 Gauss normal distribution, 362 Gauss–Jacobi Combinatorial Algorithm, 172, 176, 178 Gauss–Markov model, 208 Gauss-Lapace normal distribution, 688 Gauss-Laplace pdf, 677 Gauß trapezoidal, 402 General bases, 130 General Gauss–Markov model with mixed effects, 426 General inverse Helmert transformation, 673 General reciprocal, 602 General Relativity, 113 Generalized inverse, 103, 105, 129, 194, 315, 317 Geodesic, 362 Geodesy, 529 Grassmann column vector, 166 Grassmann coordinates, 141, 150, 152, 154, 158, 159, 166, 169 direct, 170 inverse, 170, 171
Index Grassmann manifold, 102 Grassmann space, 141, 152, 154, 158, 164, 168, 169 Grassmann vector, 159 Grassmann–Pl¨ucker coordinates, 122 Greatest common divisors (gcd), 534 Groebner basis, 527, 539, 592 definition, 540 Maple computation, 891 Mathematica computation, 889
Hankel matrix, 606, 610 Hat matrix, 114, 122, 146, 149, 150 Helmert equation, 231 Helmert matrix, 231, 609 augmented, 678 extended, 678 quadratic, 678 Helmert random variable, 694 Helmert traces, 231 Helmert transformation, 679 quadratic , 677 rectangular , 677 Helmert’s ansatz, 229, 230 Helmert’s distribution, 681 Helmert’s polar coordinates, 733 Helmert’s random variable, 708 Hesse matrix, 114, 204, 403, 474, 759 High leverage points, 172 Hilbert basis invariants, 128 Hilbert Basis Theorem, 538, 539 Hilbert space, 108 Hodge dualizer, 151 Hodge star operator, 141, 150, 587, 640 Hodge star symbol, 158 Holonomity condition, 120, 122 Homogeneous equations, 223 Homogeneous inconsistent equations, 411 Homogeneous linear prediction, 438 Horizontal rank partitioning, 122, 132 Hurwitz theorem, 593 Hypothesis testing, 339, 394
Ideal, 536, 537 Ideal membership, 540 Idempotent, 113, 213 Implicit Function Theorem, 263 Inconsistent linear equation, 506 Independent random variables, 640 Inhomogeneous inconsistent equations, 411 Inhomogeneous linear equations, 100 Injective, 265, 267
1013 Injective mapping, 638 Injectivity, 94 Injectivity defect, 302 Inner product, 111, 363 Interval calculus, 122 Invariant quadratic estimation, 187, 221, 237 Inverse equations, 374 Inverse Jacobian determinant, 372 Inverse map, 102, 641 Inverse partitioned matrices, 109 Inverse spectrum, 149 Inverse transformations, 132 IQUUE, 190
Jacobi determinant, 666, 673 Jacobi map, 102 Jacobi matrix, 69, 102, 114, 403, 641, 700, 759 Jacobian, 641 Jacobian determinant, 551, 647, 684 Join, 150 Join and meet, 141
Kalman filter, 493, 517, 519, 520 Kernel, 94 Khatri–Rao product, 116 Killing analysis, 598 Kolmogorov–Wiener prediction, 440 Krige’s prediction concept, 441 Kronecker, 538 Kronecker matrix, 116 Kronecker–Zehfuss product, 116, 338, 391, 495 Kummer, 538 KW prediction, 441
Lagrange function, 204 Lagrange multiplier, 105, 108, 110, 191, 238, 414 Lagrange multipliers, 284, 310, 413, 415, 422, 423 Lagrange parameter, 449 Lagrange polynomial, 124 Lagrangean, 72, 336, 366, 390, 556, 557 multiplier, 536 Lanczos method, 136 Langevin distribution, 362 Langevin sphere, 361 Latent condition equations, 142, 168 Latent conditions, 164 Latent restrictions, 141, 168 Laurent polynomials, 599
1014 Leading Monomial (LM), 539 Least Common Multiple (LCM), 541 Least squares, 95, 180, 185, 240, 563, 602 Historical background, 180 Least squares solution, 98, 103, 108, 268, 272, 411, 414 Left complementary index, 267 Left eigenspace, 133 Left Gauss metric, 128 Left inverse, 129 LESS, 98 Leverage point, 157, 162 Lexicographic ordering, 536, 545, 887 graded reverse, 888 Lie algebra, 592, 598 Likelihood normal equations, 204 Linear estimation, 332 Linear Gauss–Markov model, 421 Linear model, 108, 121, 150 Linear operator, 95, 265, 268, 757 Linear space, 103, 265 Loss function, 361, 366 LUUE, 190
Maple, 536 Mathematica, 536 Matlab, 536 “roots” command, 536 Matrix algebra, 265, 593, 602 Maximal rank, 103 Maximum, 536 Maximum Likelihood Estimation, 187, 202 Maximum norm solution, 113 Mean, 198 Mean matrix, 754 Mean Square Estimation Error, 307, 339, 345, 386, 392–395, 399 Measurement errors, 517 Median, 187, 198 Median absolute deviation, 187 Meet, 150, 153 Minimum, 536 Minimum bias, 310 Minimum distance mapping, 101, 102 Minimum geodesic distance, 362, 366 Minimum norm, 268, 272, 618, 619 Minimum variance, 207, 422 MINOLESS, 266 Mixed models, 240 Model manifold, 265 Modified Bessel function, 363 Monomial ordering, 887 Moore–Penrose inverse, 274, 289
Index Moore-Penrose inverse, 602 Multideg, 540 Multilinear algebra, 150 Multiplicative rank partitioning, 266, 272 Multipolynomial resultant, 547, 592 Multipolynomial resultants, 529, 549 Multivariate Gauss–Markov model, 496 Multivariate LESS, 495 Multivariate polynomials, 539
Necessary condition, 557 Necessary conditions, 96, 106, 109, 191, 204, 208, 238, 270, 284, 304, 312, 336, 337, 367, 390, 413, 448, 451 Non-symmetric matrices, 135 Non-vanishing eigenvalues, 111 Nondegenerate bilinear form, 111 Nonlinear error propagation, 758 Nonlinear model, 102 Nonlinear models, 104, 525 Nonlinear prediction, 438 Nonlinear problem, 461 Normal equation, 367 Normal equations, 110, 191, 310, 337, 390, 391, 412, 414–416, 511 Normal models, 196 Normal space, 101 Normalized column vectors, 166 Normalized Grassmann vector, 159 Null space, 94, 265–267
Objective functions, 113, 114 Oblique normal distribution, 374, 375 Observation equation, 518 Observation point, 102 Observation space, 94, 102, 111, 113, 130, 187, 214, 265, 266, 268, 364 Observational equations, 511 Observed Fischer information, 204 Octonian algebra, 593 One variance component, 223, 226, 230, 236 Optimality conditions, 206 Optimization problem, 202 Orthogonal, 51 Orthogonal column space, 141 Orthogonal complement, 101 Orthogonal matrix, 606 Orthogonal projection , 104 Orthogonality condition, 157, 229 Orthogonality conditions, 164 Orthonormal, 51, 679 Orthonormal base vectors, 130, 150
Index Orthonormal frame of reference, 154 Orthonormal matrix, 467, 474, 608 Orthonormality, 673 Outlier, 188, 198 Outliers, 172, 332 Overdetermined, 95, 186 Overdetermined system of linear equations, 136
Parameter space, 94, 102, 175, 187, 265, 266, 268, 364, 450 Partial differential equations, 540 Partial inverse, 602 Partial redundancies, 164 PCA, 728 Permutation matrix, 198 Permutation operator, 151 Pl¨ucker coordinate, 153 Pl¨ucker coordinates, 141, 150, 154, 158, 159, 166, 169 Polar coordinates, 363, 642, 675 Polar decompositions, 131 Polynomial, 528 Polynomial coefficients, 94 Polynomial equations, 540 Polynomial of degree one, 94 Polynomials equivalence, 540 ordering lexicographic, 535 roots, 536 univariate, 536, 559 Positive definite, 106, 109, 113, 116, 130, 207, 208, 296, 308, 312, 314, 332, 338, 391, 416, 427, 558, 617 Positive semidefinite, 106, 108, 112, 296, 617 Prediction equations, 520 Prediction error, 520 Prediction stage, 519 Principal Component Analysis, 728 Probability density function, 363, 638 Probability distribution, 194 Procrustes Algorithm, 461, 464, 472 Projection lines, 556 Projection matrices, 213 Pseudo-observations, 176 Pseudoinvese, 602 Pullback operation, 175
Quadratic bias, 342, 396 Quadratic estimation, 193 Quadratic estimator, 193
1015 Quadratic form, 189 Quadratic Lagrangean, 337, 390 Quaternion algebra, 593 Quotient set, 104
Random, 428 Random effect, 329, 339, 384, 386, 392, 399, 426, 438 Random regression model, 474 Random variable, 519 Range, 267 Range space, 94, 103, 266, 267 Rank factorization, 289, 617 Rank identity, 284 Rank partitioning, 12, 95, 132, 157, 164, 215, 263, 266, 298 Reduced Groebner basis, 537, 890 Reduced Lagrangean, 465 Redundancy, 147 Redundancy matrix, 146, 147, 157, 164 Regularization parameter, 307 Relative index, 111 Residual vector, 192, 316 Ricci calculus, 236 Ridge estimator, 342, 395 Riemann manifold, 380 Right complementary index, 267 Right eigenspace, 133 Right eigenspace analysis, 146 Right Gauss metric, 128 Right strain component, 128 Risk function, 449, 464 Robust, 332 Robust approximation, 122 Root mean square error, 187, 198, 679 Rotation, 131, 461, 674 Rotation matrix, 684 Rotational invariances, 439
Sample mean, 665, 667, 676 Sample median, 188, 198 Sample variance, 667, 676, 683 Scale change, 674 Scale factor, 461 Schur complement, 615 Second central moment, 339, 393 Second central moments, 385 Second derivative, 106, 109, 271, 304, 367, 413, 450 Second moment, 194, 754 Second order design, 114, 115 Second order moment, 196
1016 Second order moments, 205 Selfdual, 155 Semidefinite, 111 Sequential optimization problem, 312 Series inversion, 659 Set-theoretical partitioning, 104 Similarity transformation, 521, 522 Simple linear model, 200 Singular value decomposition, 467, 728 Skew product symbol, 158 Slices, 103 Slicing, 95, 263 Spatial redundancies, 157 Special Gauss–Markov model, 189, 205, 206, 208, 220, 223, 225, 239, 315, 318, 322, 326 Special linear Gauss–Markov model, 392 with datum defect, 308 Special Relativity, 113 Spherical coordinates, 361 Standard deviation, 188, 681 Standard multivariate model, 497 State differential equation, 522 State equations, 519 Statistical moment of first order, 189 Stochastic, 205 Stochastic observations, 187, 517 Student’s random variable, 699 Subdeterminants, 157, 162, 169 Sufficiency condition, 96, 106, 109, 192, 284, 413 Sufficiency conditions, 204, 208, 304, 312, 337, 390, 391 Sufficient condition, 558 Sufficient conditions, 451 Surjective, 94, 265, 267 Surjectivity defect, 94, 228, 302 SVD, 728 Sylvester resultant, 548 Symmetric matrix, 129, 189, 224 System equations, 518
Tangent space, 102 Taylor series, 399, 653, 655 Taylor–Karman matrix, 404 Taylor–Karman structure, 118, 402 Theory of Turbulence, 118 Third central moment, 754 Three fundamental partitionings, 95
Index Total degree, 888 Total least squares, 108, 448 Trace, 390 Transformation matrix, 521 Translation, 461, 674 Translation vector, 473 Translational invariant, 190, 193, 195, 228 Trend components, 441 Tykhonov–Phillips regularization, 302, 307 Tykhonov–Phillips regulator, 342, 395
Unbiased estimation, 677 Unconstrained Lagrangean, 96, 106, 271, 464 Uniformly unbiased, 191, 193 Unity, 595 Univariate polynomials, 539 Unknown parameters, 115 Updating equations, 520
Vandermonde matrix, 606, 611 Variance, 215, 231, 754 Variance factor, 227 Variance factorization, 215 Variance-covariance, 215, 240, 403, 754 Vector derivative, 106 Vector of curtosis, 755 Venn diagram, 104 Vertical rank partitioning, 95, 98 Von Mises, 362 Von Mises circle, 361 Von Mises distribution, 755
Weak isotropy, 439 Weight factor, 342, 395 Weight matrix, 115, 199, 472 Weighted arithmetic mean, 175 Weighted arithmetic means, 178 Weighted Frobenius matrix norm, 207 Weighted left inverse, 129 Weighted Moore–Penrose inverse, 293 Witt algebra, 592, 599
Zehfuss product, 116 Zero correlation, 385