Intelligent Systems: Approximation by Artificial Neural Networks

George A. Anastassiou Intelligent Systems: Approximation by Artiﬁcial Neural Networks Intelligent Systems Reference L...

Author: George A. Anastassiou

158 downloads 1744 Views 837KB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

George A. Anastassiou Intelligent Systems: Approximation by Artiﬁcial Neural Networks

Intelligent Systems Reference Library, Volume 19 Editors-in-Chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail: [email protected]

Prof. Lakhmi C. Jain University of South Australia Adelaide Mawson Lakes Campus South Australia 5095 Australia E-mail: [email protected]

Further volumes of this series can be found on our homepage: springer.com Vol. 1. Christine L. Mumford and Lakhmi C. Jain (Eds.) Computational Intelligence: Collaboration, Fusion and Emergence, 2009 ISBN 978-3-642-01798-8 Vol. 2. Yuehui Chen and Ajith Abraham Tree-Structure Based Hybrid Computational Intelligence, 2009 ISBN 978-3-642-04738-1 Vol. 3. Anthony Finn and Steve Scheding Developments and Challenges for Autonomous Unmanned Vehicles, 2010 ISBN 978-3-642-10703-0 Vol. 4. Lakhmi C. Jain and Chee Peng Lim (Eds.) Handbook on Decision Making: Techniques and Applications, 2010 ISBN 978-3-642-13638-2 Vol. 5. George A. Anastassiou Intelligent Mathematics: Computational Analysis, 2010 ISBN 978-3-642-17097-3 Vol. 6. Ludmila Dymowa Soft Computing in Economics and Finance, 2011 ISBN 978-3-642-17718-7 Vol. 7. Gerasimos G. Rigatos Modelling and Control for Intelligent Industrial Systems, 2011 ISBN 978-3-642-17874-0 Vol. 8. Edward H.Y. Lim, James N.K. Liu, and Raymond S.T. Lee Knowledge Seeker – Ontology Modelling for Information Search and Management, 2011 ISBN 978-3-642-17915-0 Vol. 9. Menahem Friedman and Abraham Kandel Calculus Light, 2011 ISBN 978-3-642-17847-4 Vol. 10. Andreas Tolk and Lakhmi C. Jain Intelligence-Based Systems Engineering, 2011 ISBN 978-3-642-17930-3

Vol. 11. Samuli Niiranen and Andre Ribeiro (Eds.) Information Processing and Biological Systems, 2011 ISBN 978-3-642-19620-1 Vol. 12. Florin Gorunescu Data Mining, 2011 ISBN 978-3-642-19720-8 Vol. 13. Witold Pedrycz and Shyi-Ming Chen (Eds.) Granular Computing and Intelligent Systems, 2011 ISBN 978-3-642-19819-9 Vol. 14. George A. Anastassiou and Oktay Duman Towards Intelligent Modeling: Statistical Approximation Theory, 2011 ISBN 978-3-642-19825-0 Vol. 15. Antonino Freno and Edmondo Trentin Hybrid Random Fields, 2011 ISBN 978-3-642-20307-7 Vol. 16. Alexiei Dingli Knowledge Annotation: Making Implicit Knowledge Explicit, 2011 ISBN 978-3-642-20322-0 Vol. 17. Crina Grosan and Ajith Abraham Intelligent Systems, 2011 ISBN 978-3-642-21003-7 Vol. 18. Achim Zielesny From Curve Fitting to Machine Learning, 2011 ISBN 978-3-642-21279-6 Vol. 19. George A. Anastassiou Intelligent Systems: Approximation by Artiﬁcial Neural Networks, 2011 ISBN 978-3-642-21430-1

George A. Anastassiou

Intelligent Systems: Approximation by Artiﬁcial Neural Networks

123

Prof. George A. Anastassiou University of Memphis Department of Mathematical Sciences Memphis, TN 38152 USA E-mail: [email protected]

ISBN 978-3-642-21430-1

e-ISBN 978-3-642-21431-8

DOI 10.1007/978-3-642-21431-8 Intelligent Systems Reference Library

ISSN 1868-4394

Library of Congress Control Number: 2011930524 c 2011 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typeset & Cover Design: Scientiﬁc Publishing Services Pvt. Ltd., Chennai, India. Printed on acid-free paper 987654321 springer.com

Preface

This brief monograph is the ﬁrst one to deal exclusively with the quantitative approximation by artiﬁcial neural networks to the identity-unit operator. Here we study with rates the approximation properties of the “right” sigmoidal and hyperbolic tangent artiﬁcial neural network positive linear operators. In particular we study the degree of approximation of these operators to the unit operator in the univariate and multivariate cases over bounded or unbounded domains. This is given via inequalities and with the use of modulus of continuity of the involved function or its higher order derivative. We examine the real and complex cases. For the convenience of the reader, the chapters of this book are written in a self-contained style. This treatise relies on author’s last two years of related research work. Advanced courses and seminars can be taught out of this brief book. All necessary background and motivations are given per chapter. A related list of references is given also per chapter. My book’s results appeared for the ﬁrst time in my published articles which are mentioned throughout the references. They are expected to ﬁnd applications in many areas of computer science and applied mathematics, such as neural networks, intelligent systems, complexity theory, learning theory, vision and approximation theory, etc. As such this monograph is suitable for researchers, graduate students, and seminars of the above subjects, also for all science libraries. The preparation of this booklet took place during 2010-2011 in Memphis, Tennessee, USA. I would like to thank my family for their dedication and love to me, which was the strongest support during the writing of this book. March 1, 2011

George A. Anastassiou Department of Mathematical Sciences The University of Memphis, TN, U.S.A.

Contents

1

2

3

4

Univariate Sigmoidal Neural Network Quantitative Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Background and Auxiliary Results . . . . . . . . . . . . . . . . . . . . . . . 1.3 Real Neural Network Quantitative Approximations . . . . . . . . 1.4 Complex Neural Network Quantitative Approximations . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 3 8 28 31

Univariate Hyperbolic Tangent Neural Network Quantitative Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Basic Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Real Neural Network Quantitative Approximations . . . . . . . . 2.4 Complex Neural Network Quantitative Approximations . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33 33 34 42 62 65

Multivariate Sigmoidal Neural Network Quantitative Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Background and Auxiliary Results . . . . . . . . . . . . . . . . . . . . . . . 3.3 Real Multivariate Neural Network Quantitative Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Complex Multivariate Neural Network Quantitative Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67 67 69 75 82 87

Multivariate Hyperbolic Tangent Neural Network Quantitative Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2 Basic Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.3 Real Multivariate Neural Network Quantitative Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.4 Complex Multivariate Neural Network Quantitative Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Chapter 1

Univariate Sigmoidal Neural Network Quantitative Approximation

Here we give the univariate quantitative approximation of real and complex valued continuous functions on a compact interval or all the real line by quasi-interpolation sigmoidal neural network operators. This approximation is obtained by establishing Jackson type inequalities involving the modulus of continuity of the engaged function or its high order derivative. The operators are deﬁned by using a density function induced by the logarithmic sigmoidal function. Our approximations are pointwise and with respect to the uniform norm. The related feed-forward neural network is with one hidden layer. This chapter relies on [4].

1.1

Introduction

Feed-forward neural networks (FNNs) with one hidden layer, the only type of networks we deal with in this chapter, are mathematically expressed as Nn (x) =

n

cj σ (aj · x + bj ) ,

x ∈ Rs ,

s ∈ N,

j=0

where for 0 ≤ j ≤ n, bj ∈ R are the thresholds, aj ∈ Rs are the connection weights, cj ∈ R are the coeﬃcients, aj · x is the inner product of aj and x, and σ is the activation function of the network. In many fundamental network models, the activation function is the sigmoidal function of logistic type. It is well known that FNNs are universal approximators. Theoretically, any continuous function deﬁned on a compact set can be approximated to any desired degree of accuracy by increasing the number of hidden neurons. It was shown by Cybenko [11] and Funahashi [13], that any continuous function G.A. Anastassiou: Intelligent Systems: Approximation by ANN, ISRL 19, pp. 1–32. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com

2

1 Univariate Sigmoidal Neural Network Quantitative Approximation

can be approximated on a compact set with uniform topology by a network of the form Nn (x), using any continuous, sigmoidal activation function. Hornik et al. in [15], have proved that any measurable function can be approached with such a network. Furthermore, these authors established in [16], that any function of the Sobolev spaces can be approached with all derivatives. A variety of density results on FNN approximations to multivariate functions were later established by many authors using diﬀerent methods, for more or less general situations: [18] by Leshno et al., [22] by Mhaskar and Micchelli, [10] by Chui and Li, [8] by Chen and Chen, [14] by Hahm and Hong, etc. Usually these results only give theorems about the existence of an approximation. A related and important problem is that of complexity: determining the number of neurons required to guarantee that all functions belonging to a space can be approximated to the prescribed degree of accuracy . Barron [5] shows that if the function is assumed to satisfy certain conditions expressed in terms of its Fourier transform, and if each of the neurons evaluates a sigmoidal activation function, then at most O −2 neurons are needed to achieve the order of approximation . Some other authors have published similar results on the complexity of FNN approximations: Mhaskar and Micchelli [23], Suzuki [24], Maiorov and Meir [20], Makovoz [21], Ferrari and Stengel [12], Xu and Cao [26], Cao et al. [7], etc. The author in [1] and [2], see chapters 2-5, was the ﬁrst to establish neural network approximations to continuous functions with rates by very speciﬁcally deﬁned neural network operators of Cardaliagnet-Euvrard and ”Squashing” types, by employing the modulus of continuity of the engaged function or its high order derivative, and producing very tight Jackson type inequalities. He treats there both the univariate and multivariate cases. The deﬁning these operators ”bell-shaped” and ”squashing” function are assumed to be of compact support. Also in [2] he gives the N th order asymptotic expansion for the error of weak approximation of these two operators to a special natural class of smooth functions, see chapters 4-5 there. For this chapter the author is greatly motivated by the important article [9] by Z. Chen and F. Cao. He presents related to it work and much more beyond however [9] remains the initial point. So the author here performs univariate sigmoidal neural network approximations with rates to continuous functions over compact intervals of the real line or over the whole R, then he extends his results to complex valued functions. All convergences here are with rates expressed via the modulus of continuity of the involved function or its high order derivative, and given by very tight Jackson type inequalities.

1.2 Background and Auxiliary Results

3

The author presents here the correct and precisely deﬁned quasiinterpolation neural network operator related to compact intervals, and among others, improves results from [9]. The compact intervals are not necessarily symmetric to the origin. Some of the upper bounds to error quantity are very ﬂexible and general. In preparation to establish our results we prove further properties of the basic density function deﬁning our operators.

1.2

Background and Auxiliary Results

We consider here the sigmoidal function of logarithmic type s (x) =

1 , 1 + e−x

x ∈ R.

It has the properties lim s (x) = 1 and lim s (x) = 0. x→+∞

x→−∞

This function plays the role of an activation function in the hidden layer of neural networks, also has application in biology, demography, etc. ([6, 17]). As in [9], we consider Φ (x) :=

1 (s (x + 1) − s (x − 1)) , x ∈ R. 2

It has the following properties: i) Φ (x) > 0, ∀ x ∈ R, ∞ ii) k=−∞ Φ (x − k) = 1, ∀ x ∈ R, ∞ iii) k=−∞ Φ (nx − k) = 1, ∀ x ∈ R; n ∈ N, ∞ iv) −∞ Φ (x) dx = 1, v) Φ is a density function, vi) Φ is even: Φ (−x) = Φ (x), x ≥ 0. We observe that ([9]) 2 e −1 e−x Φ (x) = = −x−1 2e (1 + e ) (1 + e−x+1 ) 2 e −1 1 , 2e2 (1 + ex−1 ) (1 + e−x−1 ) and

Φ (x) =

e2 − 1 2e2

−

(ex − e−x ) 2

e (1 + ex−1 ) (1 + e−x−1 )

2

≤ 0, x ≥ 0.

4

1 Univariate Sigmoidal Neural Network Quantitative Approximation

Hence vii) Φ is decreasing on R+ , and increasing on R− . Let 0 < α < 1, n ∈ N. We see the following ∞ ⎧ ⎨ ⎩

Φ (nx − k) =

k = −∞ : |nx − k| > n1−α

∞ ⎧ ⎨ ⎩

Φ (|nx − k|) ≤

k = −∞ : |nx − k| > n1−α

∞ e2 − 1 1 dx ≤ x−1 ) (1 + e−x−1 ) 2e2 (1 + e 1−α (n −1) 2 ∞ 2 e −1 e − 1 −(n1−α −1) e−x dx = e 2e 2e (n1−α −1) 2 (1−α) e − 1 −n(1−α) = e = 3.1992e−n . 2 We have found that: viii) for n ∈ N, 0 < α < 1, we get

∞ ⎧ ⎨ ⎩

Φ (nx − k) <

k = −∞ : |nx − k| > n1−α

e2 − 1 2

(1−α)

e−n

(1−α)

= 3.1992e−n

.

Denote by · the ceiling of a number, and by · the integral part of a number. Consider x ∈ [a, b] ⊂ R and n ∈ N such that na ≤ nb . We observe that 1=

∞

nb

Φ (nx − k) >

k=−∞

Φ (nx − k) =

k=na

nb

Φ (|nx − k|) > Φ (|nx − k0 |) ,

k=na

for any k0 ∈ [ na , nb ] ∩ Z. Here we can choose k0 ∈ [ na , nb ] ∩ Z such that |nx − k0 | < 1.

1.2 Background and Auxiliary Results

5

Therefore Φ (|nx − k0 |) > Φ (1) = 0.19046485. Consequently, nb

Φ (nx − k) > Φ (1) = 0.19046485.

k=na

Therefore we get ix)

nb

1 Φ(nx−k)

<

k=na

1 Φ(1)

= 5.250312578, ∀ x ∈ [a, b] .

We also notice that nb

1−

na −1

Φ (nb − k) =

k=−∞

k=na

∞

Φ (nb − k) +

Φ (nb − k)

k=nb+1

> Φ (nb − nb − 1) (call ε := nb − nb , 0 ≤ ε < 1) = Φ (ε − 1) = Φ (1 − ε) ≥ Φ (1) > 0. nb Therefore lim 1 − k=na Φ (nb − k) > 0. n→∞ Similarly, nb

1−

na −1

Φ (na − k) =

k=na

∞

Φ (na − k) +

k=−∞

Φ (na − k)

k=nb+1

> Φ (na − na + 1) (call η := na − na, 0 ≤ η < 1) = Φ (1 − η) ≥ Φ (1) > 0. nb Therefore again lim 1 − k=na Φ (na − k) > 0. n→∞ Therefore we obtain that nb x) lim Φ (nx − k) = 1, for at least some x ∈ [a, b]. k=na n→∞

Let f ∈ C ([a, b]) and n ∈ N such that na ≤ nb . We introduce and deﬁne the positive linear neural network operator nb Gn (f, x) :=

f nk Φ (nx − k=na nb k=na Φ (nx − k)

k)

,

x ∈ [a.b] .

(1.1)

6

1 Univariate Sigmoidal Neural Network Quantitative Approximation

For large enough n we always have na ≤ nb . Also a ≤ nk ≤ b, iﬀ na ≤ k ≤ nb . We study here the pointwise convergence of Gn (f, x) to f (x) with rates. For convenience we call nb

G∗n (f, x)

:=

k=na

that is

k f Φ (nx − k) , n

G∗ (f, x) Gn (f, x) := nb n . Φ (nx − k) k=na

Thus,

G∗ (f, x) Gn (f, x) − f (x) = nb n − f (x) k=na Φ (nx − k) nb G∗n (f, x) − f (x) k=na Φ (nx − k) = . nb Φ (nx − k) k=na

(1.2)

(1.3)

(1.4)

Consequently we derive nb 1 ∗ . |Gn (f, x) − f (x)| ≤ G (f, x) − f (x) Φ (nx − k) n Φ (1) k=na

(1.5)

That is

nb k |Gn (f, x) − f (x)| ≤ (5.250312578) f − f (x) Φ (nx − k) . n k=na (1.6) We will estimate the right hand side of (1.6). For that we need, for f ∈ C ([a, b]) the ﬁrst modulus of continuity ω1 (f, h) :=

sup |f (x) − f (y)| , h > 0. x, y ∈ [a, b] |x − y| ≤ h

(1.7)

Similarly it is deﬁned for f ∈ CB (R) (continuous and bounded on R). We have that lim ω1 (f, h) = 0. h→0

When f ∈ CB (R) we deﬁne, (see also [9]) Gn (f, x) :=

∞ k=−∞

k f Φ (nx − k) , n ∈ N, x ∈ R, n

(1.8)

the quasi-interpolation neural network operator. By [3] we derive the following three theorems on extended Taylor formula.

1.2 Background and Auxiliary Results

Theorem 1.1. Let N ∈ N, 0 < ε < x, y ∈ − π2 + ε, π2 − ε . Then

7 π 2

small, and f ∈ C N

π − 2 + ε, π2 − ε ;

(k) N f ◦ sin−1 (sin y) k f (x) = f (y) + (sin x − sin y) + KN (y, x) , (1.9) k! k=1

where KN (y, x) =

x

1 · (N − 1)!

(1.10)

(N ) (N ) f ◦ sin−1 (sin s)− f ◦ sin−1 (sin y) cos s ds.

(sin x − sin s)N−1

y

Theorem 1.2. Let f ∈ C N ([ε, π − ε]), N ∈ N, ε > 0 small; x, y ∈ [ε, π − ε]. Then (k) N f ◦ cos−1 (cos y) k ∗ f (x) = f (y) + (cos x − cos y) + KN (y, x) , (1.11) k! k=1

where ∗ KN (y, x) = −

x

N −1

(cos x−cos s)

f ◦ cos−1

(N )

1 · (N − 1)!

(1.12)

(N ) (cos s)− f ◦ cos−1 (cos y) sin s ds.

y

Theorem 1.3. Let f ∈ C N ([a, b]) (or f ∈ C N (R)), N ∈ N; x, y ∈ [a, b] (or x, y ∈ R). Then

f (x) = f (y) +

(k) N f ◦ ln 1e (e−y ) k=1

k!

e−x − e−y

k

+ K N (y, x) ,

(1.13)

where

1 · (1.14) (N − 1)! x (N ) (N ) −x N−1 e − e−s f ◦ ln 1e e−s − f ◦ ln 1e e−y e−s ds. K N (y, x) = −

y

Remark 1.4. Using the mean value theorem we get |sin x − sin y| ≤ |x − y| , |cos x − cos y| ≤ |x − y| ,

(1.15) ∀ x, y ∈ R,

furthermore we have |sin x − sin y| ≤ 2, |cos x − cos y| ≤ 2,

(1.16) ∀ x, y ∈ R.

8

1 Univariate Sigmoidal Neural Network Quantitative Approximation

Similarly we have and

−x e − e−y ≤ e−a |x − y| ,

−x e − e−y ≤ e−a − e−b ,

∀ x, y ∈ [a, b] .

(1.17) (1.18)

Let g (x) = ln 1e x, sin−1 x, cos−1 x and assume f (j) (x0 ) = 0, k = 1, ..., N . (j) Then, by [3], we get f ◦ g −1 (g (x0 )) = 0, j = 1, ..., N. x m Remark 1.5. It is well known that e > x , m ∈ N, for large x > 0. α Let ﬁxed α, β > 0, then β ∈ N, and for large x > 0 we have

ex > x β ≥ x β . α

α

So for suitable very large x > 0 we obtain α β ex > xβ β = xα . We proved for large x > 0 and α, β > 0 that β

ex > xα .

(1.19)

Therefore for large n ∈ N and ﬁxed α, β > 0, we have β

That is

en > nα .

(1.20)

e−n < n−α , for large n ∈ N.

(1.21)

β

So for 0 < α < 1 we get e−n

(1−α)

< n−α .

(1.22)

Thus be given ﬁxed A, B > 0, for the linear combination −α −n(1−α) An + Be the (dominant) rate of convergence to zero is n−α . The closer α is to 1 we get faster and better rate of convergence to zero.

1.3

Real Neural Network Quantitative Approximations

Here we present a series of neural network approximations to a function given with rates. We ﬁrst give.

1.3 Real Neural Network Quantitative Approximations

9

Theorem 1.6. Let f ∈ C ([a, b]) , 0 < α < 1, n ∈ N, x ∈ [a, b]. Then i) (1−α) 1 |Gn (f, x)−f (x)| ≤ (5.250312578) ω1 f, α +6.3984 f ∞ e−n =: λ, n (1.23) and ii) Gn (f ) − f ∞ ≤ λ, (1.24) where ·∞ is the supremum norm. Proof. We observe that nb k ≤ f − f (x) Φ (nx − k) n k=na k f − f (x) Φ (nx − k) = n

nb

k=na nb

⎧ ⎨ ⎩

kk = na

− x ≤ 1α n n nb

⎧ ⎨ ⎩

k k = na

− x > 1α n n nb

⎧ ⎨ ⎩

kk = na

− x ≤ 1α n n

f k − f (x) Φ (nx − k) + n

f k − f (x) Φ (nx − k) ≤ n

k ω1 f, − x Φ (nx − k) + n

nb

2 f ∞

⎧ ⎨ ⎩

1 ω1 f, α n

Φ (nx − k) ≤

k = na

|k − nx| > n1−α ∞ ⎧ ⎨ ⎩

k k = −∞ 1 − x ≤ α n

n

Φ (nx − k) +

10

1 Univariate Sigmoidal Neural Network Quantitative Approximation

2 f ∞

∞ ⎧ ⎨

Φ (nx − k)

≤

(by (viii))

k = −∞ |k − nx| > n1−α (1−α) 1 ω1 f, α + 2 f ∞ (3.1992) e−n n (1−α) 1 = ω1 f, α + 6.3984 f ∞ e−n . n ⎩

That is nb (1−α) k 1 f − f (x) Φ (nx − k) ≤ ω1 f, α + 6.3984 f ∞ e−n . n n k=na Using (1.6) we prove the claim. Theorem 1.6 improves a lot the model of neural network approximation of [9], see Theorem 3 there. Next we give Theorem 1.7. Let f ∈ CB (R), 0 < α < 1, n ∈ N, x ∈ R. Then i) Gn (f, x) − f (x) ≤ ω1 f, 1 + 6.3984 f e−n(1−α) =: μ, ∞ nα and ii) Proof. We see that

Gn (f ) − f ≤ μ. ∞

(1.25)

(1.26)

∞ ∞ k Gn (f, x) − f (x) = f Φ (nx − k) − f (x) Φ (nx − k) = n k=−∞ k=−∞ ∞ ∞ k k f − f (x) Φ (nx − k) ≤ f − f (x) Φ (nx − k) = n n k=−∞ k=−∞ ∞ f k − f (x) Φ (nx − k) + n ⎧ ⎨ k = −∞ ⎩ k − x ≤ 1 n nα ∞ f k − f (x) Φ (nx − k) ≤ n ⎧ ⎨ k = −∞ ⎩ k − x > 1 n nα

1.3 Real Neural Network Quantitative Approximations

k ω1 f, − x Φ (nx − k) + n

∞ ⎧ ⎨ ⎩

k k = −∞ 1 − x ≤ α n

11

n

2 f ∞

∞ ⎧ ⎨ ⎩

k k = −∞ 1 − x > α n

1 ω1 f, α n ⎧ ⎨ ⎩

2 f ∞

n

∞

Φ (nx − k) +

k k = −∞ 1 − x ≤ α n

∞ ⎧ ⎨

Φ (nx − k) ≤

n

Φ (nx − k)

≤

(by (viii))

k = −∞ |k − nx| > n1−α (1−α) 1 ω1 f, α + 2 f ∞ (3.1992) e−n n (1−α) 1 = ω1 f, α + 6.3984 f ∞ e−n , n ⎩

proving the claim. Theorem 1.7 improves Theorem 4 of [9]. In the next we discuss high order of approximation by using the smoothness of f . Theorem 1.8. Let f ∈ C N ([a, b]), n, N ∈ N, 0 < α < 1, x ∈ [a, b]. Then i) |Gn (f, x) − f (x)| ≤ (5.250312578) · (1.27) ⎧ N (j) ⎨ f (x) 1 j −n(1−α) + (3.1992) (b − a) e + ⎩ j! nαj j=1

(N ) f (b − a)N (6.3984) (1−α) 1 1 ∞ ω1 f (N ) , α + e−n , n nαN N ! N!

ii) assume further f (j) (x0 ) = 0, j = 1, ..., N , for some x0 ∈ [a, b], it holds |Gn (f, x0 ) − f (x0 )| ≤ (5.250312578) · (1.28)

N (6.3984) f (N) ∞ (b − a) −n(1−α) 1 (N) 1 ω1 f , α + e , n nαN N ! N!

12

1 Univariate Sigmoidal Neural Network Quantitative Approximation

notice here the extremely high rate of convergence at n−(N+1)α , iii) Gn (f ) − f ∞ ≤ (5.250312578) · ⎧ N (j) ⎨ f 1 j −n(1−α) ∞ + (3.1992) (b − a) e + ⎩ j! nαj j=1

(1.29)

N (6.3984) f (N ) ∞ (b − a) −n(1−α) 1 (N ) 1 ω1 f , α + e . n nαN N ! N!

Proof. Next we apply Taylor’s formula with integral remainder. We have (here nk , x ∈ [a, b]) j k N k − tN −1 n k f (j) (x) k (N ) (N ) f = −x + f (t) − f (x) n dt. n j! n (N − 1)! x j=0 Then f

j N k f (j) (x) k Φ (nx − k) = Φ (nx − k) −x + n j! n j=0

k n

Φ (nx − k) x

Hence

nb

k=na

k − tN −1 (N ) (N ) f (t) − f (x) n dt. (N − 1)!

nb k f Φ (nx − k) − f (x) Φ (nx − k) = n k=na

nb N f (j) (x)

j!

j=1

k=na

nb

Φ (nx − k)

Φ (nx − k)

k n

f

(N )

(t) − f

(N )

x

k=na

k −x n

j +

k − tN −1 (x) n dt. (N − 1)!

Thus ⎛ G∗n (f, x) − f (x) ⎝

nb

⎞ Φ (nx − k)⎠ =

N f (j) (x) j=1

k=na

j!

G∗n (· − x)j + Λn (x) ,

where

nb

Λn (x) :=

k=na

Φ (nx − k) x

k n

k − tN−1 (N ) (N ) f (t) − f (x) n dt. (N − 1)!

1.3 Real Neural Network Quantitative Approximations

We suppose that b− a >

1 nα , which 1 −α

that is when n > (b − a) . k k 1 Thus n − x ≤ nα or n − x > As in [2], pp. 72-73 for

k n

γ := in case of nk − x ≤

f

(N )

13

is always the case for large enough n ∈ N, 1 nα .

(t) − f

(N )

x

k − tN −1 (x) n dt, (N − 1)!

1 , nα

we ﬁnd that 1 (N ) 1 |γ| ≤ ω1 f , α n nαN N !

(for x ≤ nk or x ≥ nk ). Notice also for x ≤

k n

that

k n k − tN−1 (N ) (N ) n f (t) − f (x) dt ≤ x (N − 1)! k − tN −1 (N ) (N ) (t) − f (x) n dt ≤ f (N − 1)! x k N −1 k − xN (b − a)N −t (N ) n n dt = 2 f ≤ 2 f (N) . (N − 1)! N! N! ∞ ∞

2 f (N )

∞

k n

x

≤ x, then

k n

Next assume

k n

k n k − tN−1 (N ) (N ) n f (t) − f (x) dt = x (N − 1)! x k − tN −1 f (N) (t) − f (N ) (x) n dt ≤ k (N − 1)! n

t − k N −1 (N) (N ) n (t) − f (x) dt ≤ f k (N − 1)! n N −1 x − k N (b − a)N t − nk (N ) n dt = 2 f ≤ 2 f (N ) . (N − 1)! N! N! ∞ ∞

2 f (N )

∞

x k n

Thus

x

|γ| ≤ 2 f (N )

∞

in all two cases.

N

(b − a) , N!

14

1 Univariate Sigmoidal Neural Network Quantitative Approximation

Therefore nb

nb

Λn (x) =

Φ (nx − k) γ +

k=na | nk −x|≤ n1α

Φ (nx − k) γ.

k=na | nk −x|> n1α

Hence nb

|Λn (x)| ≤

k=na

1 1 Φ (nx − k) ω1 f (N ) , α + n N !nN α

| nk −x|≤ n1α ⎛ ⎜ ⎜ ⎜ ⎝

⎞ nb

k=na

⎟ (b − a)N ⎟ Φ (nx − k)⎟ 2 f (N ) ≤ ⎠ N! ∞

| nk −x|> n1α N (1−α) 1 1 (N ) (b − a) ω1 f (N ) , α + 2 (3.1992) e−n . f n N !nN α N! ∞ Consequently we have (N ) f (b − a)N (1−α) 1 (N) 1 ∞ |Λn (x)| ≤ ω1 f , α + (6.3984) e−n . αN n n N! N! We further see that j nb k j G∗n (· − x) = Φ (nx − k) −x . n k=na

Therefore

j nb k ∗ j Φ (nx − k) − x = Gn (· − x) ≤ n k=na

nb

⎧ ⎨

k = na

⎩ k − x ≤ 1 α n

1 nαj

j k Φ (nx − k) − x + n ⎧ ⎨ ⎩

n

nb

kk = na

− x > 1α n

n

nb

nb

⎧ ⎨ ⎩

Φ (nx − k) + (b − a)j

k k = na

− x ≤ 1α n

j k Φ (nx − k) − x ≤ n

n

⎧ ⎨ ⎩

k = na

|k − nx| > n1−α

Φ (nx − k)

1.3 Real Neural Network Quantitative Approximations

≤

15

(1−α) 1 j + (b − a) (3.1992) e−n . αj n

Hence

(1−α) 1 ∗ j j , Gn (· − x) ≤ αj + (b − a) (3.1992) e−n n for j = 1, ..., N. Putting things together we have established |G∗n (f, x)

N (j) f (x) 1 j −n(1−α) − f (x)| ≤ + (3.1992) (b − a) e + j! nαj j=1

N (6.3984) f (N) ∞ (b − a) −n(1−α) 1 (N) 1 ω1 f , α + e , n nαN N ! N!

that is establishing theorem. We make Remark 1.9. We notice that Gn (f, x) −

N f (j) (x) j=1

G∗n (f, x) nb k=na Φ (nx

j!

j Gn (· − x) − f (x) = ⎛

− k)

−

nb k=na

1 Φ (nx − k)

−f (x) =

⎝

n f (j) (x) j=1

j!

G∗n (· − x)

j

⎞ ⎠

1

· − k) ⎡ ⎛ ⎞ ⎛ ⎞ ⎤ nb n (j) f (x) j ⎣G∗n (f, x) − ⎝ G∗n (· − x) ⎠ − ⎝ Φ (nx − k)⎠ f (x)⎦ . j! j=1 nb Φ (nx k=na

k=na

Therefore we get N (j) f (x) j Gn (f, x) − Gn (· − x) − f (x) ≤ (5.250312578) · j! j=1 ⎛ ⎞ ⎛ ⎞ nb n (j) ∗ f (x) ∗ j ⎠ Gn (f, x) − ⎝ , ⎝ ⎠ G (· − x) − Φ (nx − k) f (x) n j! j=1 k=na (1.30) ∀ x ∈ [a, b] . In the next three Theorems 1.10-1.12 we present more general and ﬂexible upper bounds to our error quantities.

16

1 Univariate Sigmoidal Neural Network Quantitative Approximation

We give Theorem 1.10. Let f ∈ C N ([a, b]), n, N ∈ N, 0 < α < 1, x ∈ [a, b]. Then 1) (j) N f ◦ ln 1e (e−x ) j Gn e−· − e−x , x − f (x) ≤ Gn (f, x) − j! j=1 −aN (N) e−a e (5.250312578) ω1 f ◦ ln 1e , α + N !nN α n

−a N (N ) (6.3984) e − e−b f ◦ ln 1 e−n(1−α) , (1.31) e N! ∞

2) |Gn (f, x) − f (x)| ≤ (5.250312578) · (1.32) (j) (e−x ) N f ◦ ln 1 e 1 j −n(1−α) e−aj + (3.1992) (b − a) e + ⎪ j! nαj ⎪ j=1 ⎩ ⎧ ⎪ ⎪ ⎨

(N ) e−a e−aN 1 ω f ◦ ln , α + 1 e N !nαN n

−a N (N ) −n(1−α) (6.3984) e − e−b f ◦ ln 1 e , e N! ∞ 3) If f (j) (x0 ) = 0, j = 1, ..., N , it holds |Gn (f, x0 ) − f (x0 )| ≤ (5.250312578) · −aN (N ) e−a e ω1 f ◦ ln 1e , α + N !nN α n

−a N (N ) −n(1−α) (6.3984) e − e−b f ◦ ln 1 e . e N! ∞ Observe here the speed of convergence is extremely high at

1 . n(N +1)α

Proof. Call F := f ◦ ln 1e . Let x, nk ∈ [a, b]. Then f

N j k F (j) (e−x ) − k k − f (x) = e n − e−x + K N x, , n j! n j=1

where KN

k x, n

:= −

1 · (N − 1)!

(1.33)

1.3 Real Neural Network Quantitative Approximations

k n

k

e− n − e−s

17

N −1 F (N) e−s − F (N ) e−x e−s ds.

x

Thus

nb

k=na

nb k Φ (nx − k) f − f (x) Φ (nx − k) = n k=na

nb k j Φ (nx − k) e− n − e−x + Φ (nx − k)

nb N F (j) (e−x )

j!

j=1

k=na

k n

k=na

1 · (N − 1)!

k N −1 e− n − e−s F (N) e−s − F (N ) e−x de−s .

x

⎛

Therefore

G∗n (f, x) − f (x) ⎝

nb

⎞ Φ (nx − k)⎠ =

k=na N F (j) (e−x )

j!

j=1

G∗n

j e−· − e−x , x + Un (x) ,

where

nb

Un (x) :=

Φ (nx − k) μ,

k=na

with e− nk k N −1 1 μ := e− n − w F (N) (w) − F (N ) e−x dw. (N − 1)! e−x Case of nk − x ≤ n1α . k i) Subcase of x ≥ nk . I.e. e− n ≥ e−x .

1 |μ| ≤ (N − 1)! 1 (N − 1)!

k

e− n

e−x

−k n

e

e−x

k

e− n − w

k

e− n − w

N −1 (N ) (w) − F (N) e−x dw ≤ F

N−1

ω1 F (N) , w − e−x dw ≤

k e− n k N −1 1 (N) − n −x ω1 F , e −e e− n − w dw ≤ (N − 1)! e−x k N e− n − e−x k (N ) −a ω1 F , e x − ≤ N! n k

18

1 Univariate Sigmoidal Neural Network Quantitative Approximation

x − k N k n ω1 F (N ) , e−a x − ≤ N! n e−aN e−a ω1 F (N) , α . αN N !n n

−aN

e

Hence when x ≥

k n

we have −a e−aN (N ) e |μ| ≤ ω1 F , α . N !nαN n

ii) Subcase of |μ| ≤

k

≥ x. Then e− n ≤ e−x and

k n

1 (N − 1)!

1 (N − 1)!

e−x

−k e n

e−x k

e− n

k

w − e− n

N −1 (N ) (w) − F (N ) e−x dw ≤ F

N −1 k w − e− n ω1 F (N ) , w − e−x dw ≤

1 k ω1 F (N ) , e−x − e− n (N − 1)!

e−x k

e− n

k

w − e− n

N −1

dw ≤

e−x − e− nk N 1 k ω1 F (N ) , e−a x − ≤ (N − 1)! n N N −a 1 k (N) e −aN ω1 F , α e x − n ≤ N! n −aN −a 1 e (N ) e ω1 F , α , N! n nαN

i.e.

−a e−aN (N ) e ω F , , 1 N !nαN nα ≥ x. So in general when nk − x ≤ n1α we proved that |μ| ≤

when

k n

|μ| ≤

−a e−aN (N ) e ω F , . 1 N !nαN nα

Also we observe: i)’ When nk ≤ x, we get ⎛ ⎞ e− nk N −1 1 k ⎝ |μ| ≤ e− n − w dw⎠ 2 F (N ) (N − 1)! ∞ −x e

1.3 Real Neural Network Quantitative Approximations

19

k N (N) −n −x e − e N 2 F 2 F (N) ∞ −a ∞ = ≤ e − e−b . (N − 1)! N N! ≥ x, we obtain ' −x ( e N −1 k 1 (N) −n |μ| ≤ w − e dw 2 F −k (N − 1)! ∞ e n

ii)’ When

k n

N N 2 F (N ) ∞ −x 2 F (N ) ∞ −a k −n = e −e ≤ e − e−b . N! N! We proved always true that N 2 e−a − e−b F (N ) ∞ |μ| ≤ . N! Consequently we ﬁnd nb

|Un (x)| ≤

nb

⎧ ⎨ ⎩

Φ (nx − k) |μ| +

kk = na

− x ≤ 1α n

⎧ ⎨ ⎩

n

−a e−aN (N ) e ω F , 1 N !nαN nα ⎛

'

Φ (nx − k) |μ| ≤

kk = na

− x > 1α n

∞

n

( Φ (nx − k) +

k=−∞

' (⎜ ⎜ −a −b N (N ) ⎜ 2 e −e F ∞ ⎜ ⎜⎧ N! ⎜⎨ ⎝

⎞ ⎟ ⎟ ⎟ Φ (nx − k)⎟ ⎟≤ ⎟ ⎠

∞

k = −∞ |k − nx| > n1−α ' ( N 2 e−a − e−b F (N ) ⎩

−a e−aN (N ) e ω F , + 1 N !nαN nα

∞

N!

(3.1992) e−n

(1−α)

.

So we have proved that N −a (6.3984) e−a − e−b F (N) ∞ −n(1−α) e−aN (N ) e |Un (x)| ≤ ω1 F , α + e . N !nαN n N! We also notice that j j ∗ Gn e−· − e−x , x ≤ G∗n e−· − e−x , x ≤

20

1 Univariate Sigmoidal Neural Network Quantitative Approximation

e−aj G∗n

⎛ ⎞ j nb k j |· − x| , x = e−aj ⎝ Φ (nx − k) − x ⎠ = e−aj · n k=na

⎡

⎤

⎢ ⎥ ⎢ j j ⎥ nb nb ⎢ k k ⎥ ⎢ Φ(nx−k) −x + Φ(nx−k) −x ⎥ ⎢⎧ ⎥≤ n n ⎧ ⎢⎨ ⎥ ⎨ ⎣ k = na

⎦ k = na

⎩ x− k ≤ 1 ⎩ x− k > 1 α α n

n

n

⎡

n

⎤

⎢ ⎥ ⎢ ⎥ nb ⎢ ⎥ 1 j −aj ⎢ e Φ (nx − k)⎥ ⎢ nαj + (b − a) ⎧ ⎥≤ ⎢ ⎥ ⎨ ⎣ ⎦ k =k na

⎩ x − > 1 α n n (1−α) 1 j e−aj αj + (3.1992) (b − a) e−n . n Thus we have established 1 ∗ j −n(1−α) −· −x j −aj ,x ≤ e + (3.1992) (b − a) e , Gn e − e nαj for j = 1, ..., N , and the theorem. We continue with Theorem 1.11. Let f ∈ C N − π2 + ε, π2 − ε , n, N ∈ N, 0 < ε < π2 , ε small, x ∈ − π2 + ε, π2 − ε , 0 < α < 1. Then 1) N −1 (j) f ◦ sin (sin x) j Gn (f, x) − Gn (sin · − sin x) , x − f (x) ≤ j! j=1 ⎡ (5.250312578) ⎣

ω1

(N ) 1 f ◦ sin−1 , nα nαN N !

(1.34) +

⎤ (N ) ⎞ (3.1992) 2N +1 f ◦ sin−1 (1−α) ∞ ⎠ −n ⎝ ⎦, e N! ⎛

2) |Gn (f, x) − f (x)| ≤ (5.250312578) ·

1.3 Real Neural Network Quantitative Approximations

21

⎧ (j) N f ◦ sin−1 ⎨ (sin x) 1 j −n(1−α) + (3.1992) (π − 2ε) e + ⎩ j! nαj j=1 ⎡ ⎣

ω1

(N ) 1 f ◦ sin−1 , nα N !nαN

+

N+1

(3.1992) 2 N!

⎤⎫ (1−α) ⎬ (N ) ⎦ , e−n f ◦ sin−1 ⎭ ∞

(1.35) 3) assume further f (j) (x0 ) = 0, j = 1, ..., N for some x0 ∈ − π2 + ε, π2 − ε , it holds |Gn (f, x0 ) − f (x0 )| ≤ (5.250312578) · (1.36) ⎡ ⎛ ⎞ ⎤ (N ) (N) 1 (3.1992) 2N+1 f ◦ sin−1 ω1 f ◦ sin−1 , nα ∞ ⎠ −n(1−α) ⎦ ⎣ +⎝ e . nαN N ! N! Notice in the last the high speed of convergence of order n−α(N +1) . Proof. Call F := f ◦ sin−1 and let nk , x ∈ − π2 + ε, π2 − ε . Then j N k F (j) (sin x) k f − f (x) = sin − sin x + n j! n j=1 1 (N − 1)! Hence

N −1 k sin − sin s F (N ) (sin s) − F (N ) (sin x) d sin s. n

k n

x nb

k=na

f

nb k Φ (nx − k) − f (x) Φ (nx − k) = n k=na

nb N F (j) (sin x)

j!

j=1

k n

x

j nb k 1 Φ(nx−k) sin −sin x + Φ (nx−k) · n (N −1)!

k=na

k=na

N −1 k sin − sin s F (N ) (sin s) − F (N ) (sin x) d sin s. n

Set here a = − π2 + ε, b =

π 2

− ε. Thus nb

G∗n

(f, x) − f (x)

Φ (nx − k) =

k=na N F (j) (sin x) j=1

j!

j G∗n (sin · − sin x) , x + Mn (x) ,

22

1 Univariate Sigmoidal Neural Network Quantitative Approximation

where

nb

Mn (x) :=

Φ (nx − k) ρ,

k=na

with N −1 nk 1 k ρ := sin − sin s F (N ) (sin s) − F (N ) (sin x) d sin s. (N − 1)! x n Case of nk − x ≤ n1α . i) Subcase of nk ≥ x. The function sin is increasing on [a, b] , i.e. sin nk ≥ sin x. Then N−1 nk 1 k (N ) |ρ| ≤ sin − sin s (sin s) − F (N) (sin x) d sin s = F (N − 1)! x n N −1 k sin − w F (N ) (w) − F (N ) (sin x) dw ≤ n sin x N −1 sin nk 1 k sin − w ω1 F (N ) , |w − sin x| dw ≤ (N − 1)! sin x n N sin nk − sin x k (N ) ω1 F , sin − sin x ≤ n N! N k 1 (N ) k (N ) 1 n −x ω1 F , − x ≤ ω1 F , α . n N! n nαN N !

1 (N − 1)!

So if

k n

sin

k n

≥ x, then

ii) Subcase of 1 |ρ| ≤ (N − 1)! 1 (N − 1)!

1 (N ) 1 |ρ| ≤ ω1 F , α . n N !nαN ≤ x, then sin nk ≤ sin x. Hence

k n

x

k sin s − sin n

k n

1 (N − 1)!

sin x sin

w − sin

k n

sin x

sin

k n

k n

w − sin

N −1 (N ) (sin s) − F (N ) (sin x) d sin s = F N −1 (N ) (w) − F (N ) (sin x) dw ≤ F k n

N−1

ω1 F (N) , |w − sin x| dw ≤

sin x N −1 1 k k (N ) ω1 F , sin x − sin w − sin dw ≤ k (N − 1)! n n sin n

1.3 Real Neural Network Quantitative Approximations

23

N 1 k sin x − sin kk (N ) ω1 F , x − ≤ (N − 1)! n N N 1 k 1 (N ) 1 (N ) 1 ω1 F , α x − ≤ αN ω1 F , α . N! n n n N! n We got for

k n

≤ x that 1 (N ) 1 |ρ| ≤ ω1 F , α . n nαN N !

So in both cases we proved that 1 (N ) 1 |ρ| ≤ ω1 F , α , n nαN N ! when nk − x ≤ n1α . Also in general ( nk ≥ x case) ' k ( N −1 n 1 k |ρ| ≤ sin − sin s d sin s 2 F (N ) = (N − 1)! n ∞ x ' N −1 ( k sin n 1 k sin − w dw 2 F (N ) = (N − 1)! n ∞ sin x N 1 k 2N +1 (N ) sin − sin x 2 F (N ) ≤ F . N! n N! ∞ ∞ ≤ x) we get ' ( N −1 x 1 k |ρ| ≤ sin s − sin d sin s 2 F (N ) = k (N − 1)! n ∞ n ' ( N −1 sin x 1 k w − sin dw 2 F (N) = k (N − 1)! n ∞ sin n N sin x − sin nk 1 2N +1 (N ) 2 F (N ) ≤ F . (N − 1)! N N! ∞ ∞ So we obtained in general that Also (case of

k n

|ρ| ≤ Therefore we derive

2N +1 (N ) F . N! ∞

nb

|Mn (x)| ≤

k=na (k:|x− nk |≤ n1α )

nb

Φ (nx − k) |ρ| +

k=na (k:|x− nk |> n1α )

Φ (nx − k) |ρ| ≤

24

1 Univariate Sigmoidal Neural Network Quantitative Approximation

'

( ω1 F (N ) , n1α (3.1992) 2N +1 (N ) −n(1−α) + . F e N !nαN N! ∞

So that |Mn (x)| ≤

ω1 F (N) , n1α (3.1992) 2N+1 (N ) −n(1−α) + . F e αN N !n N! ∞

Next we estimate ∗ j j Gn (sin · − sin x) , x ≤ G∗n |sin · − sin x| , x ≤ j nb k j G∗n |· − x| , x = Φ (nx − k) − x ≤ n k=na

(work as before) (1−α) 1 j + (3.1992) (π − 2ε) e−n . αj n

Therefore (1−α) 1 ∗ j j , Gn (sin · − sin x) , x ≤ αj + (3.1992) (π − 2ε) e−n n j = 1, ..., N. The theorem is proved. We ﬁnally give Theorem 1.12. Let f ∈ C N ([ε, π−ε]), n, N ∈ N, ε > 0 small, x ∈ [ε, π − ε], 0 < α < 1. Then 1) N −1 (j) f ◦ cos (cos x) j Gn (f, x) − ≤ G (cos · − cos x) , x − f (x) n j! j=1 ⎡ (5.250312578) ⎣

(N ) 1 ω1 f ◦ cos−1 , nα nαN N !

(1.37) +

⎤ (N ) ⎞ (3.1992) 2N +1 f ◦ cos−1 (1−α) ∞ ⎠ −n ⎝ ⎦, e N! ⎛

2) |Gn (f, x) − f (x)| ≤ (5.250312578) ·

1.3 Real Neural Network Quantitative Approximations

25

⎧ (j) N f ◦ cos−1 ⎨ (cos x) 1 j −n(1−α) + (3.1992) (π − 2ε) e + ⎩ j! nαj j=1 ⎡ ⎣

ω1

f ◦ cos−1

(N )

, n1α

N !nαN

⎤⎫ ⎬ N+1 (1−α) (3.1992) 2 (N ) ⎦ , + e−n f ◦ cos−1 ⎭ N! ∞

(1.38) 3) assume further f (j) (x0 ) = 0, j = 1, ..., N for some x0 ∈ [ε, π − ε], it holds |Gn (f, x0 ) − f (x0 )| ≤ (5.250312578) · (1.39) ⎡ ⎛ ⎞ ⎤ (N ) (N ) 1 (3.1992) 2N+1 f ◦ cos−1 ω1 f ◦ cos−1 , nα ∞ ⎠ −n(1−α) ⎦ ⎣ +⎝ e . nαN N ! N! Notice in the last the high speed of convergence of order n−α(N +1) . Proof. Call F := f ◦ cos−1 and let

k ,x n

∈ [ε, π − ε]. Then

j N k F (j) (cos x) k f − f (x) = cos − cos x + n j! n j=1 1 (N − 1)! Hence

k n

cos

x

f

k=na

j!

k n

k=na

cos

x

F (N ) (cos s) − F (N ) (cos x) d cos s.

k=na

nb N F (j) (cos x) j=1

N −1

nb k Φ (nx − k) − f (x) Φ (nx − k) = n

nb

k − cos s n

j nb k 1 Φ (nx−k) cos −cos x + Φ (nx−k) · n (N −1)! k=na

k − cos s n

N −1

F (N ) (cos s) − F (N ) (cos x) d cos s.

Set here a = ε, b = π − ε. Thus nb

G∗n (f, x) − f (x)

Φ (nx − k) =

k=na N F (j) (cos x) j=1

j!

j G∗n (cos · − cos x) , x + Θn (x) ,

26

1 Univariate Sigmoidal Neural Network Quantitative Approximation

where

nb

Θn (x) :=

Φ (nx − k) λ,

k=na

with 1 λ := (N −1)!

N −1 k cos −cos s F (N) (cos s) − F (N ) (cos x) d cos s = n

k n

x

N−1 cos nk 1 k cos − w F (N ) (w) − F (N ) (cos x) dw. (N − 1)! cos x n k 1 Case of n − x ≤ nα . i) Subcase of nk ≥ x. The function cos ine is decreasing on [a, b] , i.e. cos nk ≤ cos x. Then N−1 cos x 1 k |λ| ≤ w − cos | F (N) (w) − F (N ) (cos x) |dw ≤ (N − 1)! cos nk n N −1 k w − cos ω1 F (N ) , |w − cos x| dw ≤ k n cos n N cos x − cos nk k (N ) ω1 F , cos x − cos ≤ n N! N k x − nk 1 1 (N ) ω1 F , x − ≤ ω1 F (N ) , α . n N! n nαN N !

1 (N − 1)!

So if

k n

cos x

≥ x, then

ii) Subcase of

1 1 |λ| ≤ ω1 F (N) , α . n nαN N !

k n

1 |λ| ≤ (N − 1)!

≤ x, then cos nk ≥ cos x. Hence cos

k n

cos x

N −1 k (N ) cos − w (w) − F (N ) (cos x) dw ≤ F n

N −1 k cos − w ω1 F (N ) , w − cos x dw ≤ n cos x cos k N−1 n 1 k k ω1 F (N ) , cos − cos x cos − w dw ≤ (N − 1)! n n cos x N cos nk − cos x 1 (N ) k ω1 F , − x ≤ (N − 1)! n N 1 (N − 1)!

cos

k n

1.3 Real Neural Network Quantitative Approximations

27

N k 1 1 1 (N ) k (N ) 1 ω1 F , − x − x ≤ ω1 F , α . αN N! n n N! n n k n

We proved for

≤ x that 1 (N) 1 |λ| ≤ ω1 F , α . n N !nαN

So in both cases we got that 1 (N) 1 |λ| ≤ ω1 F , α , n nαN N !

when nk − x ≤ n1α . Also in general ( nk ≥ x case) ' N −1 ( cos x 1 k |λ| ≤ w − cos dw 2 F (N ) ≤ k (N − 1)! n ∞ cos n 1 N!

N k 2N +1 (N ) cos x − cos 2 F (N) ≤ F . n N! ∞ ∞

≤ x) we obtain ' N −1 ( k cos n 1 k |λ| ≤ cos − w dw 2 F (N) = (N − 1)! n ∞ cos x

Also (case of

k n

1 N!

N k 2N +1 (N ) cos − cos x 2 F (N) ≤ F . n N! ∞ ∞

So we proved in general that |λ| ≤

2N +1 (N ) F . N! ∞

Therefore we derive nb

|Θn (x)| ≤

k=na (k:|x− nk |≤ n1α )

nb

Φ (nx − k) |λ| +

Φ (nx − k) |λ| ≤

k=na (k:|x− nk |> n1α )

1 2N +1 (N ) −n(1−α) (N) 1 ω1 F , α + (3.1992) . F e n nαN N ! N! ∞

So that ω1 F (N ) , n1α 2N +1 (N ) −n(1−α) |Θn (x)| ≤ + (3.1992) . F e nαN N ! N! ∞

28

1 Univariate Sigmoidal Neural Network Quantitative Approximation

Next we estimate ∗ j j Gn (cos · − cos x) , x ≤ G∗n |cos · − cos x| , x ≤ G∗n

j nb k |· − x| , x = Φ (nx − k) − x ≤ n j

k=na

(work as before) (1−α) 1 j + (3.1992) (π − 2ε) e−n . nαj

Therefore (1−α) 1 ∗ j j , Gn (cos · − cos x) , x ≤ αj + (3.1992) (π − 2ε) e−n n j = 1, ..., N. The theorem is proved.

1.4

Complex Neural Network Quantitative Approximations

We make Remark 1.13. Let X := [a, b], √ R and f : X → C with real and imaginary parts f1 , f2 : f = f1 + if2 , i = −1. Clearly f is continuous iﬀ f1 and f2 are continuous. Also it holds (j) (j) f (j) (x) = f1 (x) + if2 (x) , (1.40) for all j = 1, ..., N , given that f1 , f2 ∈ C N (X), N ∈ N. We denote by CB (R, C) the space of continuous and bounded functions f : R → C. Clearly f is bounded, iﬀ both f1 , f2 are bounded from R into R, where f = f1 + if2 . Here we deﬁne Gn (f, x) := Gn (f1 , x) + iGn (f2 , x) ,

(1.41)

Gn (f, x) := Gn (f1 , x) + iGn (f2 , x) .

(1.42)

and We observe here that |Gn (f, x) − f (x)| ≤ |Gn (f1 , x) − f1 (x)| + |Gn (f2 , x) − f2 (x)| ,

(1.43)

1.4 Complex Neural Network Quantitative Approximations

29

and Gn (f ) − f ∞ ≤ Gn (f1 ) − f1 ∞ + Gn (f2 ) − f2 ∞ . Similarly we get Gn (f, x) − f (x) ≤ Gn (f1 , x) − f1 (x) + Gn (f2 , x) − f2 (x) ,

(1.44)

(1.45)

and Gn (f ) − f

∞

≤ Gn (f1 ) − f1 ∞ + Gn (f2 ) − f2 ∞ .

(1.46)

We give Theorem 1.14. Let f ∈ C ([a, b] , C) , f = f1 + if2 , 0 < α < 1, n ∈ N, x ∈ [a, b]. Then i) |Gn (f, x) − f (x)| ≤ (5.250312578) · (1.47) (1−α) 1 1 ω1 f1 , α +ω1 f2 , α +(6.3984) (f1 ∞ +f2 ∞ ) e−n =: ψ1 , n n and ii) Gn (f ) − f ∞ ≤ ψ1 .

(1.48)

Proof. Based on Remark 1.13 and Theorem 1.6. We give Theorem 1.15. Let f ∈ CB (R, C), f = f1 + if2 , 0 < α < 1, n ∈ N, x ∈ R. Then i) Gn (f, x) − f (x) ≤ ω1 f1 , 1 + ω1 f2 , 1 + (1.49) nα nα (1−α)

(6.3984) (f1 ∞ + f2 ∞ ) e−n ii)

Gn (f ) − f

∞

≤ ψ2 .

=: ψ2 , (1.50)

Proof. Based on Remark 1.13 and Theorem 1.7. Next we present a result of high order complex neural network approximation. Theorem 1.16. Let f : [a, b] → C, [a, b] ⊂ R, such that f = f1 +if2. Assume f1 , f2 ∈ C N ([a, b]) , n, N ∈ N, 0 < α < 1, x ∈ [a, b]. Then i) |Gn (f, x) − f (x)| ≤ (5.250312578) · (1.51)

30

1 Univariate Sigmoidal Neural Network Quantitative Approximation

⎧ N (f (j) (x) + f (j) (x)) ⎨ 1 2 1 j −n(1−α) + (3.1992) (b − a) e + ⎩ j! nαj j=1 ⎡ ⎣ ⎛ ⎝

(N ) (N ) (ω1 f1 , n1α + ω1 f2 , n1α ) nαN N !

+

(N ) (6.3984) f1

⎞ ⎤⎫ (N ) ⎬ + f2 (b − a)N (1−α) ∞ ∞ ⎠ e−n ⎦ , ⎭ N!

(j)

(j)

ii) assume further f1 (x0 ) = f2 (x0 ) = 0, j = 1, ..., N , for some x0 ∈ [a, b], it holds |Gn (f, x0 ) − f (x0 )| ≤ (5.250312578) · (1.52) ⎡ (N ) (N ) (ω1 f1 , n1α + ω1 f2 , n1α ) ⎣ + nαN N ! ⎛ ⎝

(N ) (6.3984) f1

⎞ ⎤ (N ) N + f2 (b − a) (1−α) ∞ ∞ ⎠ e−n ⎦, N!

notice here the extremely high rate of convergence at n−(N+1)α , iii) Gn (f ) − f ∞ ≤ (5.250312578) · ⎧ (j) (j) N ⎨ f1 + f2 1 j −n(1−α) ∞ ∞ + (3.1992) (b − a) e + ⎩ j! nαj j=1 ⎡ (N ) (N ) ω1 f1 , n1α + ω1 f2 , n1α ⎣ + nαN N !

(

(N ) (6.3984) f1

∞

(N ) N + f2 (b − a) ∞

N!

Proof. Based on Remark 1.13 and Theorem 1.8.

(1−α)

)e−n

⎤⎫ ⎬ ⎦ . ⎭

(1.53)

References

[1] Anastassiou, G.A.: Rate of convergence of some neural network operators to the unit-univariate case. J. Math. Anal. Appli. 212, 237–262 (1997) [2] Anastassiou, G.A.: Quantitative Approximations. Chapman&Hall/CRC, Boca Raton (2001) [3] Anastassiou, G.A.: Basic Inequalities, Revisited. Mathematica Balkanica, New Series 24, Fasc. 1-2, 59–84 (2010) [4] Anastassiou, G.A.: Univariate sigmoidal neural network approximation. Journal of Comp. Anal. and Appl. (accepted 2011) [5] Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 39, 930–945 (1993) [6] Brauer, F., Castillo-Chavez, C.: Mathematical models in population biology and epidemiology, pp. 8–9. Springer, New York (2001) [7] Cao, F.L., Xie, T.F., Xu, Z.B.: The estimate for approximation error of neural networks: a constructive approach. Neurocomputing 71, 626–630 (2008) [8] Chen, T.P., Chen, H.: Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its applications to a dynamic system. IEEE Trans. Neural Networks 6, 911–917 (1995) [9] Chen, Z., Cao, F.: The approximation operators with sigmoidal functions. Computers and Mathematics with Applications 58, 758–765 (2009) [10] Chui, C.K., Li, X.: Approximation by ridge functions and neural networks with one hidden layer. J. Approx. Theory 70, 131–141 (1992) [11] Cybenko, G.: Approximation by superpositions of sigmoidal function. Math. of Control Signals and System 2, 303–314 (1989) [12] Ferrari, S., Stengel, R.F.: Smooth function approximation using neural networks. IEEE Trans. Neural Networks 16, 24–38 (2005) [13] Funahashi, K.I.: On the approximate realization of continuous mappings by neural networks. Neural Networks 2, 183–192 (1989) [14] Hahm, N., Hong, B.I.: An approximation by neural networks with a ﬁxed weight. Computers & Math. with Appli. 47, 1897–1903 (2004) [15] Hornik, K., Stinchombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2, 359–366 (1989) [16] Hornik, K., Stinchombe, M., White, H.: Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Networks 3, 551–560 (1990)

32

References

[17] Hritonenko, N., Yatsenko, Y.: Mathematical modeling in economics, ecology and the environment, pp. 92–93. Science Press, Beijing (2006) (reprint) [18] Leshno, M., Lin, V.Y., Pinks, A., Schocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks 6, 861–867 (1993) [19] Lorentz, G.G.: Approximation of Functions. Rinehart and Winston, New York (1966) [20] Maiorov, V., Meir, R.S.: Approximation bounds for smooth functions in C(Rd ) by neural and mixture networks. IEEE Trans. Neural Networks 9, 969–978 (1998) [21] Makovoz, Y.: Uniform approximation by neural networks. J. Approx. Theory 95, 215–228 (1998) [22] Mhaskar, H.N., Micchelli, C.A.: Approximation by superposition of a sigmoidal function. Adv. Applied Math. 13, 350–373 (1992) [23] Mhaskar, H.N., Micchelli, C.A.: Degree of approximation by neural networks with a single hidden layer. Adv. Applied Math. 16, 151–183 (1995) [24] Suzuki, S.: Constructive function approximation by three-layer artiﬁcial neural networks. Neural Networks 11, 1049–1058 (1998) [25] Xie, T.F., Zhou, S.P.: Approximation Theory of Real Functions. Hangzhou University Press, Hangzhou (1998) [26] Xu, Z.B., Cao, F.L.: The essential order of approximation for neural networks. Science in China (Ser. F) 47, 97–112 (2004)

Chapter 2

Univariate Hyperbolic Tangent Neural Network Quantitative Approximation

Here we give the univariate quantitative approximation of real and complex valued continuous functions on a compact interval or all the real line by quasi-interpolation hyperbolic tangent neural network operators. This approximation is obtained by establishing Jackson type inequalities involving the modulus of continuity of the engaged function or its high order derivative. The operators are deﬁned by using a density function induced by the hyperbolic tangent function. Our approximations are pointwise and with respect to the uniform norm. The related feed-forward neural network is with one hidden layer. This chapter relies on [4].

2.1

Introduction

The author in [1] and [2], see chapters 2-5, was the ﬁrst to present neural network approximations to continuous functions with rates by very speciﬁcally deﬁned neural network operators of Cardaliagnet-Euvrard and ”Squashing” types, by employing the modulus of continuity of the engaged function or its high order derivative, and producing very tight Jackson type inequalities. He treats there both the univariate and multivariate cases. The deﬁning these operators ”bell-shaped” and ”squashing” functions are assumed to be of compact support. Also in [2] he gives the N th order asymptotic expansion for the error of weak approximation of these two operators to a special natural class of smooth functions, see chapters 4-5 there. For this chapter the author is inspired by the article [5] by Z. Chen and F. Cao. He does related to it work and much more beyond. So the author here gives univariate hyperbolic tangent neural network approximations to continuous functions over compact intervals of the real line or over the whole R, then G.A. Anastassiou: Intelligent Systems: Approximation by ANN, ISRL 19, pp. 33–65. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com

34

2 Univariate Hyperbolic Tangent Neural Network

he extends his results to complex valued functions. All convergences here are with rates expressed via the modulus of continuity of the involved function or its high order derivative, and given by very tight Jackson type inequalities. The author here comes up with the ”right” precisely deﬁned quasiinterpolation neural network operator, associated with hyperbolic tangent function and related to a compact interval or real line. The compact intervals are not necessarily symmetric to the origin. Some of the upper bounds to error quantity are very ﬂexible and general. In preparation to establish our results we give important properties of the basic density function deﬁning our operators. Feed-forward neural networks (FNNs) with one hidden layer, the only type of networks we deal with in this chapter, are mathematically expressed as Nn (x) =

n

cj σ (aj · x + bj ) ,

x ∈ Rs ,

s ∈ N,

j=0

where for 0 ≤ j ≤ n, bj ∈ R are the thresholds, aj ∈ Rs are the connection weights, cj ∈ R are the coeﬃcients, aj · x is the inner product of aj and x, and σ is the activation function of the network. In many fundamental network models, the activation function is the hyperbolic tangent. About neural networks see [6], [7], [8].

2.2

Basic Ideas

We consider here the hyperbolic tangent function tanh x, x ∈ R : tanh x :=

ex − e−x e2x − 1 = . ex + e−x e2x + 1

It has the properties tanh 0 = 0, −1 < tanh x < 1, ∀ x ∈ R, and tanh (−x) = − tanh x. Furthermore tanh x → 1 as x → ∞, and tanh x → −1, as x → −∞, d and it is strictly increasing on R. Furthermore it holds dx tanh x = cosh1 2 x > 0. This function plays the role of an activation function in the hidden layer of neural networks. We further consider Ψ (x) :=

1 (tanh (x + 1) − tanh (x − 1)) > 0, ∀ x ∈ R. 4

We easily see thatΨ (−x) = Ψ (x), that is Ψ is even on R. Obviously Ψ is diﬀerentiable, thus continuous.

2.2 Basic Ideas

35

Proposition 2.1. Ψ (x) for x ≥ 0 is strictly decreasing. Proof. Clearly dΨ (x) 1 = dx 4 =

=

1 1 − cosh2 (x + 1) cosh2 (x − 1) cosh2 (x − 1) − cosh2 (x + 1)

4 cosh2 (x + 1) cosh2 (x − 1) cosh (x − 1) + cosh (x + 1) (cosh (x − 1) − cosh (x + 1)) . 4 cosh2 (x + 1) cosh2 (x − 1)

Suppose x − 1 ≥ 0. Clearly x − 1 < x + 1, so that 0 < cosh (x − 1) < cosh (x + 1) and (cosh (x − 1) − cosh (x + 1)) < 0, hence Ψ (x) < 0. Let now x − 1 < 0, then 1 − x > 0, and for x > 0 we have 0 < cosh (x − 1) = cosh (1 − x) < cosh (1 + x) = cosh (x + 1) . Hence (cosh (x − 1) − cosh (x + 1)) < 0, proving again Ψ (x) < 0. Also Ψ is continuous everywhere and in particular at zero. The claim is proved. Clearly Ψ (x) is strictly increasing for x ≤ 0. Also it holds lim Ψ (x) = 0 = x→−∞

lim Ψ (x) . x→∞ Infact Ψ has the bell shape with horizontal asymptote the x-axis. So the maximum of Ψ is zero, Ψ (0) = 0.3809297. ∞ Theorem 2.2. We have that i=−∞ Ψ (x − i) = 1, ∀ x ∈ R. Proof. We observe that ∞

(tanh (x − i) − tanh (x − 1 − i)) =

i=−∞ ∞

(tanh (x − i) − tanh (x − 1 − i)) +

i=0

−1

(tanh (x − i) − tanh (x − 1 − i)) .

i=−∞

Furthermore (λ ∈ Z+ ) ∞

(tanh (x − i) − tanh (x − 1 − i)) =

i=0

lim λ→∞

λ i=0

(tanh (x − i) − tanh (x − 1 − i)) =

36

2 Univariate Hyperbolic Tangent Neural Network

(telescoping sum) lim (tanh x − tanh (x − λ)) = 1 + tanh x.

λ→∞

Similarly, −1

(tanh (x − i) − tanh (x − 1 − i)) =

i=−∞ −1

lim

λ→∞

(tanh (x − i) − tanh (x − 1 − i)) =

i=−λ

lim (tanh (x + λ) − tanh x) = 1 − tanh x.

λ→∞

So adding the last two limits we obtain ∞

(tanh (x − i) − tanh (x − 1 − i)) = 2,

∀ x ∈ R.

i=−∞

Similarly we obtain ∞

(tanh (x + 1 − i) − tanh (x − i)) = 2,

∀ x ∈ R.

i=−∞

Consequently ∞

(tanh (x + 1 − i) − tanh (x − 1 − i)) = 4,

∀ x ∈ R,

i=−∞

proving the claim. Thus

∞

Ψ (nx − i) = 1,

∀ n ∈ N, ∀ x ∈ R.

i=−∞

Furthermore we give: BecauseΨ is even it holds ∞ ∈ R. i=−∞ Ψ (i − x) = 1, ∀x ∞ ∞ Hence Ψ (i + x) = 1, ∀ x ∈ R, and i=−∞ i=−∞ Ψ (x + i) = 1, ∀ x ∈ R. ∞ Theorem 2.3. It holds −∞ Ψ (x) dx = 1. Proof. We observe that ∞ ∞ Ψ (x) dx = −∞

j=−∞

j

j+1

Ψ (x) dx =

∞ j=−∞

1

Ψ (x + j) dx = 0

2.2 Basic Ideas

37

⎛ 1

⎝

0

∞

⎞ Ψ (x + j)⎠ dx =

1

1dx = 1. 0

j=−∞

So Ψ (x) is a density function on R. Theorem 2.4. Let 0 < α < 1 and n ∈ N. It holds ∞ ⎧ ⎨ ⎩

(1−α)

Ψ (nx − k) ≤ e4 · e−2n

.

k = −∞ : |nx − k| ≥ n1−α

Proof. Let x ≥ 1. That is 0 ≤ x − 1 < x + 1. Applying the mean value theorem we obtain Ψ (x) =

1 1 1 1 ·2· = , 4 2 cosh2 ξ cosh2 ξ

for some x − 1 < ξ < x + 1. We get cosh (x − 1) < cosh ξ < cosh (x + 1) and cosh2 (x − 1) < cosh2 ξ < cosh2 (x + 1) . Therefore 1 1 0< < , 2 2 cosh ξ cosh (x − 1) and Ψ (x) <

1 1 1 4 2 = 2 2 = 2 =: (∗) . x−1 1−x x−1 2 cosh (x − 1) 2 (e +e ) (e + e1−x )

2 From ex−1 + e1−x > ex−1 we obtain ex−1 + e1−x > e2(x−1) , and 1

<

1 e2(x−1)

.

+

(∗) <

2 = 2e2 · e−2x . e2(x−1)

So that

That is we proved

e1−x )2

(ex−1

Ψ (x) < 2e2 · e−2x ,

x ≥ 1.

Thus ∞ ⎧ ⎨ ⎩

k = −∞ : |nx − k| ≥ n1−α

Ψ (nx − k) =

∞ ⎧ ⎨ ⎩

k = −∞ : |nx − k| ≥ n1−α

Ψ (|nx − k|) <

38

2 Univariate Hyperbolic Tangent Neural Network

2e

∞

2

e

⎧ ⎨

−2|nx−k|

≤ 2e

∞

2

e−2x dx

(n1−α −1)

k = −∞ : |nx − k| ≥ n1−α ∞ (1−α) (1−α) −1) = e2 e−y dy = e2 e−2(n = e4 · e−2n , ⎩

2(n1−α −1)

proving the claim. Denote by · the integral part of the number and by · the ceiling of the number. We further give Theorem 2.5. Let x ∈ [a, b] ⊂ R and n ∈ N so that na ≤ nb . It holds 1

nb k=na

Ψ (nx − k)

< 4.1488766 =

1 . Ψ (1)

Proof. We observe that 1=

nb

∞

Ψ (nx − k) >

k=−∞

Ψ (nx − k) =

k=na

nb

Ψ (|nx − k|) > Ψ (|nx − k0 |) ,

k=na

∀ k0 ∈ [ na , nb ] ∩ Z. We can choose k0 ∈ [ na , nb ] ∩ Z such that |nx − k0 | < 1. Therefore 1 (tanh (2) − tanh (0)) = 4 1 1 e2 − e−2 1 e4 − 1 tanh 2 = = = 0.2410291. 4 4 e2 + e−2 4 e4 + 1 Ψ (|nx − k0 |) > Ψ (1) =

So Ψ (1) = 0.2410291. Consequently we obtain nb

Ψ (|nx − k|) > 0.2410291 = Ψ (1) ,

k=na

and nb

1

k=na Ψ

proving the claim.

(|nx − k|)

< 4.1488766 =

1 , Ψ (1)

2.2 Basic Ideas

39

Remark 2.6. We also notice that nb

1−

na −1

Ψ (nb − k) =

k=−∞

k=na

∞

Ψ (nb − k) +

Ψ (nb − k)

k=nb+1

> Ψ (nb − nb − 1) (call ε := nb − nb , 0 ≤ ε < 1) = Ψ (ε − 1) = Ψ (1 − ε) ≥ Ψ (1) > 0. nb Therefore lim 1 − k=na Ψ (nb − k) > 0. n→∞ Similarly, nb

1−

na −1

Ψ (na − k) =

k=−∞

k=na

∞

Ψ (na − k) +

Ψ (na − k)

k=nb+1

> Ψ (na − na + 1) (call η := na − na, 0 ≤ η < 1) = Ψ (1 − η) ≥ Ψ (1) > 0. nb Therefore again lim 1 − k=na Ψ (na − k) > 0. n→∞ Therefore we ﬁnd that nb

lim

n→∞

Ψ (nx − k) = 1,

k=na

for at least some x ∈ [a, b]. Deﬁnition 2.7. Let f ∈ C ([a, b]) and n ∈ N such that na ≤ nb . We introduce and deﬁne the positive linear neural network operator nb Fn (f, x) :=

k k=na f n nb k=na Ψ

Ψ (nx − k)

(nx − k)

, x ∈ [a.b] .

For large enough n we always obtain na ≤ nb . Also a ≤ na ≤ k ≤ nb .

(2.1) k n

≤ b, iﬀ

We study here the pointwise and uniform convergence of Fn (f, x) to f (x) with rates. For convenience we call nb k ∗ Fn (f, x) := f Ψ (nx − k) , (2.2) n k=na

40

2 Univariate Hyperbolic Tangent Neural Network

that is

F ∗ (f, x) Fn (f, x) := nb n . k=na Ψ (nx − k)

So that,

(2.3)

F ∗ (f, x) Fn (f, x) − f (x) = nb n − f (x) Ψ (nx − k) k=na Fn∗ (f, x) − f (x) = nb

nb

k=na Ψ

k=na

Ψ (nx − k)

(nx − k)

.

(2.4)

Consequently we derive nb 1 ∗ . |Fn (f, x) − f (x)| ≤ F (f, x) − f (x) Ψ (nx − k) n Ψ (1) k=na

(2.5)

That is

nb k |Fn (f, x) − f (x)| ≤ (4.1488766) f − f (x) Ψ (nx − k) . n k=na (2.6) We will estimate the right hand side of (2.6). For that we need, for f ∈ C ([a, b]) the ﬁrst modulus of continuity ω1 (f, h) :=

sup |f (x) − f (y)| , h > 0. x, y ∈ [a, b] |x − y| ≤ h

(2.7)

Similarly it is deﬁned for f ∈ CB (R) (continuous and bounded on R). We have that lim ω1 (f, h) = 0. h→0

Deﬁnition 2.8. When f ∈ CB (R) we deﬁne, F n (f, x) :=

∞ k=−∞

k f Ψ (nx − k) , n ∈ N, x ∈ R, n

(2.8)

the quasi-interpolation neural network operator. By [3] we derive the following three theorems on extended Taylor formula. π π π N Theorem 2.9. Let N ∈ N, 0 < ε < small, and f ∈ C − + ε, − ε ; 2 2 2 x, y ∈ − π2 + ε, π2 − ε . Then (k) N f ◦ sin−1 (sin y) k f (x) = f (y) + (sin x − sin y) + KN (y, x) , (2.9) k! k=1

2.2 Basic Ideas

41

where KN (y, x) =

x

(sin x − sin s)

1 · (N − 1)!

(2.10)

(N ) (N ) f ◦ sin−1 (sin s)− f ◦ sin−1 (sin y) cos s ds.

N−1

y

Theorem 2.10. Let f ∈ C N ([ε, π − ε]), N ∈ N, ε > 0 small; x, y ∈ [ε, π − ε]. Then (k) N f ◦ cos−1 (cos y) k ∗ f (x) = f (y) + (cos x − cos y) + KN (y, x) , (2.11) k! k=1

where ∗ KN (y, x) = −

x

(cos x−cos s)N −1

f ◦ cos−1

(N )

1 · (N − 1)!

(2.12)

(N ) (cos s)− f ◦ cos−1 (cos y) sin s ds.

y

Theorem 2.11. Let f ∈ C N ([a, b]) (or f ∈ C N (R)), N ∈ N; x, y ∈ [a, b] (or x, y ∈ R). Then

f (x) = f (y) +

(k) N f ◦ ln 1e (e−y ) k=1

k!

e−x − e−y

k

+ K N (y, x) ,

(2.13)

where

1 · (2.14) (N − 1)! x (N ) (N ) −x −s −s N−1 −s −y e −e f ◦ ln 1e e − f ◦ ln 1e e e ds. K N (y, x) = −

y

Remark 2.12. Using the mean value theorem we obtain |sin x − sin y| ≤ |x − y| , |cos x − cos y| ≤ |x − y| ,

(2.15) ∀ x, y ∈ R,

furthermore we have |sin x − sin y| ≤ 2, |cos x − cos y| ≤ 2, Similarly we get and

(2.16) ∀ x, y ∈ R.

−x e − e−y ≤ e−a |x − y| , −x e − e−y ≤ e−a − e−b ,

∀ x, y ∈ [a, b] .

(2.17) (2.18)

42

2 Univariate Hyperbolic Tangent Neural Network

Let g (x) = ln 1e x, sin−1 x, cos−1 x and assume f (j) (x0 ) = 0, k = 1, ..., N . (j) Then, by [3], we get f ◦ g −1 (g (x0 )) = 0, j = 1, ..., N. x m Remark 2.13. It is well known that e > x , m ∈ N, for large x > 0. Let ﬁxed α, β > 0, then α ∈ N, and for large x > 0 we have β α α ex > x β ≥ x β .

So for suitable very large x > 0 we ﬁnd α β ex > xβ β = xα . We proved for large x > 0 and α, β > 0 that β

ex > xα .

(2.19)

Therefore for large n ∈ N and ﬁxed α, β > 0, we have β

That is

e2n > nα .

(2.20)

e−2n < n−α , for large n ∈ N.

(2.21)

So for 0 < α < 1 we get

β

(1−α)

e−2n

< n−α .

(2.22) (1−α)

Thus be given ﬁxed A, B > 0, for the linear combination An−α +Be−2n

the (dominant) rate of convergence to zero is n−α . The closer α is to 1 we get faster and better rate of convergence to zero.

2.3

Real Neural Network Quantitative Approximations

Here we give a series of neural network approximation to a function given with rates. We ﬁrst present Theorem 2.14. Let f ∈ C ([a, b]) , 0 < α < 1, n ∈ N, x ∈ [a, b]. Then i) (1−α) 1 |Fn (f, x) − f (x)| ≤ (4.1488766) ω1 f, α + 2e4 f ∞ e−2n =: λ∗ , n (2.23) and ii) Fn (f ) − f ∞ ≤ λ∗ , (2.24) where ·∞ is the supremum norm.

2.3 Real Neural Network Quantitative Approximations

43

Proof. We see that nb k f − f (x) Ψ (nx − k) ≤ n k=na k f − f (x) Ψ (nx − k) = n

nb

k=na nb

⎧ ⎨ ⎩

k k = na

− x ≤ 1α n n nb

⎧ ⎨ ⎩

kk = na

− x > 1α n n nb

⎧ ⎨ ⎩

kk = na

− x ≤ 1α n n

f k − f (x) Ψ (nx − k) + n

f k − f (x) Ψ (nx − k) ≤ n

k ω1 f, − x Ψ (nx − k) + n

nb

2 f ∞

⎧ ⎨ ⎩

1 ω1 f, α n

k = na

|k − nx| > n1−α ∞ ⎧ ⎨ ⎩

2 f ∞

∞ ⎧ ⎨

Ψ (nx − k) ≤

Ψ (nx − k) +

k k = −∞ 1 − x ≤ α n

n

Ψ (nx − k)

≤

(by Theorem 2.4)

k = −∞ |k − nx| > n1−α (1−α) 1 ω1 f, α + 2 f ∞ e4 e−2n n (1−α) 1 = ω1 f, α + 2e4 f ∞ e−2n . n

⎩

44

2 Univariate Hyperbolic Tangent Neural Network

That is nb k ≤ ω1 f, 1 + 2e4 f e−2n(1−α) . f − f (x) Ψ (nx − k) ∞ n nα k=na Using (2.6) we prove the claim. Next we give Theorem 2.15. Let f ∈ CB (R), 0 < α < 1, n ∈ N, x ∈ R. Then i) F n (f, x) − f (x) ≤ ω1 f, 1 + 2e4 f e−2n(1−α) =: μ, ∞ nα and ii)

F n (f ) − f ≤ μ. ∞

(2.25)

(2.26)

Proof. We see that

∞ ∞ k F n (f, x) − f (x) = f Ψ (nx − k) − f (x) Ψ (nx − k) = n k=−∞

k=−∞

∞ ∞ k k Ψ (nx − k) = f − f (x) Ψ (nx − k) ≤ f − f (x) n n k=−∞

k=−∞

f k − f (x) Ψ (nx − k) + n

∞ ⎧ ⎨ ⎩

k k = −∞ 1 − x ≤ α n

n

f k − f (x) Ψ (nx − k) ≤ n

∞ ⎧ ⎨ ⎩

k k = −∞ 1 − x > α n

n

k ω1 f, − x Ψ (nx − k) + n

∞ ⎧ ⎨ ⎩

k k = −∞ 1 − x ≤ α n

n

2 f ∞

∞ ⎧ ⎨ ⎩

Ψ (nx − k) ≤

k k = −∞ 1 − x > α n

n

2.3 Real Neural Network Quantitative Approximations

1 ω1 f, α n ⎧ ⎨ ⎩

2 f ∞

∞

Ψ (nx − k) +

k k = −∞ 1 − x ≤ α n

∞ ⎧ ⎨

45

n

Ψ (nx − k) ≤

k = −∞ |k − nx| > n1−α (1−α) 1 ω1 f, α + 2 f ∞ e4 e−2n n (1−α) 1 = ω1 f, α + 2e4 f ∞ e−2n , n ⎩

proving the claim. In the next we discuss high order of approximation by using the smoothness of f . Theorem 2.16. Let f ∈ C N ([a, b]), n, N ∈ N, 0 < α < 1, x ∈ [a, b]. Then i) |Fn (f, x) − f (x)| ≤ (4.1488766) · (2.27) ⎧ N (j) ⎨ f (x) 1 j −2n(1−α) 4 + e (b − a) e + ⎩ j! nαj j=1

N 2e4 f (N ) ∞ (b − a) −2n(1−α) 1 (N ) 1 ω1 f , α + e , n nαN N ! N!

ii) suppose further f (j) (x0 ) = 0, j = 1, ..., N , for some x0 ∈ [a, b], it holds

|Fn (f, x0 ) − f (x0 )| ≤ (4.1488766) · (2.28)

2e4 f (N ) ∞ (b − a)N −2n(1−α) 1 1 ω1 f (N ) , α + e , n nαN N ! N!

notice here the extremely high rate of convergence at n−(N+1)α , iii) Fn (f ) − f ∞ ≤ (4.1488766) · ⎧ N (j) ⎨ f 1 j −2n(1−α) 4 ∞ + e (b − a) e + ⎩ j! nαj j=1

N 2e4 f (N ) ∞ (b − a) −2n(1−α) 1 (N ) 1 ω1 f , α + e . n nαN N ! N!

(2.29)

46

2 Univariate Hyperbolic Tangent Neural Network

Proof. Next we apply Taylor’s formula with integral remainder. We have (here nk , x ∈ [a, b]) f

j k N k − tN −1 n k f (j) (x) k = −x + f (N ) (t) − f (N ) (x) n dt. n j! n (N − 1)! x j=0

Then f

j N k f (j) (x) k Ψ (nx − k) = Ψ (nx − k) −x + n j! n j=0

k n

Ψ (nx − k) x

Hence

nb

k=na

k − tN −1 (N) (N ) f (t) − f (x) n dt. (N − 1)!

nb k f Ψ (nx − k) − f (x) Ψ (nx − k) = n k=na

nb N f (j) (x)

j!

j=1

k=na

nb

Ψ (nx − k) x

k=na

Ψ (nx − k)

k n

j k −x + n

k − tN −1 (N) (N ) f (t) − f (x) n dt. (N − 1)!

Therefore ⎛

⎞

nb

Fn∗ (f, x) − f (x) ⎝

Ψ (nx − k)⎠ =

N f (j) (x) j=1

k=na

j!

j Fn∗ (· − x) + Λn (x) ,

where

nb

Λn (x) :=

Ψ (nx − k)

k n

x

k=na

We suppose that b− a >

1 nα , which 1 −α

that is when n > (b − a) . k Thus n − x ≤ n1α or nk − x > As in [2], pp. 72-73 for γ := x

k n

k − tN −1 (N ) (N) f (t) − f (x) n dt. (N − 1)! is always the case for large enough n ∈ N, 1 nα .

f (N ) (t) − f (N ) (x)

k − tN −1 n dt, (N − 1)!

2.3 Real Neural Network Quantitative Approximations

in case of nk − x ≤

47

1 nα ,

we ﬁnd that 1 (N ) 1 |γ| ≤ ω1 f , α αN n n N!

(for x ≤ nk or x ≥ nk ). Notice also for x ≤

k n

that

k n k − tN−1 (N ) (N ) n f (t) − f (x) dt ≤ x (N − 1)! k − tN −1 (N ) (N ) (t) − f (x) n dt ≤ f (N − 1)! x k N−1 k −xN (b − a)N (N ) n −t n dt = 2 f ≤ 2 f (N ) . (N − 1)! N! N! ∞ ∞

2 f (N )

∞

k n

x k n

Next suppose

k n

≤ x, then k n k − tN−1 (N ) (N ) n f (t) − f (x) dt = x (N − 1)! x k − tN −1 (N) (N ) n f (t) − f (x) dt ≤ k (N − 1)! n

t − k N −1 (N) (N ) n (t) − f (x) dt ≤ f k (N − 1)! n k N −1 x − k N (b−a)N t− n (N ) n dt = 2 f ≤ 2 f (N ) . (N − 1)! N! N! ∞ ∞

2 f (N )

∞

x k n

Thus

x

|γ| ≤ 2 f (N )

∞

N

(b − a) , N!

in all two cases. Therefore nb

Λn (x) =

k=na | nk −x|≤ n1α

Hence nb

|Λn (x)| ≤

k=na

| nk −x|≤ n1α

nb

Ψ (nx − k) γ +

Ψ (nx − k) γ.

k=na | nk −x|> n1α

1 1 Ψ (nx − k) ω1 f (N ) , α + n N !nN α

48

2 Univariate Hyperbolic Tangent Neural Network

⎛ ⎜ ⎜ ⎜ ⎝

⎞ nb

k=na

⎟ (b − a)N ⎟ Ψ (nx − k)⎟ 2 f (N ) ≤ ⎠ N! ∞

| nk −x|> n1α (b − a)N (1−α) 1 1 ω1 f (N ) , α + 2 f (N) e4 e−2n . N α n N !n N! ∞ Consequently we have (N ) f (b − a)N (1−α) 1 (N ) 1 4 ∞ |Λn (x)| ≤ ω1 f , α + 2e e−2n . αN n n N! N! We further observe that j nb k j Fn∗ (· − x) = Ψ (nx − k) −x . n k=na

Therefore

j nb k ∗ j Ψ (nx − k) − x = Fn (· − x) ≤ n k=na

nb

⎧ ⎨ ⎩

kk = na

− x ≤ 1α n

j k Ψ (nx − k) − x + n ⎧ ⎨ ⎩

n

1 nαj

nb

kk = na

− x > 1α n

n

nb

nb

⎧ ⎨ ⎩

Ψ (nx − k) + (b − a)

kk = na

− x ≤ 1α n

j ⎧ ⎨ ⎩

n

≤ Hence

j k Ψ (nx − k) − x ≤ n

Ψ (nx − k)

k = na

|k − nx| > n1−α

(1−α) 1 j + (b − a) e4 e−2n . nαj

(1−α) 1 ∗ j j , Fn (· − x) ≤ αj + (b − a) e4 e−2n n

for j = 1, ..., N. Putting things together we have proved |Fn∗

N (j) f (x) 1 j −2n(1−α) 4 (f, x) − f (x)| ≤ + e (b − a) e + j! nαj j=1

2.3 Real Neural Network Quantitative Approximations

49

N 2e4 f (N ) ∞ (b − a) −2n(1−α) 1 (N ) 1 ω1 f , α + e , n nαN N ! N!

that is establishing theorem. We make Remark 2.17. We notice that Fn (f, x) −

N f (j) (x) j=1

Fn∗ (f, x) nb k=na Ψ (nx

j!

Fn (· − x)j − f (x) =

⎛ ⎞ n (j) 1 f (x) ∗ − ⎝ Fn (· − x)j ⎠ nb j! − k) j=1 k=na Ψ (nx − k) −f (x) = nb

1

· Ψ (nx − k) k=na ⎡ ⎛ ⎞ ⎛ ⎞ ⎤ nb n (j) f (x) j ⎣Fn∗ (f, x) − ⎝ Fn∗ (· − x) ⎠ − ⎝ Ψ (nx − k)⎠ f (x)⎦ . j! j=1 k=na

Therefore we ﬁnd N (j) f (x) j Fn (f, x) − ≤ (4.1488766) · F (· − x) − f (x) n j! j=1 ⎛ ⎞ ⎛ ⎞ nb n (j) ∗ f (x) ∗ j ⎠ Fn (f, x) − ⎝ , ⎝ ⎠ F (· − x) − Ψ (nx − k) f (x) n j! j=1 k=na (2.30) ∀ x ∈ [a, b] . In the next three Theorems 2.18-2.20 we present more general and ﬂexible upper bounds to our error quantities. We give Theorem 2.18. Let f ∈ C N ([a, b]), n, N ∈ N, 0 < α < 1, x ∈ [a, b]. Then 1) (j) −x N 1 f ◦ ln (e ) j e Fn e−· − e−x , x − f (x) ≤ Fn (f, x) − j! j=1

(N) e−α e−αN (4.1488766) ω1 f ◦ ln 1e , α + N !nN α n

50

2 Univariate Hyperbolic Tangent Neural Network

N 2e4 e−a − e−b N! 2)

(N ) f ◦ ln 1 e

e

−2n(1−α)

(2.31)

,

∞

|Fn (f, x) − f (x)| ≤ (4.1488766) · (2.32) ⎧ (j) ⎪ ⎪ (e−x ) N f ◦ ln 1 ⎨ e 1 j −2n(1−α) −aj 4 e + e (b − a) e + ⎪ j! nαj ⎪ ⎩ j=1

(N ) e−a e−aN 1 ω f ◦ ln , α + 1 e N !nαN n

N (N ) −2n(1−α) 2e4 e−a − e−b f ◦ ln 1 e , e N! ∞ 3) If f (j) (x0 ) = 0, j = 1, ..., N , it holds |Fn (f, x0 ) − f (x0 )| ≤ (4.1488766) · −aN (N ) e−a e ω1 f ◦ ln 1e , α + N !nN α n

N (N ) 2e4 e−a − e−b (1−α) f ◦ ln 1 e−2n . e N! ∞ Observe here the speed of convergence is extremely high at

(2.33)

1 . n(N +1)α

Proof. Call F := f ◦ ln 1e . Let x, nk ∈ [a, b]. Then N j k F (j) (e−x ) − k k −x n f − f (x) = e −e + K N x, , n j! n j=1

where KN

k n

k

e− n − e−s

k x, n

:= −

1 · (N − 1)!

N −1 F (N) e−s − F (N ) e−x e−s ds.

x

Thus

nb

k=na

nb k Ψ (nx − k) f − f (x) Ψ (nx − k) = n

nb N F (j) (e−x ) j=1

j!

k=na

k=na

nb k j Ψ (nx − k) e− n − e−x + Ψ (nx − k) k=na

1 · (N − 1)!

2.3 Real Neural Network Quantitative Approximations

k n

51

k N −1 e− n − e−s F (N) e−s − F (N ) e−x de−s .

x

⎛

Therefore

Fn∗ (f, x) − f (x) ⎝

nb

⎞ Ψ (nx − k)⎠ =

k=na N F (j) (e−x )

j!

j=1

Fn∗

where

e−· − e−x

j

, x + Un (x) ,

nb

Un (x) :=

Ψ (nx − k) μ,

k=na

with e− nk k N −1 1 μ := e− n − w F (N) (w) − F (N ) e−x dw. (N − 1)! e−x Case of nk − x ≤ n1α . k i) Subcase of x ≥ nk . I.e. e− n ≥ e−x .

1 |μ| ≤ (N − 1)! 1 (N − 1)!

k

e− n

e−x

−k n

e

e−x

k

e− n − w

k

e− n − w

N −1 (N ) (w) − F (N) e−x dw ≤ F

N−1

ω1 F (N) , w − e−x dw ≤

k e− nk k N −1 1 (N) − n −x ω1 F , e −e e− n − w dw ≤ (N − 1)! e−x k N e− n − e−x k (N ) −a ω1 F , e x − ≤ N! n N k k −aN x − n (N ) −a e ω1 F , e x − ≤ N! n −a e−aN (N) e ω F , . 1 N !nαN nα Hence when x ≥

k n

we derive |μ| ≤

−a e−aN (N ) e ω F , . 1 N !nαN nα

52

2 Univariate Hyperbolic Tangent Neural Network

ii) Subcase of |μ| ≤

k

≥ x. Then e− n ≤ e−x and

k n

1 (N − 1)!

1 (N − 1)!

e−x −k n

k

w − e− n

N −1 (N ) (w) − F (N ) e−x dw ≤ F

e

e−x

k e− n

N −1 k w − e− n ω1 F (N ) , w − e−x dw ≤

1 k ω1 F (N ) , e−x − e− n (N − 1)!

e−x k

e− n

k

w − e− n

N −1

dw ≤

e−x − e− nk N 1 k ω1 F (N ) , e−a x − ≤ (N − 1)! n N N 1 e−a k ω1 F (N) , α e−aN x − ≤ N! n n 1 e−a e−aN ω1 F (N ) , α , N! n nαN

i.e.

−a e−aN (N ) e ω F , , 1 N !nαN nα ≥ x. So in general when nk − x ≤ n1α we proved that |μ| ≤

when

k n

−a e−aN (N ) e |μ| ≤ ω1 F , α . N !nαN n Also we observe: i)’ When nk ≤ x, we obtain ⎛ ⎞ e− nk N −1 1 k ⎝ |μ| ≤ e− n − w dw⎠ 2 F (N ) (N − 1)! ∞ −x e k N − −x N 2 F (N) ∞ e n − e 2 F (N) ∞ −a = ≤ e − e−b . (N − 1)! N N! ≥ x, we get ' −x ( e N −1 1 k (N) −n |μ| ≤ w − e dw 2 F k (N − 1)! ∞ e− n

ii)’ When

k n

N N 2 F (N ) ∞ −x 2 F (N ) ∞ −a k −n = e −e ≤ e − e−b . N! N!

2.3 Real Neural Network Quantitative Approximations

53

We proved always true that N 2 e−a − e−b F (N ) ∞ |μ| ≤ . N! Consequently we derive nb

|Un (x)| ≤

nb

⎧ ⎨ ⎩

Ψ (nx − k) |μ| +

kk = na

− x ≤ 1α n

⎧ ⎨ ⎩

n

−a e−aN (N ) e ω F , 1 N !nαN nα ⎛

'

' (⎜ ⎜ −a −b N (N) ⎜ 2 e −e F ∞ ⎜ ⎜⎧ N! ⎜⎨ ⎝

Ψ (nx − k) |μ| ≤

kk = na

− x > 1α n

∞

n

( Ψ (nx − k) +

k=−∞

∞

⎞ ⎟ ⎟ ⎟ Ψ (nx − k)⎟ ⎟≤ ⎟ ⎠

k = −∞ |k − nx| > n1−α ' ( N 2 e−a − e−b F (N ) ⎩

−a e−aN (N ) e ω F , + 1 N !nαN nα

∞

N!

e4 e−2n

(1−α)

.

So we have proved that N −a 2e4 e−a − e−b F (N ) ∞ −2n(1−α) e−aN (N ) e |Un (x)| ≤ ω1 F , α + e . N !nαN n N! We also notice that j j ∗ Fn e−· − e−x , x ≤ Fn∗ e−· − e−x , x ≤ e−aj Fn∗ ⎡

⎛ ⎞ j nb k |· − x| , x = e−aj ⎝ Ψ (nx − k) − x ⎠ = e−aj · n j

k=na

⎤

⎢ ⎥ ⎢ j j ⎥ nb nb ⎢ k k ⎥ ⎢ Ψ (nx−k) −x + Ψ (nx−k) −x ⎥ ⎢⎧ ⎥≤ n n ⎧ ⎢⎨ ⎥ ⎨ ⎣ k = na

k = na

⎦ ⎩ x − k ≤ 1 ⎩ x − k > 1 α α n n n n

54

2 Univariate Hyperbolic Tangent Neural Network

⎡

⎤

⎢ ⎥ ⎢ ⎥ nb ⎢ ⎥ 1 j ⎥≤ e−aj ⎢ + (b − a) Ψ (nx − k) ⎢ nαj ⎥ ⎧ ⎢ ⎥ ⎨ ⎣ ⎦ k = na

k 1 ⎩ x − > n nα 1 j −2n(1−α) −aj 4 e + e (b − a) e . nαj Thus we have established j 1 ∗ j −2n(1−α) 4 + e (b − a) e , Fn e−· − e−x , x ≤ e−aj nαj for j = 1, ..., N , and the theorem. We continue with Theorem 2.19. Let f ∈ C N − π2 + ε, π2 − ε , n, N ∈ N, 0 < ε < π2 , ε small, x ∈ − π2 + ε, π2 − ε , 0 < α < 1. Then 1) N −1 (j) f ◦ sin (sin x) j Fn (f, x) − ≤ F (sin · − sin x) , x − f (x) n j! j=1 ⎡ (4.1488766) ⎣

(N ) 1 ω1 f ◦ sin−1 , nα nαN N !

(2.34) +

⎤ (N) ⎞ e4 2N +1 f ◦ sin−1 (1−α) ∞ ⎠ −2n ⎝ ⎦, e N! ⎛

2) |Fn (f, x) − f (x)| ≤ (4.1488766) · (2.35) ⎧ (j) N f ◦ sin−1 ⎨ (sin x) 1 j −2n(1−α) 4 + e (π − 2ε) e + ⎩ j! nαj j=1

⎡ ⎣

ω1

f ◦ sin−1

(N )

N !nαN

, n1α

+

e4 2N +1 N!

⎤⎫ ⎬ (1−α) (N) ⎦ , e−2n f ◦ sin−1 ⎭ ∞

3) suppose further f (j) (x0 ) = 0, j = 1, ..., N for some x0 ∈ − π2 + ε, π2 − ε , it holds |Fn (f, x0 ) − f (x0 )| ≤ (4.1488766) · (2.36)

2.3 Real Neural Network Quantitative Approximations

⎡ ⎣

ω1

f ◦ sin−1

(N)

, n1α

55

⎤ (N ) ⎞ e4 2N +1 f ◦ sin−1 (1−α) ∞ ⎠ −2n ⎦. +⎝ e N!

⎛

nαN N !

Notice in the last the high speed of convergence of order n−α(N +1) . Proof. Call F := f ◦ sin−1 and let nk , x ∈ − π2 + ε, π2 − ε . Then f

1 (N − 1)! Hence

j N k F (j) (sin x) k − f (x) = sin − sin x + n j! n j=1

k n

x

N −1 k sin − sin s F (N ) (sin s) − F (N ) (sin x) d sin s. n

nb

k=na

nb k f Ψ (nx − k) − f (x) Ψ (nx − k) = n k=na

nb N F (j) (sin x) j=1

j!

k=na k n

x

j nb k 1 Ψ (nx−k) sin −sin x + Ψ (nx−k) · n (N −1)! k=na

N −1 k sin − sin s F (N ) (sin s) − F (N ) (sin x) d sin s. n

Set here a = − π2 + ε, b =

π 2

− ε. Thus nb

Fn∗

(f, x) − f (x)

Ψ (nx − k) =

k=na N F (j) (sin x) j=1

j!

j Fn∗ (sin · − sin x) , x + Mn (x) ,

where

nb

Mn (x) :=

Ψ (nx − k) ρ,

k=na

with N −1 nk 1 k sin − sin s F (N ) (sin s) − F (N ) (sin x) d sin s. (N − 1)! x n k Case of n − x ≤ n1α . i) Subcase of nk ≥ x. The function sin is increasing on [a, b] , i.e. sin nk ≥ sin x. ρ :=

56

2 Univariate Hyperbolic Tangent Neural Network

Then |ρ| ≤

1 (N − 1)!

N−1 k (N ) sin − sin s (sin s) − F (N) (sin x) d sin s = F n

k n

x

N −1 k sin − w F (N ) (w) − F (N ) (sin x) dw ≤ n sin x N −1 sin nk 1 k sin − w ω1 F (N ) , |w − sin x| dw ≤ (N − 1)! sin x n N sin nk − sin x k (N ) ω1 F , sin − sin x ≤ n N! N k k −x 1 1 n ω1 F (N ) , − x ≤ ω1 F (N ) , α . n N! n nαN N !

1 (N − 1)!

So if

k n

sin

k n

≥ x, then

ii) Subcase of 1 |ρ| ≤ (N − 1)! 1 (N − 1)!

1 (N ) 1 |ρ| ≤ ω1 F , α . n N !nαN ≤ x, then sin nk ≤ sin x. Hence

k n

x

k sin s − sin n

k n

1 (N − 1)!

sin x sin

w − sin

k n

sin x

sin

k n

k n

w − sin

N −1 (N ) (sin s) − F (N ) (sin x) d sin s = F N −1 (N ) (w) − F (N ) (sin x) dw ≤ F k n

N−1

ω1 F (N) , |w − sin x| dw ≤

sin x N −1 1 k k (N ) ω1 F , sin x − sin w − sin dw ≤ k (N − 1)! n n sin n N 1 k sin x − sin kk (N ) ω1 F , x − ≤ (N − 1)! n N N 1 k 1 (N ) 1 (N ) 1 ω1 F , α x − ≤ αN ω1 F , α . N! n n n N! n We have shown for

k n

≤ x that 1 1 |ρ| ≤ ω1 F (N ) , α . αN n n N!

2.3 Real Neural Network Quantitative Approximations

57

So in both cases we got that

1 (N ) 1 |ρ| ≤ ω1 F , α , n nαN N !

when nk − x ≤ n1α . Also in general ( nk ≥ x case) ' k ( N −1 n 1 k |ρ| ≤ sin − sin s d sin s 2 F (N ) = (N − 1)! n ∞ x ' ( N −1 k sin n 1 k sin − w dw 2 F (N ) = (N − 1)! n ∞ sin x 1 N!

N k 2N +1 (N ) sin − sin x 2 F (N ) ≤ F . n N! ∞ ∞

≤ x) we derive ' ( N −1 x 1 k |ρ| ≤ sin s − sin d sin s 2 F (N ) = k (N − 1)! n ∞ n

Also (case of

k n

1 (N − 1)!

'

sin x sin

k n

k w − sin n

N −1

(

dw 2 F (N)

∞

=

N sin x − sin nk 1 2N +1 (N ) 2 F (N ) ≤ F . (N − 1)! N N! ∞ ∞ So we proved in general that |ρ| ≤

2N +1 (N ) F . N! ∞

Therefore we have proved nb

|Mn (x)| ≤

k=na (k:|x− nk |≤ n1α )

'

nb

Ψ (nx − k) |ρ| +

Ψ (nx − k) |ρ| ≤

k=na (k:|x− nk |> n1α )

( ω1 F (N ) , n1α e4 2N +1 (N) −2n(1−α) + . F e N !nαN N! ∞

So that ω1 F (N) , n1α e4 2N +1 (N ) −2n(1−α) |Mn (x)| ≤ + . F e N !nαN N! ∞

58

2 Univariate Hyperbolic Tangent Neural Network

Next we estimate ∗ j j Fn (sin · − sin x) , x ≤ Fn∗ |sin · − sin x| , x ≤ Fn∗

j nb k |· − x| , x = Ψ (nx − k) − x ≤ n j

k=na

(work as before) (1−α) 1 j + e4 (π − 2ε) e−2n . nαj

Therefore

(1−α) 1 ∗ j j , Fn (sin · − sin x) , x ≤ αj + e4 (π − 2ε) e−2n n

j = 1, ..., N. The theorem is established. We ﬁnally present Theorem 2.20. Let f ∈ C N ([ε, π − ε]), n, N ∈ N, ε > 0 small, x ∈ [ε, π − ε], 0 < α < 1. Then 1) (j) N f ◦ cos−1 (cos x) j Fn (f, x) − ≤ F (cos · − cos x) , x − f (x) n j! j=1 ⎡ (4.1488766) ⎣

(N) 1 ω1 f ◦ cos−1 , nα nαN N !

(2.37) +

⎤ (N) ⎞ e4 2N +1 f ◦ cos−1 (1−α) ∞ ⎠ −2n ⎝ ⎦, e N! ⎛

2)

⎡ ⎣

ω1

|Fn (f, x) − f (x)| ≤ (4.1488766) · (2.38) ⎧ (j) N f ◦ cos−1 ⎨ (cos x) 1 j −2n(1−α) 4 + e (π − 2ε) e + ⎩ j! nαj j=1

f ◦ cos−1

(N )

N !nαN

, n1α

+

4 N +1

e 2 N!

⎤⎫ ⎬ (1−α) (N ) ⎦ , e−2n f ◦ cos−1 ⎭ ∞

3) assume further f (j) (x0 ) = 0, j = 1, ..., N for some x0 ∈ [ε, π − ε], it holds |Fn (f, x0 ) − f (x0 )| ≤ (4.1488766) · (2.39)

2.3 Real Neural Network Quantitative Approximations

⎡ ⎣

ω1

f ◦ cos−1

(N)

, n1α

nαN N !

59

⎤ (N ) ⎞ e4 2N +1 f ◦ cos−1 (1−α) ∞ ⎠ −2n ⎦. +⎝ e N! ⎛

Notice in the last the high speed of convergence of order n−α(N +1) . Proof. Call F := f ◦ cos−1 and let f

k n, x

∈ [ε, π − ε]. Then

j N k F (j) (cos x) k − f (x) = cos − cos x + n j! n j=1

1 (N − 1)!

k n

x

Hence

nb

k=na

k cos − cos s n

j!

k n

k=na

k=na

cos

x

F (N ) (cos s) − F (N ) (cos x) d cos s.

nb k f Ψ (nx − k) − f (x) Ψ (nx − k) = n

nb N F (j) (cos x) j=1

N −1

j nb k 1 Ψ (nx−k) cos −cos x + Ψ (nx−k) · n (N −1)! k=na

k − cos s n

N −1

F (N ) (cos s) − F (N ) (cos x) d cos s.

Set here a = ε, b = π − ε. Thus nb

Fn∗ (f, x) − f (x)

Ψ (nx − k) =

k=na N F (j) (cos x) j=1

j!

j Fn∗ (cos · − cos x) , x + Θn (x) ,

where

nb

Θn (x) :=

Ψ (nx − k) λ,

k=na

with λ :=

1 (N −1)!

x

k n

N −1 k cos −cos s F (N) (cos s) − F (N ) (cos x) d cos s = n

N−1 cos nk 1 k cos − w F (N ) (w) − F (N ) (cos x) dw. (N − 1)! cos x n k Case of n − x ≤ n1α .

60

2 Univariate Hyperbolic Tangent Neural Network

i) Subcase of nk ≥ x. The function cos ine is decreasing on [a, b] , i.e. cos nk ≤ cos x. Then N −1 cos x 1 k (N ) λ ≤ w − cos (w) − F (N ) (cos x) dw ≤ F (N − 1)! cos nk n N −1 k w − cos ω1 F (N ) , |w − cos x| dw ≤ k n cos n N cos x − cos nk k (N ) ω1 F , cos x − cos ≤ n N! N k x − nk 1 1 (N ) ω1 F , x − ≤ ω1 F (N ) , α . n N! n nαN N !

1 (N − 1)!

So if

k n

cos x

≥ x, then

ii) Subcase of λ ≤

1 λ ≤ ω1 F (N) , 1 . nα nαN N !

k n

1 (N − 1)!

≤ x, then cos nk ≥ cos x. Hence

cos

k n

cos x

N −1 k (N ) cos − w (w) − F (N ) (cos x) dw ≤ F n

N −1 k cos − w ω1 F (N ) , w − cos x dw ≤ n cos x cos k N−1 n 1 k k ω1 F (N ) , cos − cos x cos − w dw ≤ (N − 1)! n n cos x N k cos − cos x 1 (N ) k n ω1 F , − x ≤ (N − 1)! n N N k k 1 1 1 1 ω1 F (N ) , − x − x ≤ ω1 F (N ) , α . N! n n N! n nαN 1 (N − 1)!

We proved for

k n

cos

k n

≤ x, that

1 λ ≤ ω1 F (N) , 1 . nα N !nαN

So in both cases we got that 1 λ ≤ ω1 F (N) , 1 , nα nαN N ! when nk − x ≤

1 . nα

2.3 Real Neural Network Quantitative Approximations

61

Also in general ( nk ≥ x case) ' N −1 ( cos x 1 k λ ≤ w − cos dw 2 F (N) ≤ k (N − 1)! n ∞ cos n N 1 k 2N +1 (N ) cos x − cos 2 F (N) ≤ F . N! n N! ∞ ∞ ≤ x) we obtain ' N −1 ( k cos n 1 k λ ≤ cos − w dw 2 F (N) = (N − 1)! n ∞ cos x N 1 k 2N +1 (N ) cos − cos x 2 F (N) ≤ F . N! n N! ∞ ∞ So we have shown in general that 2N +1 (N ) λ ≤ F . N! ∞ Also (case of

k n

Therefore we derive nb

|Θn (x)| ≤

Ψ (nx − k) λ +

k=na

nb

Ψ (nx − k) λ ≤

k=na

(k:|x− nk |≤ n1α ) (k:|x− nk |> n1α ) N +1 1 1 (N) −2n(1−α) 42 ω1 F (N) , α + e . F e n nαN N ! N! ∞ So that

N +1 ω1 F (N ) , n1α (N) −2n(1−α) 42 |Θn (x)| ≤ + e F . e nαN N ! N! ∞

Next we estimate ∗ j j Fn (cos · − cos x) , x ≤ Fn∗ |cos · − cos x| , x ≤ Fn∗

j nb k |· − x| , x = Ψ (nx − k) − x ≤ n j

k=na

(work as before)

(1−α) 1 j + e4 (π − 2ε) e−2n . αj n

Consequently, (1−α) 1 ∗ j j , Fn (cos · − cos x) , x ≤ αj + e4 (π − 2ε) e−2n n j = 1, ..., N. The theorem is proved.

62

2 Univariate Hyperbolic Tangent Neural Network

2.4

Complex Neural Network Quantitative Approximations

We make Remark 2.21. Let X := [a, b], √ R and f : X → C with real and imaginary parts f1 , f2 : f = f1 + if2 , i = −1. Clearly f is continuous iﬀ f1 and f2 are continuous. Also it holds (j) (j) f (j) (x) = f1 (x) + if2 (x) , (2.40) for all j = 1, ..., N , given that f1 , f2 ∈ C N (X), N ∈ N. We denote by CB (R, C) the space of continuous and bounded functions f : R → C. Clearly f is bounded, iﬀ both f1 , f2 are bounded from R into R, where f = f1 + if2 . Here we deﬁne Fn (f, x) := Fn (f1 , x) + iFn (f2 , x) ,

(2.41)

F n (f, x) := F n (f1 , x) + iF n (f2 , x) .

(2.42)

and We see here that |Fn (f, x) − f (x)| ≤ |Fn (f1 , x) − f1 (x)| + |Fn (f2 , x) − f2 (x)| ,

(2.43)

Fn (f ) − f ∞ ≤ Fn (f1 ) − f1 ∞ + Fn (f2 ) − f2 ∞ .

(2.44)

and

Similarly we obtain F n (f, x) − f (x) ≤ F n (f1 , x) − f1 (x) + F n (f2 , x) − f2 (x) ,

(2.45)

and F n (f ) − f ≤ F n (f1 ) − f1 + F n (f2 ) − f2 . ∞ ∞ ∞

(2.46)

We give Theorem 2.22. Let f ∈ C ([a, b] , C) , f = f1 + if2 , 0 < α < 1, n ∈ N, x ∈ [a, b]. Then i) |Fn (f, x) − f (x)| ≤ (4.1488766) · (2.47) (1−α) 1 1 ω 1 f 1 , α + ω1 f 2 , α + 2e4 (f1 ∞ + f2 ∞ ) e−2n =: Φ1 , n n and ii) Fn (f ) − f ∞ ≤ Φ1 .

(2.48)

2.4 Complex Neural Network Quantitative Approximations

63

Proof. Based on Remark 2.21 and Theorem 2.14. We give Theorem 2.23. Let f ∈ CB (R, C), f = f1 + if2 , 0 < α < 1, n ∈ N, x ∈ R. Then i) F n (f, x) − f (x) ≤ ω1 f1 , 1 + ω1 f2 , 1 + (2.49) nα nα 2e4 (f1 ∞ + f2 ∞ ) e−2n ii)

F n (f ) − f

∞

(1−α)

=: Φ2 ,

≤ Φ2 .

(2.50)

Proof. Based on Remark 2.21 and Theorem 2.15. Next we present a result of high order complex neural network approximation. Theorem 2.24. Let f : [a, b] → C, [a, b] ⊂ R, such that f = f1 +if2. Suppose f1 , f2 ∈ C N ([a, b]) , n, N ∈ N, 0 < α < 1, x ∈ [a, b]. Then i) |Fn (f, x) − f (x)| ≤ (4.1488766) · (2.51) ⎧ N (f (j) (x) + f (j) (x)) ⎨ 1 2 1 j −2n(1−α) 4 + e (b − a) e + ⎩ j! nαj j=1

⎡ ⎣ ⎛ ⎝

(N ) (N ) (ω1 f1 , n1α + ω1 f2 , n1α ) nαN N !

+

(N ) 2e4 f1

⎞ ⎤⎫ (N) ⎬ + f2 (b − a)N (1−α) ∞ ∞ ⎠ e−2n ⎦ , ⎭ N!

(j)

(j)

ii) assume further f1 (x0 ) = f2 (x0 ) = 0, j = 1, ..., N , for some x0 ∈ [a, b], it holds |Fn (f, x0 ) − f (x0 )| ≤ (4.1488766) · (2.52) ⎡ (N ) (N ) (ω1 f1 , n1α + ω1 f2 , n1α ) ⎣ + nαN N ! ⎛ ⎝

(N ) 2e4 f1

⎞ ⎤ (N ) N + f2 (b − a) (1−α) ∞ ∞ ⎠ e−2n ⎦, N!

notice here the extremely high rate of convergence at n−(N+1)α ,

64

2 Univariate Hyperbolic Tangent Neural Network

iii) Fn (f ) − f ∞ ≤ (4.1488766) · (j) + f2 1 j −2n(1−α) 4 ∞ ∞ + e (b − a) e + j! nαj

⎧ (j) N ⎨ f1 ⎩

j=1

⎡ (N ) (N ) ω1 f1 , n1α + ω1 f2 , n1α ⎣ + nαN N ! (N ) 2e4 f1

∞

(N ) + f2 (b − a)N ∞

N!

⎤⎫ ⎬ (1−α) ⎦ . e−2n ⎭

Proof. Based on Remark 2.21 and Theorem 2.16.

(2.53)

References

[1] Anastassiou, G.A.: Rate of convergence of some neural network operators to the unit-univariate case. J. Math. Anal. Appli. 212, 237–262 (1997) [2] Anastassiou, G.A.: Quantitative Approximations. Chapman&Hall/CRC, Boca Raton (2001) [3] Anastassiou, G.A.: Basic Inequalities, Revisited. Mathematica Balkanica, New Series 24, Fasc. 1-2, 59–84 (2010) [4] Anastassiou, G.A.: Univariate hyperbolic tangent neural network approximation. Mathematics and Computer Modelling 53, 1111–1132 (2011) [5] Chen, Z., Cao, F.: The approximation operators with sigmoidal functions. Computers and Mathematics with Applications 58, 758–765 (2009) [6] Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall, New York (1998) [7] Mitchell, T.M.: Machine Learning. WCB-McGraw-Hill, New York (1997) [8] McCulloch, W., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 7, 115–133 (1943)

Chapter 3

Multivariate Sigmoidal Neural Network Quantitative Approximation

Here we present the multivariate quantitative constructive approximation of real and complex valued continuous multivariate functions on a box or RN , N ∈ N, by the multivariate quasi-interpolation sigmoidal neural network operators. The ”right” operators for the goal are fully and precisely described. This approximation is obtained by establishing multidimensional Jackson type inequalities involving the multivariate modulus of continuity of the engaged function or its high order partial derivatives. The multivariate operators are deﬁned by using a multidimensional density function induced by the logarithmic sigmoidal function. Our approximations are pointwise and uniform. The related feed-forward neural network is with one hidden layer. This chapter is based on [5].

3.1

Introduction

Feed-forward neural networks (FNNs) with one hidden layer, the type of networks we deal with in this chapter, are mathematically expressed in a simpliﬁed form as Nn (x) =

n

cj σ (aj · x + bj ) ,

x ∈ Rs ,

s ∈ N,

j=0

where for 0 ≤ j ≤ n, bj ∈ R are the thresholds, aj ∈ Rs are the connection weights, cj ∈ R are the coeﬃcients, aj · x is the inner product of aj and x, and σ is the activation function of the network. In many fundamental network models, the activation function is the sigmoidal function of logistic type. To achieve our goals the operators here are more elaborate and complex, please see (3.2) and (3.3) for exact deﬁnitions. G.A. Anastassiou: Intelligent Systems: Approximation by ANN, ISRL 19, pp. 67–88. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com

68

3 Multivariate Sigmoidal Neural Network Quantitative Approximation

It is well known that FNNs are universal approximators. Theoretically, any continuous function deﬁned on a compact set can be approximated to any desired degree of accuracy by increasing the number of hidden neurons. It was proved by Cybenko [12] and Funahashi [14], that any continuous function can be approximated on a compact set with uniform topology by a network of the form Nn (x), using any continuous, sigmoidal activation function. Hornik et al. in [16], have shown that any measurable function can be approached with such a network. Furthermore, these authors proved in [17], that any function of the Sobolev spaces can be approached with all derivatives. A variety of density results on FNN approximations to multivariate functions were later established by many authors using diﬀerent methods, for more or less general situations: [19] by Leshno et al., [23] by Mhaskar and Micchelli, [11] by Chui and Li, [9] by Chen and Chen, [15] by Hahm and Hong, etc. Usually these results only give theorems about the existence of an approximation. A related and important problem is that of complexity: determining the number of neurons required to guarantee that all functions belonging to a space can be approximated to the prescribed degree of accuracy . Barron [6] proves that if the function is supposed to satisfy certain conditions expressed in terms of its Fourier transform, and if each of the neurons evaluates a sigmoidal activation function, then at most O −2 neurons are needed to achieve the order of approximation . Some other authors have published similar results on the complexity of FNN approximations: Mhaskar and Micchelli [24], Suzuki [25], Maiorov and Meir [21], Makovoz [22], Ferrari and Stengel [13], Xu and Cao [27], Cao et al. [8], etc. The author in [1], [2] and [3], see chapters 2-5, was the ﬁrst to obtain neural network approximations to continuous functions with rates by very speciﬁcally deﬁned neural network operators of Cardaliagnet-Euvrard and ”Squashing” types, by employing the modulus of continuity of the engaged function or its high order derivative, and producing very tight Jackson type inequalities. He treats there both the univariate and multivariate cases. The deﬁning these operators ”bell-shaped” and ”squashing” function are assumed to be of compact support. Also in [3] he gives the N th order asymptotic expansion for the error of weak approximation of these two operators to a special natural class of smooth functions, see chapters 4-5 there. For this chapter the author is greatly motivated by the important article [10] by Z. Chen and F. Cao, also by [4]. The author here performs multivariate sigmoidal neural network approximations to continuous functions over boxes or over the whole RN , N ∈ N, then he extends his results to complex valued multivariate functions. All convergences here are with rates expressed via the multivariate modulus of continuity of the involved function or its high order partial derivatives, and given by very tight multidimensional Jackson type inequalities.

3.2 Background and Auxiliary Results

69

The author here comes up with the right and precisely deﬁned multivariate quasi-interpolation neural network operator related to boxes. The boxes are not necessarily symmetric to the origin. In preparation to prove the main results we prove important properties of the basic multivariate density function deﬁning our operators. For the approximation theory background we use [20] and [26].

3.2

Background and Auxiliary Results

We consider here the sigmoidal function of logarithmic type si (xi ) =

1 , 1 + e−xi

each has the properties

xi ∈ R, i = 1, ..., N ; x := (x1 , ..., xN ) ∈ RN . lim si (xi ) = 1 and

xi →+∞

lim si (xi ) = 0, i = 1, ..., N.

xi →−∞

These functions play the role of activation functions in the hidden layer of neural networks, also have applications in biology, demography, etc. ([7, 18]). As in [10], we consider Φi (xi ) :=

1 (si (xi + 1) − si (xi − 1)) , xi ∈ R, i = 1, ..., N. 2

We have the following properties: i) Φi (xi ) > 0, ∀ xi ∈ R, ∞ ii) ki =−∞ Φi (xi − ki ) = 1, ∀ xi ∈ R, ∞ iii) ki =−∞ Φi (nxi − ki ) = 1, ∀ xi ∈ R; n ∈ N, ∞ iv) −∞ Φi (xi ) dxi = 1, v) Φi is a density function, vi) Φi is even: Φi (−xi ) = Φi (xi ), xi ≥ 0, for i = 1, ..., N. We observe that ([10]) 2 e −1 1 Φi (xi ) = , i = 1, ..., N. 2 x −1 i 2e (1 + e ) (1 + e−xi −1 ) vii) Φi is decreasing on R+ , and increasing on R− , i = 1, ..., N.

70

3 Multivariate Sigmoidal Neural Network Quantitative Approximation

Let 0 < β < 1, n ∈ N. Then as in [4] we get viii) ∞ ⎧ ⎨ ⎩

∞

Φi (nxi − ki ) =

⎧ ⎨

ki = −∞ : |nxi − ki | > n1−β

⎩ (1−β)

≤ 3.1992e−n

Φi (|nxi − ki |)

ki = −∞ : |nxi − ki | > n1−β i = 1, ..., N.

,

Denote by · the ceilingof a number, and by · the integral part of a .N N number. Consider here x ∈ i=1 [ai , bi ] ⊂ R , N ∈ N such that nai ≤ nbi , i = 1, ..., N ; a := (a1 , ..., aN ), b := (b1 , ..., bN ) . As in [4] we obtain ix) 1

0 < nb i

ki =nai Φi

(nxi − ki )

<

1 = 5.250312578, Φi (1)

∀ xi ∈ [ai , bi ] , i = 1, ..., N. x) As in [4], we see that nbi

lim

n→∞

Φi (nxi − ki ) = 1,

ki =nai

for at least some xi ∈ [ai , bi ], i = 1, ..., N. We will use here Φ (x1 , ..., xN ) := Φ (x) :=

N /

Φi (xi ) ,

x ∈ RN .

(3.1)

i=1

It has the properties: (i)’ Φ (x) > 0, ∀ x ∈ RN , We see that ∞

∞

...

k1 =−∞ k2 =−∞ ∞

∞

k1 =−∞ k2 =−∞

...

∞

∞

Φ (x1 − k1 , x2 − k2 , ..., xN − kN ) =

kN =−∞ N /

kN =−∞ i=1

Φi (xi − ki ) =

N / i=1

'

∞ ki =−∞

( Φi (xi − ki )

= 1.

3.2 Background and Auxiliary Results

71

That is (ii)’ ∞ k=−∞

(iii)’

∞

Φ (x − k) :=

∞

...

k1 =−∞ k2 =−∞

Φ (x1 − k1 , ..., xN − kN ) = 1,

kN =−∞

k := (k1 , ..., kn ), ∀ x ∈ RN . ∞

∞

∞

...

k1 =−∞ k2 =−∞ N

(iv)’

∞

∀ x ∈ R ; n ∈ N.

Φ (nx − k) :=

k=−∞ ∞

Φ (nx1 − k1 , ..., nxN − kN ) = 1,

kN =−∞

Φ (x) dx = 1, RN

that is Φ is a multivariate density function. Here x∞ := max {|x1 | , ..., |xN |}, x ∈ RN , also set ∞ := (∞, ..., ∞), −∞ := (−∞, ..., −∞) upon the multivariate context, and na : = ( na1 , ..., naN ) , nb : = (nb1 , ..., nbN ) . For 0 < β < 1 and n ∈ N, ﬁxed x ∈ RN , have that nb

Φ (nx − k) =

k=na nb

nb

Φ (nx − k) +

⎧ ⎨

k = na

⎩ k − x ≤ n ∞

1 nβ

Φ (nx − k) .

⎧ ⎨

k = na

⎩ k − x > n ∞

1 nβ

In the lasttwo sums the counting is over disjoint vector of k’s, because the condition nk − x∞ > n1β implies that there exists at least one knr − xr > 1 , r ∈ {1, ..., N } . nβ We treat ⎛ ⎞ nb

⎧ ⎨ ⎩

k k = na

− x > n ∞

1 nβ

⎜ nbi N ⎜ / ⎜ ⎜ Φ (nx − k) = ⎜⎧ i=1 ⎜⎨ ⎝ ki = nai

⎩ k − x > n

∞

⎟ ⎟ ⎟ Φi (nxi − ki )⎟ ⎟ ⎟ ⎠ 1 nβ

72

3 Multivariate Sigmoidal Neural Network Quantitative Approximation

⎛

⎞

⎛ ⎞ ⎜ ⎟ ' ∞ ( ⎜ ⎟ nbr N / ⎜ ⎟ ⎜ ⎟ ⎟ ≤⎝ Φi (nxi − ki ) ⎠ · ⎜ Φ (nx − k ) r r r ⎟ ⎜⎧ ⎜⎨ ⎟ i=1 ki =−∞ ⎝ kr = na ⎠ i =r r 1 k ⎩ r −x > r n nβ ⎛ ⎞ ⎜ ⎟ ⎜ ⎟ nbr ⎜ ⎟ ⎜ =⎜ Φr (nxr − kr )⎟ ⎟ ⎜⎧ ⎟ ⎝⎨ kr = na

⎠ r ⎩ kr − x > 1 r β n

≤

∞ ⎧ ⎨ ⎩

n

Φr (nxr − kr )

(by (viii))

≤

(1−β)

3.1992e−n

.

k kr = −∞ r − xr > 1β n n

We have established that (v)’ nb

Φ (nx − k) ≤ 3.1992e−n

⎧ ⎨

(1−β)

,

k k = na 1 − x > β n n ∞ . N 0 < β < 1, n ∈ N, x ∈ i=1 [ai , bi ] . ⎩

By (ix) clearly we obtain 0 < nb

1

k=na Φ (nx

− k)

< .N

= . nbi N

1

i=1

1

ki =nai

Φi (nxi − ki )

N

i=1 Φi (1)

= (5.250312578) .

That is, (vi)’ it holds 0 < nb

1

k=na Φ (nx − k) . N ∀x∈ [a , b ] , n ∈ N. i i i=1

N

< (5.250312578) ,

3.2 Background and Auxiliary Results

73

It is also obvious that (vii)’

∞ ⎧ ⎨ ⎩

Φ (nx − k) ≤ 3.1992e−n

k k =−∞ − x > n ∞

(1−β)

,

1 nβ

0 < β < 1, n ∈ N, x ∈ RN . By (x) we obviously observe that (viii)’ nb

lim

n→∞

for at least some x ∈

Φ (nx − k) = 1

k=na

.

N i=1

[ai , bi ] .

. N Let f ∈ C [a , b ] and n ∈ N such that nai ≤ nbi , i = 1, ..., N. i=1 i i We introduce and deﬁne the . multivariate positive linear neural network N operator (x := (x1 , ..., xN ) ∈ [a , b ] i=1 i i ) nb Gn (f, x1 , ..., xN ) := Gn (f, x) := nb1

k1 =na1

:=

k k=na f n Φ (nx − nb k=na Φ (nx − k)

k)

(3.2)

k1 .N nbN kN ... f , ..., Φ (nx − k ) i i i i=1 k2 =na2 kN =naN n n . .N nbi Φ (nx − k ) i i i i=1 ki =nai

nb2

For large enough n we always obtain nai ≤ nbi , i = 1, ..., N . Also ai ≤ ki ≤ bi , iﬀ nai ≤ ki ≤ nbi , i = 1, ..., N . n We study here the pointwise and uniform convergence of Gn (f ) to f with rates. For convenience we call nb

G∗n

(f, x) :=

k=na nb1

:=

nb2

k1 =na1 k2 =na2

∀x∈

. N

i=1

[ai , bi ] .

nbN

...

kN =naN

k f Φ (nx − k) n

f

k1 kN , ..., n n

'/ N i=1

(3.3) (

Φi (nxi − ki ) ,

74

3 Multivariate Sigmoidal Neural Network Quantitative Approximation

That is G∗n (f, x)

Gn (f, x) := nb

Φ (nx − k) k=na

, ∀x ∈

'N /

( [ai , bi ] , n ∈ N.

(3.4)

i=1

Therefore G∗n (f, x) − f (x) Gn (f, x) − f (x) = nb

nb

k=na

k=na

Φ (nx − k)

Φ (nx − k)

(3.5)

.

Consequently we derive nb N ∗ |Gn (f, x) − f (x)| ≤ (5.250312578) Gn (f, x) − f (x) Φ (nx − k) , k=na (3.6) . N ∀x∈ [a , b ] . i=1 i i We will estimate the right hand . side of (3.6). N For that we need, for f ∈ C i=1 [ai , bi ] the ﬁrst multivariate modulus of continuity ω1 (f, h) :=

sup . N

x, y ∈ i=1 [ai , bi ] x − y∞ ≤ h

|f (x) − f (y)| , h > 0.

(3.7)

Similarly it is deﬁned for f ∈ CB RN (continuous and bounded functions on RN ). We have that lim ω1 (f, h) = 0. h→0 When f ∈ CB RN we deﬁne Gn (f, x) := Gn (f, x1 , ..., xN ) :=

∞ k=−∞

:=

∞

∞

...

k1 =−∞ k2 =−∞

∞ kN =−∞

f

k1 k2 kN , , ..., n n n

f

k Φ (nx − k) n

'/ N

(3.8) (

Φi (nxi − ki ) ,

i=1

n ∈ N, ∀ x ∈ RN , N ≥ 1, the multivariate quasi-interpolation neural network operator. Notice here that for large enough n ∈ N we get that (1−β)

e−n

< n−βj , j = 1, ..., m ∈ N, 0 < β < 1.

(3.9)

3.3 Real Multivariate Neural Network Quantitative Approximations

75

Thus be given ﬁxed A, B > 0, for the linear combination An−βj + (1−β) Be−n the (dominant) rate of convergence to zero is n−βj . The closer β is to 1 we get faster . and better rate of convergence to zero. N Let f ∈ C m [a , b ] i=1 i i , m, N ∈ N. Here fα denotes a partial derivative N of f , α := (α1 , ..., αN ), αi ∈ Z+ , i = 1, ..., N , and |α| := i=1 αi = l, where α l = 0, 1, ..., m. We write also fα := ∂∂xαf and we say it is of order l. We denote max ω1,m (fα , h) := max ω1 (fα , h) . (3.10) α:|α|=m

Call also fα max ∞,m := max {fα ∞ } , |α|=m

(3.11)

·∞ is the supremum norm.

3.3

Real Multivariate Neural Network Quantitative Approximations

Here we present a series of multivariate neural network approximations to a function given with rates. We ﬁrst present . . N N Theorem 3.1. Let f ∈ C [a , b ] , 0 < β < 1, x ∈ [a , b ] , i i i i i=1 i=1 n, N ∈ N. Then i) N |Gn (f, x) − f (x)| ≤ (5.250312578) · 0

1 (1−β) 1 ω1 f, β + (6.3984) f ∞ e−n =: λ1 , n

(3.12)

ii) Gn (f ) − f ∞ ≤ λ1 . Proof. We observe that nb

Δ (x) :=

G∗n (f, x)

− f (x)

Φ (nx − k) =

k=na nb

k=na

f

nb k Φ (nx − k) − f (x) Φ (nx − k) = n k=na

k f − f (x) Φ (nx − k) . n

nb

k=na

(3.13)

76

3 Multivariate Sigmoidal Neural Network Quantitative Approximation

So that

k |Δ (x)| ≤ f n − f (x) Φ (nx − k) = k=na nb f k − f (x) Φ (nx − k) + n ⎧ ⎨ k = na

⎩ k − x ≤ n1β n ∞ nb f k − f (x) Φ (nx − k) ≤ n ⎧ ⎨ k = na

⎩ k − x > n1β n ∞ nb 1 ω1 f, β + 2 f ∞ Φ (nx − k) ≤ n ⎧ ⎨ k = na 1 ⎩ k − x > nβ n ∞ (1−β) 1 ω1 f, β + (6.3984) f ∞ e−n . n nb

(1−β) 1 |Δ| ≤ ω1 f, β + (6.3984) f ∞ e−n . n Now using (3.6) we prove claim. So that

Next we give

Theorem 3.2. Let f ∈ CB RN , 0 < β < 1, x ∈ RN , n, N ∈ N. Then i) Gn (f, x) − f (x) ≤ ω1 f, 1 + (6.3984) f e−n(1−β) =: λ2 , (3.14) ∞ nβ Gn (f ) − f ≤ λ2 . ∞

ii)

(3.15)

Proof. We have ∞

Gn (f, x) :=

k=−∞

k f Φ (nx − k) . n

Hence En (x) := Gn (f, x)−f (x) =

∞ k=−∞

∞ k f Φ (nx−k)−f (x) Φ (nx−k) = n

∞ k f − f (x) Φ (nx − k) . n

k=−∞

k=−∞

3.3 Real Multivariate Neural Network Quantitative Approximations

Thus

77

∞ k |En (x)| ≤ f n − f (x) Φ (nx − k) = k=−∞

∞ ⎧ ⎨ ⎩

k k =−∞ − x ≤ n ∞ ∞

⎧ ⎨

1 nβ

f k − f (x) Φ (nx − k) + n f k − f (x) Φ (nx − k) ≤ n

k k =−∞ 1 − x > β n n ∞ 1 ω1 f, β + 2 f ∞ n ⎧ ⎩

⎨ ⎩

∞

k k =−∞ − x > n ∞

Φ (nx − k) ≤ 1 nβ

(1−β) 1 ω1 f, β + (6.3984) f ∞ e−n . n Consequently, (1−β) 1 |En (x)| ≤ ω1 f, β + (6.3984) f ∞ e−n , n proving the claim. In the next we discuss high order of approximation by using the smoothness of f . We present . N Theorem 3.3. Let f ∈ C m i=1 [ai , bi ] , 0 < β < 1, n, m, N ∈ N, x ∈ . N i=1 [ai , bi ] . Then i) ⎛ ' ( 'N (⎞ m / fα (x) α Gn (f, x) − f (x) − ⎝ Gn (· − xi ) i , x ⎠ ≤ .N i=1 αi ! j=1 i=1 |α|=j (3.16) 0 N m max 1 N (5.250312578) · ω fα , β + m!nmβ 1,m n ' ( m max m (6.3984) b − a∞ fα ∞,m N −n(1−β) e , m!

78

3 Multivariate Sigmoidal Neural Network Quantitative Approximation

ii)

N

|Gn (f, x) − f (x)| ≤ (5.250312578) · (3.17) ⎧ ⎛ ⎞ ' ( ' (

m N ⎨ / (1−β) |f (x)| 1 α α ⎝ ⎠+ + (bi − ai ) i · (3.1992) e−n .N ⎩ nβj αi ! j=1

|α|=j

i=1

Nm 1 max ω fα , β + m!nmβ 1,m n

'

i=1

m

max

(6.3984) b − a∞ fα ∞,m N m m!

iii)

(

e

−n(1−β)

,

N

Gn (f ) − f ∞ ≤ (5.250312578) · (3.18) ⎧ ⎛ ⎞ ' ( ' (

N N ⎨ / (1−β) f 1 α α ∞ ⎝ ⎠+ + (bi − ai ) i (3.1992) e−n .N βj ⎩ n α ! i i=1 j=1 i=1 |α|=j

( ' m max m (6.3984) b − a f N (1−β) α Nm 1 ∞ ∞,m ω max fα , β + e−n , m!nmβ 1,m n m! . N iv) Suppose fα (x0 ) = 0, for all α : |α| = 1, ..., m; x0 ∈ i=1 [ai , bi ] . Then N |Gn (f, x0 ) − f (x0 )| ≤ (5.250312578) · (3.19) 2 ' ( m max m (6.3984) b − a∞ fα ∞,m N N m max 1 −n(1−β) ω fα , β + e , m!nmβ 1 n m! notice in the last the extremely high rate of convergence at n−β(m+1) . .N Proof. Consider gz (t) := f (x0 + t (z − x0 )), t ≥ 0; x0 , z ∈ i=1 [ai , bi ] . Then ⎡' (j ⎤ N ∂ gz(j) (t) = ⎣ (zi −x0i ) f ⎦ (x01 +t (z1 −x01 ) , ..., x0N +t (zN −x0N )) , ∂xi i=1

for all j = 0, 1, ..., m. We have the multivariate Taylor’s formula f (z1 , ..., zN ) = gz (1) = m (j) gz (0) j=0

j!

1 + (m − 1)!

1

(1 − θ) 0

m−1

gz(m) (θ) − gz(m) (0) dθ.

Notice gz (0) = f (x0 ). Also for j = 0, 1, ..., m, we have ' (' N ( / j! αi (j) gz (0) = (zi − x0i ) fα (x0 ) . .N i=1 αi ! i=1 α:=(α ,...,α ), α ∈Z+ , 1

N

i=1,...,N, |α|:=

Ni

i=1

αi =j

3.3 Real Multivariate Neural Network Quantitative Approximations

79

Furthermore α:=(α1 ,...,αN ), αi ∈Z+ , i=1,...,N, |α|:= N i=1 αi =m

gz(m) (θ) = ' (' N ( / m! αi (zi − x0i ) fα (x0 + θ (z − x0 )) , .N i=1 αi ! i=1

0 ≤ θ ≤ 1. . N So we treat f ∈ C m i=1 [ai , bi ] . . N Thus, we have for nk , x ∈ [a , b ] that i=1 i i k1 kN f , ..., − f (x) = n n '

m

j=1

α:=(α1 ,...,αN ), αi ∈Z+ , i=1,...,N, |α|:= N i=1 αi =j

where

R := m

1

i=1

αi !

i=1

(1−θ)

n

'

m−1

0

.N

(' N / ki

1

.N

+

('

1

i=1

α:=(α1 ,...,αN ), αi ∈Z , i=1,...,N, |α|:= N i=1 αi =m

− xi

αi (

αi !

fα (x) + R,

N / ki i=1

n

−xi

αi (

k · fα x + θ −x − fα (x) dθ. n We observe that |R| ≤ m

1

0

m−1

(1 − θ)

|α|=m

' .N

1

i=1

αi !

(' N αi ( / ki − xi · n i=1

1 m−1 fα x + θ k − x − f (x) dθ ≤ m (1 − θ) · α n 0 ' (' N αi ( / ki k 1 ω1 fα , θ − x dθ ≤ (∗) . .N n − xi n ∞ i=1 αi ! i=1 |α|=m Notice here that k − x ≤ 1 ⇔ ki − xi ≤ 1 , i = 1, ..., N. n nβ β n n ∞ We further see that max (∗) ≤ m·ω1,m

' (' N ( 1 / 1 αi 1 1 m−1 fα , β (1−θ) dθ = .N n nβ 0 i=1 αi ! i=1 |α|=m

80

3 Multivariate Sigmoidal Neural Network Quantitative Approximation

'

⎞ ' ( ⎛ ( max max ω1,m fα , n1β ω1,m fα , n1β m! ⎝ ⎠ = N m. .N mβ (m!) nmβ (m!) n α ! i i=1 |α|=m

Conclusion: When nk − x∞ ≤

|R| ≤

1 , nβ

Nm m!nmβ

we proved that

max ω1,m

In general we notice that ⎛ ' 1 1 m−1 ⎝ |R| ≤ m (1−θ) .N 0

2

|α|=m

'

m

max

.N

'

1

i=1

2 b − a∞ fα ∞,m

(⎛

m!

⎝

(' N /

i=1 αi !

|α|=m

αi !

|α|=m

1 fα , β n

N /

.

⎞

(

2 fα ∞ ⎠ dθ =

αi

(bi − ai )

i=1

( αi

(bi − ai )

i=1

⎞

m!

.N

i=1

αi !

⎠=

fα ∞ ≤ m

max

2 b − a∞ fα ∞,m N m m!

.

We proved in general that m

|R| ≤

max

2 b − a∞ fα ∞,m N m m!

:= λ3 .

Next we observe that nb

Un :=

Φ (nx − k) R =

k=na nb

nb

k=na k : n −x ≤ ∞

Consequently ⎛ ⎜ ⎜ |Un | ≤ ⎜ ⎜ ⎝

Φ (nx − k) R +

k=na k : n −x >

1 nβ

∞

Φ (nx − k) R. 1 nβ

⎞ ⎟ ⎟ N m max (1−β) 1 ⎟ Φ (nx−k)⎟ ω fα , β +(3.1992) λ3 e−n mβ 1,m m!n n ⎠

nb

k=na k : n −x ≤ ∞

≤

1 nβ

(1−β) N m max 1 ω f , + (3.1992) λ3 e−n . α 1,m mβ β m!n n

3.3 Real Multivariate Neural Network Quantitative Approximations

81

We have established that ( ' m max (6.3984) b − a∞ fα ∞,m N m (1−β) N m max 1 |Un | ≤ ω fα , β + e−n . m!nmβ 1,m n m! We see that nb

f

k=na m j=1

⎛

'

⎝

|α|=j

nb k Φ (nx − k) − f (x) Φ (nx − k) = n k=na

fα (x) .N i=1 αi !

(⎛

nb

⎝

Φ (nx − k)

'N / ki i=1

k=na

n

− xi

αi (

nb

+

Φ (nx − k) R.

k=na

The last says that ⎛ G∗n (f, x) − f (x) ⎝

⎞

nb

Φ (nx − k)⎠ −

k=na m j=1

⎛ ⎝

'

|α|=j

fα (x) .N i=1 αi !

( G∗n

'N /

(⎞ α (· − xi ) i , x ⎠ = Un .

i=1

G∗n

Clearly is a positive linear operator. N Thus (here αi ∈ Z+ : |α| = i=1 αi = j) ' ( 'N ( N / ∗ / αi α (· − xi ) , x ≤ G∗n |· − xi | i , x = Gn i=1

i=1

'N αi ( nb / ki − xi Φ (nx − k) = n

k=na

i=1

'N αi ( / ki − xi Φ (nx − k) + n

nb

k=na k : n −x ≤ ∞

i=1

1 nβ

nb

k=na k : n −x > ∞

'N αi ( / ki − xi Φ (nx − k) ≤ n i=1

1 nβ

⎞⎞ ⎠⎠

82

3 Multivariate Sigmoidal Neural Network Quantitative Approximation

⎛

⎞

⎜ N / 1 αi ⎜ + (bi − ai ) ⎜ ⎜ nβj i=1 ⎝

1 + nβj

'N /

k=na k : n −x >

(

(bi − ai )

⎟ ⎟ Φ (nx − k)⎟ ⎟≤ ⎠

nb

αi

∞

1 nβ

(3.1992) e−n

(1−β)

.

i=1

So we have proved that ' ( 'N ( N / (1−β) 1 ∗ / αi αi (· − xi ) , x ≤ βj + (bi − ai ) (3.1992) e−n , Gn n i=1 i=1 for all j = 1, ..., m. At last we observe that ⎛ ' ( 'N (⎞ m / f (x) α α Gn (f, x) − f (x) − ⎝ Gn (· − xi ) i , x ⎠ ≤ .N i=1 αi ! j=1 i=1 |α|=j nb N ∗ (5.250312578) · Gn (f, x) − f (x) Φ (nx − k) − k=na ⎛ ' ( 'N (⎞ m / fα (x) α ⎝ G∗n (· − xi ) i , x ⎠ . .N i=1 αi ! j=1 i=1 |α|=j Putting all of the above together we prove theorem.

3.4

Complex Multivariate Neural Network Quantitative Approximations

We make

.n Remark 3.4. Let X = i=1 [ai , bi ] or R√N , and f : X → C with real and imaginary parts f1 , f2 : f = f1 + if2 , i = −1. Clearly f is continuous iﬀ f1 and f2 are continuous. Given that f1 , f2 ∈ C m (X), m ∈ N, it holds fα (x) = f1,α (x) + if2,α (x) ,

(3.20)

where α denotes a partial derivative of any order and arrangement. We denote by CB RN , C the space of continuous and bounded functions f : RN → C. Clearly f is bounded, iﬀ both f1 , f2 are bounded from RN into R, where f = f1 + if2 .

3.4 Complex Multivariate Neural Network Quantitative Approximations

83

Here we deﬁne ' x∈

Gn (f, x) := Gn (f1 , x) + iGn (f2 , x) ,

n /

( [ai , bi ] ,

(3.21)

i=1

and Gn (f, x) := Gn (f1 , x) + iGn (f2 , x) , x ∈ RN .

(3.22)

We see here that |Gn (f, x) − f (x)| ≤ |Gn (f1 , x) − f1 (x)| + |Gn (f2 , x) − f2 (x)| ,

(3.23)

and Gn (f ) − f ∞ ≤ Gn (f1 ) − f1 ∞ + Gn (f2 ) − f2 ∞ .

(3.24)

Similarly we obtain Gn (f, x) − f (x) ≤ Gn (f1 , x) − f1 (x) + Gn (f2 , x) − f2 (x) , x ∈ RN , (3.25) and Gn (f ) − f ≤ Gn (f1 ) − f1 + Gn (f2 ) − f2 . (3.26) ∞ ∞ ∞ We give

.n Theorem .n3.5. Let f ∈ C ( i=1 [ai , bi ] , C) , f = f1 + if2 , 0 < β < 1, n, N ∈ N, x ∈ ( i=1 [ai , bi ]). Then i) N |Gn (f, x) − f (x)| ≤ (5.250312578) · (3.27) 0 1 (1−β) 1 1 ω1 f1 , β + ω1 f2 , β + (6.3984) (f1 ∞ + f2 ∞ ) e−n =: ψ1 , n n ii) Gn (f ) − f ∞ ≤ ψ1 .

(3.28)

Proof. Use of Theorem 3.1 and Remark 3.4. We present

Theorem 3.6. Let f ∈ CB RN , C , f = f1 + if2 , 0 < β < 1, n, N ∈ N, x ∈ RN . Then i) Gn (f, x) − f (x) ≤ ω1 f1 , 1 + ω1 f2 , 1 + (3.29) nβ nβ (6.3984) (f1 ∞ + f2 ∞ ) e−n ii)

Gn (f ) − f

∞

(1−β)

≤ ψ2 .

=: ψ2 , (3.30)

84

3 Multivariate Sigmoidal Neural Network Quantitative Approximation

Proof. By Theorem 3.2 and Remark 3.4. In the next we discuss high order of complex approximation by using the smoothness of f . We present . Theorem 3.7. Let f : ni=1 [ai , bi ] → C, such that f =.f1 + if2. Assume . n n f1 , f2 ∈ C m ( i=1 [ai , bi ]) , 0 < β < 1, n, m, N ∈ N, x ∈ ( i=1 [ai , bi ]). Then i) ⎛ ' ( 'N (⎞ m / fα (x) α Gn (f, x) − f (x) − ⎝ Gn (· − xi ) i , x ⎠ ≤ .N i=1 αi ! j=1 i=1 |α|=j (3.31) 0 Nm 1 1 N max max (5.250312578) · ω1,m f1,α , β + ω1,m f2,α , β + m!nmβ n n ⎫ ⎛ ⎞ m max max ⎬ (6.3984) b − a∞ f1,α ∞,m + f2,α ∞,m N m (1−β) ⎝ ⎠ e−n , ⎭ m! ii) N

|Gn (f, x) − f (x)| ≤ (5.250312578) · ⎧ ⎛ ' ( m ⎨ |f1,α (x)| + |f2,α (x)| 1 ⎝ + .N ⎩ nβj i=1 αi ! j=1 |α|=j 'N /

(

(bi − ai )

(

· (3.1992) e−n

αi

(3.32)

(1−β)

+

i=1

1 1 max max ω1,m f1,α , β + ω1,m f2,α , β + n n ⎫ ⎛ ⎞ m max max ⎬ (6.3984) b − a∞ f1,α ∞,m + f2,α ∞,m N m (1−β) ⎝ ⎠ e−n , ⎭ m! Nm m!nmβ

iii) N

Gn (f ) − f ∞ ≤ (5.250312578) · ⎧ ⎛ ' ( m ⎨ f1,α (x) + f2,α (x) 1 ∞ ∞ ⎝ + .N βj ⎩ n α ! i=1 i j=1 |α|=j 'N /

i=1

(

(bi − ai )

αi

(

· (3.1992) e

−n(1−β)

+

(3.33)

3.4 Complex Multivariate Neural Network Quantitative Approximations

85

1 1 max max ω1,m f1,α , β + ω1,m f2,α , β + n n ⎫ ⎛ ⎞ m max max ⎬ (6.3984) b − a∞ f1,α ∞,m + f2,α ∞,m N m (1−β) ⎠ e−n +⎝ , ⎭ m! Nm m!nmβ

. N iv) Suppose fα (x0 ) = 0, for all α : |α| = 1, ..., m; x0 ∈ i=1 [ai , bi ] . Then N |Gn (f, x0 ) − f (x0 )| ≤ (5.250312578) · (3.34) 0 m N 1 1 max max ω1,m f1,α , β + ω1,m f2,α , β + mβ m!n n n ⎫ ⎛ ⎞ max max m ⎬ (6.3984) b − am f + f N 1,α 2,α ∞ (1−β) ∞,m ∞,m ⎝ ⎠ e−n , ⎭ m! notice in the last the extremely high rate of convergence at n−β(m+1) . Proof. By Theorem 3.3 and Remark 3.4. Example 3.8. 2 Consider f (x, y) = ex+y , (x, y) ∈ [−1, 1] . Let x = (x1 , y1 ), y = (x2 , y2 ), we see that |f (x) − f (y)| = ex1 +y1 − ex2 +y2 = |ex1 ey1 − ex1 ey2 + ex1 ey2 − ex2 ey2 | = |ex1 (ey1 − ey2 ) + ey2 (ex1 − ex2 )| ≤ e (|ey1 − ey2 | + |ex1 − ex2 |) ≤ e2 [|y1 − y2 | + |x1 − x2 |] ≤ 2e2 x − y∞ . That is |f (x) − f (y)| ≤ 2e2 x − y∞ .

(3.35)

Consequently by (3.7) we get that ω1 (f, h) ≤ 2e2 h, h > 0.

(3.36)

Therefore by (3.13) we derive x+y Gn e (x, y) − ex+y ∞ ≤ (27.5657821) (3.37) 0 1 (1−β) 2 ·e2 + (6.3984) e−n , β n where 0 < β < 1 and n ∈ N.

86

3 Multivariate Sigmoidal Neural Network Quantitative Approximation

Example 3.9 Let f (x1 , . . . , xN ) =

N

sin xi , (x1 , . . . , xN ) ∈ RN , N ∈ N. Denote x =

i=1

(x1 , . . . , xN ) , y = (y1 , . . . , yN ) and see that N N N sin xi − sin yi ≤ |sin xi − sin yi | i=1

i=1

i=1

≤

N

|xi − yi |

i=1

≤ N x − y∞ . That is |f (x) − f (y)| ≤ N x − y∞ .

(3.38)

Consequently by (3.7) we obtain that ω1 (f, h) ≤ N h, h > 0.

(3.39)

Therefore by (3.15) we derive 'N ( N 1 −n(1−β) sin xi (x1 , . . . , xN )− sin xi ≤ N + (6.3984) e , Gn nβ i=1

i=1

∞

where 0 < β < 1 and n ∈ N. One can easily construct many other interesting examples.

(3.40)

References

[1] Anastassiou, G.A.: Rate of convergence of some neural network operators to the unit-univariate case. J. Math. Anal. Appli. 212, 237–262 (1997) [2] Anastassiou, G.A.: Rate of convergence of some multivariate neural network operators to the unit. J. Comp. Math. Appl. 40, 1–19 (2000) [3] Anastassiou, G.A.: Quantitative Approximations. Chapman&Hall/CRC, Boca Raton (2001) [4] Anastassiou, G.A.: Univariate sigmoidal neural network approximation. Journal of Comp. Anal. and Appl., (accepted 2011) [5] Anastassiou, G.A.: Multivariate sigmoidal neural network approximation. Neural Networks 24, 378–386 (2011) [6] Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 39, 930–945 (1993) [7] Brauer, F., Castillo-Chavez, C.: Mathematical models in population biology and epidemiology, pp. 8–9. Springer, New York (2001) [8] Cao, F.L., Xie, T.F., Xu, Z.B.: The estimate for approximation error of neural networks: a constructive approach. Neurocomputing 71, 626–630 (2008) [9] Chen, T.P., Chen, H.: Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its applications to a dynamic system. IEEE Trans. Neural Networks 6, 911–917 (1995) [10] Chen, Z., Cao, F.: The approximation operators with sigmoidal functions. Computers and Mathematics with Applications 58, 758–765 (2009) [11] Chui, C.K., Li, X.: Approximation by ridge functions and neural networks with one hidden layer. J. Approx. Theory 70, 131–141 (1992) [12] Cybenko, G.: Approximation by superpositions of sigmoidal function. Math. of Control Signals and System 2, 303–314 (1989) [13] Ferrari, S., Stengel, R.F.: Smooth function approximation using neural networks. IEEE Trans. Neural Networks 16, 24–38 (2005) [14] Funahashi, K.I.: On the approximate realization of continuous mappings by neural networks. Neural Networks 2, 183–192 (1989) [15] Hahm, N., Hong, B.I.: An approximation by neural networks with a ﬁxed weight. Computers & Math. with Appli. 47, 1897–1903 (2004) [16] Hornik, K., Stinchombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2, 359–366 (1989)

88

References

[17] Hornik, K., Stinchombe, M., White, H.: Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Networks 3, 551–560 (1990) [18] Hritonenko, N., Yatsenko, Y.: Mathematical modeling in economics, ecology and the environment, pp. 92–93. Science Press, Beijing (2006) (reprint) [19] Leshno, M., Lin, V.Y., Pinks, A., Schocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks 6, 861–867 (1993) [20] Lorentz, G.G.: Approximation of Functions. Rinehart and Winston, New York (1966) [21] Maiorov, V., Meir, R.S.: Approximation bounds for smooth functions in C(Rd ) by neural and mixture networks. IEEE Trans. Neural Networks 9, 969–978 (1998) [22] Makovoz, Y.: Uniform approximation by neural networks. J. Approx. Theory 95, 215–228 (1998) [23] Mhaskar, H.N., Micchelli, C.A.: Approximation by superposition of a sigmoidal function. Adv. Applied Math. 13, 350–373 (1992) [24] Mhaskar, H.N., Micchelli, C.A.: Degree of approximation by neural networks with a single hidden layer. Adv. Applied Math. 16, 151–183 (1995) [25] Suzuki, S.: Constructive function approximation by three-layer artiﬁcial neural networks. Neural Networks 11, 1049–1058 (1998) [26] Xie, T.F., Zhou, S.P.: Approximation Theory of Real Functions. Hangzhou University Press, Hangzhou (1998) [27] Xu, Z.B., Cao, F.L.: The essential order of approximation for neural networks. Science in China (Ser. F) 47, 97–112 (2004)

Chapter 4

Multivariate Hyperbolic Tangent Neural Network Quantitative Approximation

Here we give the multivariate quantitative approximation of real and complex valued continuous multivariate functions on a box or RN , N ∈ N, by the multivariate quasi-interpolation hyperbolic tangent neural network operators. This approximation is obtained by establishing multidimensional Jackson type inequalities involving the multivariate modulus of continuity of the engaged function or its high order partial derivatives. The multivariate operators are deﬁned by using a multidimensional density function induced by the hyperbolic tangent function. Our approximations are pointwise and uniform. The related feed-forward neural network is with one hidden layer. This chapter is based on [6].

4.1

Introduction

The author in [1], [2], and [3], see chapters 2-5, was the ﬁrst to establish neural network approximations to continuous functions with rates by very speciﬁcally deﬁned neural network operators of Cardaliagnet-Euvrard and ”Squashing” types, by employing the modulus of continuity of the engaged function or its high order derivative, and producing very tight Jackson type inequalities. He treats there both the univariate and multivariate cases. The deﬁning these operators ”bell-shaped” and ”squashing” functions are assumed to be of compact support. Also in [3] he gives the N th order asymptotic expansion for the error of weak approximation of these two operators to a special natural class of smooth functions, see chapters 4-5 there. For this chapter the author is inspired by the article [7] by Z. Chen and F. Cao, also by [4], [5]. G.A. Anastassiou: Intelligent Systems: Approximation by ANN, ISRL 19, pp. 89–107. c Springer-Verlag Berlin Heidelberg 2011 springerlink.com

90

4 Multivariate Hyperbolic Tangent Neural Network

The author here performs multivariate hyperbolic tangent neural network approximations to continuous functions over boxes or over the whole RN , N ∈ N, then he extends his results to complex valued multivariate functions. All convergences here are with rates expressed via the multivariate modulus of continuity of the involved function or its high order partial derivative, and given by very tight multidimensional Jackson type inequalities. The author here comes up with the ”right” precisely deﬁned multivariate quasi-interpolation neural network operators related to boxes or RN . The boxes are not necessarily symmetric to the origin. In preparation to prove our results we prove important properties of the basic multivariate density function induced by hyperbolic tangent function and deﬁning our operators. Feed-forward neural networks (FNNs) with one hidden layer, the only type of networks we deal with in this chapter, are mathematically expressed as n Nn (x) = cj σ (aj · x + bj ) , x ∈ Rs , s ∈ N, j=0

where for 0 ≤ j ≤ n, bj ∈ R are the thresholds, aj ∈ Rs are the connection weights, cj ∈ R are the coeﬃcients, aj · x is the inner product of aj and x, and σ is the activation function of the network. In many fundamental network models, the activation function is the hyperbolic tangent. About neural networks see [8], [9], [10].

4.2

Basic Ideas

We consider here the hyperbolic tangent function tanh x, x ∈ R : tanh x :=

ex − e−x . ex + e−x

It has the properties tanh 0 = 0, −1 < tanh x < 1, ∀ x ∈ R, and tanh (−x) = − tanh x. Furthermore tanh x → 1 as x → ∞, and tanh x → −1, as x → −∞, and it is strictly increasing on R. This function plays the role of an activation function in the hidden layer of neural networks. We further consider Ψ (x) :=

1 (tanh (x + 1) − tanh (x − 1)) > 0, ∀ x ∈ R. 4

We easily see that Ψ (−x) = Ψ (x), that is Ψ is even on R. Obviously Ψ is diﬀerentiable, thus continuous.

4.2 Basic Ideas

91

Proposition 4.1. ([5]) Ψ (x) for x ≥ 0 is strictly decreasing. Obviously Ψ (x) is strictly increasing for x ≤ 0. Also it holds

lim Ψ (x) =

x→−∞

0 = lim Ψ (x) . x→∞ Infact Ψ has the bell shape with horizontal asymptote the x-axis. So the maximum of Ψ is zero, Ψ (0) = 0.3809297. Theorem 4.2. ([5]) We have that ∞ i=−∞ Ψ (x − i) = 1, ∀ x ∈ R. Therefore

∞

Ψ (nx − i) = 1,

∀ n ∈ N, ∀ x ∈ R.

i=−∞

Also it holds

∞

Ψ (x + i) = 1,

i=−∞

Theorem 4.3. ([5]) It holds

∞

−∞

∀x ∈ R.

Ψ (x) dx = 1.

So Ψ (x) is a density function on R. Theorem 4.4. ([5]) Let 0 < α < 1 and n ∈ N. It holds ∞ ⎧ ⎨ ⎩

(1−α)

Ψ (nx − k) ≤ e4 · e−2n

.

k = −∞ : |nx − k| ≥ n1−α

Denote by · the integral part of the number and by · the ceiling of the number. Theorem 4.5. ([5]) Let x ∈ [a, b] ⊂ R and n ∈ N so that na ≤ nb . It holds 1 1 < = 4.1488766. nb Ψ (1) Ψ (nx − k) k=na Also by [5] we get that nb

lim

n→∞

Ψ (nx − k) = 1,

k=na

for at least some x ∈ [a, b]. In this chapter we employ Θ (x1 , ..., xN ) := Θ (x) :=

N / i=1

Ψ (xi ) ,

x = (x1 , ..., xN ) ∈ RN , N ∈ N. (4.1)

92

4 Multivariate Hyperbolic Tangent Neural Network

It has the properties: (i) Θ (x) > 0, ∀ x ∈ RN , (ii) ∞

∞

Θ (x − k) :=

k=−∞

∞

∞

...

k1 =−∞ k2 =−∞

Θ (x1 − k1 , ..., xN − kN ) = 1,

kN =−∞

where k := (k1 , ..., kn ), ∀ x ∈ RN . (iii)

∞

Θ (nx − k) :=

k=−∞ ∞

∞

...

k1 =−∞ k2 =−∞

∞

Θ (nx1 − k1 , ..., nxN − kN ) = 1,

kN =−∞

∀ x ∈ RN ; n ∈ N. (iv)

Θ (x) dx = 1, RN

that is Θ is a multivariate density function. Here x∞ := max {|x1 | , ..., |xN |}, x ∈ RN , also set ∞ := (∞, ..., ∞), −∞ := (−∞, ..., −∞) upon the multivariate context, and na : = ( na1 , ..., naN ) , nb : = (nb1 , ..., nbN ) , where a := (a1 , ..., aN ), b := (b1 , ..., bN ) . We clearly see that nb

nb

Θ (nx − k) =

k1 =na1

nbN

...

Ψ (nxi − xi ) =

k=na i=1

k=na nb1

N /

N /

Ψ (nxi − ki ) =

kN =naN i=1

N /

⎛ ⎝

i=1

nbi

ki =nai

For 0 < β < 1 and n ∈ N, ﬁxed x ∈ RN , we have that nb

k=na

Θ (nx − k) =

⎞ Ψ (nxi − ki )⎠ .

4.2 Basic Ideas

93

nb

nb

Θ (nx − k) +

⎧ ⎨

k = na

⎩ k − x ≤ n ∞

Θ (nx − k) .

⎧ ⎨

k = na

⎩ k − x > n ∞

1 nβ

1 nβ

In the last two sums the counting is over disjoint vector sets of k’s, because the condition nk − x∞ > n1β implies that there exists at least one knr − xr > 1 , r ∈ {1, ..., N } . nβ We treat ⎛ ⎞ nb

⎧ ⎨ ⎩

k k = na

− x > n ∞

1 nβ

⎜ nbi N ⎜ / ⎜ ⎜ Θ (nx − k) = ⎜⎧ i=1 ⎜⎨ ⎝ ki = nai

⎩ k − x > ∞

n

1 nβ

⎟ ⎟ ⎟ Ψ (nxi − ki )⎟ ⎟ ⎟ ⎠

⎛ ⎛ N ⎜/ ≤⎝ i=1 i =r

⎞

⎞ ⎜ ⎟ ⎜ ⎟ nbr ⎟ ⎟ ⎜ ⎜ Ψ (nxi − ki ) ⎠ · ⎜ Ψ (nxr − kr )⎟ ⎟ ⎧ ⎜⎨ ⎟ ki =−∞ ⎝ kr = na

⎠ r ⎩ kr − x > 1 r β n n ⎛ ⎞

'

(

∞

⎜ ⎟ ⎜ ⎟ nbr ⎜ ⎟ ⎟ =⎜ Ψ (nx − k ) r r ⎟ ⎜⎧ ⎜⎨ ⎟ ⎝ kr = na ⎠ r 1 k ⎩ r −x > r β n

≤

∞ ⎧ ⎨

n

Ψ (nxr − kr )

(by Theorem 4.4)

≤

(1−β)

e4 · e−2n

kr = −∞ − xr > n1β

⎩ kr n

We have established that (v) nb

⎧ ⎨

k k = na 1 − x > β n n ∞ . N 0 < β < 1, n ∈ N, x ∈ [a , b ] . i i i=1 ⎩

(1−β)

Θ (nx − k) ≤ e4 · e−2n

,

.

94

4 Multivariate Hyperbolic Tangent Neural Network

By Theorem 4.5 clearly we obtain 1

0 < nb k=na

Θ (nx − k) <

=. nbi N

ki =nai

i=1

1

1 Ψ (nxi − ki )

N

(Ψ (1))

N

= (4.1488766) .

That is, (vi) it holds 1

0 < nb k=na

∀x∈

Θ (nx − k)

<

1 (Ψ (1))

N

N

= (4.1488766) ,

. N

[a , b ] , n ∈ N. i i i=1

It is also obvious that (vii)

∞ ⎧ ⎨ ⎩

(1−β)

Θ (nx − k) ≤ e4 · e−2n

k k =−∞ − x > n ∞

,

1 nβ

0 < β < 1, n ∈ N, x ∈ RN . Also we ﬁnd

nb

lim

n→∞

Θ (nx − k) = 1,

k=na

. N for at least some x ∈ i=1 [ai , bi ] . . N Let f ∈ C [a , b ] and n ∈ N such that nai ≤ nbi , i = 1, ..., N. i i i=1 We introduce and deﬁne the . multivariate positive linear neural network N operator (x := (x1 , ..., xN ) ∈ [a , b ] i=1 i i ) nb Fn (f, x1 , ..., xN ) := Fn (f, x) := nb1 :=

k1 =na1

nb2

k2 =na2

...

nbN

kN =naN

.N nbi i=1

k k=na f n Θ (nx − nb Θ (nx − k) k=na

f

k

, ..., knN n 1

.N

ki =nai Ψ (nxi − ki )

i=1

k)

(4.2)

Ψ (nxi − ki )

.

For large enough n we always obtain nai ≤ nbi , i = 1, ..., N . Also ai ≤ ki n ≤ bi , iﬀ nai ≤ ki ≤ nbi , i = 1, ..., N .

4.2 Basic Ideas

95

We study here the pointwise and uniform convergence of Fn (f ) to f with rates. For convenience we call nb

Fn∗

(f, x) :=

k=na nb1

:=

nb2

...

k1 =na1 k2 =na2

. N ∀x∈ i=1 [ai , bi ] . That is

nbN

k f Θ (nx − k) n

f

kN =naN

k1 kN , ..., n n

'/ N

(4.3) ( Ψ (nxi − ki ) ,

i=1

F ∗ (f, x) Fn (f, x) := nb n , k=na Θ (nx − k)

(4.4)

. N ∀x∈ [a , b ] i=1 i i , n ∈ N. So that

Fn∗ (f, x) − f (x) Fn (f, x) − f (x) = nb

nb k=na

Θ (nx − k)

k=na Θ (nx − k)

.

(4.5)

Consequently we obtain nb N ∗ |Fn (f, x) − f (x)| ≤ (4.1488766) Fn (f, x) − f (x) Θ (nx − k) , k=na (4.6) . N ∀x∈ i=1 [ai , bi ] . We will estimate the right hand . side of (4.6). N For that we need, for f ∈ C [a , b ] the ﬁrst multivariate modulus i i i=1 of continuity ω1 (f, h) :=

sup |f (x) − f (y)| , h > 0. . x, y ∈ N [a , b ] i=1 i i x − y∞ ≤ h

(4.7)

Similarly it is deﬁned for f ∈ CB RN (continuous and bounded functions on RN ). We have that lim ω1 (f, h) = 0. h→0 When f ∈ CB RN we deﬁne, F n (f, x) := F n (f, x1 , ..., xN ) :=

∞ k=−∞

f

k Θ (nx − k) := n

(4.8)

96

4 Multivariate Hyperbolic Tangent Neural Network ∞

∞

∞

...

k1 =−∞ k2 =−∞

kN =−∞

f

k1 k2 kN , , ..., n n n

'/ N

( Ψ (nxi − ki ) ,

i=1

n ∈ N, ∀ x ∈ RN , N ≥ 1, the multivariate quasi-interpolation neural network operator. Notice here that for large enough n ∈ N we get that e−2n

(1−β)

< n−βj , j = 1, ..., m ∈ N, 0 < β < 1.

(4.9)

Thus be given ﬁxed A, B > 0, for the linear combination An−βj + (1−β) Be−2n the (dominant) rate of convergence to zero is n−βj . The closer β is to 1 we get faster and better rate of convergence to zero. .N Let f ∈ C m [a , b ] , m, N ∈ N. Here fα denotes a partial derivative i=1 i i of f , α := (α1 , ..., αN ), αi ∈ Z+ , i = 1, ..., N , and |α| := N i=1 αi = l, where α ∂ f l = 0, 1, ..., m. We write also fα := ∂xα and we say it is of order l. We denote max ω1,m (fα , h) := max ω1 (fα , h) . (4.10) α:|α|=m

Call also fα max ∞,m := max {fα ∞ } , |α|=m

(4.11)

·∞ is the supremum norm.

4.3

Real Multivariate Neural Network Quantitative Approximations

Here we show a series of multivariate neural network approximations to a function given with rates. We ﬁrst present . . N N Theorem 4.6. Let f ∈ C i=1 [ai , bi ] , 0 < β < 1, x ∈ i=1 [ai , bi ] , n, N ∈ N. Then i) N |Fn (f, x) − f (x)| ≤ (4.1488766) · 0 1 1 4 −2n(1−β) ω1 f, β + 2e f ∞ e =: λ1 , n

(4.12)

Fn (f ) − f ∞ ≤ λ1 .

(4.13)

ii)

4.3 Real Multivariate Neural Network Quantitative Approximations

97

Proof. We see that nb

Δ (x) :=

Fn∗

(f, x) − f (x)

Θ (nx − k) =

k=na nb

f

k=na

nb k Θ (nx − k) − f (x) Θ (nx − k) = n k=na

k f − f (x) Θ (nx − k) . n

nb

k=na

Therefore

k |Δ (x)| ≤ f n − f (x) Θ (nx − k) = nb

k=na

f k − f (x) Θ (nx − k) + n

nb

⎧ ⎨ ⎩

k k = na

− x ≤ n ∞

1 nβ

f k − f (x) Θ (nx − k) ≤ n

nb

⎧ ⎨ ⎩

k k = na

− x > n ∞

1 nβ

1 ω1 f, β + 2 f ∞ n ⎧ ⎨ ⎩

nb

k k = na

− x > n ∞

Θ (nx − k) ≤ 1 nβ

(1−β) 1 ω1 f, β + 2e4 f ∞ e−2n . n So that

(1−β) 1 |Δ(x)| ≤ ω1 f, β + 2e4 f ∞ e−2n . n

Now using (4.6) we prove claim. Next we give Theorem 4.7. Let f ∈ CB RN , 0 < β < 1, x ∈ RN , n, N ∈ N. Then i) F n (f, x) − f (x) ≤ ω1 f, 1 + 2e4 f e−2n(1−β) =: λ2 , (4.14) ∞ nβ

98

4 Multivariate Hyperbolic Tangent Neural Network

ii)

F n (f ) − f ≤ λ2 . ∞

(4.15)

Proof. We have ∞

F n (f, x) :=

k=−∞

f

k Θ (nx − k) . n

Therefore ∞

En (x) := F n (f, x)−f (x) =

f

k=−∞

∞ k Θ (nx−k)−f (x) Θ (nx−k) = n k=−∞

∞ k f − f (x) Θ (nx − k) . n

k=−∞

Hence

∞ k |En (x)| ≤ f n − f (x) Θ (nx − k) = k=−∞

∞ ⎧ ⎨ ⎩

k k =−∞ − x ≤ n ∞ ∞

⎧ ⎨

1 nβ

f k − f (x) Θ (nx − k) + n f k − f (x) Θ (nx − k) ≤ n

k k =−∞ 1 − x > β n n ∞ 1 ω1 f, β + 2 f ∞ n ⎧ ⎩

⎨ ⎩

∞

k k =−∞ − x > n ∞

Θ (nx − k) ≤ 1 nβ

(1−β) 1 ω1 f, β + 2e4 f ∞ e−2n . n Thus

(1−β) 1 |En (x)| ≤ ω1 f, β + 2e4 f ∞ e−2n , n

proving the claim. In the next we discuss high order of approximation by using the smoothness of f .

4.3 Real Multivariate Neural Network Quantitative Approximations

99

We present Theorem 4.8. Let f ∈ C m . N [a , b ] . Then i i i=1

. N

[a , b ] , 0 < β < 1, n, m, N ∈ N, x ∈ i i i=1

i) ⎛ ' ( 'N (⎞ m / f (x) α α Fn (f, x) − f (x) − ⎝ Fn (· − xi ) i , x ⎠ ≤ .N i=1 αi ! j=1 i=1 |α|=j (4.16) 0 N m max 1 N (4.1488766) · ω fα , β + m!nmβ 1,m n ' ( m max 2e4 b − a∞ fα ∞,m N m −2n(1−β) e , m! ii) N

|Fn (f, x) − f (x)| ≤ (4.1488766) · (4.17) ⎧ ⎛ ⎞ ' ( ' (

m N ⎨ / (1−β) |fα (x)| 1 α ⎝ ⎠+ + (bi − ai ) i · e4 e−2n .N βj ⎩ n αi ! j=1

|α|=j

i=1

N m max 1 ω f , + α m!nmβ 1,m nβ

'

i=1

m

max

2e4 b − a∞ fα ∞,m N m m!

(

−2n(1−β)

e

,

iii) Fn (f ) − f ∞ ≤ (4.1488766)N · (4.18) ⎧ ⎛ ⎞ ' ( 'N (

m ⎨ / fα ∞ 1 αi 4 −2n(1−β) ⎠ ⎝ + (bi − ai ) e e + .N ⎩ nβj i=1 αi ! j=1 i=1 |α|=j

( ' 4 m max 2e b − a∞ fα ∞,m N m N m max 1 −2n(1−β) ω fα , β + e , m!nmβ 1,m n m! . N iv) Suppose fα (x0 ) = 0, for all α : |α| = 1, ..., m; x0 ∈ i=1 [ai , bi ] . Then N |Fn (f, x0 ) − f (x0 )| ≤ (4.1488766) · (4.19) 2 ' ( max m 2e4 b − am (1−β) N m max 1 ∞ fα ∞,m N ω f , + e−2n , α 1 mβ β m!n n m! notice in the last the extremely high rate of convergence at n−β(m+1) .

100

4 Multivariate Hyperbolic Tangent Neural Network

.N Proof. Consider gz (t) := f (x0 + t (z − x0 )), t ≥ 0; x0 , z ∈ i=1 [ai , bi ] . Then ⎡' (j ⎤ N ∂ gz(j) (t) = ⎣ (zi −x0i ) f ⎦ (x01 +t (z1 −x01 ) , ..., x0N +t (zN −x0N )) , ∂x i i=1 for all j = 0, 1, ..., m. We have the multivariate Taylor’s formula f (z1 , ..., zN ) = gz (1) = m (j) gz (0)

j!

j=0

+

1 (m − 1)!

1

(1 − θ)

m−1

0

gz(m) (θ) − gz(m) (0) dθ.

Notice gz (0) = f (x0 ). Also for j = 0, 1, ..., m, we have ' (' N ( / j! αi (j) gz (0) = (zi − x0i ) fα (x0 ) . .N i=1 αi ! i=1 α:=(α ,...,α ), α ∈Z+ , 1

N

i=1,...,N, |α|:=

Ni

i=1

αi =j

Furthermore α:=(α1 ,...,αN ), αi ∈Z+ , i=1,...,N, |α|:= N i=1 αi =m

gz(m) (θ) = ' (' N ( / m! αi (zi − x0i ) fα (x0 + θ (z − x0 )) , .N i=1 αi ! i=1

0 ≤ θ ≤ 1. . N So we treat f ∈ C m [a , b ] . i i i=1 . N Thus, we have for nk , x ∈ i=1 [ai , bi ] that f m

j=1

α:=(α1 ,...,αN ), αi ∈Z+ , i=1,...,N, |α|:= N i=1 αi =j

k1 kN , ..., − f (x) = n n ' (' N αi ( / ki 1 − xi fα (x) + R, .N n i=1 αi ! i=1

where R := m

1

m−1

(1−θ) 0

'

+

α:=(α1 ,...,αN ), αi ∈Z , i=1,...,N, |α|:= N i=1 αi =m

.N

(' N / ki

1

i=1

αi !

i=1

n

−xi

αi (

4.3 Real Multivariate Neural Network Quantitative Approximations

101

k · fα x + θ −x − fα (x) dθ. n We see that

1

m−1

|R| ≤ m

(1 − θ) 0

(

'

.N

|α|=m

1

i=1

αi !

(' N αi ( / ki − xi · n i=1

1 m−1 fα x + θ k − x − fα (x))dθ ≤ m (1 − θ) · n 0 ' (' N αi ( / ki k 1 ( ω 1 f α , θ − x )dθ ≤ (∗) . .N n − xi n ∞ i=1 αi ! i=1 |α|=m

Notice here that k − x ≤ 1 ⇔ ki − xi ≤ 1 , i = 1, ..., N. n nβ β n n ∞ We further observe max (∗) ≤ m·ω1,m

' (' N ( 1 / 1 αi 1 1 m−1 fα , β (1−θ) ( )dθ = .N n nβ 0 i=1 αi ! i=1 |α|=m

'

⎞ ' ( ⎛ ( max max ω1,m fα , n1β ω1,m fα , n1β m! ⎝ ⎠ = N m. .N mβ (m!) nmβ (m!) n α ! i=1 i |α|=m

Conclusion: When nk − x∞ ≤

|R| ≤

1 , nβ

Nm m!nmβ

we proved 1 max ω1,m fα , β . n

In general we notice that ⎛ ' 1 1 m−1 ⎝ |R| ≤ m (1−θ) .N 0

2

|α|=m

'

i=1 αi !

|α|=m

m

max

.N

2 b − a∞ fα ∞,m m!

'

1

i=1

(⎛ ⎝

(' N /

αi !

|α|=m

N /

⎞

(

2 fα ∞ ⎠ dθ =

αi

(bi − ai )

i=1

( αi

(bi − ai )

i=1

⎞

m!

.N

i=1

αi !

⎠=

fα ∞ ≤ m

max

2 b − a∞ fα ∞,m N m m!

.

102

4 Multivariate Hyperbolic Tangent Neural Network

We proved in general m

|R| ≤

max

2 b − a∞ fα ∞,m N m m!

Next we see that

:= λ3 .

nb

Un :=

Θ (nx − k) R =

k=na nb

nb

k=na k : n −x ≤ ∞

Θ (nx − k) R.

k=na k : n −x >

1 nβ

∞

Consequently ⎛ ⎜ ⎜ |Un | ≤ ⎜ ⎜ ⎝

Θ (nx − k) R +

1 nβ

⎞ ⎟ ⎟ N m max (1−β) 1 Θ (nx − k)⎟ ω f , + e4 λ3 e−2n α ⎟ m!nmβ 1,m β n ⎠

nb

k=na k : n −x ≤ ∞

1 nβ

(1−β) N m max 1 ≤ ω fα , β + e4 λ3 e−2n . m!nmβ 1,m n We have established that ( ' 4 m max 2e b − a∞ fα ∞,m N m (1−β) Nm 1 max |Un | ≤ ω fα , β + e−2n . m!nmβ 1,m n m! We observe nb

k=na m j=1

⎛ ⎝

|α|=j

'

nb k f Θ (nx − k) − f (x) Θ (nx − k) = n

fα (x) .N i=1 αi !

k=na

(⎛ ⎝

nb

Θ (nx − k)

'N / ki i=1

k=na

n

⎞⎞ αi ( ⎠⎠ − xi

nb

+

Θ (nx − k) R.

k=na

The last says that ⎛ Fn∗ (f, x) − f (x) ⎝

nb

k=na

⎞ Θ (nx − k)⎠ −

4.3 Real Multivariate Neural Network Quantitative Approximations m j=1

⎛ ⎝

'

|α|=j

fα (x) .N i=1 αi !

( Fn∗

'N /

103

(⎞ αi

(· − xi )

, x ⎠ = Un .

i=1

Fn∗

Clearly is a positive linear operator. N Therefore (here αi ∈ Z+ : |α| = i=1 αi = j) ' ( 'N ( N / ∗ / αi αi ∗ (· − xi ) , x ≤ Fn |· − xi | , x = Fn i=1

i=1

nb

k=na

'N αi ( / ki − xi Θ (nx − k) = n i=1

'N αi ( / ki − xi Θ (nx − k) + n

nb

k=na k : n −x ≤ ∞

i=1

1 nβ

nb

k=na k : n −x > ∞

'N αi ( / ki − xi Θ (nx − k) ≤ n i=1

1 nβ

⎛

⎜ N / 1 αi ⎜ ⎜ + (b − a ) i i ⎜ nβj i=1 ⎝ 1 + nβj

'N /

⎞ ⎟ ⎟ Θ (nx − k)⎟ ⎟≤ ⎠

nb

k=na k : n −x >

(

∞

(bi − ai )αi

1 nβ

e4 e−2n

(1−β)

.

i=1

So we have proved that ' ( 'N ( N / (1−β) 1 ∗ / αi αi (· − xi ) , x ≤ βj + (bi − ai ) e4 e−2n , Fn n i=1 i=1 for all j = 1, ..., m. At last we observe ⎛ ' ( 'N (⎞ m / fα (x) α Fn (f, x) − f (x) − ⎝ Fn (· − xi ) i , x ⎠ ≤ .N i=1 αi ! j=1 i=1 |α|=j nb (4.1488766)N · Fn∗ (f, x) − f (x) Θ (nx − k) − k=na

104

4 Multivariate Hyperbolic Tangent Neural Network m j=1

⎛ ⎝

|α|=j

'

fα (x) .N i=1 αi !

( Fn∗

(⎞ α (· − xi ) i , x ⎠ . i=1

'N /

Putting all of the above together we prove theorem.

4.4

Complex Multivariate Neural Network Quantitative Approximations

We make .n Remark 4.9. Let X = i=1 [ai , bi ] or R√N , and f : X → C with real and imaginary parts f1 , f2 : f = f1 + if2 , i = −1. Clearly f is continuous iﬀ f1 and f2 are continuous. Given that f1 , f2 ∈ C m (X), m ∈ N, it holds fα (x) = f1,α (x) + if2,α (x) ,

(4.20)

where α denotes a partial derivative of any order and arrangement. We denote by CB RN , C the space of continuous and bounded functions f : RN → C. Clearly f is bounded, iﬀ both f1 , f2 are bounded from RN into R, where f = f1 + if2 . Here we deﬁne ' n ( / Fn (f, x) := Fn (f1 , x) + iFn (f2 , x) , x ∈ [ai , bi ] , (4.21) i=1

and F n (f, x) := F n (f1 , x) + iF n (f2 , x) , x ∈ RN .

(4.22)

We see here that |Fn (f, x) − f (x)| ≤ |Fn (f1 , x) − f1 (x)| + |Fn (f2 , x) − f2 (x)| ,

(4.23)

Fn (f ) − f ∞ ≤ Fn (f1 ) − f1 ∞ + Fn (f2 ) − f2 ∞ .

(4.24)

and

Similarly we get F n (f, x) − f (x) ≤ F n (f1 , x) − f1 (x) + F n (f2 , x) − f2 (x) , x ∈ RN , (4.25) and F n (f ) − f ≤ F n (f1 ) − f1 + F n (f2 ) − f2 . (4.26) ∞ ∞ ∞

4.4 Complex Multivariate Neural Network Quantitative Approximations

105

We give .n Theorem 4.10. f ∈ C ( i=1 [ai , bi ] , C) , f = f1 + if2 , 0 < β < 1, .Let n n, N ∈ N, x ∈ ( i=1 [ai , bi ]). Then i) N |Fn (f, x) − f (x)| ≤ (4.1488766) · (4.27) 0 1 (1−β) 1 1 ω1 f1 , β + ω1 f2 , β + 2e4 (f1 ∞ + f2 ∞ ) e−2n =: Φ1 , n n ii) Fn (f ) − f ∞ ≤ Φ1 .

(4.28)

Proof. Use of Theorem 4.6 and Remark 4.9. We present Theorem 4.11. Let f ∈ CB RN , C , f = f1 + if2 , 0 < β < 1, n, N ∈ N, x ∈ RN . Then i) F n (f, x) − f (x) ≤ ω1 f1 , 1 + ω1 f2 , 1 + (4.29) nβ nβ (1−β)

2e4 (f1 ∞ + f2 ∞ ) e−2n ii)

F n (f ) − f

∞

≤ Φ2 .

=: Φ2 , (4.30)

Proof. By Theorem 4.7 and Remark 4.9. In the next we discuss high order of complex approximation by using the smoothness of f . We present .n Theorem 4.12. Let f : i=1 [ai , bi ] → C, such that f = . .f1 + if2. Assume f1 , f2 ∈ C m ( ni=1 [ai , bi ]) , 0 < β < 1, n, m, N ∈ N, x ∈ ( ni=1 [ai , bi ]). Then i) ⎛ ' ( 'N (⎞ m / fα (x) α Fn (f, x) − f (x) − ⎝ Fn (· − xi ) i , x ⎠ ≤ .N i=1 αi ! j=1 i=1 |α|=j 0 (4.31) Nm 1 1 N max max (4.1488766) · ω1,m f1,α , β + ω1,m f2,α , β + m!nmβ n n ⎫ ⎛ ⎞ max max m ⎬ 2e4 b − am f + f N 1,α 2,α ∞ (1−β) ∞,m ∞,m ⎝ ⎠ e−2n , ⎭ m!

106

4 Multivariate Hyperbolic Tangent Neural Network

ii) |Fn (f, x) − f (x)| ≤ (4.1488766)N · ⎧ ⎛ ' ( m ⎨ |f1,α (x)| + |f2,α (x)| 1 ⎝ + .N ⎩ nβj j=1 i=1 αi ! |α|=j

'N /

(bi − ai )

(

αi

(4.32)

(

4 −2n(1−β)

·e e

+

i=1

1 1 max f1,α , β + ω1,m f2,α , β + n n ⎫ ⎛ ⎞ m max max ⎬ 2e4 b − a∞ f1,α ∞,m + f2,α ∞,m N m (1−β) ⎝ ⎠ e−2n , ⎭ m! Nm m!nmβ

max ω1,m

iii) N

Fn (f ) − f ∞ ≤ (4.1488766) · ⎧ ⎛ ' ( m ⎨ f1,α (x) + f2,α (x) 1 ∞ ∞ ⎝ + .N βj ⎩ n j=1 i=1 αi ! |α|=j 'N (

( / αi 4 −n(1−β) (bi − ai ) ·e e + i=1

1 1 max max ω1,m f1,α , β + ω1,m f2,α , β + n n ⎫ ⎛ ⎞ m max max ⎬ 2e4 b − a∞ f1,α ∞,m + f2,α ∞,m N m (1−β) ⎠ e−2n +⎝ , ⎭ m! m

N m!nmβ

(4.33)

. N iv) Suppose fα (x0 ) = 0, for all α : |α| = 1, ..., m; x0 ∈ i=1 [ai , bi ] . Then |Fn (f, x0 ) − f (x0 )| ≤ (4.1488766)N · (4.34) 0 m N 1 1 max max ω1,m f1,α , β + ω1,m f2,α , β + mβ m!n n n ⎫ ⎛ ⎞ m max max ⎬ 2e4 b − a∞ f1,α ∞,m + f2,α ∞,m N m (1−β) ⎝ ⎠ e−2n , ⎭ m! notice in the last the extremely high rate of convergence at n−β(m+1) . Proof. By Theorem 4.8 and Remark 4.9.

References

[1] Anastassiou, G.A.: Rate of convergence of some neural network operators to the unit-univariate case. J. Math. Anal. Appli. 212, 237–262 (1997) [2] Anastassiou, G.A.: Rate of convergence of some multivariate neural network operators to the unit. J. Comp. Math. Appl. 40, 1–19 (2000) [3] Anastassiou, G.A.: Quantitative Approximations. Chapman&Hall/CRC, Boca Raton (2001) [4] Anastassiou, G.A.: Univariate sigmoidal neural network approximation. Journal of Comp. Anal. and Appl. (accepted 2011) [5] Anastassiou, G.A.: Univariate hyperbolic tangent neural network approximation. Mathematics and Computer Modelling 53, 1111–1132 (2011) [6] Anastassiou, G.A.: Multivariate hyperbolic tangent neural network approximation. Computers and Mathematics 61, 809–821 (2011) [7] Chen, Z., Cao, F.: The approximation operators with sigmoidal functions. Computers and Mathematics with Applications 58, 758–765 (2009) [8] Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall, New York (1998) [9] Mitchell, T.M.: Machine Learning. WCB-McGraw-Hill, New York (1997) [10] McCulloch, W., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 7, 115–133 (1943)