Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu

Algorithms For Approximation IV. Proceedings of the 2001 International Symposium

2002

■/i ^, :: ^'ai '• i!i;!!Mjii ■" ".-, k if? {3 f" r; r.! -'„SsS;S L?!s2SiSiii '? K "?»?"? s ^ ; I iiL'lM'i iiiiliiliji ;!riii litiinfiii iiii&=» iiasi'ilMII .-Ji,ft^a»'"--•■;! jsiiiilSiiiii Isy^A' -^ '3 3 H -3 3 i? " " J. Levesley I.J. Anderson V J.C. Mason / (Eds) University of Huddersfield Proceedings Published 2002 Form Approved 0MB No. 0704-0188 REPORT DOCUMENTATION PAGE Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of tliis collection of information, including suggestions for reducing the burden, to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports (0704-0188), 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collection of information if it does not display a currently valid 0MB control number. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS. 1. REPORT DAIE (DD-MM-YYYY) 14-10-2002 2. REPORT TYPE Conference Proceedings 4. TITLE AND SUBTITLE 3. DATES COVERED (From - To) 16 July 2001 -20 July 2001 5a. CONTRACT NUMBER F61775-00-WF078 Algorithms for Approximation IV (A4A4) 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER Sd. PROJECT NUMBER 6. AUTHOR(S) Conference Committee (Organizer, Professor John C Mason) 5d. TASK NUMBER 5e. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) University of Huddersfield Queensgate Huddersfield HD1 SDH United Kingdom 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR'S ACRONYM(S) EOARD PSC 802 BOX 14 =:._-—^FEO-09499-0014.—,.—^-_ - ^ - N/A JJ. _SPONSOR/MONITOR'SJREJPORT NUMBERtS) CSP 00-5078 .— 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution is unlimited. 13. SUPPLEMENTARY NOTES 14. ABSTRACT The Final Proceedings for Algorithms for Approximation IV (A4A4), 16 July 2001 - 20 July 2001, a multidisciplinary conference addressing many areas of interest to the Air Force. Of primary interest are the potential applications to Modeling and Simulation. Specifically, the topics to be covered include in the following four major areas: Algorithms, Efficiency, Software, and Applications. Each major topic is divided into subtopics as follows: Algorithms - Approximation of Functions, Data Fitting, Geometric and Surface Modelling, Splines, Wavelets, Radial Basis Functions, Support Vector Machines, Norms and Metrics, Errors in Data, Uncertainty Estimation, Efficiency - Numerical Analysis, Parallel Processing, Software - Standards, Libraries, New Routines, World Wide Web, Applications - Metrology (Science of Measurement), Data Fusion, Neural Networks and Intelligent Systems, Spherical Data and Geodetics, Medical Data. 15. SUBJECT TERMS EOARD, Software, Mathematics, Intelligent Systems, Computational Mathematics 16. SECURITY CLASSIFICATION OF: a. REPORT UNCLAS b. ABSTRACT UNCLAS c. THIS PAGE UNCLAS 17. LIMITATION OF ABSTRACT UL 18, NUMBER OF PAGES 492 (plus front matter) 19a. NAME OF RESPONSIBLE PERSON Christopher Reuter, Ph. D. 19b. TELEPHONE NUMBER (include area code) +44(0)20 7514 4474 Standard Form 298 (Rev. 8/98) Prescribed by ANSI Std. Z39-18 Algorithms for Approximation IV The proceedings of the Fourth International Symposium on Algorithms for Approximation, held at the University of Huddersfield, July 2001. Edited by Jeremy Levesley Department of Mathematics and Computer Science University of Leicester Leicester LEI 7RH, UK. Iain Anderson Analyticon Ltd Elopak House Rutherford Close Meadway Technology Park Stevenage, SGI 2EF, UK. DISTRIBUTION STATEMENT A Approved for Public Release Distribution Unlimited John C. Mason School of Computing and Mathematics The University of Huddersfield Queensgate Huddersfield, HDl 3DH, UK. Published by The University of Huddersfield 20030319 033 A^Q^OS-O'^-OSI^ y First published in 2002 by the University of Huddersfield, Queensgate, Huddersfield HDl SDH. Printed in Great Britain by The Charlesworth Group, 254, Deighton Road, Huddersfield HD2 IJJ, UK. ISBN 186218 040 7 British Library in Publication Data A catalogue record for this book is available from the British Library. Contents Contributors Preface Chapter 1 Computer Aided Geometric Design An automatic control point choice in algebraic numerical grid generation C. Conti, R. Morandi, and D. ScaramelU Shape-measure method for introducing the nearly optimal domain A. Fakharzadeh and J. E. Rubio 1 2 £J 10 Convex combination maps M. Floater 18 Shape preserving interpolation by curves T. N. T. Goodman 24 CAGD techniques for differentiable manifolds A. Lin and M. Walker Parametric shape-preserving spatial interpolation and z/-splines C. Manni On the g-Bernstein polynomials H. Orug and N. Tuncer Uniform Powell-Sabin splines for the polygonal hole problem J. Windmolders and P. Dierckx 36 44 52 60 Chapter 2 Differential Equations Iterative refinement schemes for an ill-conditioned transfer equation in astrophysics M. Ahues, F. d'Almeida, A. Largillier, O. Titaud, and P. Vasconcelos Geometric symmetry in the symmetric Galerkin BEM A. Aimi and M. Diligenti The numerical simulation of the qualitative behaviour of Volterra integro-differential equations J. T. Edwards, N. J. Ford, and J. A. Roberts Systems of delay equations with small solutions: a numerical approach A^. J. Ford and P. M. Lumb Chapter 3 69 70 78 86 94 On an adaptive mesh algorithm with minimal distance control K. Shanazari and K. Chen 102 An alternative approach for solving Maxwell equations W. Sproessig and E. Venturino 110 Metrology 121 Orthogonal distance fitting of parametric curves and surfaces S. J. Ahn, E. Westkdmper, and W. Rauh 122 Template matching in the ii norm /. J. Anderson and C. Ross 130 A bootstrap method for mixture models and interval data in inter-comparisons P. Ciarlini, G. Regoliosi, and F. Pavese 138 Efl[icient algorithms for structured self-calibration problems A. Forbes 146 On measurement uncertainties derived from "metrological statistics" M. Grabe 154 li and loo fitting of geometric elements H.-P. Helfrich and D. S. Zwick 162 Evaluation of measurements by the method of least squares L. Nielsen 170 Chapter 4 An overview of the relationship between approximation theory and filtration P. J. Scott, X. Q. Jiang, and L. A. Blunt 188 Radial Basis Functions 197 Applications of radial basis functions: Sobolev-orthogonal functions, radial basis functions and spectral methods M.D. Buhmann, A. Iserles, and S. P. N0rsett Approximation with the radial basis functions of Lewitt J. J. Green 212 Computing with radial basic functions the Beatson-Light way! W. A. Light 220 Application of orthogonalisation procedures for Gaussian radial basis functions and Chebyshev polynomials J. C. Mason and A. Crampton Geometric knot selection for radial basis scattered data approximation R. Morandi and A. Sestini On the boundary over distance preconditioner for radial basis function interpolation C. T. Mouat and R. K. Beatson What are 'good' points for local interpolation by radial basis functions? R. P. Tong, A. Crampton, and A. E. Trefethen Chapter 5 198 236 244 252 260 Regression 269 Generalised Gauss-Markov regression A. Forbes, P. M. Harris, and I. M. Smith 270 Nonparametric regression subject to a given number of local extreme values A. Majidi and L. Davies 278 Model fitting using the least volume criterion a Tofallis 286 Some problems in orthogonal and non-orthogonal distance regression G. A. Watson 294 Chapter 6 Splines and Wavelets Nonlinear multiscale transformations: from synchronisation to error control F. Arandiga and R. Donat 306 Splines: a new contribution to wavelet analysis A. Z. Averbuch and V. A. Zheludev 314 Knot removal for tensor product splines T. Brenna 322 Fixed- and free-knot univariate least-squares data approximation by polynomial splines M. Cox, P. Harris, and P. Kenward On the approximation power of local least squares polynomials 0. Davydov A wavelet-based preconditioning method for dense matrices with block structure J. M. Ford and K. Chen Some properties of the perturbed Haar wavelets A. L. Gonzalez and R. A. Zalik An example concerning the Lp-stabihty of piecewise linear Bwavelets P. Oja and E. Quak How many holes can locally linearly independent refinable vector functions have? G. Plonka The correlation between the convergence of subdivision processes and solvability of refinement equations V. Protasov Accurate approximation of functions with discontinuities using low order Fourier coefficients R. K. Wright Chapter 7 305 330 346 354 362 370 378 394 402 General Approximation 411 Remarks on delay approximations based on feedback A. Beghi, A. Lepschy, W. Krajewski, and U. Viaro 412 Point shifts in rational interpolation with optimized denominator J.-P. Berrut and H. D. Mittelmann 420 An application of a mathematical blood flow model M. Breuss, A. Meister, and B. Fischer 428 Zeros of the hypergeometric polynomial F{—n, b; c;, z) K. Driver and K. Jordaan 436 Approximation error maps A. Gomide and J. Stolfi 446 Approximation by perceptron networks V. Kurkovd 454 Eye-ball rebuilding using splines with a view to refractive surgery simulation M. Lamard, B. Cochener, and A. Le Mehaute A robust algorithm for least absolute deviation curve fitting D. Lei, I. J. Anderson, and M. G. Cox Tomographic reconstruction using Cesaro-means and NewmanShapiro operators U. Maier A unified approach to fast algorithms of discrete trigonometric transforms M. Tasche and H. Zeuner 462 470 478 486 Contributors Invited Speakers Martin Buhmann Maurice Cox Kathy Driver Michael Floater Tim Goodman Will Light Lars Nielsen Gerlind Plonka Tomaso Poggio Larry Schumaker Alistair Watson Lehrstuhl Numerische Mathematik, Mathematisches Institut, Justus-Liebig-University, 35392 Giessen, Germany. National Physical Laboratory, Teddington, Middlesex, TWll OLW, UK. School of Mathematics, University of the Witwatersrand, Private Bag 3, WITS, 2050, South Africa. SINTEF, Applied Mathematics, P.O. Box 124, Bhndern, 0314 Oslo, NORWAY. Department of Mathematics, The University of Dundee, Dundee DDl 4HN, Scotland. Department of Mathematics, The University of Leicester, Leicester LEI 7RH, UK. Danish Institute of Fundamental Metrology DK-2800 Lyngby, Denmark. Gerhard-Mercator-Universitat Duisburg Institut fr Mathematik, D-47048 Duisburg, Germany. Massachusetts Institute of Technology Department of Brain and Cognitive Sciences 77 Massachusetts Avenue, E25-406 Cambridge, MA 02139-4307, USA. Vanderbilt University, Department of Mathematics, 1326 Stevenson Center, Nashville TN 37240-0001, USA. Department of Mathematics, The University of Dundee, Dundee DDl 4HN, Scotland. Contributing Speakers S. J. Ahn A. Aimi F. D. d'Almeida I. J. Anderson F. Arandiga A. A. Badr R. K. Beatson Fraimhofer Institute for Manufacturing Engineering and Automation (IPA), 70569 Stuttgart, Germany. Department of Mathematics, University of Parma, Italy. University of Porto, Faculty of Engineering, 4200-468 Porto, Portugal. Analyticon Ltd, Elopak House, Meadway Technology Park, Stevenage, SGI 2EF, UK. Dept. Matematica Aplicada, University of Valencia, Spain. Alexandria University, Dept. of Mathematics, Faculty of Science, Alexandria, Egypt. Dept. of Mathematics and Statistics, Univ. of Canterbury, Christchurch, New Zealand. E. Belinsky J.-P. Berrut K. Bittner T. Brenna M. Breuss C. Brezinski A. Chunovkina P. Cross M. P. Dainton 0. Davydov A. Fakharzadeh A. B. Forbes J. M. Ford N. Ford D. J. Gavaghan A. J. P. Gomes A. Gomide M. Grabe P. R. Graves-Morris J. J. Green H.-P. Helfrich H. 0. Kim W. Krajewski V. Kurkova M. Lamard D. Lei S.Li J. Lippus P. M. Lumb T. Lyche University of the West Indies, Dept. of Computer Science, Maths and Physics, P 0 Box 64, Bridgetown, Barbados. Dept. de Mathematiques, Universite de Fribourg, Switzerland. University of Missouri - St Louis, Dept. of Maths and Computing Science, St Louis, M063121, USA. Dept. of Informatics, University of Oslo, Oslo, Norway. Dept. of Mathematics, University of Hamburg, Hamburg, Germany. University of Lille, Lille, France. VNIIM, St Petersburg, Russia. University College London, Dept. of Geomatic Engineering, London WCIE 6BT, UK. National Physical Laboratory, Teddington, Middlesex, TWll OLW, UK. Universitat Giessen, Mathematisches Institut, D-35392 Giessen, Germany. Dept. of Mathematics, Shahid Chamran University of Ahvaz, Ahvaz, Iran. National Physical Laboratory, Middlesex TWll OLW, UK. Dept. of Mathematical Sciences, University of Liverpool, Liverpool L69 7ZL, UK. Chester College, Parkgate Road, Chester, CHI 4BJ, UK. Oxford University, Computing Laboratory, Oxford, 0X1 3QD, UK. University Beira Interior, Dept. Informatica, 6201-001 Covilha, Portugal. Institute of Computing, University of Campinas, Brazil. PTB, Am Hasselteich 5, 38104 Braunschweig, Germany. University of Bradford, Dept. of Maths and Computing Science, Bradford, BD7 IDP, UK. University of ShefReld, Dept. of Applied Mathematics, ShefBeld, UK. Mathematisches Seminar der Landwirtschaftlichen Fakultat der Universitat Bonn, Bonn, Germany. KAIST, Division of Applied Mathematics, Taejon, Korea. Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland. Academy of Sciences of the Czech Republic, Institute of Computer Science, PO Box 5, 182 07 Prague 8, Czech Repubhc. LATIM - INSERM ERM 0102, 29609 Brest, Cedex France. School of Computing and Mathematics, University of Huddersfield, Huddersfield, UK. Southeastern Louisiana University, USA. Tallinn Tech. University, Institute of Cybernetics, 12618 Tallinn, Estonia. Chester College, Parkgate Road, Chester, CHI 4BJ, UK. University of Oslo Institute for Informatics, P 0 Box 1080, Blindern, 0316 Oslo, Norway. U. Maier A. Majidi e. Manni J. C. Mason G. W. Morgan A. Palomares F. Pavese M. J. D. Powell A. Prymak E. Quak M. Rogina C. Ross D. Scaramelli C. Schneider P. J. Scott S. Serra Capizzano A. Sestini K. Shanazari I. M. Smith A. Sommariva W. Sproessig K. Strom M. Tasche C. R. L. N. Tofallis P. Tong Traversoni Tuncer M. Walker J. Windmolders R. K. Wright R. A. Zalik V. A. Zheludev D. S. Zwick Justus-Liebig Universitat, Mathematisches Institut, D-35392 Giessen, Germany. Dept. of Mathematics and Computer Science, University of Essen, Germany. Dept. of Mathematics, University of Torino, Italy. School of Computing and Mathematics, University of Huddersfield, Huddersfield, UK. Numerical Algorithms Group, Oxford, UK. Universidad de Granada, Facultad de Ciencias, 18071 Granada, Spain. Istituto di Metrologia "G.Colonnetti", Torino, Italy. University of Cambridge, DAMTP, Cambridge, CBS 9EW, UK. National Taras Shevchenko University of Kyiv, Mech-Math Faculty, Kyiv 01033, Ukraine. SINTEF Applied Mathematics, P.O. Box 124 Blindern, 0314 Oslo, Norway. University of Zagreb, Dept. of Mathematics, 10002 Zagreb, Croatia. School of Computing and Mathematics, University of Huddersfield, Huddersfield, UK. Dipartimento di Energetica, 50134 Firenze, Italy. Johannes Gutenberg Universitat, FB 17, D-55099 Mainz, Germany. Taylor Hobson Ltd, New Star Road, Leicester LE4 9JQ, UK. University Insubria Como, 22100 Como, Italy. Dipartimento di Energetica, Universita di Firenze, Italy. Dept. of Mathematical Sciences, The University of Liverpool, Liverpool L69 7ZL, UK. National Physical Laboratory, Middlesex, TWll OLW, UK. Universita di Padova, Dipartimento di Matematica Pura e Applicada, Padova, Italy. Freiberg University of Mining and Technology, 09596 Frieburg, German} SimSurgery, Sognsveien 75B, N-0855 Oslo, Norway. University of Rostock, Dept. of Mathematics, D-18051 Rostock, Germany. University of Hertfordshire Business School, Hertford, SG13 8QF, UK. The Numerical Algorithms Group Ltd, Oxford, 0X2 SDR, UK. Universidad Autonoma Metropolitana, D.F. Mexico CP 09340. Dept. of Mathematics, Dokuz Eyliil University, 35160 Buca Izmir, Turkey York University, Toronto M3J 1P3, Canada. Dept. of Computer Sciences, Kath. University Leuven, Belgium Dept. of Mathematics and Statistics, UVM, Burlington, VT, 05445 USA. Dept. of Mathematics, Auburn University, AL 36849-5310, USA. School of Computer Science, Tel Aviv University, Israel. Wilcox Associates, Inc. Phoenix, AZ 85310 USA. Chairs CAGD Data Approximation MetrologyNeural Networks Orthogonal Polynomials and Pade Approximation Radial Basis Functions Radial Basis Functions and Wavelets Shape Preserving Methods Spline Functions Spline Functions T. N. T. Goodman H.-P. Helfrich A B Forbes V. Kurkova P. R. Graves-Morris J. C. Mason M J D Powell C. Manni C. Brezinski T. Morton Preface This book contains the proceedings of an International Symposium on Algorithms for Approximation Four (A4A4), held at University of Huddersfield from July 15th to 20th, 2001, and attended by 106 people from no less than 32 countries. The accommodation base was the attractive University Park at Storthes Hall, where social events were centred. There was a very friendly atmosphere, helped by the presence of a significant number of younger people to balance the stalwarts. Food was excellent and weather was generally good. This was the fourth, after a pause of 9 years, in the series of "Algorithms for Approximation" meetings held before in Oxfordshire in 1985, 1988, 1992, and once again it was run under the sponsorship of US Air Force (European OfBce of Aerospace Research and Development) and this time with grants from London Mathematical Society and National Physical Laboratory (NPL) (Software Support for Metrology). The Organising Committee consisted of Iain Anderson, John Mason, David Turner (Huddersfield) Maurice Cox and Alistair Forbes (NPL) and Jeremy Levesley and Will Light (Leicester). In addition to them, the Programme Committee included Claude Brezinski (Lille), Martin Buhmann (Giessen), Tim Goodman (Dundee), Tom Lyche (Oslo), Alistair Watson (Dundee) and Larry Schumaker (Vanderbilt). In support of the committee, the Symposium Secretary, Ros Hawkins was extremely efficient, and was helped by Karen Mitchell. Moving to the academic programme, there were 11 invited speakers. From UK were Maurice Cox, Tim Goodman, Alistair Watson and Will Light; from other parts of Europe were Martin Buhmann (Giessen), Michael Floater (SINTEF Oslo), Lars Nielsen (Danish Institute of Fundamental Metrology) and Gerlind Plonka (Duisburg); from USA were Tomaso Poggio (MIT) and Larry Schumaker (Vanderbilt); and from South Africa, Kathy Driver (Witwatersrand). In addition there were 74 submitted papers given at the meeting, of which a good proportion were offered in Special Sessions in Metrology-Maths (run by David Turner), Metrology-Stats (Alistair Forbes), Orthogonal Polynomials and Pade Approximation (Claude Brezinski and Peter Graves-Morris (Bradford)), SpUne Functions (Tom Lyche), Mathematical Modelling in Medicine (Ewald Quak), Integrals and Integral Equations (Ezio Venturino (Torino)) and Wavelets (Richard Zalik (Auburn)). The current volume contains a substantial portion of the papers from the conference, which were provided by the speakers, so that this is a solid and broad contribution to the area. The book has been organised in topics to suit the final selection of papers. All submitted papers were refereed and significant modifications were made to a number of papers. In general, there was a high standard of submissions. We cannot conclude this preface without mentioning the celebration of three 60th birthdays of 2001 at the meeting, namely those of Claude Brezinski, Maurice Cox, and John Mason. All played major parts in the Symposium. We must finish by offering thanks to all the staff at University of Huddersfield, NPL, USAF-EOARD, London Mathematical Society, and the pubHshers, who contributed to this most successful and memorable symposium. Thanks also go to Jeremy Levesley and Iain Anderson and the publishers, who worked so hard on the proceedings, and to all authors without whom the volume would not exist. John Mason HuddersGeld Chapter 1 Computer Aided Geometric Design An automatic control point choice in algebraic numerical grid generation C. Conti, R. Morandi, and D. Scaramelli Dipartimento di Energetica, via C. Lombroso 6/17, 50134 Firenze, Italy costanza@sirio.de.vmiifi.it, morandiQde.unifi.it, scaramel@inath.unipd.it Abstract A strategy to construct a grid conforming to the boundaries of a prescribed domain by using transfinite interpolation methods is discussed. A transfinite interpolation procedure is combined with a B-spline tensor product scheme defined by using suitable control points. Their choice is performed by taking into account a quality measure parameter based on the condition number of matrices linked to the covariant metric tensors. 1 Introduction The algebraic grid generation approach relies on the construction of a coordinate transformation from the computational domain into the physical domain. In particular, this can be obtained through transfinite interpolating operators allowing us the generation of grids with boundary conformity. Furthermore, using a Hermite-type transfinite interpolating scheme we can obtain orthogonal grid fines emanating from the boundary. This can be very important for practical reasons since the grid point distribution in the immediate neighborhood of the boundaries has a strong influence on the accuracy of the numerical solution of partial diff'erential equations [5]. Furthermore, in case a domain decomposition is necessary the orthogonality guarantees smoother grids. In order to obtain a grid with other specified properties, e.g. the control of the shape and position of the coordinate curves, transfinite interpolating methods can be combined with tensor product schemes using suitably chosen control points (see for instance [1, 2, 6, 7, 8]). Even though this type of algebraic method is computationally efficient, to define workable meshes, a significant amount of user interaction is required for the selection of the control points involved in the tensor product. To overcome this drawback, an automatic strategy for choosing the control points turns out to be desirable. Here, following the approach first discussed in [1], we present an algebraic Hermite-type transfinite method to construct a grid interpolating the boundary and its normal derivatives. In fact, given a "quadrilateral" domain Q C H^, a transformation G : i? = [0,1] x [0,1] -+ fi is defined as G{s,t):=Tp{s,t) + {Pi®P2){[4>,ip]-Tp){s,t) (1.1) where Tp is a tensor product surface i.e. Tp{s,t) := J2Z=i Sj=i QijBi,3{s)Bj^3{t) with Bj,3 denoting the usual cubic B-spline, (j) and I/J are boundary curves and (Pi ® P2) is the An automatic control point choice Boolean sum of Hermite-type blending function linear operators. The set Q = {Qij, i = 1,..., m, j = 1,..., n} is the set of control points. As already noted, the choice of the control points is a crucial matter. In this paper we take into account a grid quality measure parameter for their selection. In particular, the proposed automatic procedure relies on the fact that some grid properties can be described in terms of the condition number of matrices linked to the covariant metric tensors [4]. Therefore, the control points are chosen minimizing their condition number. The outUne of this paper is as follows. In Section 2, the transformation (1.1) is given in detail and its properties are investigated. In Section 3, a way for choosing the control points is proposed relying on a particular quality measure parameter. Finally, in Section 4 some numerical results are presented to illustrate the features of the proposed strategy. 2 The transformation In this section the transformation (1.1) is characterized. Let us consider a "quadrilateral" domain fi c R^ such that dQ, = uf^-^dQi, with 80,1,00.2, dfls, dD.4 being the supports of four regular curves 7^ : [0,1] -^ dCli, i = 1,... ,4 taken counterclockwise. Furthermore, let us suppose that dOi D dO^ = 0 and 80,2 n 80^ = 0, with any other intersection occuring only at the end points of the boundary curves. In particular, the following compatibility conditions are assumed 71 (0) =74(1), 7i(l) = 72(0), 72(1) =73(0), 74(0) = 73(1) . For later convenience, we set (t>i{s) := 7i(s), ^2(5) := 73(1 - s) denoting by s the curve parameter running on [0,1] and we set Vi(*) := 74(1 -1), tp2{t) := 72(t) denoting by t the curve parameter running on [0,1]. In addition, the components of the ^-curves and ■j/'-curves are denoted by (jf,(f)^ and ip^,ip''' respectively. Next, we define four additional curves by computing the derivatives of the 0 and i/j-curves, i.e.. (2.1) with C a constant value also depending on the curve orientations and with || ■ ||2 the Euclidean norm. Then, we introduce the linear operators ^4 _ /.\ . /.\ Pi[0](s,t) := E-=i Oii{t)Ms) , n r /I/ .i\ v^4 P2mis,t) := ^U "i(«)V'jW ' (2.2) P,P2[(j^,'^]{s,t) := E-=i {ai{t)P2ms,Ui) + a,+2(f)^^^#^) , where tti = 0, U2 = 1- The functions 0;^, i = 1,... ,4, are the dilated versions of the C. Conti, R. Morandi and D. Scaramelli classical Hermite bases with support on [0, u] and on [1 - u, 1] being 0 < u < 1, i.e. ai(s):=(l + 2t)(l-f)2, Q3(s) := s(l - D' - s€[0,ul (2.3) The Boolean sum operator (Pi ©F2) = P1 + P2-P1P2 provides the blending function surface Bis,t) := (Pi® P2)[<l>Ms,t) = Pi[4>]is,t) + PMis,t) - PiP2[<l>Ms,t) . (2.4) It is known that B satisfies B(ui,t) = Ut),i = h2 B(s,«;^) = <^^(s) , i = 1,2 ^5^=^.(i), j = 3,4, ^M) = 0^(s) , j = 3,4 , ,25. where wi = «3 = 0, U2 = U4 = 1 and wi = W3 = 0, W2 = W4 = 1. It is worthwhile to remark that, as we are dealing with orthogonal grid lines emanating from the boundary of the domain, the intersecting boundary curves must be also orthogonal. Thus, the following additional conditions are assumed: CJ)i+2{0) = tlhiWi), (t>i+2il) = V'2(Wi) > Vj+2(0) =<p\{Ui), V'^2(l) =4>'2M, (/•"(o) =v-'i(w,), 0'i'(i) =v4'K). i = l,2. (2.6) Now, in order to define a suitable grid, following the approach given in [1], we use the linear transformation G G{s,t):=Tp{s,t) + {Pi@P2){[(pA]-Tp){s,f) (2.7) where Tp{s,t) := EHi E"=i <5u-Bi,3(s)Bj,3(t) with ^,,3 denoting the usual cubic Bsphnes with uniform knots. The set Q = {Qij, i = 1,... ,m, j = 1,... ,n} is a suitable set of control points whose definition is discussed in Section 3. It should be noted that in (2.7) the Boolean sum operator is also acting on a surface Tp{s,t). In this case (2.2) is used taking the eight boundary curves Tp{0,t), Tp{l,t), Tp(s,0), Tp(s, 1), aTp{o,t) OTp(i,t) aTpjsfi) orp(s,i) ds ' ds > at ' at ■ It is easy to show that G still satisfies G{ui,t) = tpi{t) , i = 1,2, ^'^^""'^ = tpiit) , i = 3,4, G{s,Wj) = (pjis) , j = 1,2 and ^^i^pl = (Pj{s) , j = 3,4. Furthermore, because of the locality of the blending functions ai, i = 1,... ,4, the control of the coordinate lines obtained by means of the evaluation of G over a parameter set in the interior of the domain is mainly based on the contribution of Tp. This fact and the use of Bsplines ensures the convex-hull property in the interior of the domain. This property is of importance in numerical grid generation to locate the grid with respect to the position of control points. An automatic control point choice 3 Grid quality measure It is well known that grid generation techniques sensible to grid quality features are particularly attractive. Thus, in this section, we discuss a strategy to choose the set Q of control points based on a suitable grid quality measure parameter. Given a set of grid points G-={Gij}^J_-^ defining the quadrilateral cells {C'jj}^^^"^' ", quality measures can commonly include: grid "skewness", measuring the departure of Cij from a rectangle, grid "aspect ratio", measuring the departure of Cij from a rhombus or grid "conformality", measuring the departure of Cij from a square (see for instance [5]). Here, as done in [4] for the case of unstructured grids, we define a grid quality measure taking into account the condition number of particular matrices derived from the grid. As explained below, somehow this quality parameter measures the departure of Cij from a square. The strategy starts with a set Q' of control points obtained by evaluating on a coarse parameter set Sc = {{si,tj)}^JJ^^'' a Lagrange blending function surface (for detail related to Lagrange blending function methods we refer, for instance, to [3]) by working only with the four boundary curves of the given domain. Then, using Q* a first grid is obtained by evaluating the surface G in (2.7) on a fine parameter set Sf = {(sj, tj)}^j^i obtaining the grid points g := {Gi,j = {Glj,Glj) = G{si,tj), i = 1,... ,M, j = l,...,N}. The set G is then used to define (M — 1) x {N — 1) bidimensional matrices associated with the {M -1) X {N - 1) quadrilateral cells Cij, i = 1,... ,M - 1, j ^ 1,. :. ,N -1. These matrices are defined as rv^^'' rv' S'^' 'j^],i = l,...,M-l,3 = l,...,N-l{-i.l) and their condition number K{Ai^j) is related to the stretch of the cells. In fact, it is easy to prove that K{Ai^j) := \\Aij\\2 ■ \\A~j\\2 = 1 if and only if we are dealing with a cell Cij where the three points Gij+i,Gij,Gi+ij generate half a square [9]. On the other hand, in order to involve all the grid points in the quality measure it is also convenient to define the boundary matrices ■^i.N-\ / /~<x /~ix r<x /~fx ■~ \ nv nv nv ny ((~ix rfx rix rix ^M,j '^M-l,j ^M,j / ^M,j + l ■ri.l\^_^ w„i ^M-1,N- .— 1 f^x rix r^x jrix ^y ^y ^y _ ^y '~^M,N ^M-1,JV ^M,N \ i , < — i,... ,-iw ±, \ '^M,N-1 so that the boundary points are also taken into account. Next, we modify the initial set Q* of control points minimizing the following objective C. Conti, R. Morandi and D. Scaramelli function (3.3) The minimization is done with respect to the control points under suitable constraints on their coordinates depending on the geometry of the domain fi. This is the only user interaction required. Obviously, since ideal inner cells are characterized by an associated matrix Aij having a condition number close to one, the optimal distribution of the control points should guarantee ming /o6 R^ 1- On the other hand, xnixiQ fob strongly depends on the geometry of the domain (for example in case of a squared domain the optimal value is ming fob = 1 while, in general, this value is not reached). Summary of the Method (1) Compute the initial set of control points Q' by means of a Lagrange blending function method using the four given boundary curves, (2) Compute the initial grid Q' = {G{si,tj), i = l,...,M, j = l,...,N} with G given in (2.7) by using the set of control points Q', (3) Minimize the objective function (3.3) so defining a new set of control points Q^, (4) Compute the final grid G^ = {G{si, f,), i = 1,..., M, j = 1,...,N} with G given in (2.7) by using the set of control points Q^ with M:^ M, N:» N. Remark 3.1 We note that, in order to reduce the computational cost of the minimization procedure, the integers M and N are chosen less than M and N. 4 Numerical Results We conclude the paper giving some numerical results testing the properties of the transformation G and showing the performance of the proposed approach. Three domains are considered. For each of them we present the initial grid obtained by the transformation G using the initial set of control points Q' and the final grid obtained using the set of control points Q^ resulting from the minimization procedure. In all the figures the control points are denoted by the symbol '*'. The minimization problem is solved by using a sequential quadratic programming method i.e. by using the routine constr of the Optimization toolbox of the Matlab package. In the minimization procedure, the constraints on the control points Q^ are chosen so that some geometric properties of the domain, such as symmetry and convexity, are preserved. Furthermore, in all the examples M and A'' are equal to M and to N. The values of the objective function before the minimization (/^j) and after the minimization (/j^) are also given in the figure captions. The first and the second test display a "waterway" grid and a W-shaped grid with their control points before and after the minimization procedure. The effectiveness of the method is evident. An automatic control point choice 0 0.1 0.2 0.3 ' 0.5 0.6 0.7 0.8 0.9 1 0 01 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0,9 1. Initial grid (left) and final grid (right), /^^ = 3.74, //^ = 1.65. FIG. 1 0,4 '—6FFPBTO inniiniiniiitiiniiniiifli'iiini-ii ,—1 M^ 1 :; 10 8 2 m i \ t , 1 ^ #piW-U IMH^Wm JWffff^^ -2 FIG, 0 0 -2 2 4 6 8 10 12 1 1 14 16 -6 -2 0 2 4 G 8 10 12 14 16 2. Initial grid (left) and final grid (right), /^^ = 1-45, //{, = 1.22, .L FIG, 3. Initial grid (left) and final grid (right), /^ = 1,89/6,36, //^ = 1.59/2.20, C. Conti, R. Morandi and D. Scaramelli Figure 3 shows a grid composed of six sub-grids, obtained via a domain decomposition approach. In this case, the Hermite-type interpolation method guarantees a C^ connection among the patches. Here, the two values of /^j, and //^ in the figure captions refer to the "horizontal" and "slanted" grids, respectively. Bibliography 1. P. R. Eiseman, High Level Continuity for Coordinate Generation with Precise Controls, Journal of Computational Physics 47 (1982), 352-374. 2. P. R. Eiseman, Control Point Grid Generation, Computers Math. Applic. 5 (1992), 57-67. 3. W. J. Gordon and L. C. Thiel, Transfinite Mappings and their Application to Grid Generation, Appl. Math, and Comp. Vol. 10-11 on Numerical Grid Generation, J.F. Thompson ed., 171-192, 1982. 4. P. M. Knupp, Matrix Norms & the Condition Number, Proceedings, 8th International Meshing Roundtable, South Lake Tahoe, CA, U.S.A., 13-22, 1999. 5. V. D. Liseikin, Grid Generation Methods, Springer, 1999. 6. C. W. Mastin, Three-dimensional Bezier interpolation in solid modeling and grid generation, Comp. Aid. Geom. Des. 14 (1997), 797-805. 7. R. Morandi and A. Sestini, Precise Controls in Numerical Grid Generation, Advanced Topics in Multivariate Approximation, edited by F. Fontanella, K.Jetter, and P. J. Laurent, 243-258, 1996. 8. B. V. Saunders and P. W. Smith, Grid generation and optimization using tensor product B-Splines, Approx. Theory & its Appl., 3 (1987), 120 452. 9. D. ScarameUi, Ph.D. Thesis, in preparation. Shape-measure method for introducing the nearly optimal domain A. Fakharzadeh DepaHment of Mathematics, Shahtd Chamran University of Ahvaz, Ahvaz, Iran. a-f aLkharzadehOhotmail. com J. E. Rubio Department of Applied Mathematical Studies, University of Leeds, Leeds, LS2 9JT, UK. Abstract We deal with introducing a new algorithm for solving the optimal shape problems in which they are defined with respect to a pair of geometrical elements. The ProWem is to find the optimal domain approximately for a given functional that is involved with the solution of a linear or nonlinear elliptic equation with a boundary condition over a domain The Shape-Measure method, in Cartesian coordinates, will be used to find the nearlv optimal solution in two steps. By transferring the problem into a measuretheoretical form, first we will find the solution of the elliptic problem for a given domain by using the embedding method. Then the Shape-Measure method will be applied to find the best domain approximately. An example will be given. 1 Introduction and Problem Consider the optimal shape (optimal shape design) problems in which they are defined with respect to a pair of geometrical elements; this pair consists of a measurable set (in m^) which can be regarded as a domain, and a simple closed curve contammg a given point, which is the boundary of the set. By considering the property for the desired curves to be simple, the problem depends on the geometry which is used. In polar coordinates, we solved the similar problem in [1]. But in Cartesian coordinates, it is difficult to introduce a linear condition which determines the property of a closed curve being simple. Thus here we consider some limitation on shape in order to make sure that it is simple. The problem will be solved in two stages. First, by use of measures, the value of the objective function will be calculated for any given domain. Then the optimal domain will be obtained by use of optimization techniques. Let Z? C iR^ be a bounded domain with a piecewise-smooth, closed and simple boundary dD. We assume that some part of OD is fixed and the rest, F, with the given initial and final points A and B respectively, is not fixed. Here we suppose that the fixed part of dD is made by three segments, parts of lines y = 0,x = 0 and y = 1 between points .4(1,0),(0,0),(0,1),B(1,1) (see Figure 1). Thus F is chosen as an appropriate variable curve joining A and B so that U is well-defined. Let u{X) [X = {x, y) £ M^) be a bounded solution of the following elliptic 10 Preceding Page Blank Shape-measure method ._ , „ 11 , 'lillr:yrrr~.::^ C I' ' FIG. 1. An admissible domain D under the assumptions of the numerical work. equation: A«(X) + /(X,u)=t;(Z),U|,„=0, (1.1) where X e D —> viX) G iR is a bounded real function [v also can be considered as a fixed control function); the function / is assumed to be a bounded and continuous realvalued function in L^ip x M). Moreover the above domain D is called an admissible if the equation (1.1) has a bounded solution on D; we denote by V as the set of all such admissible domains. We are going to solve the problem of minimizing the functional 1(1?) = /^ fo{X,u) dX, on the set V where /o is a given continuous, nonnegative, realvalued function on D x ]R. To calculate the value of 1(D) for a given domain D, it is necessary first to identify the solution of (1.1). 2 Weak solution and metamorphosis In general, it is difficult to identify a classical solution for the problem like (1.1) and usually one tries to find a weak (generalized) solution of them. Hence the variational form of (1.1) is introduced in the following; we remind the reader that HQ{D) = {ip e H'^{D) : VisD = 0}i where H^{D) is the Sobolev space of order 1. Proposition 2.1 equality: Let u he the classical solution of (1.1), then we have the following [ {uAi; + ipf) dX= f tpvdX ,\fip£ Hl{D). JD JD (2.1) Proof: Multiplying (1.1) by the function tp e HQ{D) and then integrating over D, with use of the Green's formula (see [3]) gives J^{uA^l) + ipf - tpv) dX = /^^(^Is - "1^) ^^^ where n is the unit vector normal to the boundary dD and directed outward with respect to D. Because tp^g^ = 0 and U|gp — 0, then (2.1) is satisfied. D 12 A. Fakharzadeh and J. E. Rubio Definition 2.2 A function u G H^{D) is called a bounded weak solution of the problem (1.1) when it is bounded and satisfies the equality (2.1) for all tjj £ HQ{D) (the conditions for existence of the weak solution of the problem (1.1) and also the boundedness property of it, have been considered in many references, like [3] and [2]). Now we apply our new way which is called the Shape-Measure method. Let fl = UxD, where U C M is the smallest bounded set in which the bounded weak solution w(-) takes values. Then by applying the Riesz Representation Theorem ([6]), a bounded weak solution can be represented by a positive Radon measure; the proof of the following Proposition is similar to the Proposition 3.1 in [1]. Proposition 2.3 Let u{X) be a bounded generalized solution of (1.1); there exist a unique positive Radon measure, say /x„, in M'^{ft) such that: Hu{F)= [ Fdfiu= [ F{X,u)dX;WeC{n). (2.2) Thus the equality (2.1) can be changed to HuiF-^,) = 1^, ^^ e Hl{D), where F^/, = «A^ + fip and 7^; = /p ipv dX. Also, 1[D) = /x„(/o)- Because the measure /u„ projects on the (a;, 2/)-space as the respective Lebesgue measure, we should have ^iu{C} = a^, where ^ : fi —> iR depends only on variable X (i.e. ^ e Ci(0)), and a^ is the Lebesgue integral of ^ over D. Therefore the original problem can be described as follows: To find a measure //„ e M'^{Vl) so that it satisfies the following constraints: liu{F^) = l4., /^«(e) = «c, ^^PeHliD); , VCeCi(J^). (2.3) As Rubio did in [5], to be sure that we do not miss any solution, we extend the underlying space; instead of finding a measure /i„ G A^+(fi), introduced by Proposition 2.3 and equalities (2.3), we seek a measure /i G M'^{U) which satisfies just the conditions: M0 = a5, 3 VCGCi(fi). (2.4) Approximation The system (2.4) is linear because all the functions in the right-hand-side of equations are linear functions in their argument /x. But the number of equations and the underlying space are not finite. We shall develop this system by requiring that only a finite number of the constraints are satisfied. This will be achieved by choosing countable sets of functions whose linear combinations are dense in the appropriate spaces. But first we should approximate the unknown part of the boundary just by the finite number of its points. This idea comes from the approximation of a curve by broken lines. For the given D and hence for the given F, let A^ = {xm,yrn),m = 0,1,2,.;., Af, be a finite number of points on F (where AQ = A). We link together each pair of consecutive points Am and Am,+i for m = 0,1,..., M - 1 and close this curve by joining the points AM and B together. Now the resulted shape, which is denoted by ODM, is an approximation for Shape-measure method dD; we also call the domain which introduced by its boundary 13 ODM as DM (see Figure It is possible that by increasing M, the curve ODM will become closer and closer (in the Euclidean metric) to the curve dD, and hence one may conclude that the minimizer of I over VM, if it exists, tends to the minimizer of I over V, if it exists. But some difficulties could arise (too oscillatory a curve may cause problems). Thus, we will fix the number of points. For a given M, let the value of the components yi,y2,- ■ ■, VM, be fixed. Because Xm is a free term, the point A^ could be anywhere on the line y = y^, a; > 0 for every m (see Figure 1). Therefore points Am and Am+i can be chosen so that they belong to r and hence the part of F between the lines y = Ym and y = Ym+i can be approximated by the segment AmAm+i- Hence, we do not lose generality. Thus, we fix the components 2/1) 2/21 • • •, VM with the values YijY^,-.., YM , respectively. Now we introduce the set {ipi € HQ{D) : Z = 1,2,... } so that the linear combinations of the functions {ipi} are uniformly dense (that is, dense in the topology of the uniform convergence) in HQ{D). We know that the vector space of polynomials with thei variable x and y, P{x,y), is dense in C°°{D); therefore the set Po{x,y) — {p{x,y) € P{x,y) \ p{x,y) = 0,V(a;,y) e dD} , is dense (uniformly) in {h 6 C°°{D) : /i|3„ r= 0} = Cg°(D)}. Since the set Q{x,y) = {l,x,y,x^,xy,y'^,x^,x^y,xy^,y^,...} is a countable base for the vector space P{x, y), each elements of P(a;, y) and also Po{x, y), is a linear combination of the elements in Q{x,y). By Theorem 3 of Mikhailov [3] page 131, the space C°°(D) is dense in H^{D); thus the space C^(D) will be dense in H^{D). Consequently, the space Po(a;,y) is uniformly dense in ifQ(-D). We define M ipi{x,y) = xy{y-l)Yl{x-xi+y-Yi)qi{x,y), (3.1) 1=1 where qi G Q{x, y). Therefore V'|r = 0 and the set {i!i{x, y) : i = 1,2,... } , is total (dense in the topology of the uniform convergence) in iifo(-^)For the second set of functions, let L be a given positive integer and divide D into L (not necessary equal) parts J9i, D2, • • •, D^, so that by increasing L the area of D^, s = 1,2,..., L, will be decreased. Then, for each s we define: ii{x,y) €Ds, [x,y,u) = i. Q ^ otherwise. These functions are not continuous, but each of them is the limit of an increasing sequence of positive continuous functions, {^s^}; then if fi is any positive Radon measure on ft, K^s) = limfc-KX) M(^sfc)- The linear combination of functions {^j : j = 1,2,..., i} for all positive integer L, can approximate a function in Ci{Q) arbitrary well (see [5] Chapter 5). By Selecting just the finite number of functions in the mentioned spaces the problem (2.4) can be replaced by another one in which we are looking for the measure UMIM^ ^ 14 A. Fakharzadeh and J. E. Rubio A4+(n), so that it satisfies the following constraints: , MMi,A/2(^t) = 7n i = l,2,...,Mi; ^MuM^iij) = dj, i = l,2,...,Af2, (3.2) where Mi and M2 are two positive integers and F, = F^,. , 7; = 'y^,^ , aj = a^.. If we denote by Q{Mi,M2) the set of positive Radon measures in M'^{fl) which satisfy equalities (3.2), and also denote by Q the set of positive Radon measures in A4+(fi) which satisfy equalities (2.4), one can easily prove the following Proposition by considering the proof of Proposition///.I in [5]. Proposition 3.1 : If Mi,M2 —> 00 then Q{Mi,M2) —> Q; hence for the large enough numbers Mi and M2 the set Q can be identified by Q{Mi,M2)But even if the number of equations in (3.2) is finite, the underlying space Q(Mi, M2) is still infinite-dimensional. By Theorem ^.5 in the Appendix of [5], /iA/i.A/j in (3.2) can be characterized as HMI,M2 — Z^n=i ^ ci„i5(Z„), with triples Z„ S fi and the coefficients Q:„ > 0 for n = 1,2,...,Mi + M2, where ^(2) e M'^{Vl) is suppoised to be a unitary atomic measure with support the singleton set {z}. Thus the measure problem is equivalent to a nonlinear one in which the unknowns are the coefficients a„ and supports {Z„}. Proposition ///.3 of [5] Chapter 3, states that the measure IJ,MI,M2 has the following form N MMi,Af2 = X/""^^"^")' n=l ^^'^"^ where Z„, n = 1,2,..., iV, belongs to a dense subset of Q. Now let us put a discretization on fi, with the nodes Z„ = {x„,yn,u„), in a dense subset of fl; then we can set up the following linear system in which the unknowns are the coefficients a„: a„ > 0, n = 1,2, ...,iV; N ^a„F,:(Z„) =7,:, i = l,2,...,Mi; n=l N Y,an^j{Zn) = aj, i = l,2,...,M2. (3.4) n-l The solution of (3.4) is not necessary unique (even if the problem (1.1) satisfies the necessary conditions for having a unique bounded weak solution), because of the approximation scheme. 4 The optimal solution The main aim of the present section is to find an optimal domain D* € V^ so that the value of I{D*) will be the minimum on the set T>M- By applying the result of the previous section, a solution of (1.1) can be found. Indeed, it is approximated by a solution of the linear system (3.4) according to the variables, a;^, m = 1,2,..., M. As mentioned. Shape-measure method 15 this solution is not necessary unique. Let us specify one by solving the following Unear programming problem N Minimize: y^Q„/o(Zn) n=l Subject to: ^n > 0, n= 1,2,. ..,N; N ^a„Fi(Z„)=7i, i = l,2,...,Mi; 71=1 N ^Q„^,(Z„) = Oj, j = l,2,...,M2. (4.1) n=l Thus, for each D, the value I{D) = J^/o(X, w) dX = /i(/o) ^ MMI,M2(/O), is defined uniquely in terms of the variables Xm, m = 1,2,..., M. So, we set up a function, J, on VM defined by D G DM —^ I(-D) = MMLMSC/O) where MMI,M2(/O) = En=i Q:n/o(^n)Clearly J can be regarded as a vector function: J : {xi,X2,...,XM)e]R^ —* MMi,M2(/o)€iR. (4.2) Since J is a real-valued function which is bounded below, and is defined on a compact set (since constraints are to be put in the variables), it is possible to find a sequence of points so that the value of the function along the sequence tends to the (finite) infimum of the function. The coordinate values corresponding to the points in the sequence are of course finite. Now, suppose that (a;^, X2,..., x*j^) is the minimizer of the vector function J; it can be identified by using one of the related minimization methods. The introduced domain by the minimizer is denoted by D*. We assume in the following theoretical result that the minimization algorithm which is used, is perfect; that is, it comes out with the global minimum of J in its (compact) domain. Theorem 4.1 : Let M, Mi and M2 be the given positive integers which were defined in section 3, and D* be the minimizer of (4-2) as mentioned above. Then D* is the minimizer domain of the functional I over VM and the value ofI{D*) can be approximated by 3{D*); moreover J{D*) —> I(D*) as Mi and M2 tend to infinity. Proof: Suppose D* is not the minimizer of I; hence there exists a domain, call D', in VM SO that I(D') < !(£>*). Proposition 2.3 shows that there is a unique measure, call n', in A^+(fl) so that I(-D') = fJ-'ifo), and also Proposition 3.1 states that for sufficiently large numbers Mi and M2, M'(/O) can be approximated by fj,'j^^ M2(f°) ^^ Q{Mi,M2). Thus, I(D') = n'f^^ MSf°) = JPO- In the same way, one can show that J(D*) approximates I{D*); so i{D*) ^ ult^^M^o) = ^{D*). Hence J(£>') < 3{D*). which is contrary with the fact that D* is the minimizer of J. Moreover, from Proposition 3.1 it follows that J(D*) tends to I(D*) as Mi,M2 —> 00. □ 16 A. Fakharzadeh and J. E. Rubio 5 Numerical example We consider the elliptic equations (1.1) with ^f^y) = l 1 if(x,y)eDnC, ^ , "^ [ 0 where C is the square [|, |] x [\, |] (see Figure 1). We also take M = 8, Mi = 10, M2 = 8, N = 740 (the 36 number of nodes are chosen so that W|go =0) and suppose Yi,Y2,... ,Yg are 0.15,0.25,..., 0.85, respectively. By extra constraints, a:;,„ > | , m = 2,3,..., 7, the value of 7i for any D 6 PM is defined as 7,: = // // Vi(a^i v) dxdy , i = 1,2,..., 10. We ■ ■ otherwise, 4 4 also assume that the function u takes values in U =[-1,1], and consider the polynomials qi{x,y) as l,x,y,x^,xy,y'^,x^,x'^y,xy'^,y^. The function/o is chosen as/o = (u-0.1)^. This function can be considered as a distribution of heat in the surface for the system governed by an elHptic equations. In minimization, we apply the Downhill Simplex Method in Multidimension by using the Subroutine AMOEBA (see [4]) and also consider an upper bound for variables (suppose they are not higher than 2). These conditions are applied by means of the penalty method (see [7]). Hence, for the nonlinear case of the partial differential equations (1.1), we have taken f{x,y,u) = 0.25M^, and used the initial values as Xm. = 1.0,m = 1,2,... ,8, and the stopping tolerance for the program (variable ftol in the Subroutine AMOEBA) as 10"'''. We remind the reader that the functions Fj and the values of 7,, i = 1,2,..., 10, have been calculated by the package ''Maple V.3". The results are: the optimal value of I = 0.45467920356379, the number of iterations = 502, the value of the variables in the final step are Xi = 1.05019, X2 = 1.08521, X3 = 0.750001, X4 = 0.768701, Xr, = 1.12986, Xg = 1.13775 ,^7 = 0.97783, Xg = 1.61566, which represent the optimal domain, shown in the Figure 2. Bibliography 1. A. Fakharzadeh J. and Rubio, J. E. Shapes and Measxires. IMA Journal of Mathematical Control and Information, vol.16, p.207-220, 1999. 2. Ladyzhenskaya, 0. A. and Urahtseva, N. N. Linear and Quasilinear Elliptic Equations. vol.46, ACADEMIC PRESS, Mathematics in Science and Engineering, 1968. 3. Mikhailov, V. P. Partial Differential Equation. MIR Publisher, Moscow, 1978. 4. Press W. H., Flannery B. P., Teukolsky S. A. and Vetterling. W. T. Numerical : Recipes: The Art of Scientific Computing. Cambridge University Press, 1986. 5. Rubio, J. E. Control and Optimization: The Linear Treatment of Nonlinear Problems. Maxich.esier\}mver:sity'Pxess,M&x\c\iesieT,l%^&. 6. Rudin, W. Real and Complex Analysis. Tata McGraw-Hill Publishing Co.Ltd, New Delhi, second edition, 1983. 7. Walsh, G. R. Method of Optimization. John Wiley and Sons Ltd., 1975. 17 Shape-measure method Initial Domain c\i in ■^ ^P ^^ in d q 0.0 0.5 1.0 1.5 2.0 X Optimal Domain cvi in >,^ *^^ ^i::^- ^5^ c^ 7 in d q 0.0 0.5 1.0 1.5 2.0 X FIG. 2. The initial and the optimal domain for nonhnear case of elliptic equations. Convex combination maps Michael S. Floater SINTEF, Postbox 124 Blindem, 0314 Oslo, NORWAY. mifSmath.sintef.no Abstract Piecewise linear maps over triangulations are used extensively in geometric modelling and computer graphics. This short note surveys recent progress on the important question of when such maps are one-to-one, central to which are convex combination maps. 1 Introduction Piecewise linear maps over triangulations have several applications in geometric modelling and computer graphics. For example, Figure la shows a surface triangulation T of a set of points {xi,yi, Zi) sampled from some unknown surface in R"^. A standard approach to fitting a smooth parametric surface s(u, v) to these points is to first parameterize them, i.e., compute planar points {ui,Vi) corresponding to the data points {xi,yi, z,). Then using some scattered data method, we find a parametric surface s : fi —♦ R"^, defined over some suitable domain Q containing the points {ui,Vi), such that s{ui,Vi)!v{xi,yi,Zi). A choice of parameterization is shown in Figure lb and a least squares surface approximation using bicubic B-splines is shown in Figure Ic. Notice that the choice of parameter points {ui,Vi) uniquely determines a piecewise hnear map (j) : Dq—> R^, where Dr is the union of the triangles in T. In practice, a necessary requirement on 0 to ensure adequate quality of the subsequent surface approximation s{u, v) is that (p should be injective. In Figure lb the mapping 0 was taken to be a so-called convex combination map, which, as we will see later, is guaranteed to be one-to-one since the boundary of T is mapped to a rectangle. Put another way, none of the triangles in Figure lb are 'folded over'. In fact further properties of the map are important, such as linear precision, and this was achieved in Figure lb by using the so-called shape-preserving weights (the coefficients in the convex combinations). For a discussion of that, see [3]. Another application of piecewise linear maps is to image morphing. Image morphing can be carried out by continuously transforming one planar triangulation T° (whose vertices represent feature points in the image) to another, T'. Here we assume that there is a one-to-one correspondence between the vertices, edges, and triangles of T° and T^. We can view each intermediate triangulation T{t), 0 < f < 1, (where T(0) = T° and T(l) = T^) as the image of a piecewise linear map (/>(<) : Dro -^ ^T(t)- As with 18 Convex combination maps 19 parameterizations, it is again important that (j){t) is one-to-one. Figure 2 shows a socalled convex combination morph of [4] of two given planar triangulations: T° appears on the left and T^ on the right. The two triangulations in the middle are T(l/3) and T(2/3). This morph ensures that </>(i) is one-to-one for all t in [0,1] and therefore T(i) has no 'folded' triangles at any time instant t. Piecewise linear maps also arise in: texture mapping; numerical grid generation; and in setting up multiresolution frameworks (nested spaces of piecewise linear functions) for manifold surface triangulations in computer graphics. A!-:.--"/ 1. Spatial triangulation (la), Convex combination parameterization (lb), Bicubic spline approximation (Ic). FIG. FIG. 2 2. Convex combination morph. Convex combination maps For the sake of simplicity we will only discuss convex combination maps defined over planar triangulations even though all the results hold equally well when the domain of the map is a spatial triangulation such as that in Figure la. Thus let T = {Ti,..., TM} be a simply-connected planar triangulation, with closed triangles Tj, and let Dr = Urer^' as in Figure 3. We will call a mapping (j) : Dq—> R^ a convex combination map if it is piecewise linear over T and, for every interior vertex v of T, there exist weights ^vw > 0, for w G Ny, such that Michael S. Floater 20 and (1) ■w€N„ where Ny is the set of neighbours of v; see Figure 3. FIG. 3. Convex combination map. In applications, the mapped boundary vertices (p{v) are chosen first. Then the weights Xvw are all specified according to some chosen strategy. Then finally the mapped interior vertices are found by treating the equations in (1) as a finear system. Example 2.1 If an interior vertex u of T has five neighbours vi,...,Vs, then we might set Until recently, the only theory behind convex combination maps was that of Tutte [8]. Working from a purely graph-theoretic point of view, Tutte proposed a so-called barycentric mapping for constructing straight line drawings of 3-connected graphs (which include triangulations). A barycentric mapping in our context is simply a convex combination map in which all the weights at each vertex are equal, i.e., A^,,, = 1/dy, where dy is the degree or valency of the vertex v. Thus for v in Example 1 we must have 111 1 1 (f)(^v) = -(f>{Vi) + -(l){V2) + -(l>iV3) + -(t>{Vi) + -(l){V5). Tutte showed that a valid straight line drawing, i.e. one with no edge crossings, results from a barycentric mapping if the 'boundary' of the graph, a so-called 'cycle', is mapped to a convex polygon. However, as argued in [3], convex combination maps share all those properties of barycentric maps necessary for Tutte's proof. Thus when interpreted in the right way and suitably generalized, Tutte's theorem can be expressed in the following way. , Theorem 2.2 Suppose (p : Dr —> R^ is a convex combination mapping which maps the n boundary vertices of T cyclically into the n vertices of some n-sided convex polygon in the plane. Then 4> is one-to-one. Despite this generalization, however, there are stih two aspects of it which need to be improved from the point of view of applications and future research. The first is that we would like to extend the theorem so that we can allow some, and indeed many, of the mapped boundary vertices to be collinear. Indeed in the application Convex combination maps to parameterization for surface fitting, it might be convenient to map all the boundary vertices of the given triangulation into the four sides of a rectangle, as in Figure lb. This is because tensor-product splines surfaces are defined over rectangular domains. Collinearity will also often be desirable in morphing, as in Figure 2, and in most other applications. Thus a drawback of Theorem 2.2 is that it does not allow coUinear vertices in the image boundary. The second aspect is that we would like to simplify the proof in order to have some hope of establishing the injectivity of piecewise linear maps in even more general situations, such as when mapping to non-convex regions, or when some of the mapped vertices are constrained, for example. The fact that Tutte's proof relies on the non-existence of the Kuratowski subgraphs K^ and if3,3 in a planar graph illustrates its complexity. It is these two improvements that are the focus of [5]. The main idea of [5] is the observation that Theorem 2.2 is very similar to a theorem on harmonic maps, referred to by Duren and Hengartner [2] as the Rado-Kneser-Choquet thereom, which was established in [7, 6, 1]. Recall that a mapping <p : D -^ R^, with D CB? and <j) = {u, v), is harmonic if both its components u{x, y) and v{x, y) satisfy the Laplace equation in D, i.e., T Uyy "yy — U, + Vyy = 0; see Figure 4. Rado-Kneser-Choquet Theorem. Suppose (p D ^ ]R is a harmonic mapping which maps the boundary dD homeomorphically into the boundary dfl of some convex region O C IR^. Then (f> is one-to-one. FIG. 4. Harmonic map. This suggested that a proof of Theorem 2.2 might be based on a proof of the RadoKneser-Choquet theorem, in particular the short proof of Kneser [6]. Kneser's proof begins by showing that (j) is locally one-to-one in the sense that the Jacobian of (f), Uy never vanishes. Kneser establishes this by supposing that the Jacobian is zero at some point {xo, J/o)- In that case there must be a straight line ax + by-{-c = 0 passing through the point <p{xo,yo) such that both partial derivatives of the function h{x, y) = au{x, y) + bv{x,y) + c are zero at {xo,yo)- At the same time, the function /i : D —> R is zero at ixo,yo) and has just two zeros along the boundary of D. Noting that h{x, y) is a harmonic 21 22 i Michael S. Floater function, Kneser then uses the Nodal Lines theorem of Courant to argue that there are at least four zero contours of h emanating from {XQ, ya) and due to the maximum principle for h, these four curves can never self-intersect nor intersect one another. Therefore all four curves must reach the boundary of D which is a contradiction. These ideas were used in [5] to establish a much simpler proof of Theorem 2.2 than that of Tutte. No graph theory is needed at all. Instead, the discrete maximum principle for convex combination functions plays the role of the maximum principle for harmonic functions. Similar to Kneser's proof we show first that 0 is locally one-to-one, except that we understand this to mean that the restriction of 0 to any quadrilateral in T is one-to-one, a quadrilateral being the union of two triangles sharing a common edge. V2 FIG. 5. Dividing edges. Moreover, Theorem 2.2 is generalized in [5] to allow collinear mapped boundary vertices. We call an edge [v, w] of T a dividing edge if both endpoints v and w are boundary vertices yet the edge [v, w] itself is not contained in the boundary. For example in Figure 5, the only dividing edge in the triangulation is [t^i,i'2]- Dividing edges play a critical role because they partition the triangulation into subtriangulations %, in each of which every convex combination function satisfies a discrete maximum principle in its strong form. The main result of [5] was the following. Theorem 2.3 Suppose T is any triangulation and that 4> '■ DT ~> ^ is a convex combination mapping which maps ODT homeomorphically into the boundary dfl of some convex region fi C R^. Then 4> is one-to-one if and only if no dividing edge [v,w] ofT is mapped by (p into dfl. 3 Future research Here is a list of topics for future research. • A triangulation is a special (maximal) kind of planar graph. Can one extend Theorem 2.3 to other planar graphs, for example, rectangular grids? This is likely because Tutte's theory already holds for all 3-connected graphs. • In what way can the theorem be extended from bivariate maps to trivariate ones? • Can similar one-to-one maps be guaranteed when mapping closed surfaces of various topology? For example, we would like to map a closed manifold triangulation. Convex combination maps homeomorphic to a sphere, into a unit sphere injectively. Here each triangle in the triangulation would be mapped to a spherical triangle on the surface of the sphere. • Can one find sufficient conditions for the injectivity of constrained maps, i.e., piecewise linear maps in which the image of certain interior points is specified in advance? • Can one remove the requirement of having to map the boundary to a convex polygon and still ensure a one-to-one mapping under some weaker condition? • Can the Rado-Kneser-Choquet thereom and Theorem 2.3 be combined as part of a single more general theorem? Bibliography 1. G. Choquet, Sur un type de transformation analytique generalisant la representation conforme et define au moyen de fonctions harmoniques, Bull. Sci. Math. 69 (1945), 156-165. 2. P. Duren and W. Hengartner, Harmonic mappings of multiply connected domains, Pac. J. Math. 180 (1997), 201-220. 3. M. S. Floater, Parametrization and smooth approximation of surface triangulations, Comp. Aided Geom. Design 14 (1997), 231-250. 4. M. S. Floater and C. Gotsman, How to morph tilings injectively, J. Comp. Appl. Math. 101 (1999), 117-129. 5. M. S. Floater, One-to-one piecewise linear mappings over triangulations, to appear in Math. Comp. 6. H. Kneser, Losung der Aufgabe 41, Jahresber. Deutsch. Math.-Verien. 35, (1926), 123-124. 7. T. Rado, Aufgabe 41, Jahresber. Deutsch. Math.-Verien. 35, (1926), 49. 8. W. T. Tutte, How to draw a graph, Proc. London Math. Soc. 13 (1963), 743-768. 23 Shape preserving interpolation by curves T. N. T. Goodman Department of Mathematics, University of Dundee, Dundee DDl 4HN. tgoodmanOmaths.dundee.ac.uk Abstract A survey is given of algorithms for passing a curve through data points so as to preserve the shape of the data. 1 Introduction We consider the problem of passing a curve through a finite sequence of points. We want the curve to preserve in some sense the shape of the data, i.e. the shape of the curve gained by joining the data by straight line segments (which we call the 'piecewise linear interpolant'). We do not consider the important problems of approximating the data by a curve, or of shape-preserving interpolation by a surface. The short length of the paper forces it to be selective. So we concentrate on actual algorithms for solving the problem rather than related theory. Also we consider only algorithms where the curve is defined explicitly, not implicitly either as the zero set of a function or as the limit of a subdivision process (though there are, to our knowledge, extremely few such implicit shape-preserving schemes). In Section 2, we consider planar curves given by a function y = f{x), often rather misleadingly referred to as 'functional interpolation'. There are numerous such schemes, dating from 1966, with most of them prior to 1990. Our treatment is therefore very selective. Section 3 deals with parametrically defined planar curves, for which the schemes are fewer and more recent. Finally, in Section 4, we consider curves in three dimensions, often called 'space curves'. Here the work is much more limited, dating only from 1997. We note that in shape-preserving interpolation, the map from the data to the function describing the curve must be non-linear. In what we call 'tension methods' the curve can be constructed by a linear scheme for any choice of certain 'tension parameters'. These parameters are then varied so as to 'pull' the curve towards the piecewise linear interpolant until the shape criteria are satisfied. Though there are a few variations on this theme, there is generally a clear distinction between tension methods and other schemes, which we shall term 'direct methods'. 2 Functional interpolation Given data (.'Ei,^/,;)Gi^^ i = 0,...,N, 24 xo<xi<---<XN, (2.1) Shape preserving interpolation by curves 25 we consider a function/: [a;o,a;iv]—>-R satisfying f{xi) = yi, i = 0,...,N. (2.2) For some reasons, perhaps the physical situation which / is intended to model, we may wish the graph of / to inherit certain shape properties of the data. We now describe these and other properties which it may be desirable for / to possess. 2.1 Desirable properties Monotonicity. Here we require / to be increasing (respectively decreasing) if (j/j) is increasing (respectively decreasing). More generally we may require the scheme to be 'co-monotone', i.e. for i = 0,..., A'' — 1, / is increasing (decreasing) on [a;i,a;j+i] if Vi < Vi+i iVi > Vi+i)- Co-monotonicity has the consequence that the local extrema of / occur exactly at the local extrema of (j/j). Moreover if yi = j/j+i, then / is constant on [a;j,a;i+i]. These properties may be too restrictive and a weaker alternative is what we call 'local monotonicity': for i = 1,... ,N — 2, f is increasing on [xi,Xi+t] if Vi-i < J/i < Vi+i < yi+2 (and similarly for decreasing). Although this is not generally stated, it is also desirable that for i = 0,..., A'' — 1, / has at most one local extremum on{xi,Xi+i). Convexity. Here we require / to be convex (concave) if the piecewise linear interpolant is convex (concave). More generally we call the scheme 'co-convex' if for i = 1,..., A^ - 2, / is convex (concave) on [a;j,a;i+i] if the piecewise linear interpolant is convex (concave) on [xi-i,Xi+2]- It is also desirable in a co-convex scheme for / to have at most one inflection in {xi,Xi+i), 0 <i < N — 1. Smoothness. By definition, the piecewise linear interpolant is shape-preserving, and so the problem is trivial unless we require / to have greater smoothness than continuity, i.e. C'^ for fc > 1. Since all the schemes use piecewise analytic functions, the C^ condition needs to be checked only at a finite number of 'knots', which generally include the data points. We remark that smoothness and shape-preservation may not be compatible; e.g. if for i = 0,.. .,A, Xi = i — 2, yi = \xi\, and / is convex on [a;o,a;4], then f{x) = \x\, —2 < a; < 2, and so is not C^ at 0. Approximation order. It is generally supposed that the data arise as values of some unknown 'smooth' function g, i.e. y, = g{Xi), i = 0,... ,N. Then we can consider how fast the interpolant / converges to g as we increase the density of data values Xi in the fixed interval [a,b]. A scheme has a,pproximation order 0{h™-) if ||/-3|| = 0{K^), where h = max{xi+i — Xi : i = 0,... ,7V - 1} and the usual norm is ||F|| = sup{|F(a;)| : a < x<b). Locality. In a 'global' scheme, the value f{x), for any x, generally depends on all the data. In contrast, for a 'local' scheme, f{x) depends on the data values {xi,yi) only for Xi 'near' x. There may be advantages in local schemes, e.g. when data are modified or inserted. Fairness. It is often desirable that the curve is 'fair', i.e. pleasing to the eye, see Section 3. 26 T. N. T. Goodman Other desirable properties are invariance under scaling or reflection in x or y, and stability, i.e. small changes in the data produce small changes in /. There may also be other constraints on /, e.g. / > 0 when y^ > 0, z = 0,..., A'^. 2.2 : Tension methods Many tension methods are a modification of cubic spline interpolation, which we now describe. Given data (2.1), there is a unique function / satisfying (2.2), where / is C^, is a cubic polynomial on [a;j,3;i+i], i = 0,...,N - 1, and satisfies suitable boundary conditions at XQ and XN- The function / minimises J^^{g )^ over a suitable class of functions and this energy minimisation property is generally considered to give a fair curve. Determining / requires solving a global, strictly diagonally dominant tridiagonal system of linear equations. Since cubic spline interpolation is not shape-preserving, in 1966 Schweikert [67] mod: ified the scheme by replacing cubic polynomials on each interval [.'E,:,XJ+I] by solutions : of /W-Ai/"=0, where Aj > 0. When Aj = 0, / will reduce to a cubic, while as Aj —> cx), / approaches a linear polynomial. Thus Aj acts as a tension parameter and by making appropriate choices of A^ large enough the function will preserve monotonicity and/or convexity globally or locally. Many papers have been written on Schweikert's tension splines giving, for example, ways of choosing the values of the tension parameters, e.g. [68,57,46,60]. However the fact that the method uses exponential functions can be seen as a drawback. An alternative was introduced by Nielson in 1974 [55] by adjusting the minimisation property of cubic splines to a minimisation problem involving also the first derivative. The resulting function, called a i^-spHne, is also cubic on each interval [xijXj+i] but only C^. However the form of the C^ continuity gives extra 'smoothness' for parametrically defined curves and so we discuss z/-splines further in Section 3. By generalising the minimisation problem still further one can gain a C^ piecewise cubic interpolant with further parameters for gaining shape properties [22]. The idea of using rational functions in tension methods was introduced by Spath [69], also in 1974, and put in a general setting of tension methods in [57]. Prom 1982-1988, Gregory and/or Delbourgo produced a series of algorithms using rational functions, e.g. [19,36,20,21,18]. We illustrate the ideas with an algorithm from [37]. Here / is C^ and on each interval [a;;,2:^+1] it has the form, for some a, b, c, d, ,,_ a + bt + ct"^+ dt^ 1 + \it{l - t) ' _ X - Xi Xi+i - Xi ' For Aj > —l,i = 0,... ,N—1, f can be determined as the solution of a strictly diagonally dominant tridiagonal hnear system (and hence the scheme is global). When all A; = 0, / reduces to the usual cubic spline interpolant, while as A; -+ 00, / converges uniformly to the linear interpolant on [xi,Xi+i]. In general the approximation order is 0{h'^) for Shape preserving interpolation by curves 27 data from a C^ function. In the special case of monotone data, choosing Ai = ft + if'ixi) + f'{xi+i)f'+^ " ""', ft>-3, i = 0,...,N-l, Vi+i - Vi ensures that / is correspondingly monotone, and for the choice /Xj = —2, / reduces to a rational quadratic which gives optimal approximation order 0(/i^). Similarly for convex data, / is also convex provided that each A, satisfies an inequality involving / [xi), f {xijf.i), and choosing Aj appropriately (which requires solving a non-linear equation) further ensures approximation order 0{h^). There are some more recent methods involving rationals, e.g. [58]. The idea of using variable degree to preserve shape was introduced by McAllister, Passow and Roulier in 1977 [47,56]. They produce monotone, convex schemes of arbitrarily high smoothness by constructing a shape-preserving piecewise linear interpolant I with one knot between any two data points (and no knots at the data points) and then defining the final interpolant on each interval [xi, cCj+i] as the Bernstein polynomial of / of some degree mj. The idea was extended from 1986 by Costantini [8-10]. For fc > 1, rui > 2k -\-1, i = Q,..., N — 1, he constructs a shape-preserving piecewise linear interpolant I with knots at Xi-\-k{xi+i -Xi)/mi and a;j+i — A;(a;j+i —Xi)lmi, i = 0,... ,N—1. The final interpolant / coincides on each interval [a;j,a;i+i] with the Bernstein polynomial of I of degree rrii and is hence C'^ (with f^^\xi) = 0, j = 2,..., fe). In [10] there is a co-monotone, co-convex scheme in which the degrees mj can either be chosen a priori or computed automatically according to the data. The above schemes using variable degree are not strictly tension schemes in our sense but in 1990, Kaklis and Pandelis [40] introduced a tension method by using the above form for fc = 1, i.e. on each interval [a;j,a;i+i] it has the form: f{t) = f{xi){i-t)+f{xi+^)t+cit{i-tr^+dit^^{i-t), t= ''"^' . Here mj > 2 is an integer and for each choice of mo,... ,mjv-i, the numbers Cj, di are chosen so that / is C^, which requires the solution of a strictly diagonally domina,nt tridiagonal linear system. When all m, = 2, this reduces to the usual cubic spline interpolant, while as m^ —> oo, / converges uniformly to the linear interpolant on [a;i,a;i+i] with order 0{m~^) (or 0{m~'^) if mj-i, rrij+i remain bounded). For further discussion of variable degree shape-preserving functional interpolation, see [11]. Our final type of tension method was introduced by Manni [50] in 1996. The general idea is to define / on [a;i,Xj+i] as f{x)=pi{ql'^{x)), where pi, qi are cubic polynomials on [a;,, a;j+i] and g, is strictly increasing from [aij, aij+i] onto itself, so that the inverse q~^ is well-defined on [a;j,a;j4.i]. For / [xi) = di, i = 0,... ,N,we require Pi(^j) = Ajrfj, qi{xi) = Xi, Pi{xi+i) ^ ^lidi+i, qi{xi+i) = Hi, for parameters Aj > 0, ft > 0. For Ai = ft = 1, we have qi{x) = a; and / reduces to a cubic on [xi,Xi+i], while for Xi = Hi = 0, f becomes linear on [a;i,a:j+i]. 28 T. N. T. Goodman In [50], the values do,., .^d^ sxe assumed known (or estimated from the data vahies) and the scheme is local C^, gives necessary and sufficient conditions for the values of the parameters A,:, /ij forco-monotonicity, and has approximation order 0(/i^) when g is C^ and generally 0(/i^) when g is C^. Manni and co-workers have written a series of papers using the same idea, [51,53,54]. For example in [45], the values di are not assumed given but are chosen to ensure that the function is C^, thus providing a locally monotone, co-convex global scheme which generalises usual cubic spline interpolation; while in [52] two further knots are inserted in each interval [xi,XiJ^\] to produce a C^, locally monotone, co-convex local scheme which interpolates values of f^^\xi), j = 1,2, i = 0,... ,N. 2.3 Direct methods In 1967, Young [71] considered shape-preserving interpolation by polynomials and a number of papers have appeared since on this topic, e.g. [59] gives a constructive proof of the existence of a co-monotone interpolant with an upper bound on the degree required. However for a practical algorithm, using a piecewise polynomial offers much more flexibility than a single polynomial. Numerous papers have been written using such polynomial splines and we mention briefly only a few. By inserting extra knots between data points, a convexity preserving scheme with C^ cubics was given by de Boor [4, p.303], and co-monotone, co-convex schemes with C^ quadratics in [48,49,66]. C^ cubic sphnes with knots at the data points are used for co-monotonicity in [25,5,24,70], (the last of these using a variational approach), and for both co-monotonicity and co-convexity in [16,17]. We also recall the methods using spline functions of variable degree with knots between the data points to obtain interpolants with arbitrarily high smoothness which were discussed under tension methods. Finally we note that following the paper [62] which was as early as 1973, Schaback [63] gives a C^ co-monotone, co-convex scheme which uses a cubic polynomial on any interval [xi,a;j+i] where an inflection is needed, and on other intervals employs a rational function of form quadratic/linear. 3 Planar curves Given data heR'^, i = 0,...,N, we consider a curve r : [a, b] —> R? satisfying r{ti) = Ii, i = 0,...,N, (3.1) for values a = to < h < ■ ■ ■ < t^ = b. For a closed curve the situation is extended periodically so that li+N = lo, ti+N = ti, ieZ, r{t + b-a) = r{t), t € R. 3.1 Desirable properties Shape. For this case it is not usually relevant to consider preservation of monotonicity. We say a scheme is 'co-convex' if the curve r has the minimum number of inflections Shape preserving interpolation by curves 29 consistent with the data. In practice, schemes satisfy the somewhat stronger condition that for any 0 < i < j — 2 < N - 2, r is positively (negatively) locally convex on [ti+i,tj-i] if the polygonal arc joining Ii,...,Ij is positively (negatively) locally convex. For more details on this and other desirable properties, see [29]. Smoothness. We shall call the interpolating curve C'^ for fc > 0 if the function r is C^. kC^ curve r we shall call G^ if the unit tangent vector is continuous, and G^ if, in addition, the curvature is continuous. A C'^ curve r is G^, fc = 1,2, provided that the parameterisation is regular, i.e. r (i) ^ (0,0), which is generally desirable. It is usually sufficient to have G^, rather than C'^, continuity if only the appearance of the curve is important and the choice of parameter t is not significant. Fairness. Planar curves often arise in computer-aided design where it may be particularly important that the curve is pleasing to the eye. Though this is subjective, various criteria have been suggested to be relevant, such as magnitude, rate of change or monotonicity of the curvature. Some schemes include 'shape parameters' which can be manipulated by the designer to modify the shape of the curve. Approximation order is not important in the context of design when the data are not considered to be taken from some unknown curve. Approximation order is related to reproduction of polynomial curves, and a related property for planar curves is reproduction of arcs of circles (or more generally conies); this cannot be done exactly by polynomials but it can be achieved by using rationals. Locality and other desirable properties are similar to the functional case as described in Section 2.1, though it is generally more appropriate that the invariance is under a rotation and the same scaling in both x and y.' 3.2 Tension methods In Section 2.2 we mentioned Nielsen's i/-splines [55]. Applying this scheme for both components of r gives a function r which is cubic on each interval [:ti,tj+i], is C^ and satisfies r'\tt)=r"{t-) + Vir'{ti), J - 1,... ,iV - 1, where Vi > 0. This condition is sufficient for G^ continuity of r (assuming regular parameterisation). When all vi = 0, r will reduce to the usual G^ cubic spline interpolant. As Vi —> 00, the curve is 'pulled tight' at Jj and as Vi, Vi+i —> oo, it approaches the linear interpolant on [ii,ij4.i]. The scheme in [37] by Gregory which was mentioned in Section 2.2 was adapted to the planar case in [38]. Other schemes using rationals were proposed by Clements in [6,7], where r is a G^ curve which on each interval [ij,fj+i] has the form, for some a, 6, c,d&B?, , , a(l - s)^ WiS + 1 - ,, , where Wj > 0 are the tension parameters. ds^ ^^(l - S) -t- 1 t-ti ti+i-ti 30 T. N. T. Goodman The variable degree tension method of [40], also mentioned in Section 2.2, was adapted to the planar case in [41], and extended in [27] to allow the designer to obtain a 'fair' curve by minimising the number of changes in the monotonicity of the curvature. 3.3 Direct methods The papers [34,35,28,23] give local, G^ co-convex schemes, e.g. in [28], a rational cubic/cubic is used on each interval [ij,ij+i] and the tangent vectors and curvatures are stipulated by the algorithm to ensure that the convexity conditions are satisfied and circular arcs are reproduced, with the possibility of modifying the tangent vectors and curvatures further as shape parameters. Following an earlier scheme in [64], Schaback in [65] gives a global G^ co-convex scheme which uses a cubic polynomial on any interval [fi,<,.|_i] where an inflection is needed, and on other intervals employs quadratic polynomials. Sapidis and Kaklis [61] give a G^ co-convex scheme by interpolating by a piecewise quintic curve tangent directions and curvatures gained by their tension method [41]. In [1] a local, co-convex G^ scheme is given which uses polynomials of degree six and which attempts to obtain a fair curve by imposing conditions on the curvature to minimise measures of fairness. Finally we note that in [12] Costantini gives an abstract theory and general purpose code. 4 Space curves Given data h&E?, i = 0,...,N, we consider a curve r : [a,b] —> R^ satisfying condition (3.1) as before. 4.1 Desirable properties What is meant by 'shape-preserving' is not so clear for space curves as for the planar case. Criteria were introduced by Kaklis and Karavelas [39] and extended by Ong and the author in [31]. We shall sketch these below. They are discussed in further detail in [30], where some further extensions are suggested. We write, for appropriate indices i: Li — Ii+i-Ii, A, = det[Lj_i,L,:,Lj+i], N, = Li^i x Li. Torsion. We ensure that the curve is 'twisting' in the same manner as the piecewise linear interpolant by requiring that if A,; =^ 0, then the torsion of r has the same sign as AiOn{ti,ti+i). Convexity. Let K{t) = r'{t) xr"{t), a<t<b. We require that for 1 < i < iV - 1, K{ti).Ni > 0, which means that the projection of the curve r onto the plane of /j_i, /j, /j+i, has the same sign of local convexity at /,; as the polygonal arc Ii-\lili+i. Moreover if Ni.Ni+i > 0, we require K{t).Nj>0, j = i,i + l, ti<t<ti+i, Shape preserving interpolation by curves 31 which impUes that the curve r has the same sign of local convexity on [ti,ti+i] when projected in any direction XNi + (1 - X)Ni+i for 0 < A < 1. Finally we require that if Ni.Ni+i < 0, then for j = i, i + 1, K{t).Nj has exactly one sign change in [ti,ti+i], which imples that each of the above projections of r have just one inflection. Smoothness. This is as for planar curves, except that we call the curve G^ if it is G^ and, in addition, the torsion is continuous. Other desirable properties are similar to the planar case. 4.2 Tension methods Although interpolation by space curves with a special shape is considered in [44], the first specific shape-preserving interpolation scheme by space curves was due to Kaklis and Karavelas [39], who adapted the variable degree tension method of [40] to give a C^ method which was also G^, but at the expense of zero torsion at the data points. In [42] the same authors adapted Nielson's i^-splines to the three dimensional case to give a curve which is C-^ and G^. The paper [14] also uses variable degree for tension parameters but gives a C^ scheme in which the limiting curve as the tension goes to infinity is not the piecewise linear interpolant but the shape-preserving interpolant given by either of the above two schemes. In [15] a C^ scheme is also given but here the components of r on each interval [ij,ij+i] lie in the linear span of the functions ■^ t-tH+l — ti When rrii = rrii^i = 5, this reduces to a quintic polynomial. As mt, rrii+i —> oo, it tends to a linear polynomial and then the curve r approaches the piecewise linear interpolant on[ii,ti+i]. . The paper [26] also uses variable degree splines with degree on each interval at least five, and the curve r also converges to the piecewise linear interpolant as the degrees go to infinity. However here the curve is C^, which the authors feel may give extra fairness to the curve due, for example, to lowering the maximum absolute value of the curvature. Variable degree polynomial splines are also used in [13]. 4.3 Direct methods Following an earlier scheme in [31], Ong and the author gave a local G^ scheme in [32] which employed a rational cubic/cubic between data points, extending the ideas of the planar scheme in [28]. This was further extended to a local G^ scheme using a rational quartic/quartic in [43]. In [33], the degrees of freedom- inherent in the scheme in [32] were used to optimise a fairness measure. Finally we mention the papers [2,3] which give local G^ schemes using a piecewise polynomial of degree six, also allowing optimisation of a fairness measure. It will be noted that many of the above papers are extremely recent and it is hoped that the unavoidable lack of detail here will serve to tantahse readers to discover for themselves more of this rapidly developing field. ,32 T. N. T. Goodman Bibliography 1. S. Asaturyan, P. Costantini and C. Manni, G^ shape preserving parametric planar curve interpolation, in Creating Fair and Shape-Preserving Curves and Surfaces, H. Nowacki, P. D. Kaklis (eds.), B. G. Teubner, Stuttgart (1998), 89-98. 2. S. Asaturyan, P. Costantini and C. Manni, Shape-preserving interpolating curves in R^: a, local approach, in Creating Fair and Shape-Preserving Curves and Surfaces, H. Nowacki, P. D. Kakhs (eds.), B. G. Teubner, Stuttgart (1998), 99^108. 3. S. Asaturyan, P. Costantini and C. Manni, Local shape-preserving interpolation by space curves, IMA J. Numer. Anal. 21 (2001), 301-325. 4. C. de Boor, A Practical Guide to Splines, Springer, New York (1978). 5. J. Butland, A method of interpolating reasonable-shaped curves through any data, Proc. Computer Graphics 80, Onhne Publ. Ltd., Northwood Hills, Middlesex, U.K. (1980), 409-422. 6. J. C. Clements, Convexity-preserving piecewise rational cubic interpolation, SIAM J. Numer. Anal. 27 (1990), 1016^1023. 7. J. C. Clements, A convexity-preserving C^ parametric rational cubic interpolant, Numer. Math. 63 (1992), 165-171. 8. P. Costantini, On monotone and convex spline interpolation, Math. Comp. 46 (1986), 203-214. 9. P. Costantini, Co-monotone interpolating splines of arbitrary degree - a local approach, SIAM J. Sci. Stat. Comput. 8 (1987), 1026-1034. 10. P. Costantini, An algorithm for computing shape-preserving interpolating splines of arbitrary degree, J. Comput. Appl. Math. 22 (1988), 89-136. 11. P. Costantini, Abstract schemes for functional shape-preserving interpolation, in Advanced Course on Fairshape, J. Hoschek, P. Kaklis (eds.), B. G. Teubner, Stuttgart (1996), 185-199. 12. P. Costantini, Boundary-valued shape-preserving interpolating splines, ACM Trans, on Math. Software 23 (1997), 229-251. 13. P. Costantini, Curve and surface construction using variable degree polynomial splines. Computer Aided Geometric Design 17 (2000), 419-446. 14. P. Costantini, T. N. T. Goodman and C. Manni, Constructing C^ shape preserving interpolating space curves, Advances Comp. Math. 14 (2001), 103-127. 15. P. Costantini and C. Manni, Shape-preserving C^ interpolation: the curve case, to appear. 16. P. Costantini and R. Morandi, Monotone and convex cubic spline interpolation, Calcolo 21 (1984), 281-294. 17. P. Costantini and R. Morandi, An algorithm for computing shape-preserving cubic spline interpolation to data, Calcolo 21 (1984), 295-305. 18. R. Delbourgo, Shape preserving interpolation to convex data by rational functions with quadratic numerator and linear denominator, IMA J. Numer. Anal. 9 (1989), 123-136. 19. R. Delbourgo and J. A. Gregory, C^ rational quadratic spline interpolation to mono- Shape preserving interpolation by curves tonic data, IMA J. Numer. Anal. 3 (1983), 141-152. 20. R. Delbourgo and J. A. Gregory, The determination of derivative parameters for a monotonic rational quadratic interpolant, IMA J. Numer. Anal. 5 (1985), 397-406. 21. R. Delbourgo and J. A. Gregory, Shape preserving piecewise rational interpolation, SIAM J. Sci. Stat. Comput. 6 (1985), 967-976. 22. T. A. Foley, A shape preserving interpolant with tension controls. Computer Aided Geometric Design 5 (1988). , 23. T. A. Foley, T. N. T. Goodman and K. Unsworth, An algorithm for shape-preserving parametric interpolating curves with G^ continuity, in Mathematical Methods in CAGD, T. Lyche, L. L. Schumaker (eds.). Academic Press, Boston (1989), 249-259. 24. F. N. Fritsch and J. Butland, A method for constructing local monotone piecewise cubic interpolants, SIAM J. Sci. Stat. Comput. 5 (1984), 300-304. 25. F. N. Fritsch and R. E. Carlson, Monotone piecewise cubic interpolation, SIAM J. Numer. Anal. 17 (1980), 238-246. 26. N. C. Gabrielides and P. D. Kaklis, C^ interpolatory shape-preserving polynomial splines of variable degree. Computing 65 (2001), to appear. 27. A. Ginnis, P. Kaklis and N. S. Sapidis, Polynomial splines of non-uniform degree: controlling convexity and fairness, in Designing Fair Curves and Surfaces, N. S. Sapidis (ed.), SIAM Series on Geometric Design, Philadelphia (1994), Part 3, Chapter 10. ■ 28. T. N. T. Goodman, Shape preserving interpolation by parametric rational cubic splines, in Numerical Mathematics Singapore 1988, R. P. Agarwal, Y. M. Chow, S. J. Wilson (eds.), International Series of Numerical Mathematics Vol. 86, Birkhauser Verlag, Basel (1988), 149-158. 29. T. N. T. Goodman, Shape preserving interpolation by planar curves, in Advanced Course on Fairshape, J. Hoschek, P. Kaklis (eds.), B. G. Teubner, Stuttgart (1996), 29-38. 30. T. N. T. Goodman and B. H. Ong, Shape preserving interpolation by curves in three dimensions, in Advanced Course on Fairshape, J. Hoschek, P. Kaklis (eds.), B. G. Teubner, Stuttgart (1996), 39-48. 31. T. N. T. Goodman and B. H. Ong, Shape preserving interpolation by space curves. Computer Aided Geometric Design 15 (1997), 1-17. 32. T. N. T. Goodman and B. H. Ong, Shape preserving interpolation by G^ curves in three dimensions, in Curves and Surfaces with Applications in CAGD, A. LeMehaute, C. Rabut, L. L. Schumaker (eds.), Vanderbilt Univ. Press, Nashville (1997), 151-158. 33. T. N. T. Goodman, B. H. Ong and M. L. Sampoli, Automatic interpolation by fair, shape preserving, G^ space curves. Computer-aided Design 30 (1998), 813-822. 34. T. N. T. Goodman and K. Unsworth, Shape preserving interpolation by parametrically defined curves, SIAM J. Numer. Anal. 25 (1988), 1453-1465. 35. T. N. T. Goodman and K. Unsworth, Shape preserving interpolation by curvature continuous parametric curves, Computer Aided Geometric Design 5 (1988), 323- 33 34 T. N. T. Goodman 340. 36. J. A. Gregory, Shape preserving rational spline interpolation, in Rational Approximation and Interpolation, Graves-Morris, Saff and Varga (eds.), Springer-Verlag (1984), 431-441. 37. J. A. Gregory, Shape preserving spline interpolation. Computer-aided Design 18 (1986), 53-58. 38. J. A. Gregory and M. Sarfraz, A rational cubic spline with tension. Computer Aided Geometric Design 7 (1990), 1-13. 39. P. D. Kaklis and M. I. Karavelas, Shape preserving interpolation in R^, IMA J. Numer. Anal. 17 (1997), 373-419. 40. P. D. Kaklis and D. G. Pandelis, Convexity-preserving polynomial splines of nonuniform degree, IMA J. Numer. Anal. 10 (1990), 223-234. 41. P. D. Kaklis and N. S. Sapidis, Convexity-preserving interpolating parametric sphnes of non-uniform polynomial degree. Computer Aided Geometric Design 12 (1995), 1-26. 42. M. I. Karavelas and P. D. Kakhs, Spatial shape-preserving interpolation using i/splines. Numerical Algorithms 23 (2000), 217-250. ■ 43. V. P. Kong and B. H. Ong, Shape preserving interpolation using Prenet frame continuous curves of order 3, to appear. 44. C. Labenski and B. Piper, Coils, Computer Aided Geometric Design 20 (1996), 1-29. 45. P. Lamberti and C. Manni, Shape preserving C^ functional interpolation via parametric cubics, Numerical Algorithms, to appear. 46. R. W. Lynch, A method for choosing a tension factor for spline under tension interpolation, M.Sc. Thesis, Univ. of Texas at Austin (1982). 47. D. F. McAllister, E. Passow and J. A. Roulier, Algorithms for computing shape preserving spline interpolation to data. Math. Comp. 31 (1977), 717-725. 48. D. F. McAllister and J. A. Roulier, An algorithm for computing a shape preserving osculating quadratic spline, ACM Trans. Math. Software 7 (1981), 331-347. 49. D. F. McAllister and J. A. Roulier, Algorithm 574. Shape preserving osculating quadratic spHnes, ACM Trans. Math. Software 7 (1981), 384-386. 50. C. Manni, C^ comonotone Hermite interpolation via parametric cubics, J. Comp. Appl. Math. 69 (1996), 143-157. 51. C. Manni, Parametric shape-preserving Hermite interpolation by piecewise quadratics, in Advanced Topics in Multivariate Approximation, F. Fontanella, K. Jetter, P. J. Laurent (eds.). World Scientific (1996), 211-228. 52. C. Manni, On shape preserving C^ Hermite interpolation, BIT 14 (2001), 127-148. 53. C. Manni and P. Sablonniere, Monotone interpolation of order 3 by C^ cubic splines, IMA J. Numer. Anal. 17 (1997), 305-320. 54. C. Manni and M. L. Sampoli, Comonotone parametric Hermite interpolation, in Mathematical Methods for Curves and Surfaces II, M. Daehlen, T. Lyche, L. L. Schumaker (eds.), Vanderbilt Univ. Press, Nashville (1998), 343-350. Shape preserving interpolation by curves 55. G. M. Nielson, Some piecewise polynomial alternatives to splines under tension, in Computer Aided Geometric Design, R. E. Barnhill, R. F. Riesenfeld (eds.). Academic Press (1974), 209-235. 56. E. Passow and J. A. Roulier, Monotone and convex interpolation, SIAM J. Numer. Anal. 14 (1977), 904-909. 57. S. Pruess, Properties of splines in tension, J. Approx. Theory 17 (1976), 86-96. 58. R. Qu and M. Sarfraz, Efficient method for curve interpolation with monotonicity preservation and shape control. Neural, Parallel and Scientific Computations 5 (1997), 275-288. 59. L. Raymon, Piecewise monotone interpolation in polynomial type, SIAM J. Math. Anal. 12 (1981), 110-114. 60. N. S. Sapidis, P. D. Kaklis and T. A. Loukakis, A method for computing the tension parameters in convexity preserving sphne-in-tension interpolation, Numer. Math. 54 (1988), 179-192. 61. N. S. Sapidis and P. D. Kaklis, A hybrid method for shape-preserving interpolation with curvature-continuous quintic splines. Computing Suppl. 10 (1995), 285-301. 62. R. Schaback, Spezielle rationale Splinefunktionen, J. Approx. Theory 7 (1973), 281292. 63. R. Schaback, Adaptive rational splines, NAM-Bericht Nr. 60, Universitat Gottingen (1988). 64. R. Schaback, Interpolation in i?^ by piecewise quadratic visually C^ Bezier polynomials. Computer Aided Geometric Design 6 (1989), 219-233. 65. R. Schaback, On global GC'^ convexity preserving interpolation of planar curves by piecewise Bezier polynomials, in Mathematical Methods in CAGD, T. Lyche, L. L. Schumaker (eds.). Academic Press, Boston (1989), 539-548. 66. L. L. Schumaker, On shape preserving quadratic spline interpolation, SIAM J. Numer. Anal. 20 (1983), 854-864. 67. D. G. Schweikert, An interpolation curve using a spUne in tension, J. Math. Phys. 45 (1966), 312-317. 68. H. Spath, Exponential spline interpolation, Computing 4 (1969), 225-233. 69. H. Spath, Spline algorithms for curves and surfaces, Utilitas Mathematica Pub. Inc., Winnipeg (1974). 70. F. I. Utreras and V. Cells, Piecewise cubic monotone interpolation: a variational approach, Departamento de Matematicas, Universidad de Chile, Tech. Report MA83-B-281 (1983). 71. S. W. Young, Piecewise monotone polynomial interpolation. Bull. Amer. Math. Soc. 73 (1967), 642-643. 35 GAGD techniques for differentiable manifolds Achan Lin and Marshall Walker York University, Toronto M3J 1P3, Canada. linOyorku.ca, walkerOyorku.ca Abstract The paper outlines procedures for extending the de Casteljau, de Boor and Aitken algorithms in such a way as to allow the construction on a Riemannian manifold of curves analogous to Bezier, B-spline, and Lagrange curves. These curves lie in the manifold and respect intrinsic geometry. 1 Introduction Given a sequence of points in a Riemannian manifold M we describe methods for extending the de Casteljau, de Boor, and Aitken algorithms. These methods allow construction of corresponding interpolating or approximating curves that lie in the manifold and respect intrinsic geometry. In the case that the manifold is a sphere, opportunity for applications exist in the domain of geological and geographical mapping, for instance the creation of topographical contour lines or isotherms, and in the field of video production, where it is desirable to have smooth camera trajectories interpolating fixed camera positions. For higher dimensional manifolds there are applications in the field of data analysis. For the case of a sphere, there is an extensive literature dealing with the general problem of data fitting, and a superb review can be found in Fasshauer and Schumaker [2]. Shoemake [7] uses properties of quaternion arithmetic to describe curves on the unit quaternion sphere, and Levesley and Ragozin [4], using techniques different from those presented in this paper, describe methods for Lagrange interpolation in differentiable manifolds. The techniques described in this paper come from the simple observation that in the de Casteljau, de Boor, and Aitken algorithms one may formally substitute appropriately parametrized geodesic arcs for straight line segments. These ideas are introduced in detail in the next section in the context of the blossoming paradigm, [6] and [3]. Unfortunately many of the useful properties of blossoms depend on the affine structure of Euclidean space which in general has no counter part in a Riemannian manifold. In particular, geodesic blossoms may be neither symmetric or multi-affine, and in general they do not possess uniqueness characteristics common to the Euclidean blossom. For an arbitrary Riemannian manifold [1] or indeed an arbitrary differentiable 2manifold enibedded in E^, it may not be possible to construct unique shortest geodesic arcs between two points. However, if the manifold is compact or in the case that the two points lie in a sufficiently small neighborhood, such arcs are known to exist. But even 36 CAGD techniques for differentiable manifolds then, there appears to be no general method that allows explicit construction. So, the task of constructing geodesic blossoms becomes a study of special cases in which specific methods can be set forth. For the general case, a discrete variational method can be used to obtain good approximations. In Section 3 a few specific examples are discussed. The case in which the manifold is a sphere is given special attention. There we introduce a variation which allows the discussion of Archimedian curves which are constructed by substituting Archimedian spirals for geodesies. This variation allows the natural construction of curves that lie off the sphere. Although the spherical geodesic blossoms are neither symmetric or multiaffine, a simple reparametrization of geodesic arcs results in spherical blossoms that have all desirable characteristics. Section 3 also contains a brief discussion of the problem of finding geodesies in developable surfaces and in surfaces of revolution. 2 Preliminaries Let M be a C°° Riemannian manifold. There is the following theorem that guarantees the existence locally of geodesies. Theorem 2.1 If M is a Riemannian manifold, xo € M. Then there exists a neighborhood V of Xo and e > 0 so that if x € V and v is a non-zero tangent vector at x and \\vx\\ < e, then there is a unique CP° geodesic Q : (—2,2) —> M defined on the open interval (—2,2) such that a{0) = x and I "3~ ) = ^x\ "* / t=o For compact Riemannian manifolds there is the Hopf-Rinow theorem that tells us that points can be connected by geodesic arcs. Theorem 2.2 (Hopf and Rinow) If a connected Riemannian manifold M is compact, then any pair of points x and y may be joined by a geodesic whose length corresponds to the distance in the manifold from x to y. We also need the notion of geodesic convexity and the result of J. H. C. Whitehead that geodesically convex neighborhoods exist for all a; € M. Definition 2.3 Given a subset X of M and a point XQ £ X, X is star shaped with respect to the point 'XQ, if for every x £ X there is a unique shortest geodesic connecting Xo with x which lies in X. Definition 2.4 A subset X of M is geodesically convex if it is star shaped with respect to each of its points. Definition 2.5 Given a subset A of a geodesically convex set X the geodesic convex hull of A is the smallest convex set which contains A. Theorem 2.6 (J. H. C. Whitehead) Let V be an open subset of a Riemannian manifold M and let x G M , then there is a geodesically convex open neighborhood U of x such that U CV. Let M be a Riemannian manifold and let X be a geodesically convex subset of M. Given points Pi in M we describe extensions of the de Castlejau, de Boor, and Aitken algorithms. 37 38 A. Lin and M. Walker 2.1 Riemannian Lagrange curves Let M be a Riemannian manifold, and let A = {PQ, Pi, ■ ■ ■, Pn} be a subset of a geodesically convex subset X. Given parameter points, to < ^i < • • • < <n, assume that A is contained in a sufficiently small neighborhood in which specified geodesies exist. For 0 < i < n — 1, define 7/ : [to, t„] ^ X to be the unique geodesic parametrized so that Jiiti) = Pi and 7,^(^1+1) = Pj+i- For 1 < r < n and 0 < i < n - r define 7[ : [to, in]'' —» X so that 7[(ui, U2, ■ • ■, Wr-i, •) is the unique geodesic parametrized so that 7[(MI,«2,- • ■,Ur-i, ti) = il'''^{ui,U2,- ■ ■,u,-i) and 7[(wi,«2,- • ■,Ur-l, U+r) = 'yi+iiui,U2,- ■ ■,Ur-i). The function 70 : [to, tn]" —> X is called the geodesic Aitken blossom associated with the points P, E X, 0 < i <n and the parameter points, to < ti < • ■ ■ < tn ■ If A : [to, tn] -* [to,t„]" is the diagonal map defined by A(u) = {u,u, ■ ■ ■, w), the geodesic Lagrange curve associated with X and the points Pj is the ' V ' n function FJf = 70 o A. Theorem 2.7 If FQ : [to, tn] -^ M is the geodesic Lagrange curve associated with the points Pi e M, 0 < i <n, as defined above, then Fo(ii) = PiProof: Observe that for 1 < r < n and 0 < i < n — r, 7^ depends for its definition only on the points, Pj, where i < j < i + r. If n = 1, and we are given points, Po and Pi, the result follows from the definition of 70 . Inductively assume it is true for k < n. For k = n, if i = 0, by definition rS(io) = l^{to,to,--;to) = 7S'-\to,to, ■■■,to) =■•■ = 7o(^o) = Po n —1 n and likewise if i = n, FS(i„) = 7j(i„,t„, • •-,*„) = 7o"Hin,in, ■ • ^in) = ■•• = 7o(*n) = ^ ^ ' ■ ' ' n n—1 P„. For 2 7^ 0 and i ^ n, observe that the geodesies used in the construction of 7o~'' and 7"""^ may be restricted respectively to the intervals [fo,in-i] and [fi,in] so that 7o~^becomes the geodesic Aitken blossom associated with the points Po,Pi,' • •, P„_iand the parameter points to < ti < ■■■< i„_i, and 7""^ becomes geodesic Aitken blossom associated with the points Pi,P2,- • ■, P„ and the parameter points ii < ^2 < • ■ • < tn- By the deductive assumption, 7o~^(ii^--j^) = Pj = 7"~^fti,ij,- • ■,ti), ^ V n—1 ^ *" -v- n—1 and consequently 7o (ij, ii, • • •, ti, •) is the geodesic connecting 7o ~H*i, ij, • • •, U) with n—1 n—1 'yi''^(ti,ti,- ■ -jti), and is thus the constant function, ^Q{ti,ti,- ■ ■,ti,u) = Pi n—1 uE[to,tn]- for all n Thus in particular, 75'(ii,fi, ••-,«») =Fg(ii) =Pi. O ^ 2.2 CAGD techniques for differentiable manifolds Riemannian Bezier curves Following the previous format we introduce a Riemannian version of the de Casteljau algorithm. Accordingly, let X be a geodesically convex subset of a Riemannian manifold M. Let A = {Po, Pi, • • •, P„} be a subset of X. Define 7? : [0, 1] -> X hy j^{u) = Pi. For 1 < r < n and 0 < i < n - r define 7[ : [0, 1]'' ^ X to be the unique geodesic with the property that 7[(ui,«2,- ■ ■,Ur-i, 0) = 7["^(wi,U2,- • -jUr-i) and 7[(ui,U2,---,Wr-i, 1) = 7[+i'^(wi,U2,---,Wr-i)- The fuuctiou 7o : [0, 1]" -^ X is called the geodesic de Casteljau blossom associated with the set A. If A : [0, 1] -^ [0, 1]" is the diagonal map, the geodesic Bezier curve associated with X and the set A is the function T^ = j^ o A. 2.3 Riemannian B-Spline curves Given A = {PQ, PI, ■ • •, Pn} contained in a geodesically convex subset X of a Riemannian manifold M, and given knots ti < t2 < ■ ■ ■ < t2n, define 7? : [ti,t2n] ^ X by jf{t) = Pi,ioiO<i<n. For 1 < r < n and r < i < n, define 7[ : [i,, ti+n+i-rV -^ X to be the unique geodesic with the property that 7[ (ui, U2, • • ■, Wr-1, U) = l^ll (wi, U2, • • ■,Ur-i) and 7[(«i,U2,- • -yUr-i, ti+n+i-r) = 7[~^(ui,U2,- • ■,Ur-i). The fuuction 7n : [tn, in+i]" ^ X is Called the geodesic de Boor blossom associated the set A. If A : [i„, tn+i] -^ [tn, tn+i]" is the diagonal map, the geodesic B-Spline curve associated with X and the points Pi is the function FJJ = 7JJ o A. We have the following results, which follow from the fact that both the geodesic de Casteljau and the geodesic de Boor blossoms are constructed from successive geodesic combinations beginning with the set A = {PQ,PI,- ■ ■, Pn}Theorem 2.8 Given A = {Po,Pi, • • •, P„} contained in a geodesically convex subset of a Riemannian manifold, if 7^ : [0, 1]" —> X is the geodesic de Casteljau blossom of A, then 7o([0, 1]") is contained in the geodesic convex hull of the set A. Theorem 2.9 Given A = {Po,Pi,---, Pn} contained in a geodesically convex subset of a Riemannian manifold, i/70 : [*n, tn+iY"-* X is the geodesic de Boor blossom of A relative to a knot sequence ti <t2 < ■ ■ ■ < t2n, then 7o ([*«, ^n+i]") is contained in the geodesic convex hull of the set A. . Since each of the three blossoms are constructed successively from C°° geodesies, it follows that the blossoms and their restrictions to the diagonal are also of class C°°. Theorem 2.10 The geodesic Lagrange, Bezier, B-spline curves are of class C°° as are each of their corresponding blossoms. 3 Examples The impediments to implementation of these ideas depend on the manifold in question. In all cases it is necessary that the points Pj should he in a region in which it is possible to construct geodesic arcs between points. The problem then reduces to that of finding methods for such constructions. Even in cases for which this is possible, there is the additional problem that many of the desirable properties associated with B-spline or Bezier curves in R^ may have no direct analogs. Many properties such as the ability 39 40 A. Lin and M. Walker to subdivide a curve depend on the blossom being symmetric or multi-afRne, and for the generalizations presented here, this is seldom true. For the case of an orientable 2-manifold embedded in R^, there are in many cases good solutions to the problem of finding geodesies, but different classes of surfaces lead to different solution. In this section we mention a few. In the case that the manifold M is the 2-sphere S^ a preliminary version of our results is reported in [5]. 3.1 The sphere In the case that M = S^, a small alteration to methods presented so far allows the consideration of curves that lie off the sphere. Given points P and Q that lie off the sphere consider radial projections to points P and Q and let 7 : [a,b] —> 5^ be a geodesic with the property that 7(a) = P and 7(6) = Q. The curve 7 : [a, b] -> MP defined by 7(0 = (f5i-11^11 +IE!-IIQIl)-7(0 is called the Archimedian spiral connecting the points P and Q. To explicitly describe the curve 7 , set P — vi,Q = V2 and for simplicity consider the parameter interval [a, b] to be the unit interval [0,1]. For < -, ■ > the standard inner product on R^'' set ^3 = (< Wl,'!'2 > fl - U2)/(||< Vl,'y2 > fl - ■i'2||) SO tha:t V3 is orthogonal to Vi and in the plane containing vi and V2. Letting 6 =< fi, V2 > denote the angle between vi and V2, the geodesic 7 connecting Vy with V2 is defined by y{t) . : = = cos{t6)vi + sm(t,d)v3 ( .„. sm{te) < vi, V2> \ COs(te) + 71 ^^^ II V \\<VuV2>Vi - V2\\J sm{t9) ^—^ |TU2. ||< Vi,?;2 > Vl - V2II Vi - -r. The corresponding Archimedian Lagrange, Bezier and B-spline curves may now be constructed with the general algorithms of Section 2. One of the difficulties that arise with Archimedian curves is that geodesic blossoms are not necessarily symmetric or multi-affine. It is even not clear what these concepts might mean in a geodesic context. Consequently, certain results that hold for normal Bezier or B-spline curves that depend on these properties are no longer valid. In particular analogs of the subdivision algorithms that allow one to determine control points of a portion of a given Bezier or B-spline are not valid. However, it can be shown that a simple non-linear change in the parametrization of the geodesic arcs, makes it possible to recapture most of what is needed. Definition 3.1 Given two points A and B on the sphere. Let C be the smaller arc of the spherical geodesic joining A with B. The barycentric param.etrizo,tion of C on the parameter interval [a, b] is the function a : [a, 6] —> C defined by a{t) = q{x{t)), where x{t) = ^rz^-A + ^j^B and q:R^ -^ S^ is the radial projection q{x) = In the following we prove a spherical version of the Menelaus theorem. T—TT . CAGD techniques for differentiable manifolds 41 Theorem 3.2 Given 3points PQ, PI, P2 on S^ let 7 : [0,1] x [0,1] -> R^ be the geodesic de Casteljau blossom in which all geodesic arcs are given the barycentric parametrization. Then'y{s,t) =^{t,s). Proof: Observe that an elementary geometric argument tells us that: 7(s,t)=7o'(s,*) ' . = = ?((l-i)7o(s)+t7i(s)) g((l-i)[(l-s)Po + sPi]+f[(l-s)Pi+sP2]) 7(i,s)=7o(*>s). = = q{{l-sH{t) + s^\{t)) 5((l-s)[(l-i)Po + tPi] + s[(l-i)Pi+tP2]). and And the result follows from the afhne properties of M^. D As an immediate consequence we have Theorem 3.3 Given points PQ, PI, • • •, Pn on S"^, the associated de Casteljau blossom, in which geodesic arcs are given barycentric parametrization, is symmetric. The conventional blossoming description of subdivision can now be employed. Prom the blossom construction we can conclude that 7o (0,0, • • •, 0, 1,1, ■ • •, 1) = P,. In pari ticular, it follows that, for 0 < u < 1, the points Qi = 7o(0,0, • • -,0, u,u,---,u) i describe a geodesic de Casteljau blossom which is parametrized to the interval [0,u] and which, because of the uniqueness of geodesic arcs, equals the restriction of 70 to [0,u]". Likewise, for the interval [u, 1] the points Pj = 'JQ^U^U,••-,«, 1,1,• • •, 1) dei termine a geodesic de Casteljau blossom which is parametrized to the interval [u,l] and which equals the restriction of 70 to [u, 1]" Therefore, if g : [0,1] -^ S^ is the geodesic Bezier curve determined by PQ, Pi, • • •, P„ and if 5 = 7Q o A, it follows that, 9\[o,u] ■ t^lo{t,t,---,t, u,u,---,u) and g\[i^u] : t ^'y^{u,u,-■ ■,u, t,t,-■ ■,t), for i i 0<u<l. More generally and along the lines of the proof above, we have the following theorem which allows all familiar properties of both Bezier and B-spline curves which have descriptions in terms of their corresponding blossoms to carry over to the spherical case. Theorem 3.4 Let f : [0,1]" —>■ M^ be the Euclidean blossom generated by the de Casteljau algorithm using points Pi £ S'^,0 <i <n. Then 7o = 9° /■ 3.2 Other surfaces We briefly discuss two examples in which explicit descriptions of geodesies between points are possible. A developable surface S [4], described as the image of a function / : (7 —> E^ for U an open subset of M^, possess the characteristic, among others, that distances are 42 ; A. Lin and M. Walker preserved by the function /. Therefore, a geodesic in the surface f{U) may be considered as the image of a straight Hne in the plane. If Po,Pi,- • -/Pn are points in S, let Qi = f~^{Pi), 0 < i < n. If C C f/ is the Lagrange, Bezier, or B-spline curve obtained from the standard Euclidean versions of the algorithms, then it follows that /(C) is the corresponding geodesic curve in S that would have been obtairied using geodesic versions of the algorithms that we have described. For surfaces of revolution the description of geodesies between two points is rather more involved. Let C be a curve in the yz-p\ane described implicitly by ■ / m=^ \ x=0 ' for (y, z) belonging to some open set U contained in the upper half of the yz-plane. The surface S obtained by rotating C about the z-axis may be expressed as g~^{0) where 5 : R X t/ —> R is defined by g{x,y,z) = f{\/x^ + y^) - z = 0. In polar coordinates letting u = -y/^M-^, we express S in the form X = u cos 6 y = usin9 . z = f{u) Let P = («iCOS^i,uisin^i,/(ui)) and Q = {u2Cos62,U2sm92,f{u2)) be two points on S. Then it may be shown that the geodesic connecting P with Q is the function a : [ui,U2\ -^ S such that a{u) = {ucos6{u),sin9{u),f{u)), where for fixed WQ, ^'"'=£/i^-''' and constants c and c' satisfy the following equations: n ji + ifju) 2 72-6'i=/ A/-T—; ;w-du Jui V ^"^ ~ ■"' 1+ (/'("]!!,„. Ju„ V ^'^' For complete details see [6]. 4 Conclusion and future research We have outlined a procedure by which conventional computer aided design constructions may be extended to arbitrary Riemannian manifolds. In practice, there are difficulties. In a given manifold points to be interpolated or approximated must lie in a region in which it is possible to construct necessary geodesic arcs. Supposing this the case, one then needs to find explicit descriptions of the geodesies. And then there is the question of the additional characteristics which the curves might possess. The paper raises more questions than it answers. In the case of a sphere, good results are obtained, and it CAGD techniques for differentiable manifolds is also possible to add variation that allows consideration of curves off the sphere but which project radially to geodesic Lagrange, Bezier, or B-spline curves. It is also shown, in the spherical case, that a change parametrization of geodesies results in blossoms that retain the desirable characteristics associated with Euclidean blossoms. For surfaces of revolution and developable surfaces, we know that geodesies can be found between points so the geodesic blossom constructions will always exist. It is however unlikely that these blossoms will be either symmetric or multi-afhne; these characteristics depend on the affine structure of M^. Thus, in the case of a general Riemannian manifold, although the constructions may be valid, it is not clear that we will be able to employ fundamental operations such as subdivision which depend on the symmetry of the blossom. We have outlined three different methods of blossom construction, one for each of the algorithms considered. In the Euclidean case, we know that there is a unique symmetric, multiafhne polynomial that restricts to a given polynomial on the diagonal. This may not be true in our more general setting. Bibliography 1. Conlon, L., Differential Manifolds, a First Course, Birkhauser, Boston, 1993. 2. Fasshauer, G. E. and Schumaker, L. L., Data Fitting on the Sphere, in Mathematical Methods for Curves and Surfaces II, Daehlen, M., Lyche, T., and Schumaker, L. L. (eds), Vanderbilt University Press, Nashville, 1998, 117-166. 3. Gallier, J., Curves and Surfaces in Geometric Modeling, Theory and Applications, Morgan Kaufman, San Francisco, 2000. 4. Opera, J., Differential Geometry and its Applications, Prentice Hall, Upper Saddle River, NJ, 1997. 5. Levesley, J., and Ragozin, D. L., Local Approximation on Manifolds Using Radial Basis Functions and Polynomials, in Curve and Surface Fitting, Cohen, A., Rabut, C.R., Schumaker, L.L. (eds), Vanderbilt University Press, Nashville, 2000, 291-301. 6. Lin, A., Geodesies between points on surfaces of revolution. Tech. Report, Dept. Mathematics, York University, Toronto, May 2001. 7. Ramshaw L., Blossoming: A Connect the Dots Approach to Splines, Digital Systems Research Center, Report 19, Palo Alto, CA, 1987. 8. Shoemake, K., Animating Rotation with Quaternion Curve, ACM Proceedings, San Francisco, July 22-26, 9, 1985, 245-254. 9. Walker, M., Curves over a Sphere, preprint, 2000. 43 Parametric shape-preserving spatial interpolation and z/—splines Carla Manni Department of Mathematics, University of Torino, Italy maimi@dm.unito.it Abstract In this paper we present a class of C^ spatial interpolating curves depending on a set of tension parameters and we illustrate their ability to reproduce the shape of the data. The curves are constructed using cubic splines and basically reduce to classical F-splines for particular values of the tension parameters. 1 Introduction Shape-preserving interpolation via functional as well as parametric splines is a well studied topic for the planar case. On the other hand, shape-preserving interpolation for spaces curves is considerably more complex than for planar ones and the related literature is apparently limited. On this concern, a considerable part of the available schemes only ensures geometric continuity of the obtained curve (see [1, 8] and references quoted therein). Recently, C^ and C^ shape-preserving interpolating space curves have been obtained using polynomial splines of variable degree, [2, 3, 6]. However, working with low(fixed)-degree polynomial splines seems to be a standard choice in the CAD/CAM community. This motivates the careful investigation of shape preserving properties of cubic zv-splines recently carried out in [7] and the present paper. In this paper we present a method for constructing C^ spatial interpolating curves reproducing the shape of the polygonal line which interpolates the given data. The curve is constructed via the so called "parametric approach", [10], using classical cubic splines. The shape of the curve is controlled by the amplitude of the tangent vectors at the data sites which play the role of tension parameters. It turns out that, for particular values of the tension parameters, the proposed scheme provides a new, geometrically evident, description of classical C^ - G^ cubic i/-splines, [11]. Moreover, the method produces a suitable reparameterization for the above mentioned curves ensuring C^ continuity. The reparameterization is a cubic polynomial involving the tension parameters (see (3.3)). Thus, the evaluation of the curve for a fixed value of the new parameter requires the solution of a cubic equation. The geometric meaning of the tension parameters coupled with the powerful "shapepreserving" properties of the Bernstein-Bezier representation can be efficiently used to construct an iterative algorithm for C^ shape-preserving interpolation. The algorithm 44 Shape-preserving spatial interpolation 45 converges in a finite number of iterations and requires at each iteration the solution of a diagonally dominant linear system. The paper is organized as follows. In Section 2 we state the problem. In Section 3 we describe the construction of the required interpolant and we illustrate its dependence on the tension parameters. The asymptotic behavior and the shape-preserving properties of the obtained curve are briefly discussed in Section 4. We conclude in Section 5 with a graphical example. 2 The problem In this section we introduce the problem of shape-preserving interpolation by curves in M^. The adopted notion of shape-preserving follows the definitions of [2] and [6]. Let Ij e]R^ i = 0,...,N, be the interpolation points with I^ ^ li+i- Define, for all admissible indices, Lj := Ij+i — li, N,:=(feS' if||L.-ixL,||>0, 10, , 0, elsewhere. elsewhere, where |a b c| denotes the determinant of the matrix with columns a, b, c. The vectors Nj and the scalars A, are, respectively, the discrete binormals and the discrete torsions of the data. Let the parameter values ai, i = 0,... ,N, with ai < Gi^i be given, and let /ij := CTj+i - CTj, i = 0,1,..., A^ - 1 be the corresponding spacings. We wish to construct a curve Q(s), s G [(JQ, ITjv], which interpolates the data, Q((TJ) = Ij, i = 0,...,N, such that Q € C'^[ao,aN]. In addition, we also require that Q(s) is shape-preserving, that is it reproduces the convexity and torsion of the polygonal line connecting the interpolation points. More specifically, denoting with dashes derivatives with respect to the parameter s, we define (2.1) as the curvature vector and the torsion of the curve respectively. Q(s) is shape-preserving if it satisfies the following criteria ([2, 6, 7]). (i) Convexity criteria: (i.l) if Ni • Ni+i > 0, then K(s) • Nj > 0, j = i,?-I-1, s e [cri,cTi+i], (i.2) if Nj • Nj+i < 0, then K(s) • Nj, j = i,i-\-l, has one change in sign in (i.3) if Ni ■ Nj 7^ 0 then (K(cTi) • Nj)(Ni ■ Nj) > 0, j = i-l,i,i + l. (ii) Torsion criteria: ii Ai ^ 0 then T{s)Ai > 0, s G [(7^,0"^^;^]. [(TJ,ITJ+I], ,46 Carla Manni For the sake of brevity we refer to [7] for the more technical collinearity and coplanarity criteria. 3 Constructing the interpolating curve In order to construct the curve Q we consider, as a first step, a cubic curve C interpolating the data. We put C(0|K,.,,,,:=C,:(t;Af\Af^ (3.1) Ci{t; Af\ Af)) := lM'\u) + Ii+rH["\u) + Af ftiT,i/f^'(«) + X^^hiTi+rHl'\u), t€[cri,ai+i], u:={t-ai)/hi, (3.2) where 0 < A^- ,A,- < 1 are shape parameters, T,;, Tj+i are vectors to be determined and H^{u) denote the elements of the cardinal basis for cubic Hermite interpolation, that is H\ {u) are the polynomials of third degree such that —jl^^ = 5ij5ri, r,l = QX One can immediately verify that the curve (3.2) interpolates the points I,;, I,;+i at the extremes of the interval [CTj,cri+i] and has tangent vectors \\ 'T,, A^ TJ+I at the same extremes. The parameters x\ \\i determine the amplitude of the tangent vectors of the curve at the two end points of the interval and they control the shape of the curve. To be more specific, since HQ{U)-\-HI{U) = 1, we have that Ci(i; 0,0) reduces to the fine through Ij, Ij+j. Thus, the parameters A,- ,X\ act as tension parameters stretching the curve from the classical Hermite cubic interpolating I,, Ij+i with tangents T,;, Tj+i {\f\\f^ = 1) to the fine segment {xf\x^P = 0). The curve (3.1) turns out to be of class G^. Let us consider now the new global parameter s{t\a,,^,^,y.= Si{t-Af\\^P):=<TiH^o\^) + ^i+^Hf\u)+ (3.3) It is not difficult to see that, if 0<AP\Ar'<l (3.4) then ^sC/'A^^) A^^h , ~ > U, t e [cri,cr,;+ij. Thus (3.3) implicitly defines a function t = t{s), which provides a reparameterization for (3.1). In the following we assume that conditions (3.4) hold and we define Q(5) := C(i(.s)). (3.5) Since Q'(cri) = T,;, i = 0,... ,7V, Q is of class C^. For each sequence of the tension Shape-preserving spatial interpolation 47 parameters Xl',\l' we will determine the tangent vectors Tj,Tj_)_i so that Q is also of class C^. Let us denote by dots derivatives with respect to the local parameter u. Imposing continuity of Q"(s) ai ai, i = 1,... ,N — 1, from (3.3), (3.5) and from the chain rule for derivatives, we obtain Q-i(l-)/i,-iAli\ - 5,-i(l-)fe,_iAli\T, _ C,(0+)fe,Af - s,(0+)ft,Af T, (3.6) Thus, after some manipulations, from (3.2) we have UiTi^i+Ti+ViTi+i^Zi, Ui = i = l,...,N-l, (3.7) , Wi Wi Wi = /ii_i(3 - xf\){hiXf^f + hi{3 - AP)(/^i_lAli\)^ (3.8) z, = -Li{hi.,X^],r + -L,_i(/iiAf )^ Wj ' '-' Wi In order to uniquely determine the vectors Tj we need two additional equations that will be obtained by imposing boundary conditions. Classical boundary conditions are periodic conditions: uoT;v-i + To + t;oTi =zo, WjvTAr-i + Tjv + I^ATTI = zjv (with UO,VO,UN,VN,ZO,Z,N defined according to (3.8) setting /i_i = A_i - A}y_p L_i = Ljv-i, tangent conditions: /ijv = ho, Xy = A^°', X)^' = To = Do, TTV = /IAT-I, X^Q^\ LN X_l = A^_j, = Lo) and end DAT, (where Do, DN are given in input). In the following we will denote by I the set of indices {1,..., A'' — 1} ({0,..., N}) when end tangent (periodic) conditions are considered. It is not difficult to see that (3.7) for any choice of the above mentioned boundary conditions provide a diagonally dominant system , AT = z. (3.9) Thus we can state the following Theorem 3.1 For any sequence A^ , A^^\ i = 0,..., iV-1, satisfying (34), there exists a unique Q G C^[(To,(TAr] defined via (S.l)-(3.3), (3.5) which interpolates the given data and satisfies periodic or end tangent conditions. We notice that for A^.' = X^^' — 1, system (3.9) reduces to the system for the computation of classical C^ cubic splines. Moreover, if Aj._j = X^°' = Afc, k G I, the 48 , Carla Manni curve G is of class C^ and equation (3.6) reads ^C,(a, ) - ^Q_,(a, ) — -Q(a, ). Then (3.6) is equivalent to impose that the cubic curve (3.1) is a C^-G"^ cubic v-spline. [5, 7, 11] where, from (3.3), for i e I /z-^Si(0+)-fe-_Vi-i(l-) ^ (6 - 4A,; - 2\i+{)hf+ (6 - 2A,-i - i\i)h-\ Vi := (3.10) 4 Asymptotic behavior and shape-preservation In this section we briefly discuss the asymptotic behavior and the resulting shapepreserving properties of the curve Q, defined by (3.1)^(3.3), (3.5) and (3.9), as the tension parameters A,-°\ Af' approach zero. The following lemma (see also [7]) concerns the asymptotic behavior of the tangents T,;. We omit the details of the proof which are completely analogous to those of Theorem 3 in [9]. Lemma 4.1 The vectors Tj, i — Q,...,N, obtained from (3.9) are hounded independently of Xf\xf\ j =0,... ,N - 1. Moreover, A<^),, A<^>-.o V /^,(Af)2 + /l,_l(Aii\)2/^.-l =: (l-ai)r^ + «.^, i€X. ftj_i tii hi^r{X^\y + hi{xfr h, (4.1) Since the tangents are bounded independently on the tension parameters, from the previous section we have that Q approaches the piecewise linear function interpolating the data as the tension parameters tend to zero. Moreover, each tangent Tj determined by (3.9) tends to a strictly convex combination of Lj_i//ii_i and Li/hi as the tension parameters A^^°\, A^^ tend to zero while A[i\/Af ^ remains bounded and strictly positive. Due to these two main facts, we are able to easy control the shape of the curve Q and to ensure that it reproduces the shape of the data as the tension parameters approach zero as we will discuss briefly in the foflowing. Since C and Q only differ for a reparameterization they have the same image. Thus, as far as the shape-preserving properties are concerned, we can consider the expression of C. As noticed in Section 3, if x\^\ = xf\ i € I, the curve C with Tj obtained by (3.9), is a C^-G^ cubic z/-spline. In such a case, using (3.10), the careful shape analysis carried out in [7] and the resulting algorithm can be considered. However, the simple geometric meaning of the tension parameters A,^°\ Af' coupled with the "shape-preserving" properties of the Bezier-Bernstein representation, allow us to more easily establish the shape-preserving results also for completely general configurations of x\^\, xf\ Thus, we express the Shape-preserving spatial interpolation 49 curve segment Ci{t;X\ ,AJ ') in Bezier-Bernstein form: 3 Q(t;Af),A«) = 5;Q,P)t'(l-e 1=0 Cj^o '■— III Cjji :— Ij + 3/iiAj Tj, Cj^2 :— li+i — a'^'j-^i Tj+i, 0,^ :— Ij+i Let us consider at the beginning the convexity criteria. A (1) A Lemma 4.2 //Nj • Nj ^ 0 ond -^ -> c> 0, t/ien lim N{0) (K(ai)-Nj)(Ni-N,)>0. X(I)_^O Proof: From the properties of Bezier curves (see [5]) and from (2.1) and (3.5) sgn(K(CTi) • N,-) = sgn((Q,i - Ci,o) X (Ci,2 - Ci,i)) • N,sgn Af/^.„ TixlLi- -^-^Ti - )S'h, ■N,- Li+l where sgn(y) denotes the sign of y. Moreover, from (4.1) lim (Ti X Li) ■ N,- = fai^ X Li + (1 - a^)^ x • LA N,- = ^\~"'^Ni • N^. t —1 ' t Hence, we obtain the assertion if Nj • Nj ^ 0. □ The previous lemma ensures that, if AJ_\, A^ ' are small enough the third convexity criterion, (i.3), stated in Section 2 is satisfied. In addition, the sign of K(0-fe)-Nj, k = i,i+ 1 can be checked considering the Bezier coefficients C,,;, 1 = 0,1,2,3, of Cj. Furthermore, thanks to the shape-preserving properties of totally positive bases, for small values of the tension parameters, (see [4]) the number of changes in sign of K(s) • Nj, s e [(Ti,CTi+i] is bounded by the number of changes of sign in the pair K(0-fe) • Nj, fc = i, i + 1. Thus, also the first and the second convexity criteria (i.l) and (i.2) are satisfied if the tension parameters are small enough. As far as the torsion is concerned, we recall that the sign of the torsion of a cubic curve coincides with the sign of the discrete torsion of its Bezier control polygon (see for example [5]) thus it is not difficult to obtain the following Lemmia 4.3 7/ Aj 7^ 0 and -^ -^ c> 0, j = i,i + l, then ,(0) ,(1) lim ,(0) .(1) ,(0) ,(1) .„ T{s)Ai>0, s e[a^,crZ-,]. ' ' . ' ^^'■' ■ With similar arguments it is not difficult to prove that also the coUinearity and the coplanarity criteria stated in [7] are fulfilled as the tension parameters approach zero. We omit the details for the sake of brevity. Summarizing, from the previous discussion it follows that if the tension parameters are small enough then the Bezier control polygon of C reproduces the shape of the data and 50 Carla Manni the curve C does the same thanks to the properties of Bezier-Bernstein representation. Thus, to obtain an automatic algorithm to compute the C^ interpolant Q defined by (3.5), satisfying convexity and torsion criteria, basically we have to perform the following steps: (a) for a given sequence of the tension parameters solve the system (3.9) and compute the Bezier coefficients of the resulting curve C; (b) check if the control polygon of each segment C, satisfies the convexity and torsion criteria; (c) if this is not the case reduce the values of the related tension parameters according to a given rule and go to step (a). 5 A graphical example To illustrate the performance of the presented scheme we consider the data proposed in [7], Example 2, consisting of 20 points with uniform parameterization in [0,1]. End tangent boundary conditions have been used (see Table 2 in [7]). Figures 1-3 show the behavior of the obtained C^ curve Q compared with the classical C^ cubic spline. The shape-preserving curve Q is defined by the following sequence of tension parameters A(I) . .6 .9 .6 1 .6 .6 FIG. .9 1 .9 111111111 .9 .911111111 .75 1 1 1 1 1 1 .75 1 1. 1. C^ cubic spline (left) and Q (right). Bibliography 1. S. Asaturyan, P. Costantini and C. Manni, Local shape-preserving interpolation by space curves, IMA J. Numer. Anal. 21 (2001), 301-325. 2. P. Costantini, T. N. T. Goodman and C. Manni, Constructing C^ shape-preserving interpolating space curves, Adv. Comput. Math. 14 (2001), 103-127. 3. P. Costantini and C. Manni, Shape-preserving C^ interpolation: the curve case, Adv. Comput. Math. (2002) to appear. 4. T. N. T. Goodman, Total positivity and the shape of curves, in Total Positivity and its Applications, M. Gasca and C. A. MiccheUi (eds), Kluwer, 1996, 157-186. Shape-preserving spatial interpolation 2. Left: ||K(s)|| for the C^ cubic spline (dotted line) and for Q. Right: convexity ratio ^^^ in [ao,ai] (with NQ := ||^g^^°||) for the C^ cubic spline (dotted line) and for Q. FIG. 3. Left: torsion of the C^ cubic spline (dotted line) and of Q (the horizontal lines depict the sign of the discrete torsion). Right: first component of (fC/dt^ (dotted line) andofd^Q/rfs^. FIG. 5. J. Hoschek and D. Lasser, Fundamentals of Computer Aided Geometric Design, A. K. Peters Ltd., 1993. 6. P. D. Kaklis and M. L Karavelas, Shape preserving interpolation in TZ^, IMA J. Numer. Anal. 17 (1997), 373-419. 7. M. L Karavelas and P. D. Kaklis, Spatial shape-preserving interpolation using v-^ splines, Numer. Algorithms 23 (2000), 217-250. 8. V. P. Kong and B. H. Ong, Shape Preserving Interpolation using Frenet Frame Continuous Curve of Order 3, (2001) preprint. 9. P. Lamberti and C. Manni, Shape-preserving C^ functiona,! interpolation via parametric cubics, Af«mer. ^tyorii/ims 28 (2001), 229-254. 10. C. Manni, On Shape Preserving C^ Hermite Interpolation, BITAl (2001), 127-148. 11. G. Nielson, Some piecewise polynomial alternative to spline under tension, in Computer Aided Geometric Design, R. E. Barnhill and R. F. Riesenfeld (eds) Academic Press, 1974, 209-235. 51 On the g-Bernstein polynomials Halil OruQ and Necibe Tuncer Department of Mathematics, Dokuz Eyliil University, Tinaztepe Kampiisii 35160 Buca Izmir, Turkey halil.orucQdeu.edu.tr, necibe.tuncerOdeu.edu.tr Abstract We discuss here recent developments on the convergence of the g-Bernstein polynomials Bnf which replaces the classical Bernstein polynomial with a one parameter family of polynomials. In addition, the convergence of iterates and iterated Boolean sum of qBernstein polynomial will be considered. Moreover a g—difference operator T>qf defined by Vqf — f[x, qx] is applied to g-Bernstein polynomials. This gives us some results which complement those concerning derivatives of Berrlstein polynomials. It is shown that, with the parameter 0 < g < 1, if A*/r > 0 then VgBnf > 0. If / is monotonic so is VgBnf. If / is convex then V^Bnf > 0. 1 Introduction First we begin by introducing some notations to be used. For any fixed real number q> 0, the g-integer [k] is defined as (1 - g^Va - g), g^l, ^ ' ^ k, q = l, for all positive integer k. The term Gaussian coefficient is also used, since they were first studied by Gauss (see Andrews [1]). Let p{N, M, n) denote the number of partitions of a positive integer n into at most M parts, each less than or equal to N. Then the Gaussian polynomial, G{N, M, n), appears as the generating function 'N + M' M G{N,M,n) = Y,p{N,M,n)q\ n>0 Note that [^] defined by = 1 FFfel. n>k>0, I 0, otherwise, where [n]\ = [n]{n — 1] • • ■ [1] with [0]! = 1, is called Gaussian polynomial (or g-binomial coefficient) since it is a polynomial in q with the degree (n — k)k. The g-binomial coeffi52 53 On the q-Bemstein polynomials cients satisfy the recurrence relations, 'n+l and = g»-fe+i [n+ll fc — n k - 1. 71 + n ^ 77, + g'^ .fc k-'i-. (1.1) (1.2) The following Euler identity can be verified using the recurrence relation (1.1) by induction that (1 + x){l + qx)-- ■ (1 + q^-^x) = ^ 5K'--i)/2 (1.3) r=0 Phillips [8] introduced a generalization of Bernstein polynomials (g-Bernstein polynomials) in terms of q'-integers n—T—l Bn{f;x) = Y,fr ^ x\^{l-q^x), r=0 L J (1.4) g_o where /r = / (j^) and an empty product denotes 1. When q=l the (1.4) reduces the classical Bernstein polynomials. The Bn{f;x) generalizes many properties of classical Bernstein polynomials. Firstly, generalized Bernstein polynomials satisfy the end point interpolation 5n(/;0) = /(0), 5„(/;l) = /(l). Phillips [8] also states the generalization of well known forward difference form (see Davis [3]) of the classical Bernstein polynomials by the following theorem. Theorem 1.1 The generalized Bernstein polynomial, defined by (1-4), inay be expressed in the q-difference form Bn{f;x) = Yl AVoa;'- (1.5) r=0 /\where A^i = A'-i/i+i -q^'^A^-^fi forr>l and A^/i = fi It is easily verified by induction that q'-differences satisfy AVi = ^^(-l)^^^^^-!)/^ fr+i—k- (1.6) k=0 Using the g-difference form of the q-Bernstein polynomials (1.5), one may show that q-Bernstein polynomials reproduce linear functions, since B„(l;a;) = 1; Bn{x;x) = x. 2 Convergence In the discussion of the uniform convergence of the g-Bernstein operator, the BohmanKorovkin Theorem (see Cheney [2]) is used as in the classical case. The BohmanKorovkin Theorem states that for a linear monotone operator £„, the convergence of 54 Halil Drug and Necibe Tuncer ^nf —* / for f{x) = l,a;,a;^ is sufficient for the sequence of operators £„ to have the uniform convergence property £„/ —» /, V/ G C[0,1]. Observe that the g-Bernstein operator is a monotone linear operator for 0 < g < 1. For a fixed value of q with 0 < g < 1 fnl —> OS n —> oo. Notice that, since Bn{x^]x) = x^ + [n\ ^ B„{x'^\x) does not converge to x^. PhilHps [8] studies the uniform convergence of qi-Bernstein polynomial. Theorem 2.1 Let q = Qn satisfy 0 < g„ < 1 and let g„ —> 1 as n -+ oo. Then, Bn{f;x)^f{x), V/(x)eC[0,l]. The degree of g-Bernstein approximation to a bounded function on [0,1] may be described in terms of the modulus of continuity with the following theorem. Theorem 2.2 If f is bounded on [0,1] and B„f denotes the generalized Bernstein operator associated with f defined by (1.4)> ^'^^n ; ||/-B„/|U<^a;(l/HV2). An error estimate for the convergence of qi-Bernstein polynomials is given in Phillips [8] by the Voronvskaya type theorem. : Theorem 2.3 Let f be bounded on [0,1] and let Xo be a point of [0,1] at which /"(XQ) exists. Further, let q = Qn satisfy 0 < Q'„ < 1 and let q'n —> 1 as n —> oo. Then the rate of convergence of the sequence of generalized Bernstein polynomials is governed by lira [n]{B„{f;Xo) - f{xo)) =-xo{l-xo)f"ixo). n—^oo / It is well known that the classical Bernstein polynotnials B„/ provide simultaneous approximation of the function and its derivatives. That is if / 6 CP[0, 1], then : ■ lim B(P)(/;x) = /(P)(x) n~*oo uniformly on [0,1]. It is worthwhile to examine if this property hold for g-Bernstein polynomials. Phillips [7] proved that the p*'' derivative of g-Bernstein polynomials converges uniformly on [0,1] to the p*'' derivative of / under some restrictions of the parameter q. This property results from the generalization of the following theorem. Theorem 2.4 Let f € C^ [0,1] and let the sequence (Q„) be chosen so that the sequence (e„) converges to zero from above faster than (1/3"), where Then the sequence of derivatives of the generalized Bernstein polynomials, B'^f, converges uniformly on [0,1] to f'{x). Up to now the convergence of g-Bernstein polynomials is examined by taking a sequence q = Qn such that ^n ^ 1 as n -^ oo. In the recent developments, the convergence On the q-Bernstein polynomials 55 of Q'-Bernstein polynomials is examined for fixed real q, 0 < q < 1 and for g > 1. It is proved in Orug and Tuncer [6] that for a fi^xed q, 0 < q < 1, the uniform convergence holds if and only if / is linear on the interval [0,1]. Moreover, if g > 1, 5„/ —> / as n —> 00 if / is a polynomial. Theorem 2.5 Let q>l be a fixed real number. Then, for any polynomial p, lim Bn{p;x) =p{x). n—*oo For any fixed integer i, the g-Bernstein polynomials of monomials (see Goodman et.al. [4]) can be written explicitly as i Bn{x';x) = Y,Xj[nY-'Sg{i,j)x^, (2.1) where an empty product denotes 1, and ^.(M-) = ^jr7(^:i)7,E(-l) r^r(r-l)/2 [j-rY, 0<i<j, (2.2) is the Stirling polynomial of second kind. Thus for any polynomial p of degree m, one may write Bn{p;x) = a'^Ax, (2.3) where a is the vector whose elements are the coefficients of p, A is an (m+1) x (m +1) lower triangular matrix with the elements a =1 ^^• [ny-'Sg{i,j), 0<j<i, ^2.4) and X is the vector whose elements form the standard basis for the space of polynomials Pm of degree m. Lemma 2.1 Let 0 < q <1 be a fixed real number. Then lim Bn{p;x) = p{x) n—►oo if and only if p{x) is linear. This lemma can be generalized for any function / € C[0,1]. Theorem 2.6 Let 0 < g < 1 be a fixed real number and f 6 C[0,1]. Then lim Bn{f;x) = f{x) n—*oo if and only if f{x) is linear. 56 Halil Orug and Necibe Tuncer 3 The iterates The iterates of classical Bernstein polynomials were first studied by Kelisky and Rivlin [5]. The authors proved that iterates of Bernstein polynomials converge to linear end point interpolants on [0,1]. Several generalization of the result due to Kelisky and Rivlin has been considered by many authors; see Sevy [9] and Wenz [10]. The recent result is the convergence of iterates of generalized Bernstein polynomials. It is proved in Orug and Tuncer [6] that the g-Bernstein polynomials do preserve the convergence property of iterates of classical Bernstein polynomial. The iterates of generalized Bernstein polynomial are defined by B^+\f;x) = B„{B^{f;xy,x), M-1,2,..., (3.1) where B^(/;a;) = B„(/;a;). ■ Theorem 3.1 Let q>0 be a fixed real number. Then Jim Bf(/;a:)=7(0) + (/(l)-/(0))x. (3.2) Let A and B be operators then the Boolean sum of A and B is defined to be A®B = A + B-AoB. ; We will be concerned with iterated Boolean sums of the generalized Bernstein polynomials in the form 5„ 9 B„ © • • ■ © B„ and will denote such an M-fold Boolean sum of the generalized Bernstein operators by ®^B„. Sevy [9] and Wenz [10] proved that the hmit of iterated Boolean sums of Bernstein polynomials is the interpolation polynomial with respect to the nodes (^,/(^)) i = 0,... ,n as M —> oo. The second theorem of this section will give a result for the convergence of iterates of Boolean sums of generalized Bernstein polynomials. It is proved in Orug and Tuncer [6] that the iterates of Boolean sums of q'-Bernstein polynomials converge to the interpolating polynomial at the nodes (i.^(fl))Theorem 3.2 The iterated Boolean sum of the q-Bemstein operator ®^Bn{f;x) associated with the function f{x) 6 C[0,1] converges to the interpolating polynomial Lnf of degree n of f{x) at the points Xi = [i]/{n], i = 0, l,...,n. 4 A difference operator Vq on generalized Bernstein polynomials Given any function f{x) and q £ Rwe define the operator Vg pj(,) = felzM. z' qx -X (4.1) Thus T>qf{x) is simply a divided difference, Vqf{x) = f[x,qx]. Note that, for a function / and non-negative integer fc f[x,qx,...,q''x] = -~V'jix). 57 On the q-Bemstein polynomials Theorem 4.1 For any integer 0 < k <n, n—k n—k r r=0 n—r—1 n (i-9^^)- s=fe Proof: Recall the g-difference form of generalized Bernstein polynomials (1.5) and apply the operator Vq to Bn{f; x) repeatedly k times to get, n—k VlBn{f;x) = Y, T [n]\ i i,r ,A^^'h^'■ (4.2) It will be useful to express A^^^'" in terms of A*'. One may prove by induction on m that, for 0 < m < n — A; we may write ^m+fcj. ^ y^(_l)tgt(t+2t-l)/2 A fm+i-t- t=0 Now applying the latter identity to (4.2) gives '^ n—k r r! r=0 t=0 AVr-tX^ (4.3) Writing m = r — t N! 771 +i [n-fc-m-i]![m + t]! t [n]\ n—k—m [n — k — m]\[m\\ [ t (4.4) and putting (4.4) in (4.3) we obtain n—k P,^B„(/;x)=E m=0 n! [n — fc — m]![m]! n — fc — m t a;*. Now, it can be easily derived from generalized binomial expansion (1.3), on replacing x by q''x, that n—m—l n—k—vn n {l-q'x)= E (-l)*9*(*+2'=-i)/2 f=fc This completes the proof. t=o n—k—m x\ t D From Theorem 4.1 we see that, with 0 < q' < 1, if t^fr >OforO<r<n — fc then V^Bnif; x) > 0. If / is convex on 0 < a; < 1 then V^B„{f; x) >0 ioi 0 < q <l.Ii f IS increasing then T>qBn{f; x) > 0, ioi 0 < q < 1. Acknowledgment: The second author is supported from the Institute of Natural and Applied Sciences of D.E.U. and this research is partially supported by the grant AFS 0922.20.01.02. 58 Halil Orug and Necibe Tuncer Bibliography 1. G. E. Andrews, The Theory of Partitions, Cambridge University Press, Cambridge, 1998. 2. E. W. Cheney, Introduction to Approximation Theory, AMS Chelsea, Providence, 1981. 3. P. J. Davis, Interpolation and Approximation, Dover PubHcations, New York, 1975. 4. T. N. T. Goodman, H. Orug, and G. M. Philhps, Convexity and generalized Bernstein polynomials, Proc. Edin. Math. Soc. 42 (1999) 179^190. 5. R. P Kelisky and T. J. Rivlin, Iterates of Bernstein polynomials. Pacific J. Math. 21 (1967), 511-520. 6. H. Orug and N. Tuncer, On the convergence and iterates of g-Bernstein polynomials, J. Approx. Theory, to appear. 7. G. M. Phillips On generalized Bernstein polynomials. Numerical Analysis, D. Griffiths and G. Watson eds. (1996), 263-269. 8. G. M. Phillips, Bernstein polynomials based on the g-integers. The heritage of P. L. Chebyshev: a Festschrift in honor of the 70th birthday of T. J. Rivlin. Ann. Numer. Afaf/i. 4 (1997), 511-518. 9. J. C. Sevy, Lagrange and least-square polynomials as limits of linear combinations of iterates of Bernstein and Durrmeyer polynomials, J. Approx. Theory 80 (1995), 267-271. 10. H. J. Wenz, On the limits of (Linear combinations of) iterates of linear operators, J. Approx. Theory 89 (1997), 219-2S7. Uniform Powell-Sabin splines for the polygonal hole problem Joris Windmolders and Paul Dierckx Department of Computer Sciences, Kath. University Leuven, Belgium. Joris.WindmoldersQcs.kuleuven.ac.be, Paul.DierckxQcs.kuleuven.ac.be Abstract An algorithm is described for smoothly filling in a polygonal hole in a surface, with a parametric uniform Powell-Sabin spline surface patch. It uses interpolation and subdivision techniques for iteratively determining an approximating solution. No assumptions are made about the surrounding surface. The user has to provide routines for calculating the curve points and the unit surface normal along the edge, as well as the unit tangent vector of the edge curves, parametrized on the unit interval. 1 Introduction A classical problem in CAGD is to fill in a hole, bounded by a set of surfaces. This problem has already been addressed in the literature (e.g. [1, 2, 4]). In most cases, assumptions are made on the bounding surfaces. In this paper, we present an algorithm for filling in a 3, 4, 5 or 6-sided hole that makes no assumptions on the surrounding surfaces, and therefore it is generally applicable. On the other hand, the filling patch will meet the given boundary curves approximately. The input of our algorithm (see Figure 1) consists of the boundary curves p which join at their endpoints. Furthermore, the user should provide the unit tangent vector 7 to the boundary curves at any point, and the unit normal vector n to the surrounding surface at any curve point except the endpoints, where the tangent vectors of the joining curves are needed only (see Figure 1 again). For other (interior) curve points, our algorithm will calculate a unit vector 5 = n X j, which will be called the (unit) cross-boundary tangent vector. It shall be referred to as if it were provided by the user. We will calculate a filhng surface patch that interpolates the user suppUed boundary curves and has the same siurface normal in a number of points. This will leave us some degrees of freedom, which we will use to fit the curve and the cross-boundary tangent vector in between each pair of interpolation points. In section 2 we briefly recall the basic properties of uniform Powell-Sabin splines. Section 3 explains how we can benefit from these properties to use UPS-splines for the polygonal hole problem. Section 4 explains our algorithm in detail. Finally we remark that on the pictures, we will denote 2D and 3D entities interchangebly; therefore most pictures reflect the situation only schematically. 60 Uniform Powell-Sabin splines FIG. 2 61 1. User supplied data. Uniform Powell-Sabin splines This section recalls the main properties of Uniform Powell-Sabin splines. For details, we refer to the original papers [3, 5]. I By 52(A*) we denote the linear space of uniform Powell-Sabin splines (in the sequel called UPS-splines), i.e., piecewise quadratic polynomials on a uniform triangulation A (which means that all triangles are equilateral and have the same size) of a polygon fi, where A* is a PS-refinement of A. The boundary of ft will be called 5Q, whereas the boundary of the tria,ngulation will be referred to as 6A. The vertices of A are denoted Vi,i = 1,...,n, and its triangles are pi,i = l,...,m. These splines have global C^continuity on A*. Any s(u,v) has a unique B-spline representation n 3 s(u,v) = ^^Ci,jBi(u,v), (u,v)en, , (2.1) i=ij=i where the locally supported basis functions form a convex partition of unity and Cjj S R^ are the control points. It follows that s(u, v) belongs to the convex hull of {cjj}^ •. Furthermore, one can prove that the control triangles, being defined as Ti(cj_i, Ci,2) ^5,3), i = 1,... ,n, are tangent to the surface at s(Vi). Due to the local support of Bf, a change to Cij will only affect s(u, v)|Mii i-e., the restriction of s(u,v) to the molecule of Vi, being the set of triangles pj that have V^ as a vertex. This indicates that we have a useful representation for C-^-continuous surfaces, without being restricted to a rectangular domain, and still enjoying the interesting features of the classical B-sjpline representation for tensor product splines. 2.1 Subdivision In [5] we present a subdivision scheme for UPS-splines. Let A^ be a uniform refinement of A, obtained by midedge subdivision. For a given s(u, v) on A, the representation (2.1) on Ar can be calculated using convex barycentric combinations of the control points only. First, a new control triangle along each edge ViVj is calculated as illustrated in J. Windmolders and P. Dierckx 62 =i.3 Ci,3 <=j,3 •^3,2 <^j,2 FIG. 2. Subdivision and Bezier points. Figure 2, left, for the bottom edge of a triangle pi{Vi, Vj, Vk) G A: Cl C2 C3 = = = 5(Ci,2 + Ci,3) icj,l + |(ci,2 + Cj,2) icj,l + |(Ci,3 + Cj,3). (2.2) Next, the control triangles at the original vertices are rescaled: for example. "i,2 = = |ci,l + g(Ci,2 + Ci,3) |ci,2 + g(Ci,3+Ci,i) "1,3 = |ci,3 3^>.3 + ' g(Cl,l+Ci,2). 6V (2.3) They are still tangent to the surface at their barycenter, but their area is only a quarter that of the former control triangles. Therefore they connect tighter to the surface. 2.2 The piecewise Bezier representation Another important property of the B-spline representation for UPS-splines, is that the piecewise Bezier representation can be calculated from (2.1) using simple convex barycentric combinations of the control points. In particular, focus an edge ViVj of A (see Figure 2, right). The Bezier points of the edge curve can be found from: s(Vl) = Pl = -(Ci,i + Ci,2 + Ci,3), 1/ X Ul = -(Ci,2 + Ci,3), s(Vj) Pj 2 1, Uj = -Cj,i + -(Cj,2 + Cj,3), :(^J.l+Cj,2+Cj,3), (2.4) -(Ui + Uj). (2.5) Fj j This is a piecewise quadratic Bezier curve, which means that pi, ry and Pj are surface points, and that ui - pi and pj - Uj are tangent to the surace at pi, resp. pj. Assuming a (counterclockwise) ordering of the boundary vertices F, € ^A, the edge curve from s(Vi) to the next adjacent point s(Vj) will be denoted ei(u, v). 3 Application to the polygonal hole problem Recall that our goal is to calculate a UPS-spline filling a hole in a surface, given by a set of bounding curves (denoted p), their derivatives j and the cross-boundary tangent vectors S. The UPS-patch will fit these curves approximately along its boundary. In the first place, interpolation of the given data at the vertices Vt e 5A is achieved. This leaves 63 Uniform Powell-Sabin splines Ci,V FIG. 3. Tangent and cross-boundary tangent vectors. some degrees of freedom allowing to fit the given curves. In the sequel we shall denote the user supplied data, evaluated at Vi, by (pi,7i,5i). 3.1 Interpolating UPS—splines and degrees of freedom In order to obtain interpolation we determine a control triangle Tj in the tangent plane spanned by pi + eji + uSi, e,^ E R, such that s(Vi) = pi. Curve point interpolation is simply expressed by (2.4). Furthermore, we let the tangent to ei at Vi be parallel to jf. Ui - Pi = ^(Ci,2 + Ci,3) - -Cl,i O O ■ a-iji, (3.1) where a^ is a scahng factor. Next, we need the cross-boundary tangent vector of s(u,v) at Vi to be parallel to 5,. Mapping the cross-boundary vector d in the domain plane (see Figure 2, right) onto the control triangle yields a vector parallel with Ci,2 — Cj^a: \ Ci,2 - Ci,3 = 2A4 (3.2) where /3j is again a scaling factor. Solving (2.4), (3.1) and (3.2) to cij in terms of the unknown a, and /3j (further called the a- and /3-factors) yields Pi - aai Pi + fTi + A^i Pi + t7i-A^i- (3.3) These equations ensure that s(u, v) interpolates the given data at Vi G 5A, and leaves us two degrees of freedom per vertex (oj and 0i). These scaling factors are related to the size of the control triangle. For example, subdivision by (2.3) divides a^ and /Sj by a factor of 2. 3.2 The fitting equations We will now use these degrees of freedom to fit the user supplied data, in between each pair of adjacent interpolating vertices Vi, Vj £ SA. First, the a-factors at Vi axid Vj are determined by trying to interpolate the curve p at the edge midpoint Vij = |(Vi + Vj). From Section 2.2, the interpolation condition reads rij = 2("»"'' '^j) ~ P>J' where pij is the given curve point. Taking (2.5) and (3.3) into account, we have aili - djlj = 4pi,j - 2(pi + pj) = qij. (3.4) 64 J. Windmolders and P. Dierckx FIG. 4. Consecutive iteration steps. This is a system of 3 equations with (at most) 2 unknowns. It can be solved in the least squares sense. Next, the /3-factors at Vi and Vj are obtained by fitting the cross-boundary tangent vector at Vij. First, we derive a subdivision rule for the /3-factors at the vertices of A from (2.2) and (3.2): 0'iAj = \i^^^i + I^M (3-5) where Slj is the cross-boundary tangent vector to s(u, v) at Vij. This /S^'^—factor belongs to a finer subdivision level then /3j and Pj, so we have to scale it up by a factor of 2. The interpolation condition then is Note that 5ij has been used instead of S'^j. This is again an overdetermined system which can be solved in the least squares sense. 4 The algorithm We will restrict the figures illustrating the algorithm to the case of a triangular hole, although the algorithm is immediately applicable to cases with 4, 5 and 6 boundary curves as well (see Section 4.4). The idea is to calculate, during a pre-iteration step, an initial solution which is smooth, but in general not close enough, and to refine this approximation iteratively to obtain a better fit to the given curves until a certain stopping criterion is satisfied. Finally, during a post-iteration step, the interior control triangles are calculated, actually filling the hole. Figure 4 illustrates this: imagine a pre-iteration step, two refinement steps and a post-iteration step. The control triangles added during a particular step have been shaded. 4.1 An initial solution The initial solution (Figure 4, leftmost) is easily obtained by solving (3.4) in the least squares sense for each edge ViVj. If we assume that 7, =^ 7^, then : . Oii = p((7i-qi,j)-(7j-qij)(7i-7j)), (4-1) ( Uniform Powell-Sabin splines "j = ;^(-(7rqi,j) + (7i-qi,j)(7i-7j)), 65 (4.2) where D = 1 — (7i •■fj)^. This yields two a-factors per vertex: one for each boundary edge being incident to that vertex. Therefore, T, is completely determined. The /3-factors can be calculated by writing (3.3) for both edges incident with the vertex and ehminating C2, respectively Ci, e.g., for Figure 3, right, Pi= 0:2(72 • ^1), P2 = -ctiili • ^2)- (4.3) There exist pathological cases where 72 -L Si or 71 ± 62. Our algorithm then sets /3i = ai, resp. ^2 = 0:2- For the case 7, = 7,, (3.4) has no solution in the least-squares sense. Assuming that si is a straight line from s(Vi) to s(Vj), the a-factors can then be determined from the projection onto the domain plane, where the size of the so-called PS-triangles (the projections of the control triangles) is fixed. The reader can verify that this yields ai = aj = ||ViV^'|. 4.2 The iteration step First the control triangles from the previous steps are rescaled by subdivision. This is simply done by scaling down the a- and /3-factors: a, <— ^ and /3i <— ^, for each Vi € SA. Next, a new control triangle is created in between any two adjacent vertices at the coarser level. This situation is illustrated in Figure 5, left, where the darker triangles are known. We are looking for the a-and /3-factors for the middle control polygon, which is tangent to the surface at s(Vk), Vk = 2 (^ + ^)- Consider the a-factor first. In order to obtain a better fit, we try to interpolate p at Vi^k = ^{Vi + Vk) and Vkj — ^(Vfe + 1^). This yields a set of fitting equations r o.ai-ak% = qi,k, .^^y where a, and Uj are known. Thus, afc can be obtained as the least-squares solution of (4.4): . ttfe = 2(7fc-(aj7J-qi,k + qk,j-Q;j7j))- ^ (4.5) The ,8fe-factor is found by fitting the cross-boundary vectors at Vi^k and Vk^j, i.e., by solving the following system in the least-squares sense: {l3i,kSi,k — 2^l3i5i + I3k5k), /^ gs Pk,jh,j = l{Pkh + 0j5j), where /3j and /3j are known. If 6i^k = Sk = Skj, as is always the case for a planar curve, this system has no solution in the least-squares sense. The /3k factor can then easily be obtained by equation (3.6), i.e., by subdivision and upscaling. 4.3 The interior control points Finally, as soon as the user supplied edge curves have been approximated well enough, the interior control points at the eventual refinement level have to be calculated. We will J. Windmolders and P. Dierckx 66 <=j.2 FIG. 5. The refinement and post-iteration steps. FIG. 6. The hole and the triangular patches. discuss three possibilities by the help of an example; Figure 6 shows a hole (left) and two filling patches (right). Copy From Initial. The interior control points are obtained directly from the initial solution by subdivision. This guarantees that the interior of the patch is smooth. A disadvantage is that the inner of the first approximation in general has no connection with the shape of the edge curves. This can cause unwanted artefacts near the boundary, after a few iterations (see Figure 7, left). The next option will therefore take edge features into account. Averaging. We will fill the hole gradually by calculating a ring of control triangles during each pass, going from the edge towards the inner of the patch. Figure 5, right shows an example where each ring has a different shade of grey. At each step, a control triangle of the current ring is obtained by averaging six surrounding control triangles. These come from the initial solution, or, if possible, from a previously calculated ring. Edge features are now smoothed out towards the inner of the patch. However, there is a main disadvantage to this approach, if averaging is applied after the last iteration step: the unwanted artefacts mentioned before are now repeated for every ring, smoothed out towards the inner of the surface, as shown on Figure 7, middle. Instant Update. A good compromise would be to take edge features into account before we finish iterating. This can be accomplished by subdividing the initial solution at each refinement step, but, we always overwrite its edge with the most recent boundary approximation. The results of this strategy are depicted in Figure 7, right. In any case can the user change the interior control triangles, and still he has a C^continuous filling patch, fitting the specified edge curves with demanded precision. Uniform Powell-Sabin splines FIG. 7. Copy from initial solution and averaging (4 iterations); instant update (3 iterations). FIG. 4.4 8. Cases with 4, 5 and 6 boundary curves. A note on the number of edges The algorithm sketched in Section 4 is immediately applicable to problems with 4, 5 and 6 boundary curves as well. Figure 8 shows the configuration of the initial solution for each of these cases. If we are working with 5 edges, there are 2 edges having a control triangle at its midpoint (shaded darker). This requires a tiny modification to the calculation of the initial solution for those edges. The a-factors are obtained by solving (4.4) to the unknown ai,aj and Uk. The ^S-factors of the outer control poygons are obtained as usual; for the middle polygon one can apply (3.6). Also, for the cases of 5 and 6 boundary curves, an interior control triangle (unshaded) has to be calculated for the initial solution. This can be done by averaging the six surrounding control polygons. Bibliography 1. Charrot, P. and A. Gregory, A pentagonal surface patch for computer aided geometric design. Computer Aided Design 1, pp 87-94. 2. Chui, C. K. and M.-J. Lai (2000), Filling polygonal holes using C^ cubic triangular spline patches, Computer Aided Geometric Design 17, pp 297-307. 3. Dierckx, P. (1997), On calculating normalized Powell-Sabin B-spfines, Computer Aided Geometric Design 15, pp 61-78. 4. Gregory, J.A., V. K. H. Lau, and J. M. Hahn (1993) , High order continuous polygonal patches, in Geometric Modelling, G. Farin, H. Hagen and H. Noltemeier (eds.), Springer-Verlag Wien. ' 5. Windmolders, J., Dierckx, P. (1999), Subdivision of Uniform Powell-Sabin splines. Computer Aided Geometric Design 16, 301-315. 6. Windmolders, J. and P. Dierckx, NURPS for Special Effects and Quadrics: Oslo 2000, Tom Lyche and L. L. Schumaker (eds.), Vanderbilt Press, Nashville 2001. 67 Chapter 2 Differential Equations 69 Iterative refinement schemes for an ill-conditioned transfer equation in astrophysics Mario Ahues, Filomena d'Almeida, Alain Largillier, Olivier Titaud and Paulo Vasconcelos Universite de Saint Etienne, France and Universidade do Porto, Portugal Abstract Let X := L'([0, To]), where To represents the optical depth of a stellar atmosphere. The weakly singular integral operator T : X—*X defined by (r^)(T) = f/;''£i(|T-T'|MT')rfr', where zj 6]0,1[ is the albedo of the atmosphere and Ei denotes the first exponential-integral function, is such that ||r||i = 07(1 — £2(TO/2)), where E2 denotes the second exponential-integral function. If zo is close to 1, and To is large, then ||T||j is close to 1. In that case, the transfer problem given fGX, find ip€X such that Tip — ip + f is ill-conditioned, and the convergence of the fixed-point iteration (pk+i = T<pk—f, which is commonly used by numerical astronomers, becomes prohibitively slow. The purposes of this work are to approximate (p through different sequences whose terms solve wellconditioned approximate equations, and to compare their eflSciency and computational costs. 1 Introduction For a given TQ > 0, let p be a function defined on ]0, TQ] such that lim gM = (1.1) -|-CXD, T-»0+ pGC°(]0,To])nLiaO,To]), (1.2) ff(T) >OforallT€]0,To], g is a decreasing function on ]0, To]. (1.3) (1.4) We consider the integral operator T defined by /•To iTx){r):= / gi\r-T'\)x{r')dr'. Jo (1.5) /•To/2 Theorem 1 T is a linear compact operator in Proof: See [2]. L^([0,TO]) and \\T\\^ = 2 / Jo ^(T) dr. D 70 Iterative refinement schemes 71 For z in the resolvent set of T, we consider the Predholm equation of the second kind T(p = z(p + f. (1.6) Apphcations will concern the function g : ]0, TQ] —> M given by g{r) := (1.7) ^E^{T) where ro € ]0,1 [ and Ei is the exponential-integral function : T > 0. £?i is the first function of the sequence {E^)^>i, EI{T) := / ^^—— dfi, := / -—^^ d^i, Ji A* r > 0, f > 2, and it is the only one presenting a logarithmic singularity at r = 0. Following Theorem 1, when g is defined by (1.7), we have ||T||i = tx7[l - £'2(TO/2)] < 1. We recall that a bounded linear finite rank operator T„ in a normed linear space X can be written as E^{T) n Tn '■= ^__,\" I ^11,31^71,3 (l-o) where n e W*, and, for j e |l,n], ^„,j € X*, the topological adjoint space of X, and The resolution of the approximate equation r„(^„ = 2;<^„ + /, (1.9) where z belongs to the resolvent set of T„, leads to an n-dimensional linear system (A„ - 2;l„)x„ = b„ (1.10) where l„ is the identity matrix of order n, Kihj) •■= {en,j , L,i), K{i) ■ = (/, L,i), x„(j) := {(fn, 4,j)- (1-11) Once this system is solved, the solution of (1.9) is given by V^n = - I ^>^n{j)en,3 " / j • (1-12) We are interested in refining approximations obtained with T„ := 7r„T, where 7r„ is a sequence of projections with finite rank n. A bounded projection 7r„ of finite rank n is n defined by 7r„a; := X^ (x, e* ,,-)e„j- for all x e X, where {en,j)]=i is an ordered basis of j=i the range of 7r„, and (e* ,,)"^i is an adjoint basis of the former in X*. Hence n TnX:=Y.'^Tx,elj)en,3, xeX. (1.13) We suppose that 7r„ is pointwise convergent to the identity operator in the Banach X where the operator T is defined. Since T is compact, r„ converges to T in the operator 72 M. Ahues, F. d'Almeida, A. Largillier, 0. Titaud and P. Vasconcelos norm. Let R{z) := {T - zl)-'^ be the resolvent of T at z. Then Rn{z) ~ (T„ - ziy^ exists for n large enough and is uniformly bounded, that is, there exists no such that Co(2) := sup ||i?„(2)|| <+TO. (1.14) n>no We develop an application in the space X := L^([0, TQ]). Let (r„j)^=o be a grid on [0, TQ] such that 0=:r„,o <r„,i < ■■• < r„,„_i <T„,„ :=To, (1.15) and set K,j ■= T„j-Tn,j-i for j G [l,...,n]. (1.16) We define, for r G [0, To], „ /,'!._ J 1 e„,,^rj.-| Q if ■7'G (T„J_I,T„J) otherwise Cl 171 ^ ' and, for a; G L'^([0,To]), {x,e*„j):=-j^rxiT')dT'. (1.18) The product defined in (1.18) is a special case of the scalar product used in equation (1.8) when a grid such as (1.15) is set. In this case the operator in (1.13) is the operator in (1.8) if we choose £n,j = T*e^j. Let /x„ := min{/i„,3- : j G [1,... ,n]}, /i„ := max{/i„j : j G [1,... ,n]}, ?„:=-7^.(1.19) For quasi-uniform grids, there exists a constant q independent of n such that, for all n, q<qn- For uniform grids, ?„ = 1 for all n. Theorem 2 Let (p j^ 0 be the solution of (1.6) with T defined by (1-5). Let (fn be the solution of (1.9) with T„ defined by (1.8) and (1.15)-(1.17). Then, forn large enough, y-fnh ^8co{z) Z'^,,,^, II II - ~^ / ^(^ ^■^' ll'/'lli Qn Jo ^j^_20) where Co{z) is given by (1-14) and computed with the 1-norm. Proof: See [2]. □ In the case (1.7), the matrix A„ of the linear system (1.10) has entries A„(i,j):=-^/ / E,{\r-r'\)en,jir')dT'dr, (1.21) E,{\r-T'\)f{T')dT'dT. (1.22) and the second member b„ has entries b„(i):=-^/ Iterative refinement schemes 73 For more details, see [3]. An application to the transfer problem in astrophysics gives (1.6) with 2: = 1, and as free term, fr^y^l -1 ifO<r<ro/2, which describes a sudden drop of the temperature on the sphere. For further details on the physical model, see [4]. 2 23) T = ro/2 layer of the atmo- Iterative refinement of approximate solutions To attain a given precision on the approximate solution <^„, it may be necessary that the largest grid step ft„ be so small that the dimension of the corresponding hnear system will be prohibitively large from a computational point of view. Not only the algorithm's stability becomes poor but also the condition number of the matrix may increase if its size increases. Refinement schemes allow us to attain iteratively the exact solution of a large scale linear system by means of the resolution of a sequence of linear systems of moderate fixed size. Let us consider the general framework of a complex Banach space X and a linear compact operator T : X —> X. If z is in the resolvent set of T, then z j^ 0. Let T„ be a sequence of linear bounded operators in X such that ||T — r„|| —> 0 in the operator norm. Then, for n large enough, z belongs to the resolvent set of r„ and Rn{z) is norm-convergent to R{z). The most elementary way to refine the approximate solution tp„ := Rn{z)f is the following. r Scheme A xW := ^„, < [ a;(*^+i) . , (2.1) := a;W - i?„(z)(Ta;W - ^a;^ -/), k>0. We can interpret Rniz) as an approximation of the inverse of the Prechet derivative of the afiine operator a; 1-^ {T — zl)x - /, the exact one being R(z). Since R{z) satisfies the identities R{z) = \{R{z)T-I) = h:rR{z)-I) (2.2) two new different approximations of R{z) are thus motivated, ^„(z):=-(i?„(2)r-7), R^{z):=\TRr,{zy-I). (2.3) These approximate resolvent operators lead to the following iterative refinement schemes, J(0) Scheme B \ Scheme C ^ := Rn{z)!, j(fc+i) .= sW-^„(^)(rxW-;2S(fc)_/), fc>o, m (2.4) := Rn{z)j, (2.5) j(fc+i) .^ £(fe) _ ^„(^)(2-£(fc) _ ^j(fe) _ /)^ fc > 0. 74 , : M. Ahues, F. d'Almeida, A. Largillier, 0. Titaud and P. Vasconcelos Since the computation of residuals which tend to zero, as well as the resolution of almost homogeneous linear systems may be unstable, the following theorems are interesting for algorithmic purposes. Theorem 3 In (2.1), a;(*^+i) = x^^^ + Rn{z){Tn - T)x^^'> for fc > 0. Theorem 4 In (2.4), x^^+i) = x^^^ + -/l„(z)(T„ - T)Tx^''^ for k>0. z Theorem 5 In (2.5), x^''+^^ = JW + -Ti?„(2)(T„ - T)x^^^ for k>0. z Proof: For each fc > 0, in (3), \{k+i) ^ xW - i?„(;j)(Ta;W - 2:rW -/) = a;W+i?„(2)(T„-T)xW For (4) and (5), the proof follows the same idea but it is technically more complicated. In our application to the transfer equation in astrophysics, T is defined by (1.5) with g given by (1.7), and the equation (1.6) has z = 1. 3 Numerical computations The iterative refinement schemes allow us to obtain the exact solution of a large scale linear system by solving a sequence of moderate fixed size ones. Each of the three iterative refinement schemes presented in this work are based on an approximation, say Gn{z), of the resolvent operator R{z). Their common structure is the following. ^(0) ^(fc+i) := Gn{z)f, _ ^(0)^(j_G„(^)(T-2j))eW, ' 3,y ^ ' fc>0. Theorem 6 Letci{z) := 8co(z)max{l, ||r||i/|2|}, and(^(*^^)fc>o be any of the sequences (2.1), (2.4) or (2.5). Then M^^^<mrgir)dA''\ k>0. Ml '^ Qn Jo ^ ' ^ - Proof: Let us prove the bound for the sequence defined by (2.1). For the other two, the arguments are similar. Using Theorem 3, we have Hence, x^^^-ip = (i?„(z)(r„-T))'(x(°'-^), X^°^-ip = Rn{z){T-Tn)^. II^W-y.il, <||(i?„(z)(T-T„))'=+i|UML, and, in [2], we have shown that ||i?„(2)(T„ - T)||, < —^^ / Qn Jo ^(T) dr. n All the schemes need evaluations of T at some prescribed functions of X. In practice T is not used for this purpose but an operator Tm of the sequence {Tv)u>i is used instead, Iterative refinement schemes 75 where m > n. We consider the kernel g defined by (1.7) and the free term / defined by (1.23). Table 1 gives the number of iterations performed by each scheme for several values of ■co in order to obtain a first relative residual less than or equal to 10~^^, when a quasi-uniform grid (ry,j)J'_o is built such that f is a multiple of 10, TQ = 1000, ^ 2v n = 200, m = 1000, and Ki:= < 5i/ I^ if ie[i,...,|], if ie[f + i,...,f2J' if j p fii 4.1 (3.2) %1 , V '^ ie[fg + i,...,H. Albedo VJ 0.750 0.990 0.999 Scheme A (2.1) 29 46 385 Scheme B (2.4) 15 27 196 Scheme C (2.5) 14 26 195 TAB 1. Number of iterations. Figures 1, 2 and 3 show the last iterate of all schemes, as well as the corresponding convergence histories, for w G {0.750,0.990,0.999}. As we can see, the schemes B and C are much faster than Atkinson's formula A, specially when the albedo is close to 1. In the latter situation a wider boundary layer arises at the left of the atmosphere, and the decay at the middle point takes place along a wider subinterval. A survey on different discretization methods for integral operators can be found in [1], with special emphasis on spectral applications. In what concerns condition number of associated linear systems, the reader is refered to [7], [5] and [6]. Bibliography 1. M. Ahues, A. Largillier and B.V. Limaye, Spectral Computations with Bounded Operators, Chapman and Hall, Boca Raton, 2001. 2. M. Ahues, A. Largillier and O. Titaud, The roles of a weak singularity and the grid uniformity in the relative error bounds, Numer. Funct. Anal, and Optimiz. 22, 789-814, 2001. 3. M. Ahues, F. D'Almeida, A. Largillier, O. Titaud and P. Vasconcelos, An L^ Refined Projection approximate solution of the radiation transfer equation in stellar atmospheres. Journal of Computational and Applied Mathematics 140,13-26, 2002. 4. I. W. Busbridge, The Mathematics of Radiative Transfer, Cambridge University Press, 1960. i 76 M. Ahues, F. d'Almeida, A. Largillier, 0. Titaud and P. Vasconcelos 5. L. N. Desphande and B.V. Limaye, On the stability of singular finite-rank methods, SIAM J. Numer. Anal. 27, 792-803, 1990. 6. A. Largillier and B.V. Limaye, Finite-rank methods and their stability for coupled systems of operator equations, SIAM J. Numer. Anal. 2, 707-728, 1996. 7. R. Whitley, The stability of finite-rank methods with applications to integral equations, SIAM J. Numer. Anal. 23, 118-134, 1986. Residual zoo 300 W 70} 500 no WO FMitwrtisrittra FIG. 1. Solution and convergence history for w = 0.750: Scheme A Scheme B — dotted Une, Scheme C — solid line. Solution ( dashed line, Residual \ . ico!oo3oa4005cofla)7[nioDioo FIG. 2. Solution and convergence history for w = 0.990: Scheme A ^ dashed line, Scheme B — dotted line, Scheme C — solid line. Iterative refinement schemes Solution 0 1D0 S« 77 Residual 700 800 SCO ia» 3. Solution and convergence history for ru = 0.999: Scheme A — dashed line, Scheme B — dotted line, Scheme C — solid line. FIG. Geometrical symmetry in symmetric Galerkin BEM Alessandra Aimi and Mauro Diligenti Department of Mathematics, University of Parma, Italy. alessandra.aimi@uiiipr.it, mauro.diligentiQunipr.it Abstract We consider a symmetric boundary integral formulation associated with a mixed boundary value problem defined on a domain fi € H^ with piecewise smooth boundary T. We assume that fJ is mapped onto itself by a finite group Q of congruences having at least two distinct elements. Hence, we can decompose the related symmetric Galerkin BEM problem into independent subproblems of reduced dimension with respect to the complete one. Shape functions for each subproblem can be obtained from classical BEM basis, ordered as a vector, applying suitable restriction matrices constructed starting from group representation theory. 1 Introduction Let fi C R^, be a bounded domain with a piecewise smooth boundary T. The_boundary r is partitioned into two non intersecting open subset Ti and r2, with T = Ti Ur2 = \JJ::=IT\ P being an open strainght Une segments. In the following we always assume measTi > 0. The solution of the mixed boundary value problem L{x)u{x) = 0 u{x)=u*{x) onTi, infi, q{x):= — =q*{x) (1.1) on Ta, (1.2) xeQ. (1.3) can be expressed by the representation formula u{x)= U{x,y)q{y)dy- ■^Uix,y)u{y)dy, In (1.1) L(-) is an eUiptic partial differential operator of second order, U{x,y) its fundamental solution (see [4] for a general discussion). In (1.2) ^ denotes the derivative with respect to the outher normal n to T, and u* and q* are given functions. Applications of (1.1)-(1.2) are, for instance, boundary value problems in potential theory and in elastostatic. Prom (1.3) it is clear that if we want to recover u in $7 we have firstly to know the remaining Cauchy data, since in (1.2) these functions are given only partially. Taking the limit of u{x) for x eVa and the normal derivative ^(x) for a; G r2 in this formula and using the jump relations, one finds the system [2] / U{x,y)q{y)dy- f ■^—-U{x,y)u{y)dy = fi(x), 78 XGTI, Geometrical symmetry 79 In order to perform the Galerkin method, we need a family of finite-dimensional subspaces {Uh,p{T)} defined on T. Let us define a mesh T^ for each T^: T = Ui=ir'fe,i such that Tj^^ is an open segment. We define iov p > 0, h > 0, Uh,p{Ti) to be the set of functions on Fi whose restrictions to P C Fi belong to the set of all polynomials of degree < p on F^^^. Moreover, for p>l, U^p{T2) will denote those continuous functions on Fa whose restrictions to F^ c F2 belong to C°(F2) and which vanish at the end points of T2. The approximating boundary element shape functions of degree p > 0 are defined through the standard assembling of the local basis functions defined on each Fj^ ^. We then define Uh,p{T) := spanKyPi, V^) : ipi G U^JT2), ^e € Uh,p{^i)}. (1.5) The corresponding symmetric Galerkin boundary elements scheme for (1.4) leads to a Unear system of the form A^=b. (1.6) If the boundary F presents symmetry properties, we will exploit them to reduce the computational cost of the solution of (1.6), using a decomposition result for the Galerkin boundary element problem that we will introduce at the end of the next section. 2 Matrix representation of a finite group of congruences and projection operators Let ^ be a finite group of i congruences {t > 2) of the Euclidean space R'" (m = 2, 3). The group Q can be described by orthogonal matrices 7, of order m. Let {71,..., 7t} be the elements of Q, 71 the identity matrix. Prom the theory of group representation [5] it follows that any finite group Q admits a finite number q of unitary irreducible, pairwise inequivalent matrix representations {a;W(7i)},{a;(2)(7,)},...,{a;W(^,)} {i = l,...,t). (2.1) Let de be the order of the representation {uj^^\'yi)}, i.e., the order of the matrices '^^^Hli)- The number q of the representations (2.1) and the orders di,..., dq only depend on G- Any representation {^(^^(71)} of order di > 2, can be replaced, in the system (2.1), by an equivalent unitary representation. Representations of order 1 are univocally determined. We observe that, if 7^ and 7^ are two elements of Q, then a;(^)(7i7^.) = a;W(7i)a;W(7j), ^(^^(7^^) = [w(^)(7i)]*, where [wW(7i)]* denote the transpose of the matrix uj^^^'ji). Always fi-om the theory of group representation it follows that q <t and the relation df -1- d^ H + dq ='t holds. Furthermore, q = t if and only if di = ^2 = • ■ • = dq = 1. Having set M = di + d2 + ■■■ + dq, then q< M <t, and we have q = M = t ii and only if Q is an abelian group. Let n be a bounded domain in H^ with a piecewise smooth boundary F, invariant with respect to Q, i.e., sent onto itself by the congruences of Q. Also the boundary F is invariant with respect to Q, i.e., for any ji €Q and a; € F, {"Yex) € F. 80 A. Aimi and M. Diligenti Let W(r) be the real vector space of real functions defined on T. We can associate to any element 7^ of ^ a linear transformation T; defined, for any v G W{T), by {Tiv){x):=v{^-'x) xer, (2.2) where Ti is a linear, invertible transformation from Vy(r) onto W(r), and Ti is the identity. Definition 2.1 A subset V(r) of W(r) is said to be invariant with respect to Q (or Q-invariant) if for any v G V(r) and any 7i 6 G, TiV £ V(r). Obviously if i; is a function of Vy(r), not identically equal to zero, the set of functions {TiV, i = 1,..., t) is invariant with respect toQ. Definition 2.2 Let C be a linear operator in V{T). We will say that C is invariant with respect to Q if for any u G V(r).- CTiU = TiCu, i = l,...,t. Example 2.3 Let V(r) be a suitable Sobolev space and {Cf){x) := /p K.{x, y)f{y)dTy an integral operator defined on V(r), with kernel K.{x,y). We have: Ti{Cf){x) = Jj.lC{'y~^x,y)f{y)dTy; since 7, G ^ is an isometry, the mapping y -^ liV preserves the differential element dTy. Thus C{Tif){x) = J iC{x,y)f{'rr'y)<iTy = j iC{xniy)f{y)dry. Then the integral operator C is ^-invariant if the kernel fC{x, y) satisfies the condition JC(a;,j/) = /C(7ia;,7i2/) for all a;, yGT, i = \,...,t. Starting from the group G, the system of representation (2.1) and the linear transformations Ti defined by (2.2), we can introduce M Hnear transformations of >V(r), ^fe = TE'^ifc(^)^ (£ = l,...,9;fc = l,...,rf,). (2.3) i=l Owing to the property of the representations (2.1), there holds Pl = Pek, PekPe'k'=0 if (£,fc) 7^ (^',fc'), EE^a=Ti. (2.4) £=1 fc=i The linear transformations Pek, which will be called projection operators, determine a decomposition of any vector space V(r) C W(r) invariant with respect to G, into a direct sum of M subspaces Vffc(r); Vffc(r) is the co-domain of Pek, viewed as a linear transformation from V(r) onto itself. If 0 is a non-abelian group, it is useful to consider in the space W{T) further linear transformations linked to the system (2.1). Let {oj^^^'yi)} be a representation of G of order de > 2. Let us consider dj Unear transformations, already introduced in [1], defined as follows 4'} = Ti24'H^i)Ti, ' k,r = l,...,de. (2.5) Geometrical symmetry 81 li k = r, then Ai2 = PekDefinition 2.4 Let B{-, •) be a bilinear form from V{T) x V(r) onH. We will say that B{-,-) is G-invariant if for, any u, V eV{T), B{TiU,Tiv) = B{u,v), i = l,:..,t. (2.6) Let V(r) be a Hilbert space and let us consider the following problem find uGV(r) : B{u, v) = T{v) for all t; G V(r), (2.7) where B{-,-) is continuous and coercive, and J^{-) : V{T) —> IR a linear continuous functional. If T and V(r) are invariant with respect to Q, and V(r) = ®'^j ®tLi V^fc(r) is the decomposition of V(r) defined by the projection operators (2.3) the following fundamental result holds. Theorem 2.5 IfB{-, •) verifies the condition (2.6) and Pik are the projection operators defined in (2.3), then the problem (2.7) can be decomposed into M independent problems; find uik E.Vik{y) such that B{uik,Vik)=H'Vtk) ^or&Wvtk&VikiV), £ = 1,..., q; k = 1,..., de. (2.8) The solution of (2.7) can be recovered as u = ®'^j ®feLi ^ttThe above result can be applied, under the invariance hypothesis, in the discrete form to the symmetric Galerkin BEM scheme if we choose the finite dimensional subspace [//j,p(r) defined in (1.5), to be G-invariant too, and therefore decomposable as Uh,p{T) = ®j^i ®feLi ^h%i^)- Then the symmetric Galerkin boundary element problem can be decomposed into M independent problems which have reduced dimension with respect to the original one and which can be solved on parallel processors. Now one has to construct boundary element basis functions for each subspace Uf^' [T). With some simple geometries (and groups of congruences) this can be done directly, but in many cases this is a difiicult task. We solve it here by applying restriction matrices, which we introduce in the next sections, to the basis of Uh,p(X)-> ordered as a vector. Since there is a onoto-one correspondence between the standard boundary element shape functions and the nodes of the mesh fixed on T, in the following we will work directly on the nodes of the boundary. 3 Elementary restriction matrices In this section we introduce suitable matrices depending only on the group G and on the system of representations (2.1), which*will be called elementary restriction matrices. In the following sections we will see how, starting from these, we can construct restriction matrices relative to a mesh defined onT. We fix a finite group G = {71,..., 7*} of congruences of R" and a system (2.1) of orthogonal irreducible, pairwise inequivalent representations of G- G always admits the representation {1, 1,..., 1} which we indicate by {u)^^\'^i)}\ let us order the remaining representations (2.1) with increasing order d^; let {w^^n7i)}i • ■ • I W''^\li)} be the representations of order 1. If G is an abehan group one has s = q = t and di = d2 = ■ ■ • = dt = 1. li G is a nonabelian group, it holds s < q <t and therefore di = d2 = ■■ ■ ~ dg = I, 2 < d^+i < ■ ■ ■ < dg. 82 A. Aimi and M. Diligenti Let G be an abelian group. We will call elementary restriction matrices the following t matrices, with 1 row and t columns ^,^^_L(^W(^^)...^W(^,)), e = i,...,t. (3.1) Since representations {w(^'(7i)} are real, it follows that a;(^'(7i) = ±1, for £, i = i,...,t. Let S be a nonabeUan group. Correspondingly to the representations {oJ^^\'yi)} of order 1 of the system (2.1), we introduce matrices Rei with 1 row and t columns i?« = i=(a;W(7i)...c^(^n7t)), £ = !,...,s. (3.2) We obtain, in this case, s matrices. Let now {w^^'(7i)} be a representation of the system (2.1) of order de, with de > 2. With k = 1,..., de fixed, let us consider the following matrix, with de rows and t columns Rek = f-\2M -11^(7.) - -12(7t)\ 4'iM 41'(72) ■■■ 41'(70 (3.3) V4;i(7i) 411(72) ■■• 411(7*) y Due to the orthogonality properties of the representationfw^^' (7;)}) matrix Rek has pairwise orthonormal rows. Therefore the rank of matrix Rek is de- For any representation {a;(^)(7j)} we obtain de matrices Rek {k = 1, • ■ ■, de). Matrices Rek {i = 1, ■ ■ ■, q; k = 1,..., de) defined in (3.2) and (3.3) will be called elementary restriction matrices. The total number of these matrices is M, with M = di + da H ^dq. The matrices defined in (3.1) or (3.2)-(3.3) satisfy some properties, easily deducible from orthogonality relations (2.4) and which we summarise in the following. Theorem 3.1 ([1]) The M elementary restriction matrices defined by (3.1) or (3.2)(3.3) verify the relations q RekRlk = he, RekRtk' = 0 if (£, k) ^ (£', k'), de ^^ ^ RlkRek = / (3.4) f=i fc=i where Ide, / are identity matrices of order de and t respectively. 4 H{T,a) spaces and elementary restriction matrices Let r be the piecewise smooth boundary of ft, invariant with respect to G, and a eV. Consider the ordered set Sa = {a,7^^a,...,7t"^a}, (4.1) and the space W(Sa) of real functions defined in £„. A natural basis B in W(Ea) is formed by functions having value 1 in a point of £„ and 0 in the remaining points. Having indicated with x the function of B with value 1 in the point a, we obtain the Geometrical symmetry 83 ordered basis B = {x(a;), x(72a;), ••• 5 x(7t2;)}, such that, of course, n(Sa)=spsin{x{x),x{l2x),...,xbtx)}. (4.2) W(Ea) is a vector space with finite dimension n <t, invariant with respect to Q (since T,a is invariant with respect to 0) and therefore decomposable into direct sum of M subspaces Tieki^a)- Having set ne = dimlieki^a), we have n = X)?=i den^. Definition 4.1 We say that a is a generic point ofV (with respect to the group Q) if dim'H(Ea) =t or, equivalently, if all the elements o/Ea are distinct. The following results hold. Theorem 4.2 ([1]) Having fixed any point a eT, if {uj^^^'ji)} is a representation of order 1, then Tiai^a) = span{P^ix} and ne < 1. If {oj^^^'yi)} is a representation of order d^ > 2, one has Heki^a) = span{4'i)x, ■ • ■, 4'ix}> k = l,...,de: (4.3) and therefore n^ < de. If a is a generic point, then n^ = de for any i. Let now V* be the column vector {xix),x{l2x), ■ ■ ■, x(7ta^))*) whose order is related to that one fixed for the elements of Q. Corresponding to the representations of order 1 of ^, for the elementary restriction matrices defined in (3.1), (3.2) we have ReiV*' = \ftPi\XProm Theorem 4.2, it follows that . Wn(E„) = span{i?ay*}. (4.4) Corresponding to the representations of order di > 2, for the elementary restriction matrices defined in (3.3) we have RekV^ = \/t/de(Af.{Xj (4.3), it follows that nek{^a) = span{RekV'}. 42A)---> 4i<:X) ■ I^o™ (4.5) , In both cases, if a is a generic point, the components of the vector RekV* constitute a basis in HikC^a)- Therefore, for any generic point a, the elementary restriction matrix Rik represents the projection operator P^k from W(Ea) onto Tieki^a), if we choose F* as a basis in K(E(j). Now, we want to construct elementary restriction matrices Rek which represent the projection operators P^k from H(Sa) onto Heki^a) for nongeneric points. Therefore let us suppose a to be a nongeneric point, i.e., such that the functions X{x),xil2x),...,x{ltx) (4.6) are linearly dependent. Let n be the maximum number of linearly independent functions among (4.6) and let the following functions be linearly independent, xhiiX),...,x{'7inX)- (4-7) It is convenient to order the functions (4.7) with increasing index ia, therefore let us suppose ii < 12 < • • • < ^n• In this case elementary restriction matrices Rik will have n columns. The number n^ of rows (n^ < de) of each Rek is not determined by «i, ia,. • •, in- . :84 A. Aimi and M. Diligenti In general, we only can say that matrices Rn,..., Rue have the same number ne of rows, where n^ = dim Wffc(Ea). Then we now consider a significant class of nongeneric points. Having fixed £{£ = 2,..., i), let Ie{T) be the set of all points a € F such that a = 77^a. (4.8) Prom (4.8) it follows, for any i : xili^') = xileli^)- This implies that the functions (4.6) are naturally subdivided into subsets and any subset contains coincident functions. Then we can obtain elementary restriction matrices for the space 7i{T,a) with a G /f (F) starting from elementary restriction matrices built in Section 3, with the following procedure, • Let us sum to each column of index ia (a = 1,..., n) all the columns of index j, with j such that jT^a = %J^a. We indicate with Rek the obtained matrices, all with de rows and n columns, but not all full-rank matrices; some of these may be zero matrices. • Let us extract from nonzero matrices Rck submatrices Rek made up of n^ linearly independent rows. • Finally, let us construct from R^k matrices Rek with a row-orthonormalization procedure. The (nonzero) matrices i?^fc verify the properties expressed by Theorem 3. L Furthermore, matrices Rek, applied to the vector F" = (x(7iix),.. .,x(7i„a;))* corresponding to a point a G /f(F), give vectors whose components constitute a basis for TiekC^a)For this reason they represent the projection operators from W(Ea) onto HekC^a), for any a € Ie{T). Then we will say that the matrices R^k, with ne rows and n columns, are elementary restriction matrices for the space W(Ea) relative to points a G /f(r). Furthermore n = 2|_j d^n^. 5 'W(S) spaces and restriction matrices Let F be the piecewise smooth boundary of fi, E a set formed by A^ points of F constituting a not necessarily uniform mesh defined on F. Let us suppose F and E invariant with respect to Q. Let W(S) be the vector space of real functions defined in E. W(S) is a A''-dimensional vector space, invariant with respect to G; this is due to the fact that E is invariant with respect to Q. A natural basis B in W(E), invariant with respect to Q, is formed by functions having value 1 in a point of E and 0 in the remaining points. In order to more easily construct restriction matrices for the space W(E), or equivalently for the mesh E, it is suitable to introduce in the set E the following equivalence relation. Definition 5.1 We say that a point a' is equivalent to a" if there exists an element : ^i £ Q such that a" = ^^^a' (and therefore a' = 'jia"). The points of the set E are then subdivided into r equivalence classes. If r = 1 one has W(E) = 7i(Ea), with a G E. Then let us suppose r > 2. We order the points of the set E as follows; having indicated with 0,1,..., Or ^ pairwise inequivalent points of E, we consider the following ordered points Geometrical symmetry 85 If points (5.1) are distinct, we have A'' = rt. If some points among (5.1) coincide, we will erase from the sequence (5.1) a point if it is equal to a previous one. Then a sequence of N points, with A'' < rt, will remain, with n^^^ points equivalent to ai, n^^) equivalent to 02,..., nW equivalent to o^. In both cases W(S) ^'H{Y,a^)®n{T,a^)®- ■ ■®n{'Sar), with dimW(Ea,.) = n^) <t,j = l,...,r&ndN = n^^) + n'^) + •... + n^^). We indicate by C^l' the elementary restriction matrices relative to the space W(Eo.), constructed as indicated in Section 4. Let n^ be the number of rows of the matrix CJ^^; having fixed j, the number of columns of matrices CJ^\ for any £ and k, is n^^\We consider therefore the following M block matrices / Mi) Rek= o0 . V 0 0 ■ ■ 0 dl^ o0 ■■■ ■ ■ o 0 0 . O 0 ■ ■ . , (5.2) '-'ik with Ni = n} +n[ '-\ +n^l rows and N columns, from which we have to eliminate the ■ possible zero rows. Matrices R^k determined by this procedure, which we call restriction matrices for the space H{T,) of dimension N, have rank equal to the number A''^ of the remained rows and for these matrices properties expressed in Theorem 3.1 still hold. In both cases, we have the following theorem. Theorem 5.2 Considering the basis B in W(S) as a column vector V^ with the order deduced from (5.1), the components of the vector RekV^ form a basis in TitkCS). Therefore the M matrices Rek, having fixed in W(S) the ordered basis V^, determine a decomposition of W(E) in M subspaces, which coincides with the one obtained with the projection operators P^k ■ Preliminary numerical results appear promising; algorithms for potential and linear elasticity problems are being implemented on parallel processors to analyse the efficiency of the proposed approach. Bibliography 1. A. Aimi, L. Bassotti, and M. Diligenti, Groups of Congruences and Restriction Matrices, submitted to BIT. 2. A. Aimi and M. Diligenti, Hypersingular kernel integration in 3D Galerkin boundary element method, J. Comp. ^ppZ. Mai/i., 138, 1, (2002), 51-72. 3. L. Bassotti Rizza, Operatori lineari T-invarianti rispetto ad un gruppo di congruenze, Ann. Mat. Pura ed AppL, 148, (1987), 173-205. 4. J. L. Lions and E. Magenes: Non-Homogeneous Boundary Value Problems and Applications I, SpnngeT-Verla.g, Berlin, Beidelheig, New York, 1972. 5. V. I. Smirnov, Linear Algebra and Group Theory, McGraw Hill, New York, 1961. The numerical simulation of the qualitative behaviour of Volterra integro-differential equations John T. Edwards, Neville J. Ford and Jason A. Roberts j.edwards@chester.ac.uk, njford@chester.ac.uk, j.roberts@chester.ac.uk Chester College, Parkgate Road, Chester, CHI 4BJ, UK. Abstract We consider the qualitative behaviour of exact and approximate solutions of integral and integro-differential equations with fading memory kernels. Over long time intervals the errors in numerical schemes may become so large that they mask some important properties of the solution. One frequently appeals to stability theory to address this weakness, but it turns out that, in some of the model equations we have considered, there remains a gap in the analysis. We consider a linear problem of the form y'{t) = - f e'^^'^'^y{s)ds, 2/(0) = 1, Jo and we solve the equation using simple numerical schemes. We outline the known stability behaviour of the problem and derive the values of A at which the true solution bifurcates. We give the corresponding analysis for the discrete schemes and highlight that, for particular stepsizes, the methods give unexpected behaviour and we show that, as the step size of the numerical scheme decreases, the bifurcation points tend towards those of the continuous scheme. We illustrate our results with some numerical examples. 1 Introduction The qualitative behaviour of numerical approximations to solutions of functional differential equations is an important area for analysis. We aim to investigate whether the behaviour of the numerical solution reflects accurately that of the true solution. We are particularly concerned with the behaviour of the solution over long time periods when (in particular) the convergence order of the method gives us limited insight, since the error depends on a constant that grows with the time interval. Many authors are concerned with stability of solutions and of their numerical approximations. We have considered elsewhere (see [7]) the stability of numerical solutions of equations of this type (and of non-linear extensions). This analysis raised a number of questions, which we consider here, about just how well the full range of qualitative behaviour of even quite a simple equation is understood. Bifurcations (by which we shall mean any change in the qualitative behaviour of solutions) frequently arise only for systems or for higher order problems and therefore 86 Numerical simulation of volterra equations 87 one is particularly interested in finding suitable simple equations as the basis for analysis. In this paper, we consider the solution by numerical techniques of the integro-differential equation y'{t) = - f e-^(*-«)y(s)ds, 2/(0) = 1. (1.1) Jo The equation is a linear convolution equation with separable fading memory convolution kernel and therefore is a simple example from an important class of problems familiar in applications. It is also possible to analyse the equation in the form of a second order ordinary differential equation. The equation has several key properties that ma;ke it an ideal basis for our analysis: 1. it depends on the value of the single parameter A, , 2. when A varies through real values, four distinctive qualitative behaviours in the solution can be detected, and 3. equations with exponential convolution kernels frequently arise in applications and elsewhere in the literature. For A real and positive, the kernel is of fading memory type. For A real and negative, the kernel has a growing memory effect. This linear equation displays surprisingly rich dynamical behaviour for real values of the parameter A and it is this behaviour that we want to consider for the numerical scheme. We note that the classical test equation y'{t)^g{t)+^y{t) + r] f y{s)ds, rij^O (1.2) Jo ([1, 2]) displays the same range of qualitative behaviour possibilities as (1.1) for varying values of the two real parameters ^,r]This motivates us to consider equation (1.1) as a prototype problem that is interesting in its own right and that will also provide insight into the behaviour of more complicated equations. We propose to give a further analysis, where we consider the boundaries along which bifurcations occur for equation (1.2) in a sequel [3]. We consider the following questions. 1. Does the numerical scheme display the same four qualitatively different types of long term behaviour as are found in the true solution? 2. Are the interval ranges for the parameter A that give rise to the changes in behaviour of the solution the same as in the original problem? 2 Behaviour of the exact solution We consider the equation (1.1) which can be shown to have a unique continuous solution (see, for example, [10]). One can easily establish (by considering, for example, an equivalent ordinary differential equation) the general solution y{t) = Ae ^^* + Be 2 * (2.1) where A, B are constants. For real values of A the solution to (1.1) bifurcates (or changes qualitative behaviour) at A = 0, ±2. We have the following qualitative behaviour. J. T. Edwards, N. J. Ford, and J. A. Roberts Al. A2. A3. A4. When A > 2, y —> 0 as i —> oo, with no oscillations. When 0<A<2, y—>Oasf^oo, with infinitely many oscillations. When A = 0, y{t) = cos(i) (persistent oscillations). When —2 < A < 0, the solutions contain infinitely many oscillations of increasing amplitude. A5. When A < — 2, the solution grows (in magnitude) without any oscillations. 3 Numerical analysis To apply a numerical method to an integro-difFerential equation of the type y'it) = f(t,y{t),j k{t,s,y{s))dsY y{0) = yo, (3.1) we write the problem in the form y'{t) = fit,y{t),m) z{t) = (3.2) I k{t,s,y{s))ds. (3.3) Jo We solve (3.2), (3.3) numerically using a linear multistep method for solving equation (3.2) combined with a suitable quadrature rule for deriving approximate values of z from equation (3.3) (see [2]). Such a method is sometimes known as a DQ-method. For linear fc-step methods, one also needs to provide a special starting procedure to generate the additional k — 1 initial approximations to the solution that are not given in the equation but are needed by the multistep method on its first application. It turns out that one needs to choose the quadrature, multi-step method and starting schemes carefully to ensure that the resulting method is of an appropriate order of accuracy for the work involved. One should try to choose schemes of the same orders as one another since the order of the overall method is equal to the lowest of the orders of the three separate methods (the multistep formula, the starting value scheme and the quadrature) used to construct it. In this paper we have chosen to focus on one-step methods. There are two reasons for this: we have thereby avoided the need to construct special starting procedures which would make our analysis more complicated; as Wolkenfelt showed in [11], methods with a repetition factor of 1 (such as the ones we consider) are always stable and we also draw attention (see [9] for example), to the fact that the trapezoidal rule is an A-stable 1-step method. For a well-behaved numerical scheme for (3.2), (3.3), we would anticipate four intervals (as with the continuous problem) of A-values where the solutions to the discrete scheme behave qualitatively differently. However we know from investigation of bifurcation points for numerical solution of delay differential equations (see [12]) and indeed from stability analysis of integro-differential equations, that the points at which the qualitative behaviour of the solution changes may arise at the wrong values of the parameter. Based on previous experience (see [6]) we would expect this difl^erence to be dependent upon the stepsize h of the numerical method and on the choice of method itself. Furthermore (see, for example [8], [12]), one might expect the bifurcation points of the discrete Numerical simulation of volterra equations 89 scheme to approach the bifurcation points of the continuous problem as h -^ 0 and one could anticipate that, for a method of overall order p, the approximation of the true bifurcation point by the bifurcation point of the numerical scheme would also be to 0{hP). We will show in this paper that (for h—>0) the approximation of the bifurcation points in the methods we have chosen is at least to the order of the method. To keep the analysis reasonably simple, we consider the following discrete form of (3.2). We use a linear ^-method in each case so that we solve the system Vn+i = yn + h{eiFn + {l-ei)Fn+i),n = 0,l,..., Fn = f{nh,yn,Zn), Zn = (3.4) , (3.5) h{e2k{nh,0,yo) + Y^k{nh,jh,yj) + {l-e2)k{nh,nh,yn)\. i^.G) One could choose any combination of ^,,0 < ^j < 1 and a natural choice could be 01 =62- However, in order to start with a simple method where the algebraic problem is tractable we have considered first the cases where ^i = 0 and we consider a range of values of ^2One solves equations of the form ' n yn+i - yn = -h^ ^2e-^^"+')'^yo + 5]e-^'»("+i-^)%- + (1 - e2)yn+i \,yo = yi = 1. (3.7) Note that we ha,ve used a simple procedure to find the additional starting value yi = 1. We have observed from the integro-differential equation that j/'(0) = 0 and have deduced that y{h) = 2/(0) will provide a reasonable order 1 starting approximation. This choice of formula implies that we are combining a backward Euler scheme to discretise the differential equation, with, respectively, (for O2 = 1) the forward rectangular (Euler) rule, (for ^2 = 5) the trapezoidal rule and (for 62 = 0) the backward rectangular rule for the quadrature. We will return to consider other combinations of Oi, 62 later. The equation (3.7) is equivalent to (1 + h''{l - 92)) yn+2 + {h^62e-^^ - 1 - e"^'^) y„+i + e'^'^yn = 0. (3.8) The behaviour of the solution as i —> 00 depends on the roots of the characteristic equation {l + h^{l-e2))k''+{h^e2e-^^-l-e-^^)k + e-^^ = Q. (3.9) Any solution of (3.8) will be asymptotically stable if both roots of (3.9) are of magnitude less than one and unstable if either root of (3.9) has magnitude greater than one. The solutions will contain (stable or unstable) oscillations when the roots of|(3.9) are complex or, indeed, when at least one root is negative. It follows from this'(see [4]) that the bifurcations occur as follows (for reasonably small /i > 0). 90 J. T. Edwards, N. J. Ford, and J. A. Roberts Bl. When A > -^In I h^ei^H^+i )^ y„ -> 0 as n -4 oo with no oscillations. This condition can be written in the simpler form A > ^In (1 + 2/1^ - h^Oi + 2yJ-h? {h^e2 - 1 - h?)\ and we thank the anonymous referee for pointing out this simplification. B2. When i In (i+pfjre;)) < A < i In (l + 2/1^ - /i^^a + 2^-h? {h?e2 - 1 - ft^)), y„ ^ 0 as n —> 00, with infinitely many oscillations. B3. When A = ^ In (i^^an_g s) we obtain persistent oscillations. B4. When i In (l + 2h^ - h^2 - 2y/-h^ {K^Oi - 1 - h?)) < A < i In (^^^^J^-^), the solutions contain infinitely many oscillations of increasing amplitude. B5. When \ < }^\n (l + 2h? - h^Oi - 2^-h? {h?92 - 1 - h?)\, the solution grows (in magnitude) without any oscillations. 4 Bifurcation points of the numerical scheme as approximations to true bifurcation points We consider now the way in which the bifurcation points of the discrete scheme approximate those of the original problem. We are using a numerical scheme of order 1. First we consider the value of Ai = ^ In (l + 2h'^ - h'^92 + 2^-/i2 [h'^e^ - 1 - h'^)\ as 92 varies and h —> 0. It is easy to see that, as h ^ 0, the value Ai satisfies Ai —> 2. In fact we can give greater precision to this. We can show that Ai = 2 - 92h + 0((^) as /i —> 0. This means that, for $ methods in general, the approximation by our scheme approximates the true value (—2) to order 1 (the order of the method), as /i —> 0. In the particular case 92 = 0 the approximation is to order 2. For A2 = ^ In (i^f^i}i_Q )) it is straightforward to show that stability is lost at a value of A that approximates the true value (0) to order 1 in general. In fact, for ^2 = 1, the forward Euler scheme, the approximation is exact for all values of /i. The analysis of A3 = i In (l + 2h'^ - h^92 - 2y/-h? {h?92 - 1 - h'^)) follows in exactly the same way as for Ai and leads to an identical conclusion: the approximation of the bifurcation point A = —2 is in general to order 1 as h -^ 0 and to order 2 if ^2 = 0. We illustrate our results graphically. Each of the plots shown in Figure 1 illustrate, for varying h, the ranges for the parameter A where 1. the solutions are unstable due to at least one real root greater than unity in magnitude (the darkest region in the figures) (exponential growth if the root is positive, growing oscillations if the root is negative), 2. the solutions are unstable due to growing oscillations (the next darkest region in the figures), 3. the solutions are stable with asymptotically stable oscillations (the lightest region in the figure), and Numerical simulation of volterra equations 4. the solutions are stable with exponentially stable decay. We can compare with the right hand plot in Figure 2 which shows the true regions for the original problem and we can make the following observations. 1. As ft -+ 0 the values of A at which changes in the behaviour occur approach the true values. This coincides with our previous experience in delay equations (see [8]). 2. There is some extremely surprising behaviour for some values of /i > 0. (a) For the two values 62 = 0.5 and 62 — lwe can see that the darkest region is in two parts: in the upper part there is a negative real root of magnitude greater than unity leading to exponentially growing oscillations in the solution; in the lower part there is a positive real root of modulus greater than unity leading to exponential growth in the solutions. (b) There can be a critical value oi h > 0 {h = -h= when 62 > 0) at which, for apparently arbitrarily large A < 0 the numerical solution displays oscillatory behaviour. (c) There can be an additional thin region (visible only in larger scale versions of the plots) between the darkest and lightest regions in which there is a real negative root of magnitude less than unity leading to decaying oscillations. (d) For 02 = 0.5 and 62 = I the upper part of the darkest region indicates some really strange behaviour: spurious oscillations may arise for arbitrarily large negative values of A and even (see figure 1) for some positive values of A. Thus we can have the situation (for example for A small and positive) where the true solution tends to zero while the approximate solution exhibits oscillations of growing magnitude. Alternatively, (for A large and negative) the true solution could exhibit high index exponential growth while the approximate solution exhibits oscillations. We draw attention also to the fact that, for 62 — 0.5 and 62 = 1 the stability boundary of the method is made up of parts of the boundaries of two regions, making the prediction of behaviour for varying h>0 particularly difficult. We believe that these observations justify our view that more attention needs to be paid to changes in qualitative behaviour other than stability in reaching a good understanding of the behaviour of numerical methods for problems of this type. We can consider next whether these observations are equally true for other choices of numerical method. We present in Figures 2 plots reveaUng the qualitative behaviour of solutions to equations (3.2), (3.3) with other choices of ^-method. It is easy to see that, even for combinations such as using the trapezium rule for both parts of the discretisation (a method characterised by 61 = 62 = 0.5 and known to do very well at preserving the stability boundary) there are problems in the preservation of other types of qualitative behaviour when h is not very small. Similarly, we can see that the choice 61 = 62 = I leads to a shrinking range (as h increases) for A that lead to stable oscillatory solutions. 91 92 J. T. Edwards, N. J. Ford, and J. A. Roberts 8,=0 ft;=05 * FIG. J -2. 0 ? 4 6 .8 -E -4 <2 1. Bifurcation points as h varies for 0i = 0, ^2 = 0, .5,1 respectively. W 5, Irlpillum rull (oi ielMtnt 3 &^T, bachwrt •litr for dims'W 4 e B fl * 4 -2 1 4 E 8 « * ■< -2 2 4 E 2. Bifurcation points as h varies for, respectively, 9i = O2 = 0.5,1 and for the analytical problem. FIG. 5 Alternative approaches The particular equation we have considered can be formulated as an integro-differential equation, as an integral equation or as a second order differential equation. We have shown in [4] that the interesting and somewhat surprising observations about numerical behaviour that we made in the previous section also apply in these other formulations. 6 Closing remarks The results presented in this paper show that the well-established stability theory based on the analysis of equation (1.2) gives only a very limited insight into the qualitative behaviour of solutions of the class of convolution equations with exponential memory kernel that we have considered here. We have observed elsewhere (see [5, 6, 7]) that the qualitative behaviour of numerical solutions to equations of this type may have surprising features and our consideration here of the prototype problem (1.1) illustrates how this unexpected behaviour may arise. We have seen in this paper how oscillations may arise in the numerical schemes when they should not, and how in other cases the numerical schemes may supress genuine oscillatory behaviour. When one seeks good methods based on a stability analysis, the desire is to focus on those methods where the step-length h> 0 is not subject to some upper bound to ensure the stability of the method. However our initial observations in this paper have shown that this may well prove an unreasonable Numerical simulation of volterra equations expectation when one is investigating these other changes in qualitative behaviour. We believe that this paper introduces a range of worthwhile investigations in a field that is still quite open. Space restrictions have prevented us from considering the behaviour of more general methods in this paper and also from extending our analysis to consider other problems. The results we have presented here show that, for these simple methods at least, the bifurcation parameters are approximated in the numerical scheme to at least the order of the method, for sufficiehtly small ^ > 0. It is also very clear that, even for what appears to be a simple problem, the choice of numerical scheme and the form in which the problem is presented provide us with a rich source of example behaviour. Bibliography 1. H. BRUNNER AND J. LAMBERT, Stability of numerical methods for Volterra integrodifferential equations, Computing, 12 (1974), pp. 75-89. 2. H. BRUNNER AND P. J. VAN DER HOUWEN, The numerical solution of Volterra equations, North-Holland, 1986. 3. J. T. EDWARDS, N. J. FORD, AND J. A. ROBERTS, Numerical approaches to bifurcations in solutions to integro-differential equations. Proceedings of HERCMA, (2001). 4. , The numerical simulation of an integro-differential equation with exponential memory kernel close to bifurcation points. Tech. Rep. preprint, Manchester Centre for Computational Mathematics, (ISSN 1360 1725) 2001. 5. J. T. EDWARDS AND J. A. ROBERTS, On the existence of bounded solutions to a difference analogue for a nonlinear integro-differential equation. International Journal of Applied Science arid Computations, 6 (1999), pp. 55-60. 6. N. J. FORD, C. T. H. BAKER, AND J. A. ROBERTS, Nonlinear Volterra integrodifferential equations- stability and numerical stability of 6-methods, Journal of Integral Equations and Applications, 10 (1998), pp. 397-416. 7. N. J. FORD, J. T. EDWARDS, J. A. ROBERTS, AND L. E. SHAIKHET, Stability of a difference analogue for a nonlinear integro-differential equation of convolution type, Tech. Rep. 312, Manchester Centre for Computational Mathematics, October (ISSN 1360 1725) 1997. 8. N. J. FORD AND V. WULF, The use of boundary locus plots in the identification of bifurcation points in numerical approximation of delay differential equations. Journal of Computational and Applied Mathematics, 111 (1999), pp. 153-162. 9. J. LAMBERT, Numerical methods for ordinary differential systems, Wiley, 1991. 10. P. LINZ, Analytical and Numerical Methods for Volterra Equations, SIAM, 1985. 11. P. H. M. WOLKENFELT, On the relation between the repetition factor and numerical stability of direct quadrature methods for second kind Volterra integral equations, SIAM Journal on Numerical Analysis, 20 (1983), pp. 1049-1061. 12. V. WULF, Numerical analysis of delay differential equations undergoing a Hopf 6i/urcaizon, PhD thesis, University of Liverpool, 1999. 93 Systems of delay equations with small solutions: a numerical approach Neville J. Ford and Patricia M. Lumb Chester College, Parkgate Road, Chester, CHI 4BJ, UK. njford@chester.ac.uk, P.Lumb@chester.ac.uk Abstract We consider systems of delay differential equations of the form y'(t) = A{t)y{t-l) where y € IR" and J4 : R —> JR"**". We investigate whether a numerical method can be used to determine whether or not the equation has so-called small solutions. Our work builds on recent analysis and experimental work completed in the scalar case and we are able to conclude that, at least when A is a suitable periodic matrix, one can predict small solutions by using a numerical approximation scheme of fixed step length. 1 Introduction and basic theory The analysis of delay differential equations, both analytically and numerically, is wellestablished. One distinctive feature is that even a scalar delay differential equation is an infinite dimensional problem. For, if x satisfies y'{t) = b{t)y{t-l) (1.1) the initial conditions that need to be specified take the form yit) = <pit), -l<t<0. (1.2) This infinite dimensionality has two significant implications for us: (1) the dimension of a system of delay equations is the same as the dimension of a scalar delay equation, and (2) the range of dynamical behaviour among solutions of delay equations is far wider than would be the case for ordinary differential equations. In the present paper we are investigating an infinite dimensional property (that of possessing small solutions) where the analysis and results for systems needs to be presented quite separately from those for scalar equations because there are some interesting and distinctive features. One way in which delay equations may be analysed is to view the solution operator as a dynamical system. The dimension of the dynamical system then inherits the infinite dimensionality of the delay equation itself. Small solutions (those that satisfy a:(i)e"* —* 0 94 Systems of delay equations with small solutions 95 as t —> 00 for all values of the parameter a) can arise in these infinite dimensional problems but would not be observed in firnite dimensional equations. They are important because, when a delay equation has small solutions, the eigenfunctions and generalised eigenfunctions of the solution map do not form a complete set. This means that some standard analytical results do not hold and that particular care must be taken in solving and analysing the equation. The easy detection of problems that have small solutions is still, in general, open, but we have seen [4, 5] that the use of a numerical approximation scheme can lead to good insights. Here we approximate the delay differential equation using a simple numerical scheme with fixed step length and then consider the spectrum of the resulting solution map. In recent work (see, for example [3, 5]) the scalar case has been considered with some success. We have been able to see that, for the equation (1.1) with b periodic of period 1, we can detect the existence of small solutions by exploring the (finitely many) eigenvalues of the numerical scheme. We also found that it was not necessary to use a sophisticated numerical scheme for the investigation and this has justified us in focussing on the trapezium rule as the numerical method in this paper. For the scalar case (1.1) it is known (see for example [4, 5]) that, when h satisfies the periodicity condition h{t) = b{t - 1), then non-trivial small solutions arise if and only if the function b changes sign. For the vector-valued case we can give a theorem, recently proved by Verduyn Lunel ([11]). Theorem 1.1 Consider the equation y'{t) = A{t)y{t-1), where A{t) = A{t-l), (1.3) and where y S iR". The equation has small solutions if and only if at least one of the eigenvalues Xi satisfies, for some t, sRAi(t-)xSRAi(£+)<0,Ai(i) = 0. (1.4) Remark 1.2 We shall describe the property (1.4) using the words an eigenvalue passes through the origin. We note that, even for real matrices A, the eigenvalues may be complex and it could be that a pair of complex conjugate eigenvalues will cross the y—axis away from the origin. In this case the equation has small solutions only if there is some other crossing of the y— axis by an eigenvalue where the crossing does take place at the origin. 2 Numerical methods and systems of order two All the important relevant features of systems of delay equations turn out to be exhibited in systems of two equations and so we shall focus on these for simplicity. We consider the equation y'{t) = A{t)y{t - 1) for A e M^^^ and y e K^ subject to y{t) = (p{t) for -1 < t < 0 and we assume that A{t) = A{t - 1) for all t. (2.1) 96 N. J. Ford and P. Lumb We introduce We apply the trapezium rule with step length h = jj and introduce the approximations Xij f« xi{jh), and X2,j « X2{jh),j > 0; xij = (pi{jh),X2,j = </'2(j/i), -N <j < 0. Set yn = [ Xi^n,Xl^n-l,--- ,Xi,n-N^^2,n,X2,n-l,--- ,X2,n-N ) • (2.3) We note that, as in the one-dimensional case (see [3, 4, 5]), we can write the numerical scheme as j/„+i = A{n)yn, where the matrix A{n) now takes the form / 1 1 0 0 0 1 A{n) 0 |a„+i jan 0 0 0 0 0 1 0 2 7n+l 2 '" 0 0 1 1 0 0 Vo lPn+1 0 0 Sn + 1 0 1 2^" \ ■ (2.4) 2"" 0 / The sequence of matrices {j4(n)} is periodic, of period A'' (since the function A is periodic of period 1) and j/2 = A{l)yi,y3 = A(2)A{l)yi and so on. Therefore yw+i = Cyi where C = A{N)A{N- 1). • • ■ .^(2)^(1). Remark 2.1 The key to extending our discussion to larger systems, and indeed, to gaining a full understanding of the approach, is to note that in both the matrix A{n) and the matrix C the original block structure is retained. Therefore although the matrices A{n) and C are considerably larger than the original 2x2 matrix A{t) in the problem, they are made up of 4 blocks in a 2 x 2 formation. Indeed the contents of each block is completely determined by our numerical method (the trapezium rule) and the values of the corresponding function, respectively a, 0,^,6. There is no pollution of the blocks from the neighbouring functions. We consider three different cases: (1) P{t) = 7(i) = 0 so that the matrix A is diagonal, (2) either P{t) = 0 or 7(i) = 0 so that the matrix A is triangular, and Systems of delay equations with small solutions (3) the matrix A is neither diagonal nor triangular. The first two cases can be dealt with quite quickly because of the' fact that real diagonal and triangular matrices have only real eigenvalues and these eigenvalues lie on the diagonal. Therefore in these two cases we need consider only the question of whether the eigenvalues pass through zero; we do not need to concern ourselves with possible complex eigenvalues whose real parts change sign away from the origin. We can go further: a diagonal matrix A leads to a block diagonal matrix A{n) (with non-zero blocks top left and bottom right). Now by simple matrix theory we know that the eigenvalues of such a matrix are simply the union of the eigenvalues of the two blocks. A similar argument applies when there is a triangular matrix A because the matrices A{n) are then block triangular. It follows that, for both of cases 1 and 2, the 2—dimensional eigenvalue problem simply reduces to two 1—dimensional problems. Therefore, when we consider the eigenspectra of the numerical schemes in cases 1 and 2, we expect the result to be the superposition of the eigenspectra from the two block matrices on the diagonal of C. Case 3 is more complicated and we shall return to it after we give brief examples of Cases 1 and 2. 3 How to recognise small solutions: our previous work Space restrictions here prevent us from giving a great many details of our previous work, but we provide a summary to show how the current investigation builds on the scalar case. In [3] we considered the eigenspectra of the matrix C. We showed that there were three characteristic patterns for the eigenspectra, represented by Figure 1. We take the presence of the closed loops that cross the a;-axis to be characteristic of the cases where small solutions arise. FIG. 1. Eigenspectra where b{t) has no change of sign on [0,1] (left), where b{t) has a change of sign on [0,1] and /^ b{s)ds = 0 (centre), and where b{t) has a change of sign on [0,1] and /^ b{s)ds # 0 (right). 4 The cases when /3(i) = 0 and/or ^{t) = 0 As we have remarked already, the eigenspectrum when A is diagonal or triangular is just the same as the eigenspectra of the block matrices from the diagonal of C. We expect to 97 N. J. Ford and P. Lumh 98 find the eigenspectra superimposed, which is indeed what we see in the examples given. Here we assume that at least one of 7(<) or /3(i) is zero; the plots are then independent of the values taken by the other. Example 4.1 We solve (2.1) with the choice a{t) = sin 27ri+1.4 and 5{t) = sin 27ri+0.5. Here a does not change sign but 5 does change sign. We expect small solutions and Figure 2 provides confirmation. Example 4.2 Now we solve (2.1) with a{t) = sin 27ri and 6{t) = < ' . 4 ^ )i' ^i This time both a and S change sign and we expect small solutions (see Figure 2). FIG. 2. Eigenspectra for Example 4.1 (left) and Example 4.2 (right). 4.1 The general two dimensional case We now move on to consider the case when neither of P(t),j{t) is identically zero. In this situation the eigenvalues of A{t) can be complex and so may cross the y-axis away from the origin. First, we recall that det(yl) is the product of the eigenvalues of A so that, by Theorem 1.1, it follows that det{A) = 0 is a necessary condition for small solutions. However this condition cannot be used to characterise equations where small solutions arise; if the eigenvalues of A are real and one passes through the" origin, then det(yl) will change sign. If the eigenvalues of A are a complex conjugate pair and cross the y-axis at the origin then det(^) will instantaneously take the value zero but will otherwise remain positive (the same behaviour as when a real eigenvalue becomes zero but does not change sign). Therefore one cannot expect a change of sign in det(j4) whenever there are small solutions. The fact that the trace of A is the sum of the eigenvalues of A can be used to characterise this case. We summarise. For a real matrix A: (1) if det(i4) changes sign then there are small solutions, ,(2) if det(i4) becomes zero instantaneously and trace(A) simultaneously changes sign then there are small solutions, (3) if det(j4) becomes zero instantaneously and trace(j4) does not simultaneously change sign then there are no small solutions indicated. Systems of delay equations with small solutions 99 Example 4.3 We first consider the case when the matrix A takes the form Ait)^ sin 2nt + a sin 2iTt + c sin 2iTt + b sin 2irt + d By judicious choice of the constants a, b, c, d one can produce different types of behaviour. One can see that \A{t)\ = {a + d-b- c)sin27ri + {ad- be). We will illustrate with the following choices of the constants Case 1: a = 1.5, b = 0.7, c = 0.5, d = 0.5 where the determinant changes sign, Case 2: a = —2, b — 0.8, c = 1.8, d = 0.7 where, again, the determinant changes sign. Case 3: a = 1.6, b = 0.8, c = 1.8, d = 0.7 where the determinant never becomes zero. Prom the plots for cases 1 and 2, we can easily see the presence of small solutions in the eigenspectra shown in Figure 3. In the Case 3, the eigenspectra in Figure 3 indicate that, as expected, no small solutions are present. »ffV- i^t»t$*{»"*-«twti: XI I FIG. 3. Case 1. Case 2. Case 3 Example 4.4 Next, we consider the case when the matrix A takes the form sin 2Trt sin 2-jTt + b {sm2'Kt + b) sin 27ri We choose the constant 6 in the following ways Case 4: 6 := 0 so that det(A) becomes instantaneously zero at the same value that trace(yl) changes sign and the complex eigenvalues of A cross the y-axis at the origin. Case 5: 6 = 0.05 so that the complex eigenvalues of A cross the y-axis away from the origin. Here we can see that the characteristic shapes we familiar from our earlier work are not reproduced and further investigation is called for. We remark that (in the zoomed versions) the eigenspectrum where small solutions arise passes through the origin. This property is reproduced also for all other examples that we have tried. Example 4.5 Now we consider the case when the matrix A takes the form A{t) = t t+b -t-b t N. J. Ford and P. Lumb 100 FIG. 4. Left: Case 4. Right: Case 5 and (below) zoomed versions. for t e [-0.5,0.5), A(t) = A{t - 1) for t > 0.5 then it follows that A has complex eigenvalues that cross the y-axis at y = b when i = 0. We plot the eigenspectra for Case 6: 6 = 0 so the eigenvalues of A cross the y-axis at the origin, Case 7: b = 0.01 so the eigenvalues of A cross the y-axis away from the origin. FIG. 5. Left: Case 6. Right: Case 7 and (below) zoomed versions Systems of delay equations with small solutions 5 Conclusions We have seen that it is easy to extend the detection of small solutions by numerical methods from one-dimensional to two-dimensional problems where the eigenvalues are real. Initial experiments indicate that the method works also for problems possessing complex eigenvalues, but here the patterns that arise in the eigenspecra plots are unfamiliar and require further investigation. However, based on our experimental evidence, it seems that small solutions arise in the latter case if and only if the eigenspectra plots pass through the origin. Bibliography 1. O. Diekmann, S.A. van Gils, S. M. Verduyn Lunel, H.-O. Walther, Delay Equations, Springer Verlag, New York, 1995. 2. Y. A. Fiagbedzi, Characterization of Small Solutions in Functional Differential Equations, ^pp?. Mai/i. Lett. 10 (1997), 97-102. 3. N. J. Ford, P. M. Lumb, Numerical approaches to delay equations with small solutions, Proceedings of HERCMA 2001, to a,ppea,T. 4. N. J. Ford, S. M. Verduyn Lunel, Numerical approximation of delay differential equations with small solutions. Proceedings of 16th IMACS World Congress on Scientific Computation, Applied Mathematics and Simulation, Lausanne 2000, paper 173-3, New Brunswick, 2000 (ISBN 3-9522075-1-9). 5. N. J. Ford, S. M. Verduyn Lunel, Characterising small solutions in delay differential equations through numerical approximations, Applied Mathematics and Computation, to appear. 6. N. J. Ford, Numerical approximation of the characteristic values for a delay differential equation, MCCM Numerical Analysis Report No 350, Manchester University 1999 (ISSN 1360 1725). 7. J. K. Hale and S. M. Verduyn Lunel, Introduction to Functional Differential Equations, Springer Verlag, New York, 1993. 8. D. Henry, Small Solutions of Linear Autonomous Functional Differential Equations, J. Differential Equations. 8 (1970), 494-501. 9. S. M. Verduyn Lunel, A sharp version of Henry's theorem on small solutions, J. Differential Equations. &2{l%m),2m-2'JA. 10. S. M. Verduyn Lunel, Series Expansions and Small Solutions for Volterra Equations of Convolution Type, J. Differential Equations. 85 (1990), 17-53. 11. S. M. Verduyn Lunel, private communication. 101 On an adaptive mesh algorithm with minimal distance control Kamal Shanazari and Ke Chen Department of Mathematical Sciences, The University of Liverpool, Liverpool L69 7ZL, UK. {kamals, k.chen} 01iv.ac.uk. Abstract In this paper, we present a new technique for generating error equidistributing meshe.s that satisfy both local quasi-uniformity and a preset minimal me-sh spacing. This is firstly done in the one-dimensional case by extending the Kautsky and Nichols method [6] and then in the two-dimensional case by generalizing the tensor product methods to alternating curved line equidistributions. With the new meshing approach, we have achieved better accuracy in approximation using interpolatory radial basis functions (RBFs). Furthermore improved accuracy in numerical results have been obtained for a class of linear and non-homogeneous PDEs solved by the dual reciprocity method (DRM). 1 Introduction The adaptive mesh algorithms have been widely used in the numerical solution of partial differential equations (PDEs) for boundary value problems [1, 13]. One undesirable feature of an error equidistributing mesh is that there is no guarantee of it being sufRciently smooth. For our applications of interpolation (using RBFs), the distance between points becoming too small can imply that the underlying interpolation matrix becomes ill-conditioned. In this paper, we propose a method to deal with this problem in Section 2. Essentially our method consists of modifying the error monitor function in a suitable way and then equidistributing the new function so that the minimal mesh size constraint can be satisfied. We deal with the extension of adaptive mesh to two dimensions in Section 3. Finally, some numerical results will be given in Section 4. 2 An adaptive mesh with minimal mesh size control In the ID case, a typical adaptive mesh problem can be stated as follows: given a mesh (uniform or non-uniform) to,ti,...,tm, and its corresponding error values (usually estimated from the numerical solution using a monitor fvmction [5]) /o, /i,..., /m, we wish Support of a studentship by the Ministry of Education (Iran) is gratefully acknowledged. 102 Adaptive mesh algorithm with distance control 103 to find a new mesh n:a;o, a;i, ..., a;„, (2.1) that is locally bounded with respect to a positive constant A; > 1 such that 1/fc < hj/hj-i < k,-j = 1,2,.. .,n — 1, hj = Xj^i — Xj, while the errors are equidistributed on mesh n. One solution to this problem was given in [6] by replacing fj by fj followed by a standard equidistribution algorithm, fj is referred to as the padded function and the main idea of replacing fj is increasing the values of the function /, where too small, to prevent considerably large mesh sizes. We now propose a method of further modifying fj in such a way that the resulting equidistribution mesh satisfies the preset minimal mesh size hmin- Before proceeding, we consider replacing the piecewise linear function f{x) (with endpoint values fj = f{tj)) by another piecewise hnear function Z{x) (with endpoint values Zj = f{xj)). This is a technical approximation to simplify the presentation; actually the proposed method may work without this step. Note that if we were to equidistribute Z{x), the resulting mesh would not differ from Xj much; define the average value of the monitor function as d' = d'{Z)^^Y.^Zj + Zj^,)^. (2.2) Our aim now is to modify some Zj values so that the modified average value is the same as d' while the modified values ensure a preset minimal mesh size hmin is satisfied. To present our method, we note that insisting on hj > hmin implies Zj < Z where Zhmin ~ " (2-3) and Z is the critical constant to realize hmin- This points a way of modifying those large values of Zj. However it is not obvious how to ensure the new and modified average values are the same, i.e. equidistribution is maintained for the same error constant. Suppose that among the current Zj values, there are M +1 of them that are larger than Z (i.e. whose corresponding mesh size is less than hmin)', denote these values by Zk. for i = 0,1,..., M. This means that Zk^ < Z for j = M +1, M + 2,..., n. Here the sequence A;o,fci,.. .,fc„ represents a permutation of 0,1,2,.. .,n. It turns out that a suitable modification (from Zj to Zj) is the following: (i) Zkj=Z when Zkj>Z, i.e. for j = 0,1,.. .,M, M (ii) z,^=z,^+-^ J2iZk, - Z)h, ^k, ^2.4) l=M+l for j = M+l,M + 2,... where {hki+hki-i)/2 'T-ki = { ho/2 hn-i/2 when when when ki^Q,n, ki=0, ki = n. (2.5) 104 Shanazari and Chen For a simple illustration, see the plot of Fig 3b. To prove that the above modification is suitable, we first present the following result for a simple case. Theorem 2.1 Let xo,xi,...,Xn be a non-uniform mesh with the mesh sizes hj = Xj+i - Xj and Zo,Zi,. ..,Zn are the corresponding error values. If the critical constant value Z as in (2.3), and only one value Z\ > Z (i.e. M = 1 and all others Zj are less than or equal to Z), the modification (2.4) takes the following form, (i) (U) Zo = Zo, Zi-Z, Zj = Zj + ^n' r, [(^1 - ■^)(/io + /M)/2] l{hi + ft^-i)/2 forj = 2,3,.. .,n. Then the average value d = d{Z) of the modified values Zj is the same as d' = d'{Z) in (2.2). Note M = 1 here; in fact the results holds for any one value Zj > Z. Now we are readyto present the main result on equation (2.4) with regard to minimal mesh size control. Theorem 2.2 With the error function modified as in (2.4), the new mesh hj resulting from equidistribution satisfies (i) the average error value remains as d'; (ii) hj > hminHere hmin cannot be specified to be larger than h = 1/n (the uniform mesh size); practically we found hmin G [h^,h/2] is adequate. Full proofs to these results will be given in the full version of this paper [10]. In the method in (2.4), the values of ZA-^ which are less than but close to Z may become unnecessarily larger (e.g. larger than Z) and therefore we can propose a further refinement. We can keep some of the Zkj values which are between Z/2 and Z. In other words, we only modify the very large and very small values of Zkj (see plot of Fig 3b). Then our theorems are still valid but the proofs may need minor changes. Finally we summarise our adaptive method with minimal mesh size control as follows (see the plot of Fig 3b for an illustration). Algorithm 2.3. (Numerical algorithm) For given non-uniform mesh a = to,ti,..., tm = b, the error values fo,fi,...,fm,, values c and hmin'(1) Does the locally bounded mesh algorithm converge to the new m.esh a — XQ < Xi < ■ ■ ■ < Xn = b which is sub-equidistributing with respect to c and f, that is, for a sufficiently large value of the integer n such that J^ f < nc, and the inequalities / f<c, j = 0, l,...,n-l are satisfied. (2) Check the minimal mesh size and compare it with the hmin- If it is less than hmin> go to the Step 3 otherwise stop. (3) Approximate the padding values Zj — f{xj) corresponding to the new mesh by using piecewise linear interpolation of fi values and calculate the average value 1 "~-^ h ■ d=-^{Zj-\-Zj+i)-^, ^•=o where hj =Xj+i-Xj, Adaptive mesh algorithm with distance control 105 and Z according to Zhmin = d. (4) Obtain the decreasing arrangement of Zj, Zk^ by ordering them. (5) Modify the Zk^ values as follows, (i) Zkj = Z when Zk^ > Z, (ii) assuming, that for j = 0,1,..., M Zk^ > Z, Zk, = Zk, when Z/2 < Zk, <Z, assuming, that for j = M+ 1,M+ 2,.. .,N, (Hi) Z/2 < Zk, < Z, Zk^ = Zk^ + Y^n^^l^z^^ [Et^o(^fei - Z)hki\ l\ for j^N+l,N + 2,..:,n, where hk^ was introduced in (2.5). (6) Check the modified values Zkj in the stage (Hi) of the Step 5. If Zk^ < Z/2 for all j, go to Step 7 otherwise repeat Step 5. (7) Perform the equidistribution procedure for the modified values Zkj o-nd obtain the new adapting mesh. 3 Extension to two dimensions The concept of adapting mesh in one dimension is well known (see e.g. [5, 3]). Extension of this idea to two dimensions is not straightforward. For a given function f{x, y) and 2D domain Q, an obvious extension is dividing the domain fi into some subdomains fij in such a way that f{x,y) = constant. (3.1) J jQi 1. In Fig (a) the monitor values corresponding to the new mesh are represented by '*', the linear interpolation for these values is shown by '-.' and in Fig (b) the modified values of the padded function, represented by dash line, are compared with the original values. FIG. Shanazari and Chen 106 2. In Fig (a) equidistribution of slabs in the two coordinate direction and in Fig (b) three stages of the new method are shown. FIG. But, such a partition is not unique and furthermore satisfying condition (3.1) properly is not simple. Consequently, this condition has to be replaced. Among the methods given to satisfy the condition (3.1) as much as possible, two well known methods are transformation and dimension reduction. Transformation methods are based on mapping the physical domain into a simple domain with a uniform mesh and ultimately applying the equidistribution condition to obtain an adapting mesh in the physical domain [4, 12]. These methods are generally costly and complicated in theory. In this work we first consider the latter method which is easier and cheaper than the former method. We then present a new technique to generate a 2D mesh. 3.1 Dimensions reduction We assume that fi is a rectangle in the form fl = {{x,y), a < x < b, c < y < d}. A simple idea is to produce the mesh, a = XQ < Xi < . . .< X„-l < Xn = b, C = yo < yi < ■ ■ ■ < Vm-l < Vm = d, such that r^i+1 rVm / / Jxi Jyo fx{x,y)dydx = constant, (3.2) fy{x,y)dxdy = constant, (3.3) and rVj+i r^r, / / Jyj Jxo where fx{x,y) and fy{x,y) are the monitors in the x and y directions respectively (see Fig 3.1a). Obviously the generated mesh by this method is much different from an equi-distributing mesh that one expects from (3.1). Another method which leads to a non-rectangular grid is dimensional splitting [11]. We now describe a new method of type dimension reduction. Adaptive mesh algorithm with distance control O.B 107 -...-._ ■ f-.\w,r.-.- : ^vr-• ••^ —: ....^', "i ..v/ .-"v-*' . -::"* ::-■ ■ '::" --:: O.S O ■ ■ ^:^ - ::-. - fr-~;-. :: : :: ,, .{,5i.=v. '.':-r • ---"^»" ^. FIG. 3. In Fig (a) the mesh generated by the new method for function in (3.6) and in Fig (b) the resulting mesh when restricting the minimal mesh size as hmin = h/2 for the same function are shown. 3.2 A new approach for a 2D mesh The idea is based on the tensor product method and therefore a non-rectangular grid. We start with a uniform mesh in a rectangular region f] and perform the method in three stages. In the first stage, the error equidistributing is performed for each line in the horizontal direction (see the first part of Fig 3.1b), that is. /" J Xi fx{^: Vi) dx = constant for z = 0,1,..., m. (3.4) In the next stage, the mesh is redistributed in the vertical direction along the new grid lines (see the second part of Fig 3.1b), that is. fy{xj ,y)dy = constant for j = 0,1,..., n. (3.5) J Si where Sj+i — Sj is the distance between two consecutive points {xj,yi) and Xi,yi+i) '^jii along the new lines. In the final stage, equidistributing is repeated in the horizontal direction along the grid lines (the last part of Fig 3.1b). One can observe that repeating this procedure usually leads to a convergent mesh. According to our experiments, the number of iterations to achieve convergence is at most five. The resulting mesh by this procedure for function u{x,y) = e(^-^"~^y"T (3.6) when applying the arc-length monitor is shown in Figure 3a. The idea of controlling the mesh size can also be applied in this technique. The generated mesh for the same function when the mesh sizes are restricted to hmin = h/2, where h is the mesh size in the case of uniform mesh, is given in Figure 3b. 4 Numerical examples In this part the affect of adapting the mesh on the accuracy of interpolation and the DRM is considered. In the following examples, the infinity norm has been used to measure the Shanazari and Chen 108 ..'^ 'i'f 11 VNi-Wl* 0.2 - ■ • > :.■.-.■.•.::.- . ■ -—<i ?^*^ —a —1 4. The resulting mesh when using the new method for function in Examples 1 and 2 are shown in Figures (a) and (b) respectively. FIG. Method uniform mesh Adaptive mesh with control Adaptive mesh without control stage — first second third first second third Function (El) 5.1E-2 5.4E-3 5.4E-3 3.8E-3 1.4E-2 2.2E-2 1.8E-2 Derivative 9.5E-1 1.6E-1 3.0E-1 3.0E-1 9.9E-2 7.5E-1 6.0E-1 Function (E2) 1.3E-2 2.5E-3 2.1E-3 3.7E-3 2.5E-3 2.1E-3 4.5E-3 Derivative 2.2E-1 1.3E-2 l.OE-1 l.OE-1 1.5-2 l.OE-1 1.2E-1 TAB. 1. The interpolation error for Examples 1-2 using adaptive mesh with and without control the mesh sizes. accuracy, that is, if u and u are the exact and approximate values respectively then the error is calculated as e„ = ||u(a;) -u(x)% max \u{x) — u(x) A polynomial RBF, 1 -|- r-'', has been employed in this work. Example 4.1 We check the interpolation in terms of the RBFs for the function, t/(a;,2/) = (l-e^"-3)sin(1.5 7r2/), (4.1) in a rectangular domain. The generated mesh for this function is shown in Figure 4a. Table 4 shows the affect of adapting mesh on the interpolation accuracy with and without controlling the mesh sizes. As one can observe, using the adapting mesh considerably improves the accuracy in comparison with the case of uniform mesh. Moreover, the result in the case of controlling the minimal mesh size is better. Adaptive mesh algorithm with distance control Example 4.2 In this example we first check the function f2{x, y) = 0.5—0.5 tanh(-4+ l&x^ + 16y2) and then solve the linear PDE: y^" + 2^1^ + ^% + ^V^ = ^' ^i*'^ ^^^ Dirichlet boundary condition over the elliptic domain x^ + 4j/^ = 4, where d is a known function such that the exact solution is w(a;,j/) =/2(a;, y). Again from Table 4, we see improved approximation. We apply the DRM method [7] for solution, where the domain integrals are approximated by using RBF interpolation. The adaptive mesh for this function is given in Fig. 4b and has been observed to give rise to improved DRM solution. 5 Conclusions We considered a new algorithm for producing a locally bounded mesh with a preset minimal mesh size. Such a mesh is used to overcome the ill-conditioning problems associated with radial basis function interpolation. Extension of the idea to the 2D case is also considered. Some preliminary and improved numerical results are given. Bibliography 1. Ainsworth, M. and Oden, T. J., A Posterior Error Estimation in Finite Element Analysis, John Wiley, 2000. 2. Beckett, G., Mackenzie, J. A., Ramage, A. and Sloan, P. M., On the numerical solution of one-dimensional PDEs using adaptive methods based on equidistribution, Journal of ComputationalPhysics, 167 (2), 372-392,2001. 3. Carey, G. F. and Dinh, H. T., Grading Functions and Mesh Redistribution, SIAM J. Numer. Anal., 22 (5), 1028-104:0, 1985. 4. Chen, K., Two-Dimensional Adaptive Quadrilateral Mesh Generation, Communications in Numerical Methods In Engineering, 10, 815-825, 1994. 5. Chen, K., Error Equidistribution And Mesh Adaptation, SIAM J. Sci. Comput, 15, No 4, 798-818, 1994. 6. Kautsky, J. and Nichols, N. K., Equidistributing Meshes With Constraints, SIAM J. Sci. Statist. Comput, 1, No 4, U9-511, 1980. 7. Partridge, P. W., Brebbia, C.A, and Wrobel, L. C, The Dual Reciprocity Boundary Element Method, Computational Mechanics Publications, 1992. 8. Pereyra V., and Sewell, E. G., Mesh selection for discrete solution of boundary problems in ordinary differential equations, Numer. Math., 23, 261-268, 1975. 9. Profit, A., Chen, K. and Amini, S., Application of the DRBEM with Adaptive Internal Points to Nonlinear Dopant Diffusion, Proc. 2nd UK BIE conf., Brunei University Press, 1999. 10. Shanazari, K. and Chen, K., On an adaptive mesh algorithm with minimal distance control for the dual reciprocity method. In preparation. 11. Sweby, P. K., Data-Dependent Grids, A^Mmericai AnaZj/sis Kepori 7/,S7, University of Reading, UK, 1987. 12. Thompson, J. F., Warsi, Z. U. A. and Mastin, C. W., Numerical Grid Generation Foundations and Applications,'NoTth.-H.oWand, 1985. 13. White, A. B., On selection of equidistribution meshes for two-point boundary value problems, SIAM J. Numer. Anal., 19, 472-502, 1979. 109 An alternative approach for solving Maxwell equations Wolfgang Sproessig Freiberg University of Mining and Technology, Germany sproessig@math.tu-freiberg.de Ezio Venturino Politecnico di Torino, Italia egvv@calvino.polito.it Abstract At present the use of hypercomplex methods is pursued by a growing number of mathematicians, physicists and engineers. Quaternionic and Chfford calcuhis will be applied on wide classes of problems in very different fields of science. We explain Maxwell equations within the geometric algebras of real and complex quaternions. The connection between Maxwell equations and the Dirac equation will be elaborated. Using the Teodorescu transform we will deduce an iteration procedure for solving weak time-dependent Maxwell equations in isotropic homogeneous media. Assuming the so-called Drude-Born-Feodorov constitutive laws Maxwell equations in chiral media were deduced. Full time-dependent problems will be reduced to the consideration of Weyl operators. 1 Historical oriented introduction Classical Maxwell equations were discovered in the second half of the nineteenth century as result of the stormy development of electromagnetic research in that time. The study of these equations ha.s attracted generations of physicists and mathematicians but some of their secrets are still hidden. At about the same time, also new algebraic structures were invented. W.R. Hamilton discovered in 1843 the algebra of real quaternions as a generalization of the field of complex numbers. Under the influence of H. Grassman's extension theory and Hamilton's quaternions, W.K. Clifford created in 1978 a geometric algebra, which is nowadays called Clifford algebra. Its construction starts with a basis in the signed R" = RP''^ with units ei,..., e„. Assume that ef = -1, for i — 1,..., q, and ej = 1, for j = 1, ...,p, as well as the anticommutator relation GiG-j "T" GjGi — U for i 7^ j. Together with eo = 1 one can construct a basis in the 2"-dimensional standard Clifford algebra Clp^g. Incidentally, in 1954 C. Chevalley [5] showed that each Clifford number, i.e. each element of Clp^q, can be identified with an antisymmetric tensor. Let us go back to the electromagnetic field equations. Already J. C. Maxwell [15] himself and W. R. Hamilton [10] used these new algebraic techniques to try to simplify 110 An alternative approach for solving Maxwell equations Maxwell's equations. The aim was to obtain an equation of the type Du + au = F with suitable operators D and a. For this reason Hamilton introduced his "N'abla operator" as well as the notion "vector". The tendency of algebraisation of physics continued in the first half of the last century. A long list of important publications were devoted to this topic. We only stress here some of the milestones, beginning with the "Theory of Relativity" by L. Silberstein (1914) [18] , and H. Weyl's book "Raum-Zeit-Materie" of 1921. Important results of Einstein/Mayer, Lanczos and Proca foUowed. In 1935 this development highlighted with the thesis of M. Mercier (Geneva) [16]. After the reinvention of the concept of "spinors", firstly appeared in 1911 in a paper by E. Cartan, D. Hestenes [11, 12, 13], F. Bolinder [3] and M. Riesz [17] wrote fundamental algebra papers with applications in electromagnetic theory, using the framework of Clifford numbers and spinor spaces. Meanwhile, in the late thirties the famous Swiss mathematician R. Fueter and his coworkers and followers used a function-theoretic approach for the same problems. These ideas were refreshed and fruitful extented by R. Delanghe and his group and A. Sudbury in the seventies and early eighties (cf. [4, 20]). Influenced by the success of complex analysis and Vekua theory a generalized operator theory with corresponding singular integral operators [19] and a corresponding hypercomplex theory for boundary value problems of elliptic partial differential equations Were developed [8],[9]. Making use of a transformation of Maxwell's equations into a system of homogeneous coordinates we will propose an alternative solution method. 2 Meixwell equations Let ,G be a bounded domain with sufficient smooth boundary F that is filled Out with an isotropic homogeneous material. Using Gauss units Maxwell equations read as follows: CTOt H = 4ITJ + dtD c rot E div D div B = = = —dtB iirp 0 (Biot-Savart-Ampere's law) (Faraday's law) (Coulomb's law) (no free magnetic charge) Furthermore, the continuity condition has to be fulfihed: div J = -dtp, where E — E{t, x) is the electric field, H = H{t, x) the magnetic field, J = J{t, x) the electric current density, D = D{t,x) the electric flux density, B = B{t,x) the magnetic flux density, p = p{t, x) the charge density, and c is the speed of light in a vacuum. The relations between flux densities and the electric and magnetic fields depend on the material. It is well-known that for instance all organic materials contain carbon and 111 112 Wolfgang Sproessig and Ezio Venturino realize in this way some kind of optical activity. Therefore, Lord Kelvin introduced the notion of the chirality measure of a medium. This coefficient expresses the optical activity of the underlying material. The correspondent constitutive laws are the following: D = eE + ef3 rot E (Drude-Born-Feodorov laws), where e = e{t, x) is the electric permittivity, /x = n{t, x) is the magnetic permeability and the coefficient /3 describes the chirality measure of the material. In isotropic cases one has the possibility to use the so-called Tellegen representation D = eE + aH, B = nH + a*E. The connection between the electric field E and current density J is given by I J=aE+ag where a is the electric conductivity and g a given electric source. Starting with /3 = 0 and replacing D and B hy D = e E and B = /iH we get in the case of £ = S{x) , fJ, = ll{x) -EdtE + cmtH fidtH + c lot E edivE = = = 0, iTrp-{Ve-E); (2.1) (2.2) (2.3) /xdivF = -{Vn-H). (2.4) 4TTJ, After summing (2.1) and (2.4) as well as (2.2) and (2.3) we obtain -sdtE + cT0tH + ndivH = -{Vn-H) + 4TrJ, (2.5) ^i^tH + cTotE + £diyE = -{Ve■H) + 41rp. (2.6) In the case of e,n being constants we can introduce the new functions E, H which are defined on a homogeneous space with a first coordinate XQ and the other coordinates IT = (;ri,;r2,i3). We obtain: E{t,x) =: E (--t,-c H{t,x)=:H(-t,-c The equations (2.5)-(2.6) transform into diE + TOt H + fi c div H = Air J, diH+ iotE + £cdiv £ = 41: p. An alternative approach for solving Maxwell equations 3 113 Quaternionic representations Let 61,62,63 be the generating units of the algebra of real quaternions M, which fulfil the conditions eiej-\-ejei =-25ij (i,j = 1,2,3). This leads to the following multipUcation rule for two quaternions u = UV = UoVo —U-V + UOV + VQU + UX V UQ+U , v= VQ+V: (ViGR), where u — uiei + u^eo, + U363 , v = vie^ + ^262 + ^363- Further, let u = UQ + u be a quaternion. Then U = UQ-U\S called to be its conjugate quaternion. The operator defined by £> = 5iei + ^262 + 9363 is called Dirac operator. It acts on a quaternionic valued function as follows: Dti = — div u + rot u + grad Uo • With the multiplication operator me meu — OuQ + u withu = Uo+M, (0 6 IR"*"), u = Uiei + ^262+ 11363, we obtain mf,o{diE + DH) = A-K J , rUscidiH + DE) = 4n p , and so diE + DH = m~J47rJ diH + DE = -1/ mj^'inp. Finally, we get d{E + H)=di{E + H) + D{E + H)=4Tr{m-^J + mJ^^p)=:Fi, d{E-H) = di{E-H)-D{E-H)=4TT{m-^J-mJ^^p)=:F2, where d is also called Weyl operator and d is the conjugate to d. By the way, a function u is called quaternionic regular iidu = 0 and quaternionic anti-regular iidu = 0. For simplifying we set: E + H =: v and E - H —-.w. Then it follows dw dw = = Fi{v,w), F2{v,w). (3.1) (3.2) Let us have a closer look at the functions ^1,^2- The electric current density J is given by J = aE + ag, 114 Wolfgang Sproessig and Ezio Venturino where E and g are vector functions. This leads to the following simplification Fi =47r a{E + g) + 27r cr(v + w) -\- ag ^ eci F2 =47r (j{E + ag) ecJ = 27r a(v + w)+g ec ec Hence F2 = -F\ . Thus 4 dw = Fi{v,w), (3.3) dw = —Fi{v,w). (3.4) Integral representation Let G be a bounded domain in R^ and a a positive constant. We consider in R,^ the cylinder Z = Gx [—a,a]. A right inverse to the Weyl operator is the following Teodorescu transform: {Tzu){x) = — fe{x- y)u{y)dy , (^3 J Z = Gx [-a, a] with e(x) = xj\x\'^, 0-3 = 27r'''^^/r(3/2). We obtain in a straightforward manner dTzu = {0 in in Z, e," \ Z, and Tzdu + (t>z [l in in Z, _ e,^ \ Z, with (j)z G kerS. In complete knalogy a conjugate Teodorescu transform T^ is introduced. We just have to replace e{x) by its conjugate. Now it follows from (7)^(8) that , v = TzFx{v,w) + (f,z {d(pz=0), w = T*zF2(v,w) + 4>*z {Wz=^)Furthermore we have to introduce Cauchy-Bizadse-type operators, which are defined by the boundary data. These operators read as follows: {Fezv){x) = - fe{x- y)n{y)u{y)d{dZ)y {x i dZ) dZ and iF^zu)ix) := - feix- y)n{y)u{y)d{dZ)y, {x i dZ) . (^3 J dz where n{y) = (no +n){y) denotes the unit vector of the outer normal on dZ at the point y. ■ An alternative approach for solving Maxwell equations 115 It can be proved that •^1 = ^az'^ and (/>2 = FQZV in Z . It should be noted that we do not need the whole trace of the functions w and v on the boundary. We just have to consider these parts of trzv ( trzw) which are lying in the corresponding Hardy space of functions, which permit a quaternionic regular (quaternionic anti-regular) extension into Z, accordingly. We get the integral equations V = A'KaTz{v + w) + 4TrTz{ag+~) + h, (4.1) w - 4TTaT^{v + w)+47rT^{ag-^) + h*, (4.2) where h = Fsztrazv and h* = Fgztrgzw. If h, h* are known then under smallness conditions the iteration procedure: Vn=4:TraTz{v„-i+Wn-i) + 4.'jrTz{c7g+—) + h, Wn = 4:7raT^{vn-i+Wn-i) + i'KT^{ag--^) + h*, with (t)o = u;o = 0) will converge in suitable Banach spaces. Remark 4.1 In [1] is proved the following estimation: 5 Weak time dependent Maxwell-equations Assume now e = £(a;),/i =/x(x),K = «;(a;) (5 = 0) and E{t,x) = Eo{t)E{x) where the scalar functions EQ and H{t,x) = Ho{t)Hi{x), and Ho are known. Maxwell equations then transform to cEoiotEi = -dtifiHo)Hi, (5.1) cHomtHi £'o(Ve-£i)-|-edivEi = = (dtieEo) + 4TVKEO)EI , 47rp, (5.2) (5.3) {ViJ.-Hi)+fidWHi = 0. (5.4) It follows rot El = , „ ,. ,-, -divEi = -— Hi =: aoHi , c iio fs dtEo , 4TTK EO\ 47rp Ve „ , „ -p + Ei^p'-a-Ei, ., 116 Wolfgang Sproessig and Ezio Venturino -divFi = ^ ■ Hi = -0 ■ Hi . Here a = ao + a , P = po + §_, a := -^ , 13 := -^. Using the fact that in H Du = — div u + rot u, we get ■ DEi = aoHi + p' -a-Ei, DHi = l3oEi-§_-Hi. The right inverse of D is the corresponding Teodorescu transform To over G CR^. A short calculation leads to Ei=TGaoHi-TGa-Ei + TGp' + <l>i, HI=TG(3OEI-TG§_-HI + (I>2, where (/)j € ker D (i = 1,2). The iteration method Ej"^ =-Tea • if (") = with H["^ = conditions. E[°'' TG^^ EJ"-') . £(") - + TcaoFl"-'^ + Top'+ ,/.i, TG^ . Jj("-^) + ^2, - 0 converges in suitable Banach spaces (L2, W^,C) under smallness In the time-harmonic case i.e. Ho = Eo = l and e, /x, are constants and have DEi=p' and K = K{X) we DHi = l3oEi. Setting Po = 6~^ we obtain DSDHi=^^=p', i.e. AHi = -f. If boundary values of iJi (trr/fi) are known i.e. trrHi = g the complete solution is given by Hi=Frg + TGVsDh + TGQsSTGf. (5.5) Here Vs and Qs are orthoprojections on subspaces in the quaternionic Hilbert space L2{G), namely L2{G) = SkeTDnL2{G)QDW2iG). s An alternative approach for solving Maxwell equations The scalar product is defined by iuSvdG e M. {u,v)5 := G The operator Vs can be seen as a generalized Bergman projection. In the representation formula from above is Fr the Cauchy-Bizadse operator on T and h a smooth continuation of g into G. Note that Vs and Qs can be explicitly defined (cf. [9])! Then Ei = -^VsDh + QsSTGf. 4TTK Let us prove that the boundary condition is fulfilled! Indeed, QsTof = Df with TGDf = f-Frf^O / €1^2 i-e- trrf = 0. (Borel-Pompeiu's formula). On the other hand, Plemelj-Sokhotzkij's formulae yield: trrHi = Prg + trrVsDh = Prg + trrTDh-trrTQsDh = Pr9 + 9-Pr9 + 0 = 9- Pr is the so-called Plemelj-projection onto that Hardy space of iff-regular extendible functions into G. Bibliography 1. H. Bahmann, K. Guerlebeck , M. Shapiro and W. Sproessig, On a modified Teodorescu transform. Integral Transforms and Special Functions 12 (2001), 213-226. 2. A. W. Bitsadze, On two-dimensional integrals of Cauchy-type, Akademii Nauk Grus. 55i? 16 (1955), 177-184 (Russian). 3. E. F. Bolinder, The classical electromagnetic equations expressed as complex fourdimensional quantities. J. Franklin Inst. 263 (1957), 213-223. 4. F. Brackx, R. Delanghe and F. Sommen, Clifford analysis, Pitman Research Notes m Mai/i., Boston, London, Melbourne, 1982. 5. C. Chevalley, The algebraic theory of spinors, Columbia University Press, New York, 1954. 6. W. K. Clifford, Applications of Grassmann's extensive algebra. Americ. J. of Math. Pure and Appl. 1 (1878), 350-358. 7. R. Fueter, Analytische Theorie einer Quaternionenvariablen. Comment. Math. Helv. 4 (1932), 9-20. 8. K. Guerlebeck and W. Sproessig, Quaternionic Analysis and Boundary Value Problems, Birkhuser Verlag, Basel, 1990. 9. K. Guerlebeck and W. Sproessig , Quaternionic and Clifford calculus for physicists and engineers. Mathematical Methods in Practice Vol. 1, John Wiley &: Sons, 1997. 117 118 Wolfgang Sproessig and Ezio Venturino 10. W. R. Hamilton, Elements of Quaternions (2 Vols), Chelsea, (reprint 1969) 1866. 11. D. Hestenes, Space-Time Algebra, Gordon and Breach, New York, 1966. 12. D. Hestenes, New foundations for classical mechanics, Reidel, Dordrecht, Boston, 1985. 13. D. Hestenes and G. Sobzyk, Clifford algebras for mathematics and physics, Reidel, Dordrecht, 1985. 14. V. V. Kravchenko and M. Shapiro, Integral representations for spatial models of mathematical physics. Pitman Research Notes in Math. Series 351, 1996. 15. J. C. Maxwell, The Scientific Papers (2 Vols), Dover, 1969. 16. M. Mercier, Expression des Equations de lectromagnetisme au moyen des nombres au Chfford, Thesis Nr. 953, University of Geneva, 1935. 17. M. Riesz, Clifford Numbers and Spinors, Lecture Series 38, Maryland, 1958. 18. L. Silberstein, The theory of relativity, Macmillan, London, 1914. 19. W. Sproessig and E. Venturino, The treatment of window problems by transform methods, Zeitschrift fur Analysis und Anwendungen 12 (1996), 6A3-654. 20. A. Sudbery, Quaternionic analysis, Math. Proc. Cambr. Phil. Soc. 85 (1979), 199225. Chapter 3 Metrology 121 Orthogonal distance fitting of parametric curves and surfaces Sung Joon Ahn, Engelbert Westkamper, and Wolfgang Rauh Fraunhofer Institute for Manufacturing Engineering and Automation (IPA) Nobelstr. 12, 70569 Stuttgart, Germany {sja; wke; wor}@ipa.fhg.de Abstract Fitting of parametric curves and surfaces to a set of given data points is a relevant subject in various fields of science and engineering. In this paper, we review the current orthogonal distance fitting algorithms for parametric models in a well organized and easily understandable manner, and present a new algorithm. Each of these algorithms estimates the model parameters minimizing the square sum of the error distances between the model feature and the given data points. The model parameters are grouped and simultaneously estimated in terms of form, position, and rotation parameters. The form parameters determine the shape of the model feature, and the position/rotation parameters describe the rigid body motion of the model feature. The new algorithm is applicable to any kind of parametric curve and surface. We give fitting examples for circle, cylinder, and helix in space. 1 Introduction The use of parametric curves and surfaces is very common and model fitting to a set of given data points is a relevant subject in various fields of science and engineering. For fitting of curves and surfaces, orthogonal distance fitting is of primary concern because of the applied error definition, namely the shortest distance from the given point to the model feature [5, 9]. While there are orthogonal distance fitting algorithms for explicit [3], and implicit models [2, 7] in the literature, we are considering in this paper fitting algorithms for parametric models [4, 6, 8, 10, 11] (Fig. 1). The goal of the orthogonal distance fitting is the estimation of the model parameters minimizing the performance index ag = (X - X'f pTp(X - X') (1.1) ag^d^pTpd, (1.2) or where X"^ = {Xj,..., X^) and X''^ = (X'7,..., X'J) are the coordinates vectors of the m given points and of the m corresponding points on the model feature, respectively. Moreover, d"^ = (di,.. .,dm) is the distances vector with rfj = ||Xi - XJ||, P'^P is the weighting matrix. We are calling the fitting algorithms based on the performance indexes (1.1) and (1.2) coordinate-based algorithm and distance-based algorithm., respectively. 122 123 Orthogonal distance fitting of parametric models Measuremert point: X, Orthogonal contactingpoint: xj ■x(a z ,M) (a) Measurement point: X, Orthogonal contacting point: x' FIG. 1. Parametric features, and the orthogonal contacting point x^ in frame xyz from the given point Xj in frame XYZ: (a) Curve; (b) Surface. In this paper, the model parameters a are grouped and simultaneously estimated in three categories. First, the form parameters ag (e.g. three axis lengths a, b, c of an ellipsoid) describe the shape of the standard model feature defined in model coordinate system xyz (Fig. 1) x = x(ag,u) with ag = (ai,. ,ai (1.3) The form parameters are invariant to the rigid body motion of the model feature. The second and the third parameters groups, respectively the position parameters Ap and the rotation parameters a^, describe the rigid body motion of the model feature-in machine coordinate system XYZ: where X=:R-ix + Xo or x = R(X-Xo), R = RKR^OR^, = (ri r2 ra)'^ , R"-^ = R'^ , and ar = {CJ,(P,K) 3-p — XQ — [Xo, Jo, Zo) (1.4) A subproblem of the orthogonal distance fitting of a parametric model is the finding of the location parameters {uj}^i, which represent the nearest points {X^j^^ on the model feature from each given point {Xj}J^i. The model parameters a and the location parameters {ui}^i will generally be estimated through iteration. By the total method [6, 10], a and {UJI^^I wih be simultaneously determined, while they are to be separately estimated by the variable-separation method [4, 8,11] in a nested iteration scheme. There could be four combinations for algorithmic approaches as shown in Table 1. One of the algorithmic approaches in Table 1 results in an obviously underdetermined linear system for iteration, thus, it has no practical application. We describe and compare the reahstic three algorithmic approaches in the following sections. 124 S. J. Ahn, E. Westkamper, and W. Rauh Algorithmic approaches Total method Variable-separation method Coordinate-based algor. I (ETH [6, 10]) III (FhG, this paper) 1. Orthogonal distance fitting algorithms for parametric models. TAB. 2 Distance-based algor. Underdetermined system II (NPL [4, 11]) Orthogonal distance fitting algorithm I (ETH) The ETH algorithm [6, 10] is based on the performance index (1.1), and simultaneously estimates the model parameters a and the location parameters {ui}J!ii for the nearest points on the model feature. We introduce the new estimation parameters vector b containing a and {uj^i as follows, b'^ = (a'^, uj",..., u^) = (aj, aj, a^, uj,..., u^). The parameters vector b minimizing the performance index (1.1) can be determined by the Gauss-Newton method p_ Ab = P(X-X')|fc, bfc+i = bfc + aAb, (2.1) with the Jacobian matrices of each point X^ on the model feature, from (1.3) and (1.4) Jx^b = = ax dh x=x; >-i dx R da, R dh^ ab '^^ ab an -1 d&r Oi,--,Oi_i R idx a^'Oi+i.' ■,o„ j A disadvantage of the ETH algorithm is that the storage space and the computing time cost increase very rapidly with the number of the data points, unless the sparse linear system (2.1) is handled beforehand by a sparse matrix algorithm. 3 Orthogonal distance fitting algorithm II (NPL) The NPL algorithm [4,11] is based on the performance index (1.2), and separately estimates the model parameters a and the location parameters {ujjj^j in a nested iteration ''*''"'' min min <Tg({XKa,u)}r=i). » {"iltei The inner iteration determines the location parameters {u^}^i for the minimum distance points {X^}J^i on the current model feature from each given point {Xj^j, and, the outer iteration updates the model parameters. In this paper, in order to implement the parameters grouping of a^ = {aJ,aL^,aJ), we have modified the initial NPL algorithm. 3.1 Orthogonal contacting point For each given point x, = R(Xj—XQ) in frame xyz, we determine the orthogonal contacting point x^ on the standard model feature (1.3). Then, the orthogonal contacting point X^ in frame XYZ to the given point Xj will be obtained through a backward transformation of x^ into XYZ. We are searching the location parameters u which minimizes the error distance between the given point Xj and the corresponding point x on the model 125 Orthogonal distance fitting of parametric models feature (1.3) D = {Ki- x(ag, u))'^(xi - x(ag, u)). The first order necessary condition for a minimum of (3.1) as a function of u is (3.1) (3.2) The condition (3.2) means that the error vector (xj-x) and the surface tangent vectors dx/du at X should be orthogonal. We solve (3.2) for u by using the Newton method (how to derive the Jacobian matrix di/dn is shown in Section 4). dvi Au=:-f(u)|fc, Ufe+1 = Ufc + aAu. (3.3) 3.2 Orthogonal distance fitting We update the model parameters a minimizing the performance index (1.2) by using the Gauss-Newton method (outer iteration) P—- Aa=-Pd|fc, a/;+i ^afe + aAa. 9^ k Prom di = \\yii - Xy, and equations (1.3) and (1.4), we derive the Jacobian matrices of each orthogonal distance di ddi (Xi - x^'^ ax Jdi.a da Xi X'JI 9a T R -1 ax 5x5u du da da. aR-i axo da da With (1.4) and (3.2) at u=<, {Xi - and Jdi.a = X^TR- (X,-XO^ / ||X,;-X'|| I _i5x i9x da. (X.-X,) - dar -X,- is the resultant Jacobian matrix for dj. A drawback of the NPL algorithm is that the convergence and the accuracy of 3D-curve fitting (e.g. fitting of a circle in space) are relatively poor. 2D-curve fitting or surface fitting with the NPL algorithm do not suffer from such problems. 4 Orthogonal distance fitting algorithm III (FhG) At the Praunhofer Institute IPA (FhG-IPA), a new orthogonal distance fitting algorithm for parametric models is developed, which minimizes the performance index (1.1) in a nested iteration scheme (variable-separation method). The new algorithm is a generalized extension of an orthogonal distance fitting algorithm for implicit plane curves [1]. Interested readers are referred to [2] for the orthogonal distance fitting of implicit surfaces and plane curves. The location parameter values {u^}g,i for the minimum distance 126 S. J. Ahn, E. Westkamper, and W. Rauh points {Xf}J^i on the current model feature from each given point {XJJILi are to be found by the algorithm described in Section 3.1 (inner iteration). In this section, we intend to describe the outer iteration which updates the model parameters a minimizing the performance index (1.1) by using the Gauss-Newton method dX.' Aa = P(X-X')|fc, afc+i=aA. + QAa, (4.1) da fc with the Jacobian matrices of each orthogonal distance point X[, from (1.3) and (1.4) ' dx 'X' a = .j dx du du da R dx du du da = R»-i , da da X=X' da dx R-1 + dR- + gXo da dR -1 I dag u=u; -x + da^ (4.2) The derivative matrix du/da at u = Ui in (4.2) describes the variational behavior of the location parameters u^ for the orthogonal contacting point x^ in frame xyz relative to the differential changes of the parameters vector a. Purposefully, we derive du/da from the condition (3.2). Because (3.2) has an implicit form, its derivatives lead to df du di dxi af 0 + 7^ ^+ du da ' dxi da ' da where dxi/da is, from Xj = R(X,: - ax,; dR ,„ da da „ . or d{ du du da = 0 ( di dxi (9i_dx^_^d{\ \dxi da da) (4.3) XQ), „ aXo (Xj — XQ) — R da R S<'''-^°>)- The other three matrices df/du, dt/dxi, and df/da in (3.3) and (4.3) are to be directly derived from (3.2). The elements of these three matrices are composed of simple hnear combinations of components of the error vector (x, - x) with elements of the following three vector/matrices dx/du, H, and G (XHG matrix): dx , x,), H^f';-' l^A , G= I Gi [Xi \X{ Xj X-iiy XJ Xyy \Xi ~ X) [X-i xj X xjGo - (xi xjGo - (xj x)TGi Xy) [X^ Xyj df af . d^r-^""^ ■"■") ' a„ da (4.4) xfG2 Now (4.3) can be solved for du/da at u = u^, and the Jacobian matrix (4.2) and the linear system (4.1) can be completed and solved for the parameter update Aa. We would like to stress that only the standard model equation (1.3), without involvement of the position/rotation parameters, is required in (4.4). The overall structure of the FhG algorithm remains unchanged for all dimensional fitting problems of parametric models. All that is necessary for a new parametric model is to derive the XHG matrix of (4.4) from (1.3) of the new model feature, and to supply a proper set of initial para- 127 Orthogonal distance fitting of parametric models ® Measured/given point /- >N Orthogonal contacting point/-~N ^ (Xr-X^ -^— (XT) i = \,...,m i^ ={a'„Xl,oi>,(p,K) x = R(X-X„) t = |(R(x,-x„))' T -5X •'":••" Sal FIG. X Y Z 5 1 -3 Orthogonal I au 5a l^Sx,. Sa Sa cohtactini %^ f(x,,x(ag,u)) = 0 Machine coordinate system XYZ X = R-'x + X„ |-(R-'x(a„u) + X„)| da Model coordinate system xyz © a.^i =a,+aAa Aa=P Machine coordinate system XYZ ■®- '' Model coordinate system xyz x(a_,u) 2. Information flow with the FhG algorithm. 6 3 -1 TAB. 5 4 1 5 6 3 3 5 5 2 4 7 0 2 9 -1 0 11 -1 -2 11 0 -5 11 3 -7 11 4 -8 11 7 -10 11 9 -9 10 2. Fourteen coordinate triples representing a helix. meter values ao for iteration (4.1). An overall schematic information flow with the FhG algorithm is shown in Fig. 2. The FhG algorithm shows robust and fast convergence with 2D/3D-curve and surface fitting. The storage space and computing time cost are proportional to the number of data points. A disadvantage of the FhG algorithm is that it additionally requires the second derivatives d'^-x./da.gdn as shown in (4.4). As a fitting example, we show the orthogonal distance fitting of a helix. The standard model feature (1.3) of a helix in frame xyz can be described as follows. x(ag,u) = x{r,h,u) = {rcosu,rsmu,hu/2'K) , with a constraint on the position and rotation parameters ^ /c(ap,ar) = (Xo-X) r3(a;,<^)=0, where r and h are respectively the radius and elevation of a helix. X is the gravitational center of the given points set and ra (see (1.4)) is the vector of direction cosines of the z-axis. We have obtained the initial parameter values from a 3D-circle fitting, and a cylinder fitting, successively. The helix fitting to the points set in Table 2 with the initial values of h — lO and K = 7r terminated after 0.22s, 8 iteration cycles for ||Aa|| = 3.2 xlO"'' with a Pentium 133 MHz PC (Table 3, Fig. 3). They were 0.33s, 10 iteration cycles for ||Aa||= 3.6x10"'^ with the ETH algorithm, and, 1.05s, 61 iteration cycles for ||Aa|| =8.8x10""'' with the NPL algorithm. The computing cost with the ETH algorithm increases rapidly with the number of the data points. The NPL algorithm showed slow convergences with the 3D-circle and the helix fitting (3D-curve fitting). 128 S. J. Ahn, E. Westkdmper, and W. Rauh Parameters a 3D-Circle a(a) Cylinder :cT(a) Helix Yo -2.7923 , 0.8421 -3.0042 0.4525 -1.5560 0.3934 TAB. CTO T 5.8913 8.3850 0.7355 8.2835 0.2738 6.1368 0.4238 1.6925 2.2301 Zo 5.2333 0.8821 4.5081 0.6513 6.4871 0.7500 OJ -0.6833 0.1177 -0.4576 0.3049 0.3003 0.0880 h — —" Xo 5.6999 0.9939 4.7596 0.7465 3.8909 0.5488 19.5811 1.3214 V 0.7882 0.1375 1.1327 0.2116 0.5114 0.0663 K '—~~ 2.4602 0.2881 3. Results of the orthogonal distance fitting to the points set in Table 2. X- X- K- K- X- X 12 Iteration number 16 (a) (b) FIG. 3. Orthogonal distance fitting to the points set in Table 2: (a) Helix fit; (b) Convergence of the fit. Iteration number 0-3: 3D-circle, 4-12: circular cylinder, and 13-: helix fit with the initial value of /i= 10 and « = 7r. 5 Summary In this paper, we have reviewed the current orthogonal distance fitting algorithms for parametric curves and surfaces in an easily understandable manner, and presented a new algorithm. By each of the algorithms the model parameters are grouped and simultaneously estimated in terms of form/position/rotation parameters. The ETH algorithm demands a large amount of storage space and high computing cost, and the NPL algorithm shows relatively poor performance with 3D-curve fitting. The new algorithm, the FhG algorithm, has no such drawbacks of the ETH algorithm or of the NPL algorithm. A Orthogonal distance fitting of parametric models disadvantage of the FhG algorithm is that it requires the second derivatives 9^x/9ag9u. The FhG algorithm does not require a necessarily good set of initial parameter values, which could also be internally supphed as demonstrated with the fitting examples. Prom the viewpoint of implementation and application to a new model feature, the FhG algorithm is,universal and very efficient. Merely the standard model equation (1.3) of the new model feature is eventually required, which has only few form parameters. The functional interpretation and treatment of the position/rotation parameters are basically identical for all parametric models. The storage space and the computing time cost are proportional to the number of given data points. Together with other orthogonal distance fitting algorithms for implicit models [2], the FhG algorithm is certified by the German federal authority PTB [5, 9], with a certification grade that the parameter estimation accuracy is higher than 0.1/tm for length unit, and 0.1/xrad for angle unit for all parameters of all tested model features. Bibliography 1. S. J. Ahn, W. Rauh, and H.-J. Warnecke, Least-squares orthogonal distances fitting of circle, sphere, ellipse, hyperbola, and parabola, Pattern Recognition 34 (2001), 2283-2303. 2. S. J. Ahn, W. Rauh, and H.-J. Warnecke, Best-Fit of ImpUcit Surfaces and Plane Curves, in Mathematical Methods for Curves and Surfaces: Oslo 2000, T. Lyche and L. L. Schumaker (Eds.), Vanderbilt University Press, TN, 2001, 1-14. 3. P. T. Boggs, R. H. Byrd, and R. B.- Schnabel, A stable and efficient algorithm for nonlinear orthogonal distance regression, SI AM J. Sci. Stat. Comput. 8 (1987), 1052-1078. 4. B. P. Butler, A. B. Forbes, and P. M. Harris, Algorithms for Geometric Tolerance Assessment, Report no. DITC 228/94, NPL, 1994. 5. R. Drieschner, B. Bittner, R. EUigsen, and F. Waldele. Testing Coordinate Measuring Machine Algorithms: Phase II, BCR Report, EUR 13417 EN, Commission of the European Communities, Luxemburg, 1991. 6. W. Gander, G. H. Golub, and R. Strebel, Least-squares fitting of circles and ellipses, BITZA (1994), 558-578. 7. H.-P. Helfrich and D. Zwick, A trust region method for implicit orthogonal distance regression, Numerical Algorithms 5 (1993), 535-545. 8. H.-P. Helfrich and D. Zwick. A trust region algorithm for parametric curve and surface fitting, J. Comput. Appl. Math. 73 (1996), 119-134. 9. ISO/DIS 10360-6, Geometrical Product Specifications (GPS) - Acceptance test and reverification test for coordinate naeasuring machines (CMM) - Part 6: Estimation of errors in computing Gaussian associated features, ISO, Geneva, 1999. 10. D. Sourlier, Three Dimensional Feature Independent Bestfit in Coordinate Metrology, Ph.D. Thesis, ETH Zurich, 1995. 11. D. A. Turner, The approximation of Cartesian coordinate data by parametric orthogonal distance regression, Ph.D. Thesis, University of Huddersfield, 1999. 129 Template matching in the ii norm Iain J. Anderson and Colin Ross School of Computing and Mathematics, University of Huddersfield, UK. i.j.andersonOhud.ac.uk, c.ross@hud.ac.uk Abstract We present a method for matching a surface in three dimensions to a set of data sampled from the surface by means of minimising the distances from the data points to the closest point on the surface. This method of association is afline transformation invariant and as such is very useful in situations where the coordinate axes are essentially arbitrary. Traditionally, this problem has been solved by minimising the £2 norm of the distances from the data points to the corresponding points in the surface, while the use of other £p norms is less well known. We present a method for template matching in the £1 norm based upon a method of directional constraints developed by Watson for the related problem of orthogonal distance regression. An algorithm for this method is given and numerical results show its effectiveness. 1 Introduction Template matching is used in a variety of applications such as the quality assurance of manufactured artifacts [1] and dental metrology [2]. Given a fixed template, i.e., curve or surface, and a set of data in a different frame of reference, template matching involves finding the frame transformation which maps the data onto the template. A typical strategy for finding the optimal transformation parameters in the template matching problem is to minimize, in some norm, the orthogonal distances between the transformed data and the template. In this case, the template matching problem can be viewed as a form of orthogonal distance regression (ODR) [3], which is a technique commonly used for fitting curves and surfaces to measured data. Therefore, most algorithms for solving the template matching problem are extensions of algorithms for ODR. Template matching in the ^2 norm is addressed by Turner [3] and in the £oc norm by Butler et al. [1] as well as by Zwick [7] for the two dimensional case. In this paper, we are specifically concerned with the following problem. Given a fixed differentiable parametric surface f (u, v) and a set of m data {xj}^! G 3?^, find points {f(uj,Di)}"i, ^ rotation matrix RQ, and a translation vector to such that the £1 norm of the residual distances {||i?e(xj - to) {{ui,Vi)\\2}i^i is mimmal. This is the template matching problem in the £1 norm, and although not as widely used as the £2 and £00 counterparts, it does nonetheless have an important role to play. The importance of the Ci norm is that, generally speaking, any outlying data are effectively ignored with the result that an approximation is obtained which is largely independent 130 Template matching in the £i norm 131 of any unreliable data. This has particular importance when our data arises as a result of some measurement process, perhaps involving ma,ny complicated and finely-tuned instruments. For such a measurement scenario, any change in the assumed measurement conditions can result in a datum which has gross error relative to other data. Thus, if we choose a measure which is susceptible to outlying data, we are in danger of obtaining an unrepresentative approximation. This situation is avoided by use of the ^i norm and we therefore advocate its use both here and in any situation involving measurement data where a representative approximation is required. A feature of optimal ii solutions is the Ukelihood of a small number of the data having a residual of zero, and it is therefore unclear whether the elements of the Jacobian matrix of partial derivatives are well-defined for these points. As a result, use of the usual Gauss-Newton method would appear to be handicapped due to its dependence upon the Jacobian matrix to calculate an updated transformation estimate. This difficulty also arises in the conventional ODR fitting problem and has recently been considered by Watson [6]. His solution is to adopt a method of fitting subject to directional constraints. By setting these directional constraints to be orthogonal to the approximant, Watson shows not only that the Jacobian is defined but also how to compute its elements without incurring a build-up of rounding error. In this paper, we extend Watson's constrained direction fitting routine to the template matching problem. We show that Watson's results are equally valid for ^i template matching. Finally, we exploit these results to give a reliable algorithm for the f i template matching problem. The structure of this paper is as follows. Section 2 provides the results necessary to justify the new technique. Section 3 describes the algorithm adopted to implement the theory. Section 4 gives some numerical results for both a simple case and a more challenging case. Finally, Section 5 concludes this paper and presents possibilities for future work. 2 Theory We are concerned with the minimisation of the quantity .E^\\{di,...,dm)\\, (2.1) where <^i = '^^J\^i-Hui,Vi)\\2, i = l,2,...,m, (2.2) and x = i?e(x-to), with respect to the rotation parameters (2.3) J32 : I. J. Anderson and C. Ross the translation parameters to = and the location parameters U= This is a constrained problem and can be solved using a separation-of-variables approach as described by Turner [3] among others. In this approach, the problem of obtaining the transformation parameters is separated from the subproblem of obtaining the location parameters U. At each iteration, the subproblem is solved to obtain an optimal U for the current transformation parameters t which is then used to obtain an update of the transformation parameters themselves. , 2.1 Considerations specific to tlie h problem Up to this point, we have not specified which norm we are using to measure the disparity between the transformed data and the template. Since we will be particularly interested in the h case, this section discusses problems inherent in the solution of such a problem. The major problem with solving non-linear ^i problems is that in order to use a technique such as the Gauss-Newton method, derivative information is required. Unfortunately, derivatives of the distances d are not defined when a distance has a value of zero. Such is the nature of ^i approximation that zeros are to be expected at an optimal solution [5]. Thus, it is unclear whether the Jacobian matrix is defined at these data points. Recent work by Watson [6] has considered how the related problem of orthogonal distance regression might be solved by considering distances to be measured along fixed direction vectors w^. Orthogonal distance regression involves the fitting of a curve or surface to a set of data where the residuals are taken to be the shortest distance from the data to the approximant [3]. Template matching can be seen as a variant of this since the residuals are measured in the same way, but we are only altering the position and orientation of the approximant, rather than the actual shape itself. Thus, techniques for orthogonal distance regression can be used successfully in template matching. By means of these directional constraints, it is possible to show that if we choose the directions Wi to be the orthogonal directions, f{Ui,Vi) -Xj "^'^ \\{{Ui,V,)-Xi\\2 then the derivatives are well defined in the hmit as \\{{ui,Vi) - Xi||2 ^0. This result may be summarised in the following Theorem (taken from Watson [6]). Theorem 2^1 For parametric fitting, let the (usual) Gauss-Newton method produce a sequence {t} such there is a unique unit normal vector to the template at f{ui,Vi), and Template matching in the l\ norm 133 Xj remains on one side of the template. Then Vt^^i is well defined on this sequence. If {{ui,Vi) —> Xj, then this formulation will lead to similar problems to which we are attempting to resolve as a result of the quotient becoming undefined. As a result, Watson [6] suggests leaving Wj unchanged once di becomes small. By this method, numerical problems arising as a result of a distance tending to zero may be avoided. However, the algorithm will still tend to the correct solution provided that the small residual corresponds to an interpolation point of the £i solution. If this is not the case, then the solution will not be optimal, but will still be close to the optimal solution. 2.2 Possible problems The most immediate problem that arises is how to ensure that there exists a point on the template which is situated along the direction vector given from each datum. Clearly in certain situations, there will not exist such a point — corresponding to the case where the direction vector Ues within the tangent plane of the template in the region of the datum. In such a situation there would seem to be two possible recourses available. (1) Ignore these data. (2) Choose the point on the template that is closest to the line though the datum defined by the direction vector. It has been found through empirical results that provided the problem only occurs on certain iterations rather than as a result of poor choice of the direction vectors associated with the template, ignoring the problem data is the better option. Use of the second option has been found to prevent convergence of the algorithm. 3 Algorithm The algorithm to implement this technique consists of two sub-algorithms, each related to a specific section of the main algorithm. These sub-algorithms are (1) the constrained closest point problem, (2) the calculation of a new transformation estimate. 3.1 Constrained closest point problem For each data point Xj, this problem is that of finding Uj and u, such that the constraint X — f{u,v) = dw, (3.1) is satisfied (subscripts dropped for clarity). Expanding this equation, we obtain ws / If we pre-multiply this equation by a"^, we obtain a'^x-a^f(u,?;)-daTw = 0. (3.2) 134 I. J. Anderson and C. Ross Thus, by choosing a to be orthogonal to w, we are able to eliminate d from equation (3.2). Similarly, if we multiply equation (3.1) by b we obtain the equation We may thereby reduce the system (3.1) to that of two (nonlinear) equations in two unknowns (u and v). This system can then be solved by adopting a Newton-type method. Our problem has been reduced to that of solving F{u,v) = [ai:hf{x-t{u,v)) = 0, which has derivative V„,„F =-[a : b]'r(V„f : V„f), by means of Newton's method which involves adopting an iterative approach and solving VU,VF(^11^ = -F{U,V), (3.3) at each stage to obtain better estimates u + 6u and v + 6v. The quantities F{u,v) and Vu,vF are straightforward to calculate as they arise directly from the explicit parameterisation of the template. All that remains is the choice of a and b. We obtain these vectors by taking the cross product of w with two arbitrary vectors — resulting in two vectors which are orthogonal to V. More generally, the vectors a and b should be chosen to ensure that the system (3.3) is well-conditioned. 3.2 Updating the transformation estimate The method we adopt to obtain an update of the transformation parameters is the Gauss-Newton method. This involves solving, at each iteration, the problem J5t = -d, (3.4) in the li sense, where J is the Jacobian matrix of partial derivatives with entries Jij = ^tjdi. The estimate of the optimal transformation parameters is then updated according to t = t-|-^t. Thus, since the distances d are obtained from the constrained closest point subproblem, we are left with the task of calculating the Jacobian matrix. For each datum, from equation (3.1), we have that x(t) - f (w(t), t;(t)) = wrf(u(t), v(t)), where we have explicitly included the dependency of the distance d on the location parameters U. Differentiating and rearranging, we obtain Vtx = wVtrf-I-V[/fVtC/. This is equivalent to the form V.x = [w : Vuf] ( ^ifj ) - Template matching in the ii norm Therefore, J = Vtd = enw:Vuf]-'Vtx, where ei is the first component vector. Having obtained the Jacobian matrix J and the distance vector d, we are now in a position to solve the system (3.4) in order to update our estimate of the optimal transformation parameters t. We note that using the traditional orthogonal distances can lead to problems since calculation of the Jacobian matrix involves division of each row by the corresponding orthogonal distance — leading to exacerbation of rounding errors and possible division by zero especially in the £i case. 4 Numerical results In this section, we present two example to illustrate the techniques presented in this paper. In the first, we have a small number of data which we wish to match to a given plane. In the second, we have a larger number of data and we wish to match them to a cylinder. In both cases, although analytical expressions are available to obtain the constrained closest points on the templates, we nonetheless utilise the method presented above in order to test its effectiveness. 4.1 Simple problerti Here we describe the problem of matching a representative set of 8 data onto the plane defined as /1\ fo f(li,v)=u\ 0 \ +v\ 1 Since this problem is rank deficient if we use all six possible transformation parameters, we restrict ourselves to using a translation in the ^-direction and rotations about the x and y axes. Having three degrees of freedom, we might expect to obtain an optimal £i solution which interpolates 3 of the data. However, as we shall see, this is unattainable in general and we can, in fact, only expect interpolation at two points. As Watson states [6], in such a situation, the rate of convergence can be unacceptably slow. This is found to be the case. It can be seen that not only is the convergence slow, but an optimal solution Iteration 1 5 10 50 100 TAB. norm (residuals) norm (update) 0.6662 4.9901e-02 0.3008 3.5716e-04 0.3007 8.8545e-06 0.3006 9.1533e-06 0.3008 3.8514e-04 1 Progress of the Gauss-Newton method for planar data, is never obtained, with the objective function ||d||i increasing occasionally. 135 136 L J- Anderson and C. Ross To ensure convergence, a simple line-search algorithm was adopted which searches along the direction obtained from the Gauss-Newton step for the maximum reduction in the objective function. This modification affects convergence in 3 iterations. 4.2 A more challenging problem As a more challenging problem, we consider the matching of a set of 128 data which supposedly represent a cylinder but which contain 8 wild points. The cylinder is parametrised by u and u as (cosw sinu resulting in a cylinder with unit radius oriented along the z-axis. Again, the problem of matching the data onto this model is rank deficient. The rank deficiencies occur due to rotations about the 2-axis and translations along the z-axis. As such, we omit these possible transformations. Although we might initially expect to interpolate 4 data points at an optimal ^i solution, we find that in fact only two are guaranteed, although if a third point lies within two radii of one of these two points, then three points can be guaranteed. Typically, this will occur when the data is representative. For the data set we are considering, we expect three interpolation points due to the data representing the cylinder and in fact at the optimal solution, three interpolation points are obtained. In fact, the "missing" interpolation has the effect of slowing convergence of the Gauss-Newton method considerably so that in 100 iterations, the algorithm had not been deemed to converge. However, by the introduction of a simple line-search method, the algorithm converged in five iterations as displayed in Table 2. TAB. 5 Iteration norm(residuals) norm(update) 5.6796e-03 0.9654 1 6.2932e-04 0.9559 2 1.0141e-04 0.9557 3 2.5812e-07 0.9557 4 4.4006e-14 0.9557 5 2 Progress of the Gauss-Newton method for cylindrical data using a line-search. Conclusions This paper has shown how perceived problems in ii template matching can be avoided by use of the so-called "method of directional constraints". In this method, the closest point on the template along a given direction vector is calculated in order to obtain the residuals between data and template. By then altering this direction vector to be the normal to the surface at that projected point, the algorithm progresses to the expected ii solution. Problems regarding undefined quotients are avoided by no longer updating Template matching in the ii norm 137 the direction vectors corresponding to a datum when the residual associated with that point is below a certain tolerance. This work forms part of a larger project to consider novel approaches to ill-conditioned problems in metrology. It is hoped that the work presented in this paper will aid in the resolution of rank-deficient systems and ill-conditioned systems by altering the usual orthogonal distances to be these directional constraints, which should remove some of the rank deficiency. As an example, consider the template matching problem where the template to be matched is an infinite cylinder with axis along the z-axis. Using typical template matching algorithms, this problem is rank deficient by two at the solution due to the possible translation in the 2;-axis and the possible rotation about the z-axis. By introducing these directional constraints, the rotational rank deficiency is almost completely resolved (there are now two possible rotations to obtain the optimal matching rather than the infinite number previously). The use of the ii norm is also being used tO attempt and resolve any rank deficiencies and ill-conditioning present in the problem. This is achieved by ensuring that any local deviations from the template (caused by, for example, wear) are "ignored" so that regions of local deviations might be compared. This will then result in a resolution of the uncertainty in the transformation parameters. Bibliography 1. B. P. Butler, A. B. Forbes, and P. M. Harris. Algorithms for geometric tolerance assessment. Technical Report DITC 228/94, National Physical Laboratory, Teddington, UK, 1994. 2. V. Jovanovski. Three-dimensional Imaging and Analysis of the Morpohology of Oral Structures from Co-ordinate Data. Ph.D. Thesis, Department of Conservative Dentistry, St Bartholomew's and the Royal London, School of Computing and Dentistry, Queen Mary and Westfield College, London, UK, 1999. 3. D. A. Turner. The approximation of Cartesian coordinate data by parametric orthogonal distance regression. Ph.D. Thesis, School of Computing and Mathematics, University of Huddersfield, UK, 1999. 4. D. A. Turner. Least squares profile matching using directional constraints. Preprint, 2001. 5. G. A. Watson. Approximation Theory and Numerical Methods. Wiley, New York, US, 1980. 6. G. A. Watson. On curve and surface fitting by minimizing the £i norm of orthogonal distances. Preprint. 7. D. S. Zwick. A planar minimax algorithm for analysis of coordinate measurements Advances in Computational Mathematics, 2:4, 1994, 375-391. - A bootstrap algorithm for mixture models and interval data in inter-comparisons p. Ciarlini and G. Regoliosi Istituto per le Applicazioni del Calcolo "M.Picone", CNR, Roma, Italy F. Pavese Istituto di Metrologid "G.Colonnetti", Torino, Italy Abstract To combine the information from several laboratories to output a representative value Xr and its probability distribution function is the main aim of an inter-comparison in Metrology. Here, the proposed procedure identifies a simple model for this probability function, by taking into account only the probability interval estimates as a measure of the uncertainty in each laboratory. A mixture density model is chosen to characterize the stochastic variability of the inter-comparison population considered as a whole. The bootstrap method is applied to approximate the distribution function of the comparison output in an automatic way. 1 Introduction The "mise en pratique" of the Mutual Recognition Arrangement (MRA), issued by national metrological Institutions in 1999, prompted new studies and projects in Metrology mainly concerning the inter-laboratory comparisons area. Recently, considerable effort has been devoted to finalise the problem of the choice of a suitable statistical procedure to summarise inter-comparison data. The problem solution is influenced by both metrological and statistical considerations, but it can also depend on the physical quantity under comparison. Some of the critical issues now emerging are related to several different reasons. For instance, the statistical information supplied by each laboratory is synthetic, since it comes from a data reduction process performed on several experimental datasets. In each laboratory, assumptions and statistical reduction procedures may be different and sometimes not fully documented or the a priori information on the original data may be insufficient to define a "credible" probability distribution function (pdf) for output quantities of the inter-comparison. The use of the whole sets of original data from each laboratory might be an unfeasible approach in the inter-comparison case, due to the unavailability of all needed data or to practical reasons. At present, the practice is to supply synthetic information Xi by each participant to the inter-comparison and to use a location estimator to output the representative value. 138 A bootstrap algorithm for mixture models 139 Efforts should be given to improving the rehability of inter-comparison results by asking for the use of any a priori information and of its "credibihty" to go ahead, towards the direct estimation of the output of the comparison, Xr. This paper proposes the identification of a solution without resorting to the synthetic values and its point estimates of the standard uncertainty, but only to the probability interval estimates as the measure of the uncertainty. This approach consists of two parts: a modelling procedure to identify a simple mixture model able to approximate the stochastic variability of the inter-comparison population as a whole; a parametric Monte Carlo algorithm to autoiriatically estimate the probability distribution of the output Xr and any accuracy measures at a prescribed precision. The concept of a mixture of distribution functions occurs when a population made up of distinct subgroups is sampled, for example, in biostatistics, when it is required to measure certain characteristics in natural populations of a particular species. In an inter-comparison each participant constitutes a subgroup. The Monte Carlo method, based on the principle of mimicking sampling behaviour, can always compute a numerical solution in an automatic way, also when the required analytic calculations may not be simple. If the Monte Carlo method is applied with the principle of substitution (of the unknown probability function with a probability model estimated from the given sample), the approach is known as the bootstrap approach [4] and is already used in Metrology [2]. In [1] the case of a multivariate normal mixture model is considered and the standard errors are estimated by means of the parametric bootstrap. The present algorithm will be applied to a thermometric inter-comparison, where data cannot be assumed to be normally distributed. 2 Data structure of an inter-comparison with interval data The number, N, of laboratories involved in an inter-comparison is typically small. In the i-th laboratory, the {^i ,... , Ci ) measurements are supposed to pertain to a single probability distribution function, say Fi{A), where A is the parameter vector, that may be partially unknown. The measurements are statistically analysed and reduced to provide to the comparison the synthetic value Xi and its uncertainty Uj at 95% confidence level, or a 95% uncertainty interval (95%CJ): ((a;i,«i)..., (xiv,MAT))In this work the uncertainty is considered as "a 95%CI rather than as a multiple of the standard deviation" (see 4.3.4 in [6]). Then an aim of an inter-comparison is to combine the input data in the labs to characterise a representative value of the inter-comparison, i.e., the random variable 6 and its pdf F. Hence a good estimate of the 95%CJ for ^ can be obtained if the output pdf F is a simple known function, describing the stochastic variability of the inter-comparison data. In other cases a suitable approximation of the expected value £^j?[X] = J xdF{x) could be accepted to output the reference value XrThe inter-comparison data structure is summarised here in terms of interval estimates: INPUT Sample — Each one of the N participants originates a 95%C/ that is one element of the inter-comparison sample: / {[uii,Uiu],i = l,... ,N}'. (2.1) 140 P. Ciarlini, G. Regoliosi, and F. Pavese Here no value Xi in the interval [uji, w™] is chosen as representative; possible information on Fi (such as limited or unhmited support, symmetric or not) should be added. If a laboratory does not supply any information on the pdf, the uniform distribution is assumed. Comparison OUTPUT — It includes the representative value and its 95%C/ (^,h,6„]). (2.2) In many inter-comparisons, the differences to 6 are also defined: (y^, [«,;,«;,„]), where yi = Xi-§,i = 1,... ,N. 3 A classical approach to inter-comparisons Let us recall the solution to the inter-comparison problem through the traditional estimator, the weighted mean. It is a location statistic that combines several measures and their standard uncertainties {xi,Ui)fLi. It provides the following estimate for 6, and the following symmetric 95%C/, dw±kuw, (3.2) where the coverage factor k is taken as the value iAr-1,0.95 of the Student distribution, A^ being small. In this approach, each Xi is viewed as an unbiased estimate of the laboratory mean value and the random variable 6^ is defined to be a linear combination of N independent random variables Xi,... ,XN, where {xi,... ,XN} is an observed sample. 6u, is supposed to be asymptotically normally distributed [6]. This estimator can be correctly adopted to solve an inter-comparison problem if the assumption of the homogeneity of the data is valid. This is equivalent to saying that, after considering the extent of the real effect and bias in each laboratory, the laboratories yield on the average the same value, so that the differences between the estimates are entirely due to random error. In this case, the selected estimator 9^ appropriately estimates 0 and (3.2) accurately estimates its 95%CJ. Obstacles to applying this approach to a key-comparison have been discussed in [3]. The "credibility" of the representative values Xi, and of their uncertainty can critically affect the accuracy of the estimate of the representative value Xr- Moreover, the peculiar characteristics of a typical inter-comparison sample ((1) its very Hmited size, from a statistical point of view, (2) different experimental methods, used in each laboratory) often imply that the statistical assumptions are not satisfied, as for example in several thermometric cases. Indeed, the first characteristic implies that the Central Limit Theorem and the asymptotic theory do not hold. Then the normal distribution cannot be properly used to infer the estimates in (3.2). Another example of the inadequacy of the weighted mean approach is when some laboratories provide data aflFected by bias, resulting from skewed distributions underlying their measurements. The symmetric confidence interval of (3.2) cannot be considered an A bootstrap algorithm for mixture models 141 accurate approximation^ of the true one, since it does not adjust for the skewness. Finally, it is necessary to point out that the homogeneity condition among the laboratories rhust be assured in some sense, otherwise it would be impossible to attempt to the computation of any summary estimate and its associated uncertainty. 4 4.1 The approach based on interval data The mixture density function This paper proposes to construct a simple model for the output pdf, and to estimate its expected value 6 without requiring strong assumptions such as N large or each Fi normal. This approach enables us to Compute the probability interval of the output value in terms of the identified density in each laboratory. The stochastic variability of the population of inter-comparison data is directly considered in the modelhng approach as a whole, by means of a so-called mixture distribution model [5]. This model, being a linear superposition of several (say N) component densities, appears to be suitable from a computational point of view and can be embedded in a bootstrap algorithm to simulate several data needed to predict the output quantities. In an inter-comparison, let us suppose that a density function /^(a;; A^*)) is assumed for the i-th laboratory, then the following density mixture is identified to model the output pdf, where the parameter vector is A = {h.^^\... , A^'^)) and given weights TTJ > 0, i = 1,... ,7V, have summation normalised to one: N 9{x;K) = Y.'^ifi{x;K^% (4.1) To compute the output as estimate of the expected value of the mixture, 6 — EG{h.)[X], the probability function G(A), corresponding to the density in (4.1), must be known. When some laboratory provides only partial information on a pdf, we propose to identify its experimental variability by one of the following simple probabilistic models: uniform, normal or triangular pdf (right or left or symmetric triangular). Indeed, in thermometric experiments these three probabilistic models can represent several common stochastic variabilities for measurements, such as a limited or unUmited support, symmetric or not. We want the mixture parameters to be estimated by means of the INPUT Sample, (2.1), as required in a bootstrap approach. Let us call/j the probability interval to which the 100% measurements of the laboratory are supposed to pertain. For the uniform and the triangular types, A^'^ parameters are defined to be the extremes of J, = [Xu, Aj„]. For the normal model the parameters are the mean Xi and the variance Wj, while Jj becomes (-00,-1-00). A right triangular pdf (RT), a left triangular pdf (LT) or symmetric triangular pdf (ST) is chosen according to the position where the maximum of the probability density occurs, i.e., one extreme or the middle point of /. ^A 95% CI [€(,£11] for 9 is defined to be accurate if the following holds for every possible value for 6: 'Ptoho{6 > £„} = 0.025 and ProbG{6» <£(}= 0.025 142 ^ P. Ciarlini, G. Regoliosi, and F. Pavese To compute the two components of the vector A'"' = {Xu, Xiu)"^ given the i-th input interval, a 0.025% portion of probabihty mass is added outside of each extreme, according to the suppHed density shape. For example, if the ST density is chosen, the parameters are computed by: Aj; = (0.89u,:; - 0.11Wi„)/0.78 Ai„ = (0.89ui„ - 0.11u,;;)/0.78. The mixture weights could be used to associate a degree of "credibility" to each laboratory. Then the choice TTJ = 1/A'', i = 1,... , TV, implies that every laboratory equally contributes to the inter-comparison. When the mixture G(A) is completely identified, it can be used to simulate data and to approximate the output value in the Monte Carlo algorithm. 4.2 The bootstrap algorithm To avoid integral computations to estimate 6 and its variance, the Monte Carlo method is commonly used to approximate them within a given precision. Since the parametric bootstrap approach does resampling frorh a parametric distribution model, in this case the mixture model G(A), is adopted to approximate the following distribution, H{x)^Prohc;{9* < x). (4.2) The Monte Carlo method simulates a sufficiently high number B of data 6* from G = G{K), to compute, Hix)^''^ = ^f2^{ei<x}, (4.3) where the function 11{A} is the indicator function of the set A. With probability one, it is known that the Monte Carlo approximation converges to the true value as B —> oo. The Monte Carlo algorithm has been developed for a mixture density to estimate the comparison output. A hierarchical resampling strategy is used to reproduce the hierarchical variability in the inter-comparison population, throughout the following steps: (1) (a) Choose at random an index, say fc, of fc-th laboratory by randomly resampling with replacement from the set {1,... , A''} K r^ Prob{K = k} ^-Ki. (b) Given k, generate, at random from the selected Fk of the distribution, a bootstrap value ^* in [Afc(,AA;„]. Repeat Step 1 B times to simulate the full bootstrap sample 01,.. .,9^. (2) Approximate the bootstrap mixture distribution as in (4.3) to compute: — the bootstrap estimate of the expected mean 1 ^'B= ^ oE^^ 6=1 (4.4) A bootstrap algorithm for mixture models Labi Lab3 Lab5 Lab7 (-0.05; ( 0.18; ( 0.71; (-0.03; 0.15) 0.15) 0.15) 0.15) [-0.347, [-0.117, [ 0.413, [-0.327, 0.247] 0.477] 1.007] 0.267] Lab2 (0.03; 0.30) [-0.564, 0.624] ,Lab4 (0.04; 0.15) [-0.257, 0.337] Lab6 (-0.01; 0.15) [-0.307, 0.287] '. TAB. 1. Inter-comparison of 7 laboratories [7]: point estimates and simulated interval data. — the bootstrap standard deviation: Sd — the 95%CI [e*, e*], where the two extremes are computed as the a-th quantile 2 (a = 0.025) of the bootstrap distribution H^„^^{a))-^ = q*^, hence £;* = q*^ In Step lb) the inverse transformation method has been used for simulating a random variable X having a continuous distribution F^. For example, X = F^'^{U), for a U{\kh hu) random variable. In Step 2 the bootstrap CI has been computed by means of the percentile method (see footnote). However, when the normal distribution is involved in the mixture, the t-bootstrap method gives more appropriate results [4]. To determine B in approximating the bootstrap confidence interval the coefficient of variation [4] can be used. The value of B is increased until the coefficient of variation cv of the sample quantile approaches the given precision SQ. Indeed, from a metrological point of view, it appears easier to choose do instead of B as stopping rule in Step 1. We would hke to have also an automatic tool to investigate how well every laboratory contributes to the comparison, or to detect the possible presence of heterogeneous data. Here the concept of jackknife-after-bootstrap has been adopted to compute the mean and the bootstrap 95%C/. It is simply obtained by the following algorithm: — for i = 1,..., N, leave out the i-th lab and compute 9%{-i) and q*B{--i), — compare the N jackknife estimates to detect outlier values. 5 An application in thermometry The proposed method is shown applied to an inter-comparison of Temperature Fixed Points, involving N =7 laboratories [7]. Each lab provided data Xi with the 95% standard uncertainty (Table 1: first item). The second item (square brackets in the same table) represent the interval data generated with (3.2), that used to perform this simulated example. Since no specific pdf was supplied, the mixture distribution density has been constructed assuming the uniform type for each participant and equal weights. The parameters of every uniform density was computed using interval data, and the obtained mixture density was used in the resampling step of the algorithm to compute the representative value and its The percentile method of a statistics 6, based on B bootstrap samples, simply gives for a a-percentile ot" {(QB)th largest for 0*} . 143 P. Ciarlini, G. Regoliosi, and F. Pavese 144 Mixture of 7 Uniform densities Mixture of 6 Uniform and 1 RT densities B = J209 "l- 1 w 1. Bootstrap histograms B =2209: left-mixture of 7 uniform distributions; rightmixture of 6 ST plus one RT density for Labi. FIG. probability interval with So = 0.05. In Figure 1 (left) the bootstrap histogram, that approximates the mixture density, shows a bimodal behaviour. The computations are obtained for 6o = 0.05 or B = 2209: 6* = 0.14, bootstrap standard deviation Sd*=0.33, 95%CJ [-0.35, 0.92]. The proposed algorithm was also applied with a mixture of seven normal densities, and the results are 0* = 0.13, Sd* = 0.43, bootstrap 95%CJ [-0.61, 1.1] for B =4752. The effect of assuming unlimited symmetric distributions to model the output pdf results in a wider 95%C/ for a mixture of normal densities. By comparing the jackknife results in Table 2, Lab5 appears to supply unusual values. To directly consider this behaviour in the inter-comparison, a mixture of six uniform densities plus a RT density, identifying Lab5, has been constructed. The approximated bootstrap distribution is displayed in Fig.l (left), with bootstrap estimates, 6** = 0.15, standard deviation Sd* = 0.35 and [-0.35, 0.96] for the Bootstrap 95%C/, obtained for 5 = 2209. 6 Conclusions The problem of the inter-comparison data has been described, and a new approach has been proposed. It is based on the uncertainty estimates, that should be provided by each Laboratory as interval estimate at 95% confidence level together with information, also partial, on the probability function. The constructive procedure directly characterises the stochastic variability of the reference value of the inter-comparison, by means of a mixture density model. The result of an inter-comparison is then viewed as a random variable, not directly measured, being the output of a complex process, that involves measures, statistical information and metrological considerations. These considerations suggest us constructing a mixture, with weights TT, to take into account each participating laboratory according to its credibility. A bootstrap algorithm for mixture models Labi Lab3 Lab5 Lab7 0.34 0.34 0.23 0.34 -0.45, -0.40, -0.42, -0.42, 0.92; 0.91 0.48 0.92 Lab2 Lab4 Lab6 0.32 0.34 0.34 -0.31, 0.94] -0.35, 0.92 -0.36, 0.95 TAB. 2. Jackknife-after-bootstrap estimates. Standard deviation and 95%C/ for mixture of 6 uniform densities {B — 1000): in the ith item, Labi is left out. The parametric bootstrap approach has been adopted to estimate in a simple and automatic way the inter-comparison output, where information, even partial, on the probability hierarchical data of the participating laboratories, have been taken into account. Also with a limited number of laboratories, the method can be applied, as it is shown in the thermal example, where {N = 7) and the experimental conditions implied to adopt skewed distributions. The automatic jackknife method of detecting the heterogeneous data succeeded in revealing an unusual value. To take into account this condition, a mixture of six uniform densities plus an RT density to identify Lab5 could be better used. The choice of equal weights emphasises that all the standards have equally contributed to the inter-comparison. The bootstrap procedure, completely developed for a class of five simple distribution functions often used in thermal metrology, could be adapted to consider other distributions, when the synthetic data information provided by the laboratories, as summarised in Section 2, allow to compute the mixture parameters. Bibliography 1. K. E. Basford, D. R. Greenway, G. J. McLachlan and D. Peel, Standard errors of fitted component means of normal mixtures. Computational Statistics 12, 1-17, 1997. 2. P. Ciarlini et al.. Non-parametric bootstrap with application to metrological data. In: Advanced Mathematical Tools in Metrology, Series on Advances in mathematics for applied sciences, t6, Singapore, Ciarlini, Cox, Monaco, Pavese eds., World Scientific, 219-230, 1994. , 3. M. Cox, A discussion of approaches for determining a reference value in the analysis of key-comparison data. In Advanced Mathematical and Computational Tools in Metrology IV, Series on Advances in mathematics for applied sciences, 53, Singapore, Ciarlini, Cox, Pavese, Richter Eds , World Scientific, 45-65, 2000. 4. B. Efron and R. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall, London,1993. 5. «B. S. Everitt, Finite Mixture Distributions, Chapman and Hall, London, 1981. 6. ISO, Guide to the Expression of Uncertainty in Measurement, Geneva, Switzerland, 1995. 7. F. Pavese, Monograph 84/4 of Bureau International des Poids et Mesures, BIPM Sevres, 1984. 145 Efficient algorithms for structured self-calibration problems Alistair B. Forbes National Physical Laboratory, Teddington, Middlesex TWll OLW, UK. alistair.forbes9npl.co.uk Abstract Self-calibration techniques have been used extensively in co-ordinate metrology. At their most developed, they are able to extract all systematic error behaviour associated with the measuring instrument as well as determining the geometry of the artefact being measured. However, this is generally at the expense of introducing extra parameters leading to moderately large observation matrices. Fortunately, these matrices tend to have sparse, block structure in which the nonzero elements are confined to much smaller submatrices. This structure can be exploited either in direct approaches in which QR factorisations are performed or in iterative algorithms which depend on matrix-vector multiplications. In this paper, we describe self-calibration approaches associated with high accuracy, dimensional assessment by co-ordinate measuring systems, highlighting how the associated optimisation problems can be presented compactly and solved efficiently. The self-calibration techniques lead to uncertainties significantly smaller than can be expected from standard methods. 1 Introduction An important activity in metrology is the calibration of instruments and artefacts. Calibration defines a rule which converts the values output by the instrument's sensor(s) to values that can be related to the appropriate standard (SI or derived) units. Importantly, to these calibrated values it is required to assign uncertainties that reliably take into account the uncertainties of all quantities that have an influence. As a consequence, the size and complexity of the computational tasks associated with the data analysis can be significant, even for instruments that appear to be of siinple design and operation. It is thus beneficial to design and implement algorithms that are efficient with respect to computation and memory. Fortunately, many of the calibration problems give rise to systems of equations with a well defined sparsity structure. The rest of this paper is organised as follows. In Section 2 we review least squares approaches to calibration problems and go on to describe self-calibration problems in co-ordinate metrology in Section 3. Sections 4 and 5 describe solution methods for two types of sparsity structure. Our concluding remarks are given in Section 6. 2 Least squares solution to calibration problems In many calibration problems, the observation equations involving measuremerits y, 146 Efficient algorithms for structured self-calibration problems 147 can be expressed as j/j = ^i(a) + e,, where </>, is a function depending on parameters a = (ai,..., an)"^ specifying the behaviour of the instrument, and Cj represents random measurement error. For a set of measurement data {yi}™) best estimates a* of the cahbration parameters a are determined by solving m mm ^/^(a) = f*f, (2.1) j=i where /i(a) = j/j — </>i(a). The most common approach to solving this problem is derived from the Gauss-Newton algorithm; see, for example, [5]. If a is an estimate of the solution and J is the Jacobian matrix defined at a by Jij = dfi/doj, then an updated estimate of the solution is a + p, where p solves the Jacobian system in the least squares sense. Starting with an appropriate initial estimate of a, these steps are repeated until convergence criteria are met. A numerically stable method of solving the Jacobian system is to find a factorisation J = QR, where Q is an m x n orthogonal matrix and R is an upper-triangular matrix of order n (see, e.g., [1, 6]). The solution p is determined efficiently by solving the upper-triangular system Rp = -QTf using back substitution. The matrix Q can be constructed using either Householder reflections, which process the Jacobian matrix a column at a time, or Givens plane rotations, which process the matrix row-wise. For either approach the orthogonal factorisation requires 0{mn?) operations. An alternative to the direct approaches to solve matrix equations is to use iterative procedures based on conjugate gradients. The advantage of these approaches is that they involve only matrix-vector multiplications and for sparse matrices these multiplications can be made efficient. In particular, the LSQR algorithm of Paige and Saunders [7] implements an iterative approach to solving linear least squares problems. Often, linear equality constraints on the parameters of the form C^a = c, where C is an n X p matrix, p < n, are required to eUminate degrees of freedom in the problem. However, we can use orthogonal projections to eliminate these constraints. Suppose C is of full column rank and has QR factorisation C=[Vi V2 s 0 where Vi and V2, respectively, are the first p and last n — p columns of the orthogonal factor V. If ao is a solution of C'^a = c, then for any (n — p)-vector a, a = ao -f- V2a automatically satisfies the constraints and the optimisation problem can be reformulated as the unconstrained non-linear least squares problem m minV/f(ao-Fy2a), 148 Alistair B. Forbes involving the reduced set of parameters a. We note that the associated Jacobian matrix is simply J = JV2, where Jy = dfi/daj, as before. Unfortunately, even if J has structure J = JV2 could be full. For indirect approaches, this is of little consequence since the matrix-vector multiplications can be formed in two stages (e.g., y = V2X., z = Jy) each of which can be implemented efficiently. For a direct approach, it may be possible to implement the constraints in such a way as to minimise the amount of fill-in during the orthogonal factorisation stage. 3 Self-calibration problems in co-ordinate metrology Co-ordinate metrology is concerned with defining the geometry of two and three dimensional artefacts from measurements of the co-ordinates of points related to the surface of the artefacts. It is a key discipline in quality and process control in manufacturing industy. In a (conventional) co-ordinate measuring machine (CMM) with three mutually orthogonal linear axes, the position of the probe tip centre is inferred from scale readings on each of the three machine axes. In practice, CMMs have imperfect geometry with respect to the straightness of the axes, the squareness of pairs of axes and rotations describing roll, pitch and yaw, and these systematic errors have to be taken into account if the accuracy potential of the CMM is to be more fully realised. Two approaches can be adopted to nullify the effect of these systematic errors. The first - error mapping involves performing a set of experiments to characterise as completely as possible the error behaviour of the instrument and then use error correction software to produce more accurate co-ordinate estimates. The disadvantages of this approach are, firstly, the set of experiments is expensive to perform and, secondly and more importantly, the error behaviour of the CMM is likely to drift so that, for example, an error correction valid on Monday will only be partially valid on Friday and may be of limited value a month later. The second approach - self-calibration - attempts to use any approximate symmetries, rotational or translational, of the artefact so that systematic errors associated with the measuring system are identified as part of the measurement process [4]. The advantage of this method is that the eff'ect of systematic error behaviour of the instrument is cancelled out arid the accuracy of the measurements are limited only by the smaller, random component. 3.1 Calibration of reference artefacts in 2-dimensions As an example, we consider the accurate calibration of 2-dimensional artefacts by a two dimensional CMM. The artefacts define the location of targets nominally aligned in a grid pattern. Let y^, j = 1,... ,nY, be the locations of the targets in a fixed frame of reference, and let be the location of the jth target in the fcth measuring position. Here, the roto-translation T is specified by three parameters t defining the translation vector and angle of rotation. We suppose the systematic error of the two dimensional CMM can be expressed as X* = x*(x, b) = X-(-e(x, b). Efficient algorithms for structured self-calibration problems 149 where x* are the true point co-ordinates, x are the indicated point co-ordinates output by the machine and e(x, b) is the error correction term depending on x and error parameters b. For instance, suppose the model describes scale and orthogonality errors so that X* =x(l + bi) + y{l + b2)smb3, y* = 2/(1-|-62) cos63. If Xj is the measurement of the jth target with the artefact in the fcth position then the associated observation equation is Xi-t-e(xi,b) =yj,fc + ei. (3.1) Given a set of such measurements {xi}^-^ and associated index functions {j{i),k{i)) specifying the targets and artefact positions, estimates of the model parameters can be determined by solving a non-linear least squares problem mx min {yi},{tfc}.b y^f^fi, fr^ where fj(yj(i), tfe(i),b) = Xj-I-e(xi,b) - yj,fe. The model involves three sets of the parameters: the target locations {yj}, transformation parameters {t^} and the error parameters b. Each observation equation depends on only one target and one transformation, so that the Jacobian matrix j of partial derivatives can be ordered to have a block-angular structure [2] [ifl J= Jl J2 K2 J^mx *^mx where Kj corresponds to the parameters y^- and the border blocks {J,} correspond to the border parameters a = {{t^}, b}. The frame of reference for the targets {yj} can be specified by applying three appropriate hnear equality constraints on the transformation parameters {tfe}. While scale and orthogonality errors are often major contributors to the systematic error behaviour of a CMM, there is no guarantee nor does'experience show that they explain the full extent of the behaviour. For this reason, more comprehensive models have been developed [3, 9]. However, they all depend on the approximation of actual behaviour by empirical functions such as polynomials and the adequacy of the approximation is often difficult and expensive to evaluate. However, if we always rotate and translate the artefact according to the symmetries of the reference artefact so that the targets are always located (nominally) at a subset of a fixed grid of points in the CMM's working volume, then measurenients are made at a finite number of machine locations. To the Zth location we associate a machine error ej. If the ith measurement is made at the Ith location then the observation equation corresponding to (3.1) is Xi + ei =yj,k + ei. The advantage of this error model is that it entails no significant approximation: the Alistair B. Forbes 150 \\ I ^-vx, i 0 50 100 HZ = 1784 1. Sparsity structure of the transpose of the Jacobian matrix associated with the measurement of a 5 x 5 hole plate in eight positions. FIG. systematic errors are modelled exactly. An apparent disadvantage is that there are likely to be as many error parameters as target parameters giving rise to a sparsity structure in the Jacobian matrix for which direct, structure-exploiting methods provide relatively minor efficiency gains. Figure 1 shows on the left the sparsity structure of the Jacobian matrix J associated with the measurement of a 5 x 5 hole plate in eight positions, the first four corresponding to rotations by 0, 90, 180 and 270 degrees, the second four incorporating a translation as well as a rotation. In each position the location of the targets y^ are measured in order. The nonzero elements of the matrix are represented by a dot. The first (second) 50 columns correspond to the derivatives with respect to the machine error parameters e; (target parameters y^) and the last 24 correspond to the eight sets of transformation parameters tfc. On the right the sparsity structure of the triangular factor of J is illustrated and shows the substantial fill-in that occurs. In the next two sections, we describe approaches for dealing efficiently with blockangular and more general sparse-block structure. 4 Algorithms for block-angular systems We consider non-linear least squares problems where the optimisation parameters can be partitioned into two sets t) = {yj}"^' and a, and such that each observation equation involves a and at most one set of parameters y^. Corresponding to (2.1), we have instead an objective function of the form F(77,a) = fo^(a)fo(a) -f 5]f7(y,-,a)f,(y,-,a). Efficient algorithms for structured self-calibration problems 151 The associated Jacobian matrix J and its triangular factor R can be arranged to have the form Ki Bi Ri Ji Ko B2 J2 R2 R J= K„ Rn J, Jo Bo The nonzero blocks of the matrix R can be stored compactly in a vector r, row by row. Efficient updating strategies for such triangular factors have been incorporated into a non-linear least-squares solver to deal with block-angular problems. It is assumed that the Jacobian matrix is composed of TIB blocks of rows, with the ith block depending on at most one set of parameters y^, j = j{i). The user is required to supply a function and gradient evaluation module that given rj, a and 1 < i < TIB, returns j — j{i) and fi(a), Ji, fj(yj,a), Ji, Ki, j == 0, j >0. For each i, the triangular factor and righthand side vector is updated by the ith block of rows: Rj{i) Ki Bjii) Ji \ 1—> Rj{i) [ 0 %i) Ji \ 1 ri?o 1 [ Ji \ 1—^ ■ -Ro ■ [ 0 J Linear equality constraints on the border parameters a implemented using the orthogonal projection approach can be incorporated by setting Ji := JjV2 at the appropriate stage. 5 Algorithms for sparse-block matrices Let m X n matrix S be composed of HB submatrices Sk of dimension ruk x Uk- We assume that Sk is stored (column-wise or row-wise) as a column vector s^. The information in S can be encoded in a column vector s/ and an indexing set Is such that Is{l : 5,fc) = {ik,jk,'n^k,nk,lk) where {ik,3k) specifies the locaition of 5^(1,1) in S and Ik indicates that s^ = S/(/fe : Ik + mkUk — 1). Blocks of such matrices can be easily represented by concatenating the s-vectors and index matrices Is and performing some trivial index modifications. Matrix-vector multiplications of the form y := aS'x-|-/3y are easily implemented through a sequence of full matrix multiplications: y := /3y, followed by y(ifc : ife-I-mfc - 1) := y(4 : 4 + JTifc - 1)-f Q5fcx(jfe : jfc-I-rife - 1), fc = 1,...,rifi. A similar scheme calculates x := aS'^y + p-x.. The storage and multiplication scheme can be modified to take into account the type or structure of the submatrices SkTo implement linear equality constraints, it is required to perform matrix multiplication by a submatrix V2 of the orthogonal factor of the constraint matrix C. A simple scheme can be implemented using the LAPACK routines DGEQRF (orthogonal factorisation) and DORMQR (matrix multiplication by an orthogonal matrix stored as a product of Householder matrices) [8]. Alistair B. Forbes 152 FIG. 2. Residual errors associated with the first 1000 observations for models a) with no error separation (dots) and b) with error separation. We have implemented a non-linear least squares solver for sparse-block systems. The user is required to supply a module that takes as input the current estimate a of the optimisation parameters and outputs the function values f (a) and the Jacobian matrix stored in sparse-block form (s/,/g). The solver implements a Gauss-Newton approach using the LSQR solver to find the Gauss-Newton step and caters in a straightforward way for linear equality constraints. The solver has been successfully tested in a number of self-calibration problems. For example, it was used recently in the calibration of a 13 X 13 grid of targets on a glass plate by a CMM with an optical probing system. The problem involved approximately 15,000 observation equations in over 800 optimisation parameters and was solved in a few tens of seconds using a standard laboratory PC (450 MHz). The advantage of the error separation model is illustrated in Figure 2 which shows the residual errors associated with the first 1000 observations for models a) with no error separation (dots) and b) with error separation. The fit for the error separation model is much superior. The practical metrological consequence of adopting the enhanced model is that uncertainties associated with the target locations can be reduced by a factor of five. Importantly, because the model is a realistic approximation of the measuring system, we can have confidence in the uncertainty estimates derived from the model. 6 Concluding remarks The move to more accurate measurement systems has led to more comprehensive models of the measuring instrument and its interaction with the physical quantity being measured. These models include parameters that describe properties of the instrument and those of the measurand. The aim of self-calibration experiments is to determine as much as possible about both sets of parameters from a set of measurement experiments. For models with a small to modest set of parameters, a full matrix approach may be acceptable. For larger systems, exploitation of sparsity structure in the defining equations is Efficient algorithms for structured self-calibration problems highly desirable and often a stark necessity if the computations are to be made in an acceptable time using the computing resources to hand. The exploitation of block-angular structure has been well-known and well-used in some areas of metrology. The supporting numerical technology based on structured orthogonal factorisations is mature, compact and easily implemented using standard numerical linear algebra. However, these techniques could be applied more widely in metrology, making feasible approaches that have to be rejected if full matrix methods only are to be used. The use of sparse matrix techniques is relatively rare within metrology. We have attempted to show here that in self-calibration problems in dimensional metrology, they allow us to develop improved models that provide vastly superior fits to the data, with correspondinig improvements in the evaluated uncertainties in the fitted parameters. The supporting numerical technology is maturing and accessible. Acknowledgements: This work has been supported by the Department of Trade and Industry's National Measurement System Software Support for Metrology Programme and undertaken by a project team at the Centre for Mathematics and Scientific Software, National Physical Laboratory. The author is particularly thankful to Maurice Cox, Peter Harris and Ian Smith for their contributions. Bibliography 1. A. Bjorck. Numerical Methods for Least Squares Problems. SIAM, Philadelphia, 1996. 2. M. G. Cox. The least-squares solution of linear equations with block-angular observation matrix. In M. G. Cox and S. Hammarling, editors, Advances in Reliable Numerical Computation, pages 227-240. Oxford University Press, 1989. 3. M. G. Cox, A. B. Forbes, P. M. Harris, and G. N. Peggs. Experimental design in determining the parametric errors of CMMs. In V. Chiles and D. Jenkinson, editors. Laser Metrology and Machine Performance IV, pages 13-22, Southampton, 1999. WIT Press. 4. A. B. Forbes and I. M. Smith. Self-calibration and error separation techniques in metrology. In P. Ciarlini, M. G. Cox, E. Filipe, F. Pavese, and D. Richter, editors. Advanced Mathematical and Computational Tools in Metrology V, pages 149-163, Singapore, 2001. World Scientific. 5. P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic Press, London, 1981. 6. G. H. Golub and C. F. Van Loan. Matrix Computations. John Hopkins University Press, Baltimore, third edition, 1996. 7. C. C. Paige and M. A. Saunders. LSQR: and algorithm for sparse linear equations and sparse least squares. ACM Transactions on Mathematical Software, 8{1), 1982. 8. SIAM, Philadelphia. T/ieL^PACii'f/sers'Gmde, third edition, 1999. 9. G. Zhang, R. Ouyang, B. Lu, R. Hocken, R. Veale, and A. Donmez. A displacenient method for machine geometry calibration. Annals of the CIRP, 37:515-518, 1998. 153 On measurement uncertainties derived from "Metrological Statistics" Michael Grabe Am Hasselteich 5, 38104 Braunschweig, Germany. michael.grabe@ptb.de Abstract As measurement uncertainties are closely tied up with error models, it might be of interest to review a model, which the author assigns to "Metrological Statistics". Given that the random errors are normally distributed, the experimentalist could either refer to B.L. Welch's concept of "effective degrees of freedom" or to the multidimensional FisherWishart distribution density. In the first case, different numbers of repeated measurements are admissible, in the latter it is strictly required to have equal numbers of repeated measurements. In error propagation, however, only the latter mode of action opens up the possibility of designing confidence Intervals according to Student and confidence ellipsoids according to Hotelling. Another point of view, closely linked to the choice of the numbers of repeated measurements, refers to the customary practice of attributing equal rights to statistical expectations and empirical estimators. However, the Fisher-Wishart distribution density suggests using only the information which is realistically accessible to experimentalists, namely empirical estimators. For the handling of unknown systematic errors, either the existence of a (rectangular) distribution density may be assumed or, and this is proposed here, they may be classified as time-constant quantities, biasing expectations and suspending a lot of tools and procedures of error calculus well-established otherwise. 1 Introduction The joint propagation of random errors and unknown systematic errors currently places the experimentahst in the following dilemma. In regard to the propagation of random errors, there are, at least in principle, two different choices. If one is willing to accept unequal numbers of repeated measurements of the physical quantities to be combined within a given function, one has, in order to express the influence of random errors, to resort to B. L. Welch's sophisticated concept of so-called numbers of effective degrees of freedom [8]. However, this procedure Is tied up with difficulties: it is restricted to independent variables. Though B. L. Welch's concept completely exhausts the information implied in measured data, unfortunately, from a metrological point of view, it is cumbersome to handle and obstructs the view to existing simpler procedures. On the other hand, if the experimentalist preferred equal numbers of repeated measurements, he would — if need be — have to give away part of his information, namely that which is carried by the 154 On measurement uncertainties from metrological statistics 155 excessive numbers of repeated measurements of the variables involved. Up to liow, the disregarding of excessive numbers is regarded as unfavourable. In spite of this view, just this precaution opens up a toolbox of applied statistics hitherto closed to metrologists, as only with equal numbers of repeated measurements, is the experimentalist in a position to call upon the standard model of statistics for jointly normally distributed random variables, i.e. the Fisher-Wishart density [3]. The advantages gained in that way outweigh by far the "lost information", as relatively few repeated measurements of experimental set-ups, operating in a stationary mode, are able to locate accurately the respective physical quantities. After all, in error propagation the experimentalist may define confidence intervals according to Student (Gosset) including any number of variables. In least squares, he may even establish multidimensional confidence intervals, and last but not least, certain problems of classical error calculus, such as the Fisher-Behrens problems no longer arise. In regard to the interpretation and propagation of unknown systematic errors, the situation is not simpler. Let us assume that an unknown systematic error /, constant in time, is confined to an interval of the kind^ -fs<f<fs, fs>0. (1.1) Now, the experimentalist may either assign a postulated probabilty density to /, usually a rectangular density [7], p(/) = ^, (1-2) / = constant, (1.3) or he niay set without exception where / hes anywhere within (1.1). The latter interpretation introduces biased estimators, leading to a break-down of many procedures of error calculus otherwise wellestablished. . Seen mathematically, both interpretations should be justified. In the case of (1.2), the combination of random and systematic errors should be carried out geometrically, in the case of (1.3), arithmetically. Regarding (1.3), the author suggests adding hnearly Student's confidence intervals to appropriately designed worst-case estimates of the propagated systematic errors, and no probability statements should be associated with so-defined overall uncertainties. 2 Error propagation The fundamental error equations of Metrological Statistics are given as follows [4]. Let xo designate the true value of the physical quantity x to be measured. Furthermore, let £; be the random error and fx = constant the unknown systematic error corresponding ■'Should the interval be unsymmetrical to zero, it could be symmetrized by subtracting the halved sum of the upper and lower boundary — the same quantity would have to be subtracted from the data. 156 Michael Grabe to (1.1). We then have xi=xo + ei-{- fx, 1 = 1,...,n. (2.1) Let Hx = xo + fx be the expectation of the random variable X = {xi,X2,... ,x„}, so that the a;; are some of its reahzations. We then find Xi = fXx+£h l = l,---,n. (2.2) Furthermore, let x = l/n^"^j xi denote the arithmetic mean. We then have the useful identities Xl=Xo + {xi-fJ,x) + fx, X = XQ + {x-fix) + fx- (2.3) While the arithmetic mean is biased, the empirical variance si n L_J2{xi-xf _ , (2.4) (=1 is not. For the time being, let us consider just two quantities to be measured, x and y. As robust and simple uncertainty assessments are a matter of linearization, the overah uncertainty u^ of a given function (j> {x,y) is proposed to be [5], ts,pin-l) "^- ' \f, (d<t> D'^-(i)(S)'-(l)'^ Vlai^ «x + 2i^M^)sx,+ (i^) si + dx Js,X T dy Js,y (2.5) where ts,p (n - 1) is the Student-factor corresponding to a confidence level P. We distinctly see how the empirical covariance 1 " 1=1 enters the empirical variance of the (l){xi,yi); I = I,... ,n, given by .5. 2 /a ;\ /QJL\ / OA\ 2 ^g[„.,.„,-.(.,.)f^(i) 4..(g)(|).,.(|) 4. . The final result (j){x,y)±u^ (2.6) is expected to localize the true value 0(a;o,j/o) with "reasonable certainty" — but no proper confidence statement should be added, as u^ is a mixture of a statistical and a non statistical component. The last term in (2.5) may overestimate the uncertainty, on the other hand linearization errors have been negleted. After all, this uncertainty statement should fulfill the prerequiste to be safe, robust and simple. If there are m quantities to be measured, we replace the notation x,yhyx\,X2,--- , XmThen the overall uncertainty u^ of the final result 4>{Xi,X2,... ,Xm)±U^ On measurement uncertainties from metrological statistics 157 is given by + „„('^_n ts,p{n-l) u^ = - ' in JIL d(t) F>A d(l)_ PtA^ JUL P>A dxi \ fs,i. : (2.7) When (2.5) and (2.7) are compared, it becomes obvious that the proposed formalism of error propagation works Uke a building kit, perspicuous and easy to handle. There are arguments against (2.7), in particular that an experimentalist who wishes to design his uncertainties in this way, would have to know the complete set of repeated measurements, in other words, the complete empirical variance-covariance matrix s = (sij), 2,i = 1,2,... ,m, (2.8) of the input data. Arguably, this is true, but in the days of computers and the internet such a challenge should no longer be apt to provoke difSculties worth mentioning. Another argument, that (2.7) might overestimate overall uncertainties, should be judged in view of the unique role of metrology in science. Standing "between" theory and experiment, metrology pursues the idea to localize rehably the value of the physical quantity in question. 3 Least squares Let , A/3«a; (3.1) be a linear system of equations to be adjusted. Here, A designates the m x r design matrix of rank r, /3 the r x 1 vector of unknowns and, finally, a; the m x 1 vector of the observations or input data. We assume rn> r. The idea of least squares is of purely geometrical origin. In what follows, A^ denotes the transpose of A. The idea is to project the vector x by means of a projection operator P^AiA^Ay'A^ (3.2) orthogonally onto the column space of the matrix A, and the result is P={A'^Ay'A'^x. (3.3) As the solution vector $ is Unear in the input data, the transfer of (2.7) to its components ^fc, fc = 1,... ,r, is straightforward. Clearly, the orthogonal projection is in no way dependent on the error model implied. In contrast to this, the latter turns out to be crucial in regard to uncertainty assessments. Let us consider a set of single observations Xi= xo4 +ei + fi= Xo,i + {Xi - Pi) + fi, i = l,... ,m, (3,4) being the input data, where E {Xi} = ni. Writing (3.4) in vector form, we have X = xo + (x - fi) + f (3.5) 158 Michael Grabe where X = {Xi,X2,...,Xm) , Xo = {XO,1,XQ^2,--- ,Xo,m) M = (M1)A*2)-•• j/^m) , / = (/l,/2,--- ,/TO) , , -fs,i < ft < fs,i- Given equal variances a^ = EUXi -fit) [, the minimized sum Qmm of squared residuals of the adjusted system (3.1) should yield, according to quite familiar procedures, an estimator s^ i=» 0-^. However, from Qmin = {X- Px) {X - Px) , we obtain something different, namely E{Qr^^] = a\m-r) + ff-fPf. (3.6) As we see, even the simplest of all associated least squares procedures breaks down, should the model of time-constant unknown systematic errors be accepted. At the same time the related basic tool hnked to Qmin and frequently used, namely the test of consistency oi the'mp\it data, based on the criterion breaks down as well. Indeed, during many decades, time and again, the observation Qmin/s^ '>m-r has stunned experimentalists [2], so that, in the adjustments of the fundamental physical constants, even the abolition of least squares has been considered [1]. However, in view of (3.6), these observations are understandable. After all, a least squares adjustment of biased input data requires arithmetic means 5i = Xo,i + {xi - iii) + fi, i = 1,... , m, (3.7) so that the empirical variances and covariances Sij = 1 " r y ^ (Xjl — Xi) {Xjl — Xj), Sii = Sj , (."J-"] are known a priori. Replacing (3.5) by x = xo + {x-fi) + f (3.9) P={A'^Ay'A'^x. (3.10) . instead of (3.3), we find A matter of similar concern refers to the break-down of the Gauss-Markoff theorem. In view of (3.9), the solution vector P is biased, so that the experimentalist is no longer in a position to obtain a weight-matrix from the variance-covariance matrix of the input vector X. Consequently, simple, optimized adjustments, to which we are customarily used, must be ruled out. Nevertheless, we may multiply (3.1) from the left with any On measurement uncertainties from metrological statistics 159 non-singular weighting matrix, e.g. with a diagonal one, G = {gi,g2,.-. ,gm}, 9i = —, (3-11) and adjust the weights g, by trial and error in order to find the shortest possible uncertainty intervals. As has been shown, this method is also able to detect inconsistencies among the input data, [6]. Indeed, as a non-singular weight-matrix cannot shift the true solution vector /3o, we are allowed to proceed this way. To assign uncertainties to the components ^fe; A; = 1,... , r of the solution vector (3, we refer to (2.7). To abbreviate the notation, we set in (3.10) B^AiA^Af^ (3.12) where the elements of the matrix B will be designated by 6,^. Upon insertion of (3.9) into (3.10), we arrive at p = B''xo + B^ix-^i) + B'^f. (3.13) Evidently, /3o = B'^XQ is the true value of the estimator p. Setting fx^ = E {^] = (3Q + B'^ f, we may define the theoretical variance-covariance matrix E[{^-iXp){^-iipf}, which, however, remains numerically inaccessible. Consequently, the only thing we can do is to resort to the empirical variance-covariance matrix s^=(s^.^,,)=-B'"sB, /c=l,2,...,r, (3.14) whose elements are given by 771 Clearly, the Sij are the elements of the empirical variance-covariance matrix s of the input data, as has been stated in (2.8) and (3.8). These procedures presuppose, as has been pointed out, equal numbers of repeated measurements within each of the m means (3.7). The components Pk of the solution vector may be written as ^ n m h = -y^3ki with hi = 'yMkXii\ k = l,...,r. (3.16) 1=1 1=1 Evidently, the Pki are independent and normally distributed. Let p^^ denote the expectations H0^^E{h}, fc = l,...,r (3.17) of the Pk- Looking for just any one of the ^k, 0k - -^-^ ^0. ^ M^. ^^k + ^ s^, (3.18) 160 Michael Grabe is a confidence interval according to Student, where ts,p {n - 1) is the Student-factor. This interval localizes fx^ with confidence P. The components of the third term on the right-hand side of (3.13) are given by fh=T.^ikfi, k = l,...,r. (3.19) Worst-case estimates are m fsA=Y,\^i''^f^^i^^ ■ fc = l,...,r. (3.20) t=i After all, the overall uncertainties u^^ of the components of the solution vector ^, considered and employed individually, are proposed to be "fe= ^ g^. + /.A. k = l,...,r. (3.21) 4 Uncertainty spaces The component representation of (3.13), m m 0k = Po,k + 5^ bik (xi - Hi) + ^ bikfi (4.1) reveals the couplings between the least squares estimators. Those due to random errors may be expressed by means of Hotelling's density [3]. The last term on the right-hand side of (4.1), m fpk = Yl^''•-•^'' k=l,... ,r, (4.2) expresses the couplings due to systematic errors. The r components /^^ map the mdimensional hypercuboid -fs,i < fi< /«,i, i = 1,... ,m, (4.3) onto the r-dimensional space, yielding a convex polytope. Both solids may be combined to an overall uncertainty space, resembling a "convex potato". Figures 1-3 show the confidence ellipsoid, the "security polytope" and the combination of both to an overall uncertainty space for the example of a least squares adjustment of a circle. 5 Conclusion As computer simulations reveal, the approach presented here leads to measurement uncertainties safeguarding physical objectivity in the sense that uncertainty intervals reliably locate the values of the physical quantities in question. With such a distinct statement, the traceability of units and standards will certainly be maintained. On measurement uncertainties from metrological statistics 1. Confidence ellipsoid, security polytope, overall uncertainty space resembling a "convex potato". FIG. References 1. Bender, P.L., B. N. Taylor, E.R. Cohen, J.S. Thomas, P. Pranken, and C. Eisenhart, Should least squares adjustment of the fundamental constants be abolished?, NBS Special Publication 343, United States Department of Commerce, Washington D.C., 1971. 2. Cohen, E.R. and B.R. Taylor, The 1986 adjustment of the fundamental physical constants, CODATA BULLETIN Nr. 63 (1986). 3. Cramer, H., Mathematical Methods of Statistics, Princeton University Press, Princeton 1961. 4. Grabe, M., Principles of "metrological statistics", metrologia 23 (1986/87) 213-219. 5. Grabe, M. Estimation of measurement uncertainties, an alternative to the ISO-Guide, metrologia 38 (2001) 97-106. 6. Grabe, M., An alternative algorithm for adjusting the fundamental physical constants, Physics Letters A 213 (1996) 125-137. 7. ISO, Guide to the expression of uncertainty in measurement, 1993.1, Rue de Varambe, Boite postale 56, CH-1211 Geneva 20, Switzerland. 8. Welch, B.L., The generalization of Student's problem when several different population variances are involved, Biometrika 34 (1947) 28-35. 161 /i and loo ODR fitting of geometric elements Hans-Peter Helfrich Mathematisches Seminar der Landwirtschaftlichen Fakultat der Universitat Bonn helfrich@uni-bonn.de Daniel S. Zwick Wilcox Associates, Inc. dzwickOwilcoxassoc.com Abstract We consider the fitting of geometric elements, such as lines, planes, circles, cones, and cylinders, in such a way that the sum of distances or the maximal distance from the element to the data points is minimized. We refer to this kind of distance based fitting as orthogonal distance regression or ODR. We present a separation of variables algorithm for h and loo ODR fitting of geometric elements. The algorithm is iterative and allows the element to be given in either implicit form f{x,0) = 0 or in parametric form x = g{t, /?), where (3 is the vector of shape parameters, a; is a 2- or 3-vector, and s is a vector of location parameters. The algorithm may even be applied in cases, such as with ellipses, in which a closed form expression for the distance is either not available or is difficult to compute. For h and loo fitting, the norm of the gradient is not available as a stopping criterion, as it is not continuous. We present a stopping criterion that handles both the ^1 and the loo case, and is based on a suitable characterization of the stationary points. 1 Introduction Let us be given A'' points {^Jili G K'' and a geometric object S in • implicit form {x : /(a;,/3) = 0} with a scalar function /, or • parametric form x = g{t, p) with a vector function g, where the shape parameter vector /? e C lies within a closed, convex subset C of R™. Denote by (/)i(/3) = inf{ 112;,: - Xi II2 : a^i on 5} the distance of the point Zi to the geometric object S. Let <A(/3) = (0i(/?),---,<Ajv(/3)f be the distance vector with norm where ||(A(/3)|| denotes either the /00-norm #(/3) = max(<^i (/?),..., <AJV(/3)) 162 Fitting of geometric elements or the /i-norm N We consider the problem: Find (3 e C and points {xi}^^ on S such that $(/?) = ||(?!>(/3)|| is minimal. If the minimum is attained, each function 0i(/3) = \\zi - a;j||2 is minimal for the point Si € 5. Then z, - Xi is orthogonal to S for interior points of S, hence the term "orthogonal distance regression" or "ODR". Nonlinear /i ODR problems are treated in WATSON [10, 12]. A survey for linear problems is given in ZwiCK [13]. As stated, the problem has dimension Nd-\-m. In typical metrology applications, the data set is very large so that a direct approach to the problem becomes computationally expensive. We use a separation of variables algorithm that was used in [2, 4] and TURNER [9] for the I2 ODR problem. Each iteration of our algorithm consists of two steps. In the first step, the foot points {xi}f^i on S, i.e., the location parameters, are calculated for a fixed parameter vector (3. These d-dimensional subproblems can be efficiently handled by trust region methods [3]. In the second step, a first order approximation of (j)i{l3) is employed, that can be given without explicit knowledge of the dependence of the optimal points Xi{j3) on (3. At this stage, the norm of the correction to the parameter vector /3 is limited by a trust region strategy. The correction can be computed by solving a linear programming problem. For general nonlinear minimax problems such methods were proposed in MADSEN AND SCHJJGR-JACOBSEN [6], HALD AND MADSEN [1] and JONASSON AND K. MADSEN [5]. Our convergence analysis follows the general approach given in POWELL [8] and MORE [7]. But in order to handle the li and l^o case we cannot use the norm of the gradient as a stopping or convergence criterion, since the gradient is not continuous. Moreover, a neccessary condition for a minimum is that the subgradient contains the zero functional, see, e.g., WATSON [11]. In order to overcome this difficulty, we introduce a replacement for the norm of the gradient that serves both as a stopping criterion and as an essential tool in the convergence proof. 2 The trust region algorithm At each iteration of our algorithm we solve the low-dimensional subproblems (Pj) for (3 = Pk for each fixed i,i = l,...,N: Minimize \\zi — a;,||2 subject to f{xi,l3) =0 or ic, = g{ti,P). In order to apply the trust region method to h and Zoo ODR we need a first order approximation tpi{l3,a) to (pi{(3). With appropriate regularity assumptions, this can be computed without knowledge of the dependence of the optimal points Xi{(3) on /3 ([2], [4]). This means that the iterative improvement in /3 is uncoupled from the calculations of Xi{(3), whereby a true first order approximation of the objective function is attained. 163 164 H.-P. Helfrich and D. S. Zwick In the case of the impHcit form f{x,/3) = 0, the first order approximation 0i(/3 + a) = ^j(/3, a) + O(Q;) is given by UP,a) = — ^J^^-^^ . (2-1) as a first order approximation to the signed distance ±(j)i{(3 + a). For the parametric form X = g{t,l3), we have MP,a) = \\zi - XiW, - \'' ~ '^'f ■P^ff(x,;,/3)Q. '*(2.2) Note that (2.1) makes sense even for points on the surface. For an orientable hypersurface in parametric form, the expression [|^^~^'|| in (2.2) should be replaced by the unit normal for points on the surface. Denote by the vector of the linearized distances and let *(/3,a) = ||V^(/3,a)||-||(/>(/?)||. The main algorithm: • Step 0: An initial /3o_G M", a trust region radius Ao > 0, and constants 0 < fl< 1 and 0 < 7 < 1 < 7, A are given. Set fc = 0. • Step 1: Minimize '^{I3k,a) subject to ||a||2 < Afc and (3k + a £ C. Let ak denote the solution with minimal norm. • Step 2: If ak = 0, stop. • Step 3: Compute • Step 4: {1) Successful step. li pk > fji set and choose Afc+i such that Afe<Afc+i<min(7Afc,A). (2) Unsuccessful step. Otherwise, set , ^fc+i = ^fc and 0 < Afc+i < 7Afc. • Step 5: Increment k by one and go to Step 1. 3 Global convergence In an abstract setting our problem may be formulated as Minimize $(/?) = \\(p{0)\\ on a closed, convex set C. (2.3) Fitting of geometric elements 165 To solve this problem, at each stage of the iteration we solve the following constrained, linearized problem: Minimize ^{P, a) subject to 13 + a £ C and \\a\\ < A. In order to get the hnearization in our case, we solve the least distance subproblems (Pj), i = 1,..., A'', with a shape parameter/3, and use (2.2), or (2.1). For the purpose of characterizing stationary points, we introduce the quantity Vi(/?) =-inf{*(/3,a) I ||a|| < 1, ^ + a e C}. Note that Vi(/3) > 0, since *(/3,0) = 0. By convexity, Vi (/3) = 0 implies that a = 0 is a solution of the linearized minimization problem. MADSEN AND SCHJJER-JACOBSEN [6] have shown that the latter condition is equivalent to a condition given therein for the functional to have a stationary point. In order to prove Theorem 3.3 we prove a lemma that was given in a similar form for the loo case in MADSEN AND SCHJ^R-JACOBSEN [6] and JONASSON AND MADSEN [5]). We give a different proof that is applicable to both the li and /oo cases. Lemma 3.1 estimate Let Vi((9) > e and A < A. For the solution of the linearized problem the ^iP,a)<-CeA (3.1) holds, with a constant that depends only on e and A. Proof: According to the definition of Vi(^) and the continuity of \Ef there exists a feasible ai with ||Q!I|| < 1 such that *(/?,ai) = -e.: Let a = tai, where t = min(l. A). Since *(/?, a) is a convex function, we get ^{P,a)<{l-t)^{0,O)+t^{p,ai) = -te. Since i> Amin(l,l/A) we get the conclusion with C = min(l,l/A). Proposition 3.2 For a minimum point, □ Vi(/3)=0 holds. Proof: Assume the contrary, then Vi(,3) = e > 0 holds. According to the definition of ^{P,a) we have $(/? + a) = $(/3) + *(A a) + o(a). By Lemma 3.1, we can find an a with the Lemma, we may conclude that ||Q!|| < A such that (3.1) holds. As in the proof of #(/? + to) < $(/3) - CeiA + o(to) for 0 < i < 1. If we let i —> 0 we get a contradiction to the minimum property. P 166 H.-P. Helfrich and D. S. Zwick Theorem 3.3 Either the algorithm ends in a finite number of steps, or a sequence 0^ is generated for which \im'mik-*oo^ i{l3k) = (^Proof: Assume the contrary. Then there exists e > 0 such that Vi(/3fc) > e holds for all k. By the definition of p^ and the lemma, it follows that for a successful step and by the updating rule for Afc+i we get Afc+i < c(<^(/3,+i) - 0(A-))with c = l/{iiCe). Combining this inequality with the updating rule for an unsuccessful step yields AA-+I < 7Afc + c(0(A+i) - •/-(/JO)By summation and the monotonicity of <t>{l3k) it follows that for all iV fc=0 II Since this implies the convergence of ^Ak, we get HmAfc = 0. From ||/3)t|| < A^ we obtain the convergence of /J^. From the definition of pk it then follows that limpk = 1. But then the updating rule (2.3) imphes that eventually AA-+I > A^, which gives a contradiction. O Theorem 3.4 (Global Convergence, cf. MORE [7], POWELL [8]) Assum.e that Vi(/3) is uniformly continuous. Then either the algorithm ends in a finite number of steps, or a sequence (3k is generated for which lim Vi(/3fc) = 0. Proof: Assume the contrary. Then there exists an ei such that for each fco there exists a fc > /co with Vi(A)>ei. By Theorem 3.3 we can find an index / > k such that Vi(A)<ei/2 (/co will be determined later). We choose the smallest such /. As in the proof of Theorem 3.3, it follows that for that a successful step with k <i < I, IIA+i - All < A^ < 2ci(<^(A) - 0(A+i)). Clearly, this also holds for an unsuccessful step. This yields ||A-/3^-||<2ci(^(/3fc)-M))Since (/)(/3j) converges by monotonicity, we can make ||/3; - A-H arbitrarily small for large enough fco • By the uniform continuity of Vi(/?) we infer |Vi(/3fe)-Vi(/30|<ei/2, which is a contradiction. □ Fitting of geometric elements 4 167 A numerical example As an illustrative example, we fit an ellipse to data, given as coordinate pairs in M^. There are 24 data points and five components to the shape parameter vector (i.e., n = 2,d = 2,m = 5,N = 24). We used a standard parameterization involving a center (a;o, yo), the axes (a, 6), and a rotation angle 9. . The output is shown below. The initial values for the parameters and the obtained parameters in three different norms are given in Table 1. In the I2 case, we give as the error the root mean square error, in the /i case the mean absolute deviation, and in the loo case the maximum deviation. FIG. Initial values h h loo 1. I2, h, and ?oo-Approximation. XQ Xi 0.4989881 0.6637511 0.5368646 0.7694412 -1.4262126 -1.3987826 -1.4465520 -1.3829474 TAB. 4.6719913 5.5124671 5.2778061 4.9731226 0.4364267 0.3376480 0.3358224 0.4491259 1. Parameters for different norms. 6 (degrees) Error 20.75913 20.90124 20.88869 20.66893 0.11520 0.09047 0.23489 168 H.-P. Helfrich and D. S. Zwick The number of iterations in each case was five or six. We note that the deviations for the best fit li and /oo eUipses exhibit behavior typical to these norms: five of the data points lie on the best fit li ellipse and there are six deviations of largest magnitude in the loo case. Bibliography 1. J. Hald and K. Madsen. Combined LP and Quasi-Newton methods for minimax optimxzdXion. Mathematical Programming, 2Q-A%-&2,1%%1. 2. H.-P. Helfrich and D. Zwick. A trust region method for implicit orthogonal distance regression. iVwrnerica/ylZ^orii/ims, 5:535-545, 1993. 3. H.-P. Helfrich and D. Zwick. Trust region algorithms for the nonlinear least distance problem. Numerical Algorithms, 9:171-179, 1995. 4. H.-P. Helfrich and D. Zwick. A trust region algorithm for parametric curve and surface fitting. J. Comput. Appl. Math., 73:119-134, 1996. 5. K. Jonasson and K. Madsen. Corrected sequential linear programming for sparse minimax optimization. BIT, 34:372-387, 1994. 6. K. Madsen and H. Schjaer-Jacobson. Linearly constrained minimax optimization. Mathematical Programming, lA:208-225, 1978. 7. J. J. More. Recent developments in algorithms and software for trust region methods. In A. Bachem, M. Grotschel, and B. Korte, editors, Mai/iemafica/Proc/rammmg Bonn 1982-The State of the Art, pages 259-287. Springer, 1983. 8. M. J. D. Powell. Convergence properties of a class of minimization algorithms. In O. L. Mangasarian, R. R. Meyer, and S. M. Robinson, editors, Nonlinear Programming 2, pages 1-27. Academic Press, 1975. 9. D. A. Turner, I. J. Anderson, J. C. Mason, M. G. Cox, and A. B. Forbes. An efficient separation-of-variables approach to parametric orthogonal distance regression. In P. Ciarlini, M. G. Cox, F. Pavese, and D. Richter, editors. Advanced Mathematical and Computational Tools in Metrology IV, pages 246-255, Singapore, 2000. World Scientific. 10. G. A. Watson. The use of the h norm in nonlinear errors-in-variables problems. In S. Van Huffel, editor, Recent Advances in Total Least Squares Techniques and Errors-in-Variables Modeling, pages 183-192, Philadelphia, 1997. SIAM. 11. G. A. Watson. Choice of norms for data fitting and function approximation. Acta Numerica, pages 337-377, 1998. 12. G. A. Watson. Some robust methods for fitting parametrically defined curves and surfaces to measured data. In F. Pavese P. Ciarlini, A. B. Forbes and D. Richter, editors, Advanced Mathematical and Computational Tools in Metrology IV, volume 53 of Series on Advances in Mathematics for Applied Sciences, pages 256-272. World Scientific, 2000. 13. D. Zwick. Algorithms for orthogonal fitting of lines and planes: a survey In P. Ciarhni, M. G. Cox, F. Pavese, D. Richter, editors. Advanced Mathematical Tools in Metrology II, pages 272-283. World Scientific, 1996. Evaluation of measurements by the method of least squares Lars Nielsen Danish Institute of Fundamental Metrology (DFM), Lyngby, DK} LN@dfm.dtu.dk Abstract In this paper, a general technique for evaluation of measurements by the method of Least Squares is presented. The input to the method consist of estimates and associated uncertainties of the values of measured quantities together with specified constraints between the measured quantities and any additional quantities for which no information about their values are known a priori. The output of the method consist of estimates of both groups of quantities that satisfy the imposed constraints and the uncertainties of these estimates. Techniques for testing the consistency between the estimates obtained by measurement and the imposed constraints are presented. It is shown that linear regression is just a special case of the method. It is also demonstrated that the procedure for evaluation of measurement uncertainty that is currently agreed within the metrology community can be considered as another special case in which no redundant information is available. The practical applicability of the method is demonstrated by two examples. 1 Introduction In 1787, the French mathematician and physicist Laplace (1749-1827) used the method of Least Squares to estimate 8 unknown orbital parameters from 75 discrepant observations of the position of Jupiter and Saturn taken over the period 1582-1745. Since then, the method of Least Squares has been used extensively in data analysis. Like Laplace, most people use a special case of the method, known as unweighted Unear regression. The calculation of the average and the standard deviation of a repeated set of observations is the most simple example of that. The unweighted regression analysis is based on the assumptions that the observations are independent and have the same (unknown) variance. In addition, the linear regression is based on the assmnption that the observations can be modelled by a function that is linear in the unknown quantities to be determined by the regression analysis. For most measurements carried out in practice, none of these assumptions can be justified. In order to evaluate the result of a general measurement, in which some redundant information has been obtained, one therefore has to apply the method of Least Squares in its general form. This paper describes how measurements can be evaluated by the method of Least Squares in general. The paper is based on an earlier work of the author [2] but includes 1 Address: Building 307, Matematiktorvet, DK-2800 Lyngby, Denmark 170 Evaluation of Measurements 171 several new features not published before as well as practical examples from the daily work at DFM. An alternative approach is described in [6]. 2 Measurement model In a general measurement, a number m > 0 of quantities is either measured directly using measuring instruments or known a priori, for example from tables of physical constants etc. The (exact) values of these m quantities are denoted ^ C = (Cl)---)Cm) • Due to measurement uncertainty, the values z obtained by the measurement (or from tables etc.) Z = {zi,...,Zm)'-'^ are only estimates of the values ^. The standard uncertainties of the estimates Zi, u{zi) , i = l,...,m, are determined in accordance with the GUM [1] and depend on the accuracy of the instruments and the reliability of any tabulated value used. In general, some of the estimates Zi may be correlated. If r{zi,Zj) is the correlation coefficient between the estimates Zi and Zj then the covariance u{zi, Zj) between these two estimates is given by u{zi,Zj)=u{zi)r{zi,Zj)u{zj). Because of the uncertainty, the estimates z can be considered as an outcome of a mdimensional random variable Z with expectation C (the exact values of the quantities) and covariance matrix S / S = u(z,z^) u^{zi) u{zi,Z2) u{z2,Zi) U'^{Z2) \ u{Zm,Zi) u{zi,Zm) \ u{z2,Zm) u{Zm,Z2) ••• U^{Zm) J In addition to the m quantities for which prior information is available either from direct measurement or from other sources, a general measurement may involve a number A; > 0 of quantities for which no prior information is available. The values of these quantities are denoted by In general, the values /3 and C are constrained by a number n of physical or empirical laws. These constraints may be written in terms of an n-dimensional function / /i(/3,C) \ /0\ 0 , k<n<m + k. (2.1) V /n(AC) / It is assumed that fi : fl ^^ R l...n, are differentiable functions (with con- 172 Lars Nielsen tinuous derivatives) defined in a region ft C ii'=+™ around (/3, C)- As indicated in (2.1), the number n has to be larger than or equal to the number k of quantities for which no prior information is available; otherwise some of the values (3 cannot be determined. In addition, the number n of constraints has to be smaller than the total number fc + m of quantities involved; otherwise the values of /3 and C would be uniquely determined by the constraints and no measurements would be needed. The estimates z, the covariance matrix S and the n-dimensional function f{f3, C) are the input to the general Least Squares method. It should be stressed that no probabiHty distribution has to be assigned to the input estimates z. On the contrary, if a probability distribution has been assigned to an estimate, it should be used to calculate the mean value and the variance of the estimate which should then serve as input to the Least Squares method. Like any other covariance matrix, the covariance matrix u(z,z^) = S is positive semi-definite. Otherwise, at least one linear combination x^z of the estimates z would have negative variance u(x^z, z^x) = x^Sx. In the following it is assumed that S is positive definite and therefore non-singular. 3 Normal equations Least Squares estimates 3 and C, of the values /3 and C, are found by minimizing the chi-square function x'(C;z) = (z-C)^s-i(z-C) subject to the constraints f(/3,C) = 0. It is convenient to solve this minimization problem by using Lagrange multipliers [5]: If a solution (/3, C,) to the minimization problem exists, the solution satisfies the equation V(^,C,A)^(/9,C,A;z) = 0 where $(A C, A; z) = (z - C)^S-i(z - C) + 2A^f (A 0 for a particular set of Lagrange multipliers A = (Ai,... j^A„)'^. By taking the gradient of the function $, the following n-\-m-{-k equations in 0, C„ A) evolve: V/3f(/3,C)^A -S-i(z-C)-fVcf(ACTA f(/3,C) where = = = (3.1) MA. \ / 9A and V;3f: . d^ \ 001 0, 0, 0, ... MIL df}k V^f^ \ aci df„ , acm / The equations (3.1) are called the normal equations of the Least Squares problem. Evaluation of Measurements 4 173 Solving the normal equations If {(3i,Ci,M) is an approximate solution to the normal equations, a refined solution (/3;_,_j, Ci+i, A;+i) can be found by the iteration I \ , I = 1,... ,00. The step (A^;, ACi, A;+i) is given by D(/3„0 AC, V Am ; = S-^z^C;) h (4.1) Vcf(A,C,r 0("-") , / (4.2) V -f(A,CO where , D(/3„C;)= 0(™'^) V V/3f(A,C;) S-i Vcf(A,C) is a symmetric matrix. This iteration procedure is similar to Newton iteration except that the second order partial derivatives of the functions /, have been neglected as it is practice to do in non-linear Least Squares estimation [4]. In order to reduce the effects of numerical rounding errors, it is recommended to calculate the step (A/3;, A^;, A^+i) by solving the hnear equations (4.1) by Gauss-Jordan elimination with full pivoting [4]. This algorithm also provides the inverse matrix D(/3,, C;)""^ which is needed at the final stage for estimating the covariance matrix of the solution as shown in Section 5. If proper starting values (/3i,Ci) are selected, the iteration is expected to converge towards the solution {P,C,) Since the solutions C, are expected to be close to the estimates z of C available a priori, the estimates z are obviously the proper starting values Ci to be selected for the iteration. The selection of proper starting values /Sj is more difficult in general. If, however, f (/3, Q are linear functions in the variables /3, the iteration process will converge after a few iterations, independent of the choice of jS^. Most differentiable functions f(/3, C) can be handled by the described method. In order to get reliable standard uncertainties, it is required, however, that the function can be approximated by a first order Taylor expansion, i.e. f (/3, C) ^ f (/3, C) + V;3f (/3, C)(/3 -/9) + Vcf (/3, C)(C - C) when the values /3 and C are varied around the solution ^ and C on a scale comparable to the standard uncertainties of the solution. If this vaguely formulated criterion is met, the function f (/3, Q is said to be linearizable. Note that almost any differentiable function 174 Lars Nielsen will be linearizable if the standard uncertainties are sufficiently small. On the other hand, if the uncertainties are sufficiently high, all non-linear functions will no longer be Hnearizable. The requirement that f(/3, C) is linearizable is considered to be the only major limitation of the method of Least Squares! It should be mentioned that the minimization using Lagrange multipliers will fail in case the gradients V^/j and V^/j of one of the constraint functions /j are both equal to zero at the point of the solution (/3, C). This gives some restrictions on how a constraint may be formulated. A function /j defining a constraint may, for example, be replaced by the square of that function, ff. But since /i(^, Q = 0, the gradient of ff will be zero at the point of the solution (/9, Q although the gradient of /,; is not. 5 Properties of the solution Since the solution {P,C„X) depends on the estimates z, which are considered as the realization of the multivariate random variable Z, the solution (/3,C,A) can also be regarded as a multivariate random variable. If the functions /,(/9, C) fire linearizable, the estimates (;9, C,) are hnear in Z /3\ c U //3\ c / 0(M) +D(Acr\l s"'(z-c) 1- (51) In that case, the expectation of the solution is which means that (/9,C) are central estimators of the values (/3,C)- Under the same assumption, the covariances of the solution are given by the symmetric matrix D(/3, C)~^ provided by the Gauss-Jordan elimination algorithm^ u{t^^) V ()'"■'=) u(C,f) O'"'"' O^"'"^ =D(/3,0-' = D(^,C)-^ (5.2) -u{X,X^) I This relation can be derived as follows. Partition the symmetric matrix D~^ into nine sub-matrices similar to the left hand side of (5.2) or similar to the partitioning of D according to the definition (4.2). Express the covariance matrix of the solution (5.1) in terms of the covariance matrix S of the random variable Z and the matrix D~^. Insert the partitioned D"^ into the resulting matrix double product and express the covariances of the solution in terms of S~^ and the sub-matrices of D"^ Reduce these expressions to the final result by multiple use of the nine relations between the sub-matrices of D~^ and D derived from the identity DD~^ = I. : ^The empty brackets in the left hand matrix indicates the parts of D ^ that do not contain information about covariances. Evaluation of Measurements Prom equation (5.1) and (5.2) the covariance matrices between 0, C) and the estimates z are found to be u{z,f) = u{cf). Prom the last of these two relations, a relation of particular interest is derived, u(z-C,z^-C )="(z,z^)-w(C,C )• For the diagonal elements, this relation reads u'^{zi-Ci)=u'^{zi)-u'^{Ci) , i = l,...,m. That is, the variance of the difference between the initial estimate Zi of Q and the refined estimate Q is equal to the variance of Zi minus the variance of Q. This relation is useful when testing if the difference Zi — Q is significantly different from its zero expectation. 6 x^ test for consistency When the estimates (/9, C) have been found, the minimum x^ value x'(C;z) = (z-cTs-Hz-C) can be used to test if the measured values z are consistent with the measurement model (2.1) within the uncertainties defined by the covariance mati^ix S. If the model is linearizable, the expectation of the random variable X'^(C; Z) is equal to the number m of measured quantities, minus the number m + A; of adjusted quantities, plus the number n of constraints, that is E X (C; Z) = m — {m -\- k) -\- n = n — k = V. If, in addition, the random variables Z are assumed to follow a multivariate normal distribution with mean values C, and covariance matrix S, the random variable x^(C; Z) will follow a x^('^) distribution with v = n — k degrees of freedom. In that case, the probability p of finding a y^ value larger than the value x^(Cizi) actually observed can be calculated from the x^(^) distribution p = p[x\y) > x'(C,z)} = 1 - P{x\y) < x'(C,z)}. If this probability p is smaller than a certain value a, the hypothesis that the measured values are consistent with the measurement model has to be rejected at a level of significance equal to a. As the result of measurements are normally quoted at a 95% level of confidence, an a = 5% level of significance is a reasonable choice for the consistency test. Although the assumption of a normal distribution of Z may not be fulfilled, it is suggested to carry out the test of consistency as described above anyway. This is justified by the fact that a value of X^(C; z) significantly higher than the expectation v indicates inconsistency no matter what the distribution of Z might be. The calculated probability 175 176 Lars Nielsen p simply describes how unlikely the observed x^ value is if a normal distribution is assigned to Z. 7 Normalized deviations If the test described in the previous section leads to a rejection of the measurements, a tool for identifying the outlying measurements is desirable. A measured value Zi is defined as an outlier if the difference zt - Q is significantly different from zero taking into account the standard uncertainty u{zi - Q) of that difference. This leads to the introduction of the normalized deviation dj defined by"* ai= . "(^' ~ ^i^ = , , i = l,...,m. ^Jv?{zi)-u'{Q The normalized deviation di has zero expectation and variance 1. A normalized deviation with \di\ larger than 2 or 3 is therefore rather unlikely no matter what the distribution of the random variable di might be. If a multivariate normal distribution is assigned to Z and the model function f (/9, Q is linearizable, the normalized deviation di is normally distributed, di eiV(0,1) , i = l,...,m. In that case P{Mi|>2} = 5%, and a measurement with |di| > 2 is therefore identified as an outlier at a 5% level of significance. It is suggested to use the criteria \di\>2 to identify potential outliers even if the distribution assigned to Z is not normal. 8 Adjustment of a variance <T^ If some values Zi have a common but unknown variance u^(zj) = cr^, this variance can be estimated by adjusting o^ by an iterative procedure until the "observed" x^ value becomes equal to its expectation value u where the covariance matrix E is a function of the unknown variance a"^. As the estimates C, depends on the value assigned to cr^, these estimates have to be updated together with the estimates C, each time the value of a"^ is changed during the iteration. This way of estimating the unknown variance a"^ leads to the well-known expression for the standard deviation in the case of a repeated measurement of a single quantity as shown in Section 13. . ^If u{zi — ifi) =0, the difference Zj - Q will be zero as well and d; may be set equal to zero. This situation occurs whenever there is no 'redundant information available regarding the value of the quantity C,i. 177 Evaluation of Measurements 9 Example: Calibration of an analytical balance An analytical balance with capacity Max=220 g, resolution d=0.1 mg, and built-in adjustment weight was cahbrated by DFM in October 1999 during an inter-laboratory measurement comparison piloted by DFM. Two mass standards were used as reference standards. One of them was a traditional 200 g weight (named R200g) of known conventional mass value^ ruR and density pR. The other reference standard was a specially designed 200 g stack of weights consisting of four discs (named lOOg, 50g, 25g and 25g*) machined from the same metal bar of known density p. The conventional mass values mi, m2, ms, m4 respectively of these four discs were not known a priori; only the conventional mass value ms = mi + m2 -f ma -F 7714 of the stack was known. The calibration was performed by placing a weight combination at the weighing pan of the balance and by recording the corresponding average indication / in the display. A total of 18 weight combinations were used. Each weight combination was weighed 3 times from which the average indication was calculated. The calibration was repeated 4 times during a period of 10 days in which the inter-laboratory comparison took place. Prom these four calibrations, a grand average indication /,, i = 1,..., 18 was calculated for each of the 18 weight combinations specified in Table 1. The standard uncertainty of the grand average was estimated from the observed variation in indication over the four calibrations. h lOOg 50g 25g 25g* /lO 50g 25g 25g* TAB. I2 lOOg 50g 25g 25g* ill 50g 25g Is lOOg 50g 25g /12 50g 25g* h lOOg 50g 25g* /l3 50g h lOOg 50g /l4 25g 25g* h lOOg 25g 25g* /is 25g I7 lOOg 25g* /8 I9 lOOg 25g lOOg /16 /l7 /18 25g* R200g R200g 1. The weight combinations corresponding to the 18 balance indications 7, Due to the effect of air buoyancy, the balance indication depends not only on the mass of the weighed body, but also on the density of the body as well as the density of the air. When calibrated in air of known density a, the reference indication IR of the balance corresponding to a load generated by a weight with conventional mass value m and density p is given by m (a-ao) ( \P Po where ao=1.2 kg/m^ and po=8000 kg/m^ are the reference densities of the air and the weight respectively to which the conventional value of mass refers. As a model for the '*The conventional mass value of a body is defined as the mass of a hypothetical weight of density 8000 kg/m* that balances the body when weighed in air of density 1.2 kg/m^ and temperature 20 "C. Lars Nielsen 178 z u{z) c uiO d ms ruR [g] [g] 199.988816 0.000010 199.988814 0.000010 1.66 h z u{z) c u(0 d z u{z) c u(0 d [div] 199.988608 0.000023 199.988620 0.000011 -0.56 174.992133 0.000023 174.992149 0.000012 -0.77 175.009992 0.000023 175.010024 0.000012 -1.61 ho [div] 99.982925 0.000023 99.982899 0.000013 1.38 /8 /9 [div] 124.984217 0.000023 124.984212 0.000014 0.25 [div] 100.005650 0.000023 100.005632 0.000013 0.93 d 0 u0) / [g/div] 1.00000186 0.00000019 «(0 TAB. h [div] /l4 c h [div] [dlv] 49.974992 0.000023 49.974995 0.000013 -0.19 z u{z) 199.999043 0.000008 199.999043 0.000008 -1.66 PR [kg/m^] 7833.01 0.29 7833.01 0.29 1.66 /l5 H/m'] 7965.76 0.71 7965.76 0.71 -1.66 1.1950 0.0035 1.1946 0.0035 1.66 h h [div] 149.980675 0.000023 149.980672 0.000012 0.14 /i [div] 199.988617 0.000023 199.988620 0.000011 -0.16 /7 [div] 24.996417 0.000023 24.996432 0.000011 -0.77 [div] 150.013558 0.000023 150.013558 0.000013 0.03 In [div] 74.986433 0.000023 74.986450 0.000014 -0.87 In [div] 199.998867 0.000023 199.998851 0.000011 0.78 mi m2 [div] 199.998875 0.000023 199.998851 0.000011 1.19 ma [g] [g] [g] [g] 100.005774 0.000011 50.007963 0.000010 24.978601 0.000010 24.996476 0.000010 /l6 [div] 24.978533 0.000023 , 24.978557 0.000011 -1.17 A [1/div] -4.4E-09 l.OE-09 a P [kg/m«l [div] 125.002083 0.000023 125.002087 0.000014 -0.20 /l2 /l3 [div] 75.004325 0.000023 75.004325 0.000014 0.03 [div] 50.007892 0.000023 50.007881 0.000012 0.54 /l8 77)4 2. Measured and estimated values and associated standard uncertainties. calibration curve of the balance, a second order polynomial through zero is assumed In = f{l + AI^) where / and A are unknown quantities to be determined from the calibration data. In this example, there are m = 23 quantities for which prior information is available from the measurements performed: C,={ms,mR,pR,p,a,Ii,...,Iisf whereas there are fc = 6 quantities for which no prior information is available: /3 = (/,>!, mi,7712, m3,m4)^. 179 Evaluation of Measurements f A mi m2 ms 7X14, TAB. f 1 -0.945 0.021 0.071 0.096 0.096 A -0.945 1 0.124 -0.016 -0.094 -0.094 TTll 0.021 0.124 1 -0.194 -0.269 -0.268 m2 0.071 -0.016 -0.194 1 -0.287 -0.287 ms 0.096 -0.094 -0.269 -0.287 1 -0.287 ni4 0.096 -0.094 -0.268 -0.287 -0.287, 1 3. Correlation coefBcients of the estimated $ values. Between these quantities, there are n — 19 constraints: / (mi+m2+m3+m4)(l-(a-ao)(i-^))-/(/i + ^/?) ^ =0 . f(/3,C) = V run (l - (a - ao) (^ - ^)) - / (^is + Alf,) ms - (mi + m3 + ms +1714) / The measured values z and associated standard uncertainties are given in Table 2 under the row headings z and u{z). All measured values are assumed to be uncorrelated. By solving the normal equations, the estimates C a^nd /3 and associated standard uncertainties given in Table 2 under the row headings C, u{Q, $ and u(/3) are obtained. Selected correlation coefficients derived from D(/3, C)~-^ are given in Table 3. The observed minimum x^ value is x(C) z) = 8.6 which should be compared to the expectation value 1/ = n - fc = 19 - 6 = 13. Since P {x^{13) > 8.6} = 80.3%, it is concluded that the measured values are consistent with the specified constraints taking into account the measurement uncertainties. This conclusion is confirmed by the calculated normahzed deviations given in Table 2 under the row heading d; all normahzed deviations satisfy the criterion \d\ < 2. Prom the estimates of the quantities / and A and the associated covariance matrix, the error of indication E, defined as E = I-lR = I-f{l + AI^), and the associated standard uncertainty u{E) can be calculated as a function of the indication /. The result is shown in Figure 1 as the full lines representing E — u{E), E, and E + u{E). The measured points Ei,i = 1,..., 18 shown in the figure are the observed average balance indications li minus the corresponding reference values IR. The error bars of the measured points indicate the standard uncertainties u{Ei) that have been calculated taking into account the covariance between li and IR. 10 Example: Evaluation of calibration history A weight (named Rlmg) of nominal mass 1 mg has been calibrated 39 times in the period 1992-2001. For calibration number i, the mass rrii of the weight at the time ti and the associated standard uncertainties u{mi) and u{ti) are given. The calibration history of the weight is shown in Figure 2 as dots with error bars indicating the standard 180 Lars Nielsen Balance BP221S 0.00 ^—Fit -—Fit+u(Fil) — Rt-u(Fil) O 100 Measured 150 Balance indication //g FIG. 1. Error of indication of the calibrated balance. uncertainties; the scale mark 1992-01 on the time axis indicates the position of the date 1 January 1992 etc. Due to wear and changes in the amount of dirt adsorbed to the surface, the mass of the weight is expected to change in time. A reasonable model of the change in mass as a function of time is a superposition of a deterministic linear drift and a random variation rrii = ai + a2ti + Srrii , i = l,...,39, where Srrii is a random variable with zero expectation and variance cr^. The drift parameters ai, a2 and the associated covariance matrix as well as the variance a^ are unknown a priori and are to be estimated from the calibration history available. Once the estimates ai and 0,2 have been found, it is possible to predict a value m of the mass of the weight as a function of time t ' 171 = 0,1 + ogi -f- 6m, where Sm = 0 with standard uncertainty u{5m) — a. The standard uncertainty of the predicted mass value is given by u^{m) =u^{ai)-\-t^u^{a2) + 2tu{ai,a2)+(P. The measurement model used for evaluating the calibration history is C, = {mi,...,m3c,,ti,...,tsci,Smi,...,5m3<if , /3 = (ai,a2)^, 181 Evaluation of Measurements R1 mg (Before adjustment of a) 1.0705 f- 1.0703 ^ 1.0701 g" 1.0699 5 1,0697- ""^"^i= 44^ 8 1993-01 1994-01 Result a Ignore ii fU'(37)>66.6| = 0.2% 1992-01 ■ 1995-01 1996-01 1997-01 1996-01 1999-01 2000-01 2001-01 Time of Calibration _it™( 1992-01 1993-01 1 * 1994-01 y^—«•_ 1995-01 1—• ii 1996-01 ♦ ♦* 1997-01 Io ♦ 1 ♦♦ * 1998-01 3-01 1 ♦ 2000-01 * ♦ h«-^ 2001-01 2002-01 Time of Calibration FIG. 2. Evaluation of the calibration history of a 1 mg weight assuming that a = 0. R1 mg (After adjustment of a) 1992-01 1993-01 1994-01 1995-01 6-01 1997-01 2000-01 2001-01 2002-01 Time of Calibration J 4.000 ■5 2.000 ■ ■H • »v—«»-5—I—• ^ * ♦^ 10 O 1 ♦» \ * I* * \ !♦ Deviation | I -4.000 1992-01 1993-01 1994-01 1995-01 1996-01 1997-01 1998-01 1999-01 2000-01 2001-01 2002-01 Time of Calibration 3. Evaluation of the calibration history of the 1 mg weight with a adjusted to 0.092 fig. FIG. ( mi - (ai + a2ti + 5mi) \ = 0. f(AC) \ ^39 - («1 + ^2*39 + ^^39) / The measured values z are given by the calibration history, except for the values of 182 Lars Nielsen Sniiji = 1,..., 39 which are set equal to the expectation vakie zero. The associated covariance matrix u{z, z^) = S is built up from the uncertainties w(m,:) and u{ti) available from the calibration history and a negligible but finite^ initial value of the unknown variance a^. Since the standard uncertainties u{mi) are of the order 0.1 fig, the value (T=lE-07 fig is considered negligible and is selected as a starting point. By solving the normal equations, estimates Oi and 0,2 of the drift parameters and the associated covariance matrix are found after a few iterations. The predicted value m of the mass of the weight and the associated standard uncertainty u{rh) as a fimction of time are shown in Figure 2 as solid lines. The normalized deviations d associated with the mass values rui are shown in Figure 2 as well''. The observed minimum chi-square value is x^ = 66.6 which is large compared to the expectation value i^ = 39 - 2 = 37. Since P{x^(37) > 66.6} = 0.2%, the hypothesis a = 0, or no random variation in the mass, is rejected at a 0.2% level of significance. The value of a is therefore increased as described in Section 8 until the calculated minimum x^ value becomes equal to its expectation value u = 37. In this way the standard uncertainty reflecting the random variation of the mass of the weight is found to be 17=0.092 fig. The result of the evaluation of the calibration history after adjustment of cr is shown in Figure 3. Note the significant increase in the standard uncertainty of the predicted value of the mass of the weight and the decrease in the absolute value of the normalized deviations d. The calibration history can also be evaluated by an iterative technique based on linear regression [3]. The results obtained are identical to the results presented in this section. 11 Case I: Univariate output quantity, Y = h{Xi,..., X^) In this section it is shown that the evaluation of measurements by the method of least squares is consistent with the generally accepted principles for evaluating measurement uncertainty as described in the GUM [1]. Using the nomenclature of the GUM, a univariate output quantity Y is assumed to be related to N input quantities Xj,.. .,XN through a specified function h, The values assigned to the input and output quantities are denoted xi,.. .,XN and by 2/ respectively. In the nomenclature of this paper, the measurement model is C = (Xi,...,X;vf , /3 = (n f(AC) = (r-M^i,---,^w)) = o. The measured values are z = (.Ti,...,a;iv)^ ^If the variance cr'^ is assumed to be exactly zero, the quantities Smi have to be removed from the model. Otherwise the covariance matrix S will be singular. ^The absolute value of normalized deviations of tj and Sm-i is equal to the absolute value of the normalized deviation of mj. ; 183 Evaluation of Measurements with the known covariance matrix u^{xi) U{XI,XN) u{xN,Xi) V?{XN) S=W(Z,Z^):= The coefficient matrix D of the normal equations is / D(/3,C)= 0 0(1'^) OWD \ 1 where Vx/l^ 1 S-i . -Vx/i(xf -Vx/i(x) 0 dh dh axi' ■ ■ ■' dXN In the present case, the solution to the normal equations is found after one iteration, y = $ = h{xi,...,XN) , C = (a;i,...,a;N)^, A = 0. The associated covariances are given by ^Hy) u{y,f) .■^ ^T, u{C,y) u{CCl ()(1.1) Q(1,N) if''^ \ O^^''^ = D(/3,C)-^ _^2(;^) J Vx/i(x) S Vx/i(x)^ S Vx/i(x)^ 1 Vx/i(x)S S 0 1 0 0 In other words. JV N w^(y) = VxMx)SVx/i(x)^ = ^^Ciu(a;i,a;j)cj, dh Cj = ^ (a;,) which is identical to the linear variance propagation formula given in the GUM. 12 Case II: Linear regression, Y = Xa Linear regression is applied when there is a linear relationship Y = Xa between some observed quantities Y and some unknown quantities a. The design matrix X is made up of known elements that may be given as specified functions of one or several independent variables. In the notation of this paper, the measurement model for the linear regression problem is C = Y = {Yi,...,Ynf , /3 = a=(ai,...,afcf, . f(C,/3) = Y-Xa = 0, where X^"'*^^ is the known design matrix. The measured values are z = y = {yi,---,ynf 184 Lars Nielsen with known covariance matrix / u^y^) i; = u(z,z^) = : ••• \ u{yn,yi) ■■• The coefficient matrix D of the normal equations is (0(*^'*^) Q(n,k) of'''") -X^ 5.-1 i(n,n) Again, the solution to the normal equations is found after one iteration, a = 3 = CX^S-V , Y = C = Xa , X^-l^-^Y-y) where C = (X^S'^X)"^. The associated covariances are given by ^ / u(a,a^) u(Y,a^) V Q(n,fc) u(a,Y^) u(Y,Y^) Q(n,n) Q''^'") \ ()("•"' _u(A, A^) / D(3,c)-^ c cx'' -cx^s-^ XC XCX^ I-XCX^E V -s-^xc i-s^^xcx^ s-'xcx'^E^^-s-^ that is, a = CX^S-V , u(a,a^) = C = (X^S-^X)-! as is known from the theory of linear regression. 13 Case III: Repeated observations of a single quantity Assume that a quantity X is measured n times with the same uncertainty a. Such a measurement can be modelled by n quantities Xi,..., X„ having a common value /i C = X = (Xi,...,X„f mc)= I ; , /3=(/x), I =0. The measured values are Z = X = \Xi, . . . , Xji) , and under the assumption that the measurement results are mutually independent, the associated covariance matrix is given by i; = u(z,z^) = / (T2 ... 0 i V 0 ■•. ■.. ; a^ Evaluation of Measurements 185 The coefficient matrix D of the normal equations is (0(1,1) 0(n,l) 0(1'") f^-2i(n,n) -1(1.") i{n,n) l(n,™) 0^"'") _l(n.l) where 1 denotes a matrix with all elements equal to 1. The solution of the normal equations is found after one iteration, n n •^—' The associated covariances are given by = D(/3,C)-V u(x,A) ^(x,x^) ()("■") Q(„,1) Q(n,n) -u{X,X^) I (T^n-i (T^n"!!'"'!) ^-lj^(n,l) c72n-il(i'") Cr^n~ll^"'") l(n,n) _yj-ll(n,n) n-il(i'") l(n,n) _ ^-lj_(n,n) a-~^(ri'"ll("'"' - l("'")) As expected, 1 /i=-Va;i %=\ , u^ {ft) =0^/71. If a^ is not known a priori, it can be estimated by solving the equation i.e., 1 1 " " \-t^i — i=\ which is the well known expression for the experimental standard deviation s. 14 Gonclusion A general technique for evaluation of measurements by the method of Least Squares has been presented. The applicability of the method has been demonstrated by two examples. It has been shown that the method is fully compatible with the generally accepted principles for evaluation of measurement uncertainty laid down in the GUM and that ordinary linear regression is just a special case of the method. The input to the method consists of • An estimate of the value of each measured quantity, including any relevant influence quantity. • The covariance matrix of these estimates formed by the standard uncertainties of the estimates and the correlation coefficients between the estimates. 186 Lars Nielsen • A measurement model describing all the known relations between the measured quantities and some additional quantities (if needed) for which no prior information is available. The output of the method consists of • An adjusted estimate of the value of each measured quantity and an estimate of each additional quantity introduced in the measurement model. • The covariance matrix of all these estimates from which the standard uncertainties and correlation coefficients can be calculated. • A chi-square value which is a measure of the degree of consistency between the measurement model, the input estimates, and the covariances of the input quantities. The adjusted estimate of the value of a measured quantity differs from the input estimate only if the measurement model imposes additional information regarding the value of that particular quantity. In that case the standard uncertainty of the adjusted estimate will be smaller than the standard uncertainty of the input estimate. For a good measurement, the difference between the adjusted estimate and the input estimate of a measured quantity should not be large compared to the standard uncertainty of that difference. It has therefore been suggested that the ratio d of the difference to its standard uncertainty is calculated and assessed against a selected criterion, e.g. |d| < 2. By plotting the d values of the adjusted estimates it is possible to assess whether a too high chi-square value is caused by a few poor input estimates or is due to a poor model. Bibliography 1. BIPM, lEC, IFCC, ISO, lUPAC, lUPAP, OIML, Guide to the expression of uncertainty in measurement, ISO, 1995. 2. L. Nielsen, Least-squares estimation using Lagrange multipliers, Metrologia 35 (1998), 115-118. Erratum, Metrologia 37 (2000), 183. 3. L. Nielsen, Evaluation of the calibration history of a measurement standard, DFM report DFM-01-R25, 2001, 1-6. 4. W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery, Numerical Recipies in C, 2nd ed., Cambridge, Cambridge University Press, 1992, 36-40 and 681-688. 5. T. L. Saaty and J. Bram, Nonlinear Mathematics, Dover Publications, New York, 1981,93-95. 6. K. Weise and W. Woger, Uncertainty and Measurement Data Evaluation, WileyVCH, 1999, 183-224 [in German]. An overview of the relationship between approximation theory and filtration Paul J. Scott Taylor Hobson Limited, Leicester, UK. PScottStaylor-hobson.com Xiang Q. Jiang University of Huddersfield, Huddersfield, UK. x.jiang@hud.ac.uk Liam A. Blunt University of Huddersfield, Huddersfield, UK. l.a.blunt@hud.ac.uk Abstract This paper gives an overview of the similarities and differences between the requirements and techniques used in mathematical approximation theory and filtration in surface metrology. Although the two fields tend to use the same or similar mathematical objects to produce functions that simplify a function in a controlled manner, it is the way that this simplification is achieved which is the main difference between the two. Approximation theory uses norms to judge the closeness of the approximation while filtration uses the concept of wavelength to control the "smoothness" of the result of filtration. The new ISO definition of a filter is stated, together with a generalisation of the concept of wavelength through "brickwall" filters. This new ISO definition of a filter illustrates the closeness of approximation theory and filtration. The paper then proceeds to survey some recent developments in filtration in the hope that there can be some cross-fertilisation between approximation theory and filtration. These include wavelets, robust filters and non-Unear filters such as the family of morphological filters, which includes envelope filters and alternating sequence filters (non-linear multiresolution). Examples from surface texture are used throughout the paper. 1 Introduction This paper gives an overview of the similarities and differences between the requirements and techniques used in mathematical approximation theory and filtration in surface metrology. It is not the intention of this paper to give full mathematical detail but to survey recent developments in filtration in the hope that there can be some crossfertilisation between approximation theory and filtration. Although the two fields tend to use the same or similar mathematical objects to produce functions that simplify the original function in a controlled manner, it is the 188 Overview of approximation theory and filtration way that this simpUfication is achieved which is the main difference between the two. Mathematical approximation theory is concerned with best and good approximation of a large family of functions from a smaller set (usually finitely generated, linear or non-hnear) in certain normed spaces (such as Lp), the construction of good approximants (if possible) and the determination of approximation order. Classical tools to achieve this include polynomial tools and splines. More recent tools include wavelets and multiresolution that decompose the normed spaces. Filtration uses the concept of "wavelength" to control the "smoothness" of the result of filtration. In surface metrology, filtration is concerned with the extraction of features within a prescribed "wavelength" band defined by "wavelength cut-offs". Classical tools to achieve this include Gaussian filters [1], polynomials and splines [4]. Recently there has been a resurgence of activity, both fundamentally and practically, in filtration for surface metrology. The International Standards Organisation Technical Committee 213 (ISO TC/213), whose remit includes surface metrology, has recently set up an Advisory Group (AG9) to explore filtration for surface metrology. They are producing a series of technical specifications (ISO/TS 16610 series [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) to standardise filter terminology and to introduce to industry other filtration tools, which include spline wavelets [5], morphological filters [9] and scale-space techniques [10]. Other groups are also producing filtration for surface metrology. The University of Huddersfield has used second generation wavelets to produce an improved spline wavelet [12]. The University of Hanover is exploring robust Gaussian filtration [6]. PTB has developed a Robust Spline filter [7]. The rest of the paper surveys some of the results of this recent activity. 2 Basic concepts of filtration This section is a summary of the basic concepts of filtration as given in ISO/TS 16610 part 1 [2]. Let V be the space of real surfaces. Let Vx be a set of nested subspaces indexed by A € 7^^+ (here 7?.+ is the set of positive reals which includes zero) such that yX> n>0;VxCVf,CV and Vo is dense on 'P. The nesting index A a number indicating the relative level of nesting for a particular subspace in such a way that given a particular nesting index, subspaces with lower indices contain more surface information and subspaces with higher nesting indices contain less surface information. By convention, as the nesting index approaches zero there exists a surface in that indexed subspace that approximates the real surface to within any given measure of closeness as defined by a suitable norm. Thus approximation theory is used to define Filtration. The usual norm used in filtration is L2 but others are used such as the one-sided Chebychev for morphological filters. Let $A : 'P —> VA be a projection from the space of real surfaces onto the subspace indexed by A > 0 which satisfies the following two properties. 189 190 P. J. Scott, X. Q. Jiang, and L. A. Blunt • The sieve criterion: VA,/x > 0 and Va e V; $A($/i(«)) = ^sup(A,A«)(^)• The projection criterion: VA > 0 and Va e VA; ^x{a) = a. $A is called the brickwall filter (or primary mapping) and is a method of choosing a particular surface belonging to a subspace with a specified nesting index, to represent the real surface, which satisfies the projection and sieve criteria [16]. The sieve criterion allows brickwall filters to have the property that once the surface has been brickwall filtered at a particular nesting index, subsequent brickwall filtering with a higher nesting index will produce the same surface as brickwall filtering the original surface with the brickwall filter with the higher nesting index. The projection criterion is required in order that the nesting index is a scale or size. For define the set operator ^x :V —fV as VA > 0 and VP C 7?;'I'A(P) := {p • P e P and $A(P) == p}. That is to say p 6 *A(P) if and only ii p e P and $A(P) = P- Then it is easily demonstrated that the set operator 9\ is a granulometry [16] on V and A is the scale/size of the granulometry. Since the nesting index of brickwall filters is a scale/size and it satisfies the sieve criterion, it can be used to define the generalised concept of wavelength. An example of a brickwall filter is a morphological closing filter using a sphere as the structuring element. Here the nesting index is the radius of the sphere. Other filters can be constructed using brickwall filters (e.g. weighted mean of brickwall filters, supremum of brickwall filters, etc.). 3 Wavelet filters An important example of the concepts discussed in the previous section is wavelet filtration. The multiresolution form of the wavelet transform consists of constructing a ladder of smooth approximations to the profile. The first rung is the original profile. Each rung in the ladder consists of a filter bank where the profile Ai is split into two components giving, a smoother version ^Ij+i of the profile which becomes the next rung and a component JDj+i that is the "difference" between the two rungs. The multiresolution ladder structure lends itself naturally to a set of nested mathematical models of the profile, with the ith model m,, reconstructed from (Dj, D2, D3, ..., Di, Ai). The nesting index is the order of the model, the higher the model the smoother the representation with less detail. Thus m,_i_i is a smoother version of the profile than rui . As part of a research programme at the University of Huddersfield, the use of biorthogonal wavelets for surface analysis has been investigated because of their significant merits [12]. A very fast, second-generation, in-place algorithm, which uses the lifting scheme, has been developed at Bell Laboratories for biorthogonal wavelets [13]. One important property of biorthogonal wavelets is that they allow the construction of symmetric wavelets and thus linear phase filters that preserves the location of surface features with far less distortion than phase shift filters. Overview of approximation theory and filtration Surface texture analysis usually breaks down a surface into defined wavelength components of the surface called roughness, waviness and form. There are many well-known problems with the current standardised filter [14], i.e. Gaussian filter [1], including lost data at the edges, distortion due to form, retention of unwanted wavelengths, etc., Huddersfield has investigated the possibility of using a 'lifting wavelet' model to overcome some of these problems and enhance the extraction accuracy for roughness, waviness and form. This is achieved by using the wavelet transform to break down the surface into subsets at different scales and recombining only those subsets of the scales of interest (i.e. setting all the other subsets to zero and applying the inverse wavelet transform). Figure 1 shows the application of the wavelet filtering technique a femoral head from an artificial hip joint. Full details of the particular biorthogonal wavelet and its associated lifting scheme together with some engineering apphcations are given in reference [12]. FIG. 4 1. Metallic femoral head showing original, reference and roughness surfaces. Envelope filters Traditional linear filters, such as the Gaussian filter [1], produce a smoothed mean surface through a measured surface. Many engineering apphcations of functional surfaces involve mechanical contact where the envelope of the surface is of interest rather than the mean surface. But what exactly is the envelope of a surface? The fohowing are defining properties of the envelope of a surface used by ISO TC/213 AG9 [8]: • the envelope filter must be Extensive, i.e., VA,JF(A) > J4, • the envelope filter must be Increasing, i.e., A< B implies F{A) < F{B), • the envelope filter must be Idempotent, i.e., F{F{A)) = F{A), where A, B are surfaces and F(A) is the filtered surface of surface A. But these are also the defining properties of a morphological closing filter [15]; hence all envelope filters are morphological closing filters. A morphological closing filter using a disk as the structuring element is illustrated in Figure 2. Unfortunately, envelope filters, by definition, are not very robust to outliers, consisting of large spikes, in the surface. Scale-space is an attempt to overcome this problem with the morphological closing filter. 191 192 P. J. Scott, X. Q. Jiang, and L. A. Blunt FIG. 5 2. An envelope filter using a closing filter with a disk as a structural element. Scale-space Scale-space is a way of breaking down a signal or image into objects of different scales. To define scale-space we need to define the size of objects in a signal or image. This is achieved using Alternating Sequence Filters [10]. Alternating Sequence Filters (ASFs) are defined in terms of matched pairs of closing and opening filters. A closing followed by an opening both at a given scale (radius of the circle, length of the horizontal segment, etc.) will eliminate features of the surface whose "scales" are smaller than the given scale. ASFs begin by eliminating very small features, then eliminating slightly larger features, and then eliminating slightly larger features still etc., in a systematic way up to a given scale. Usually there is a constant ratio between successive scales. This process produces a ladder structure similar to wavelet analysis. At each rung in the ladder the profile is filtered by a matched pair of closing and opening filters at a given scale to obtain the next rung profile and a component that is the "difference" between the two rungs. The ladder structure leads to a multiresolution analysis, similar to wavelet analysis, with all of the associated analysis techniques. An example of scale space of a profile from a ceramic surface is given in Figure 3. The top part of this figure shows the original non-smoothed profile with the final smoothed profile. 6 Robustness Robustness of filtration is an increasingly important area of interest in surface metrology. Robustness is not in general an absolute property of a filter but a relative one. One can only say that a particular filter is more robust than an alternative filter against a particular phenomenon if there is less distortion in that filter's response to that phenomenon than in the alternative filter's response. To make robustness an absolute property of filters we need to define a reference class Overview of approximation theory and filtration Alternating Sequence Filter ^^^/v^^'^^-'Vlp'^'''^^ c E £ .3 0.01 <l> I -0.005 FIG. 0.2 0.1 o.a 1 Spacing mm 1.2 3. Successively smooth profiles of a ceramic profile using an ASF with a disk. of profile filters with which to compare. The reference class of filters defined in ISO TC/213 AG9 is the class of linear filters [3]. Hence by this definition all robust filters must be non-linear. There are several well-known techniques (all non-linear) which can produce robust filters for a particular phenomenon. These are indicated in the next sections. 6.1 Metric based Here the metric used to fit the filter to the surface is altered to a more "robust" metric. For example the metric based on the Li norm is more robust against spike discontinuities than the metric based on the least square norm {L2 norm), which in turn is more robust then the metric based on the Chebychev norm (Loo norm). The Robust Sphne Filter given in ISO/TS 16610 part 32 uses an Li metric rather than the usual L2 norm to make it more robust [7]. 6.2 Robust statistics Here each point on the surface is weighted according to its relative height position to the filter's smooth response, with points further away being given less influence on the filter response than points nearer in height. This is an attempt to make the filter more robust against spike discontinuities. There are several standard functions used to allocate the weights to points (Huber, Beaton functions, etc.) which can be found in any standard book on robust statistics [17]. 193 194 P. J. Scott, X. Q. Jiang, and L. A. Blunt The Robust Gaussian regression filter given in ISO/TS 16610 part 31 uses a Beaton function to alter the influence of outliers [6]. 6.3 Pre-filtering Pre-filtering is a technique where a phenomenon (such as spikes, form, etc.) in the surface are removed or greatly reduced, by other means, before filtration, thus removing or greatly reducing any effect the phenomenon can have on the filter's response. This approach has the advantage that once a method has been found to remove unwanted phenomenon then this method will work with any filter. Form pre-filtering, involving removing the form of the surface before filtration, is a very common technique used in surface metrology. Less common is using scale space pre-filtering which involves removing singularities and other features of a certain size before filtration. 7 Conclusions The paper has given an overview of the similarities and differences between the requirements and techniques used in mathematical approximation theory and filtration in surface metrology. Some recent work on filtration has been reported. It is hoped that this paper can generate some cross-fertilisation between the two areas of approximation theory and filtration. Bibliography 1. ISO 11562 1996. Geometrical product specifications (GPS)- Surface texture: Profile method -Metrological characteristics of phase correct filters. 2. ISO/TS 16610-1. Geometrical product specifications (GPS) — Filtration Part 1: Overview and basic terminology. 3. ISO/TS 16610-20. Geometrical product specifications (GPS) ^ Filtration Part 20: Linear profile filters; Basic concept. 4. ISO/TS 16610-22. Geometrical product specifications (GPS) — Filtration Part 22: Linear profile filters; SpHne filters. 5. ISO/TS 16610-29. Geometrical product specifications (GPS) — Filtration Part 29: Linear profile filters; Spline wavelets. 6. ISO/TS 16610-31. Geometrical product specifications (GPS) — Filtration Part 31: Robust profile filters; Gaussian regression filters. 7. ISO/TS 16610-32. Geometrical product specifications (GPS) — Filtration Part 32: Robust profile filters; Spline filters. 8. ISO/TS 16610-40. Geometrical product specifications (GPS) — Filtration Part 40: Morphological profile filters; Basic concepts. 9. ISO/TS 16610-41. Geometrical product specifications (GPS) — Filtration Part 41: Morphological profile filters; Disk and horizontal line segment filters. 10. ISO/TS 16610-49. Geometrical product specifications (GPS) — Filtration Part 49: Morphological profile filters; Scale Space Techniques. Overview of approximation theory and filtration 11. ISO/TS 16610-60. Geometrical product specifications (GPS) — Filtration Part 60: Linear areal filters; Basic concepts. 12. X. Q. Jiang, L. A. Blunt and K. J. Stout. Development of a lifting wavelet representation for surface characterization, Proc. R. Soc. Lond. A 456 (2000), 2283-2313. 13. W. Sweldens. The lifting scheme: A construction of second generation wavelets, SIAM J. Math. Anal, 29 (1997), No. 2, 511-546. 14. X. Q. Xiang, L. A. Blunt and K. J. Stout. Application of the fifting wavelet to rough surfaces. Precision Engineering 25 (2001), 83-89. 15. J. Serra. Image Analysis and Mathematical Morphology Vol. 1, Academic Press, New York, 1982. 16. G. Mathron. Random Sets and Integral Geometry, John Wiley & Sons, New York, 1976. 17. P. J. Huber. Robust Statistics, John Wiley & Sons, New York, 1981. 195 Chapter 4 Radial Basis Functions 197 Preceding Page Blank Applications of radial basis functions: Sobolev-orthogonal functions, radial basis functions and spectral methods M.D. Buhmann Mathematisches Institut, Justus-Liebig University, 35392 Giessen, Germany buhmann@uiii-giesseii.de A. Iserles DAMTP, University of Cambridge, Silver Street, Cambridge, CBS 9EW, UK ai@amtp.cam.ac.uk S.P. N0rsett Department of Mathematics, Norwegian University of Science and Technology, Trondheim, Norway norsett@math.ntnu.no Abstract In this paper we consider an application of Sobolev-orthogonal functions and radial basis function to the numerical solution of partial differential equations. We develop the fundamentals of a spectral method, present examples via reaction-diffusion partial differential equations and discuss briefly some links with theory of wavelets. 1 Introduction Radial basis functions are a well-known and useful tool for functional approximation in one or more dimensions. The general form of approximations is always a linear combination (finite or infinite) number of shifts of a single function, the radial basis function. In more than one dimension, this function is made rotationally invariant by composing a univariate function, usually called (f>, with the Euclidean norm. In one dimension such approximation usually simplifies to univariate polynomial splines. For a recent review of radial basis function approximations, see [5]. This note is about applications for radial basis functions and other approximation schemes such as Sobolev-orthogonal polynomials and more general Sobolev-orthogonal functions to the numerical solution of partial differential equations. The basic ideas stem from the theory of Sobolev-orthogonal polynomials ([13]), and in this paper there is a remarkable connection developed between applications of Sobolev-orthogonality with radial basis functions (e.g. [5]), and wavelets are mentioned as well (e.g. [8, 9]). Sobolev198 Sobolev-orthogonal functions, radial basis functions 199 orthogonal polynomials are a device to extend the standard theory of orthogonal polynomials (see, for instance, [12]) by requiring orthogonality with respect to non-selfadjoint inner products of the form {f,g)x= [ fix)9ix)dx + X f f'{x)g'{x)dx for a positive parameter A and a suitable interval (a, 6), a,b € HU {±00}. The da; in the two integrals is often replaced by more general Borel measures, dip, say. The scheme which we want to discuss in this short article is one of spectral type: in lieu of e.g. finite element spaces as underlying piecewise polynomial approximation spaces for the solution, we take purpose-build approximations which make the linear systems which we need to solve particularly simple, sometimes even diagonal. Therefore, in the first instance, we develop a theory of applying Sobolev-orthogonal polynomial basis functions for the numerical solution of partial differential equations via a spectral method. Then we extend this idea to general classes of radial basis functiontype methods, where shift-invariant approximation spaces are generated with Sobolevorthogonal basis functions. Due to the introductory character of this paper, our discussion is restricted to relatively simple cases. Our presentation is illustrated with the one-dimensional reaction-diffusion partial differential equation. This is the place to note that radial basis functions have found a number of other applications in the discretisation of PDEs. Thus, for example, DriscoU and Fornberg [10] have used fast-converging 'flat' multiquadrics in pseudospectral methods, while Frank and Reich [11] applied radial basis functions with particle methods in order to conserve enstrophy in the solution of certain shallow-water equations. Our application is of an altogether difl:erent nature. 1.1 Examples of PDEs and Sobolev-orthogonality Consider the partial differential equation J^=y(^aWu) + bu + c, / (1.1) where u = w(x, t) is of sufficient smoothness with respect to x and f, x is given in a cube V C M'^ (more generally, in a finite domain), i > 0, a = a(x) > 0, 6 = &(x) and c = c(x). We impose zero Dirichlet boundary conditions. The stipulation of cube as a domain and zero Dirichlet conditions is unduly restrictive, but it will suSice for the short presentation in this paper and adequately illustrate the main novel concepts in our presentation. In the next section, we shall also introduce a nonlinearity into the underlying PDE. We wish to approximate the solution «(x, t) as a finite linear combination of the generic form m ' u{yi,t) = '^ai{-K)wi(t), where t is nonnegatiye and x resides in the domain. In the sequel we shall also use expansions into infinite series with I e 2Z. Thus, a Galerkin ansatz (in the usual L2 inner product on M'' which we denote by (•, •) in contrast to the specialised Sobolev-inner 200 , M. Buhmann, A. Iserles, and S. P. N0rseU product (•,•)A above) gives m m m '^{ai,ak)w'i=^{W{aVai),ak)wi + ^{bai,ak)wi + {c,ai,), (=1 1=1 fc = l,2,...,m. 1=1 Integration by parts in the second term above and substitution of the requisite zero boundary conditions yield the alternative formulation m m m .'^{ai,ak)w'i = -Y^{dVai,Vak)wi + '^{bai,ak)wi + {c,ak), 1=1 1=1 k = l,2,...,m. 1=1 (1.2) We solve the ODE system (1.2) with respect to t, for example with the backward Euler scheme (we use backward Euler for the sake of simplicity, but it should be noted that the same analysis applies to any implicit multistep method, because our use of Sobolevorthogonality is only linked to the implicitness of the solution method) t«f+i=< + AiF,(w"+i), n6K+, / = l,2,...,m, (1.3) where the function F; is given implicitly by the equations (1.2) and where w""*"^ in the expression above is the vector with components w^'^^, I = 1,2,..., m. Let us now multiply expression (1.3) by {ai,ak) and sum up for Z = 1,2,... ,m. Then, exploiting (1.2), a little algebra yields ^ I /" [1 - Atfe(x)] Q((x)aA:(x)dx + At f a(x)V'^Q;(x)Vafc(x)dx| <+^ > y; / a,(x)afc(x)dx<+ / c(x)afc(x)dx. (1.4) The connection with Sobolev-inner products is clear. Indeed, let us now choose the set Wm,„ := {wi,'W2, ■■■, Wm} as a set of functions that are orthogonal with respect to the O homogeneous Sobolev Hd,2 inner product (see, e.g., [13]) (/, 9)At :=/[!- Ai6(x)]/(x)5(x)dx + At f a(x)V^/(x)Vff(x)dx (1.5) Jv Jv (this of course requires that Atb{x) < 1, hence may restrict in a minor way the choice of the time step At). Further below we shall also use infinite sets W instead of the finite set Wm,n- It is important to note that in general the Sobolev inner-product depends upon the step size. Subject to this formulation, the linear system (1.4) diagonalises and its numerical solution becomes trivial. We turn now to a more elaborate example in the next subsection, namely the reaction-diffusion equation. 1.2 Reaction-difFusion as a paradigm for nonlinear PDEs Let us consider the nonlinear partial differential equation ^ = V(aVu)+ /(«), (1.6) Sobolev-orthogonal functions, radial basis functions 201 where otherwise all the quantities are as in (1.1), including the boundary conditions. Suppose that an approximation u" to u{x., nAt) is available at all the spatial grid points. We commence by interpolating u" to requisite precision by some function v. Thus, v is defined throughout the cube V and coincides with «" at the grid points. This allows us to linearise the source function / about u", the outcome being dt V{aVu) + c + bu + g{u), (1.7) where &(x) = /'(Kx)), C(X) g{x,u) = fiv{K))-f'{v{K))v{x), = f{u)-f{v{x))-f'{v{x))[u-v{x)]. Note that g{x,u)^0{\u-vf). We can now solve the nonlinear system (1.7) by functional iteration, i.e. by letting as a start <+i'''=«;r, j = l,2,...,m, and recurring, employing the inner product (1.5), 771 J2{ai,ak)Atw'^+'''+' (1.8) 1=1 = '^{ai,ak)wT+i9i-,J2°^^'^'^^^'^]''^'']' k = l,2,...,m, for j e ZZ+. If, as in the previous subsection, we choose Wm so as to diagonalise the linear system, each step of (1.8) becomes relatively cheap. Hence this approach might offer a realistic means to derive spectral approximation to nonlinear PDEs. Indeed, a special one-dimensional case can be treated straightforwardly and it is presented in the sequel. 1.3 The one-dimensional case using polynomial splines Let (1.1) be given in one space dimension and without source terms, whence it becomes the familiar diffusion equation with variable diffusion coefficient, du _ d / du dt dx \ dx Thus, provided that 0 < a; < 1 and t nonnegative, we require the 'usual' Sobolev orthogonality [13] with respect to the inner product {f,9)At = {f,g)i= f f{x)g{x)dip{x)+ f f'{x)g'{x)d^{.x), Jo Jo M. Buhmann, A. Iserles, and S. P. N0rsett 202 where dip{x) difi{x) = Ata. = 1 - Atb, d.T " "' dx We emphasise again the dependence of the Sobolev-inner product on the step size. Taking the approach of the previous subsection as our point of departure, an obvious option is to use Sobolev-orthogonal polynomials. An alternative approach which can be worked out explicitly and which we wish to demonstrate in this subsection, is to use univariate polynomial spline approximations. It ha.s the advantage of being more amenable to a generalisation to several space dimensions. We suppose that the unit-interval [0,1] is divided into N intervals of length h := jj and consider a piecewise-quadratic basis of continuous functions si, S2,..., Siv such that ' j^[x-{l-l)h] + ai{x-lh)[x-{l-l)h], {l-l)h<x<lh, siix)~l {[il + l)h-x] + f3i{x-lh)[x-il + l)h], lh<x<{l + l)h, ^ 0, \x- lh\ > h. Clearly, si is a continuous, C[0,1] cardinal function of Lagrange interpolation at the knots (hence, a quadratic spline with double knots, cf., Powell [16], the added degree of freedom taken up by the requirement of Sobolev-orthogonality). Next, we need just to impose Sobolev orthogonality, and solve for the coefficients ai and 0i. This is equivalent to the requirement that (s/,s;+i)At=0, ; = l,2,...,iV-l. In the special case a{x) = 1, b{x),c{x) = 0, we have (p{x) = x, ip{x) = Ate and (s;,Si+i)At L ^h + ai+i(x-h)x + Ai h—X + f3ix{x - h) dx L U + 2ai+ix-ai+ih ■l>4 + a,+i/i2(C-l)C -i + 2A.T-A/!ld.T • l-e-A/i^CCl-O dC 'i!> + 2a,+ie-am)(l + A-2/3/OdC h Let ^l = At/h^ be the Courant number. Since we have two degrees of freedom for each / and because each equation is otherweise independent of /, we may fix a = a; = /?;. Then, letting a := h^a, requiring (S;,S;+I)A« = 0 is equivalent to 5 - 5a + d^ + lOjua^ - 30/i = 0 or (10/i + h'^)a'^ - bh'^a + 5 - 30/x = 0. (1.9) Soholev-orthogonal functions, radial basis functions 203 We wish to solve this quadratic equation for a for a suitable range of Courant numbers. Indeed, the equation (1.9) has two real solutions a for every /i > g if /i is small enough, since its discriminant is (120/x + 5)/i^ + 1200//2 - 200/i. In the case M = e ^^^^ ^i reduces, upon the choice of d = 0, to a chapeau function. Otherwise we obtain a = 0(1). We may give up a small support, characteristic of spline functions (which, anyway, is of marginal importance, since we do not solve linear systems!). This is a case discussed in the next section. Another obvious alternative is to construct an orthogonal basis from chapeau functions. This, however, is easily seen to be identical to the LU factorization of the standard FEM matrix 0 0 1 6 0 0 0 2 0 0 0 1 2 6 f 0 0 6 0 Applications of radial basis functions and wavelets 2.1 Sobolev-orthogonal translates of a radial basis function In this section, we wish to develop a more general approach employing the concepts of wavelets and radial basis functions and employ shift-invariant spaces of approximations for our spectral methods. We begin by giving up the compactness of the domain V and work on the entire real line instead. For this, we shall demonstrate the use of Sobolev-inner products and shift-invariant spaces and concentrate solely on this part of the analysis in the present article. So, in particular, the set W above is of the form {^(' — nh) I n 6 K}. In the sequel we shall add several remarks about how to find compactly-supported ^ that allow the treatment of partial differential equations on compact domains. We remark that n is no longer used for the time-steps in the differential equation solver but for the shifts of the radial functions. To start with, we wish to find a function cj) € H^(IR), where H^(IR) is a nonhomogeneous Sobolev space, such that for a positive constant A and positive spacing h it is true that /oo /»oo ^{x)<p{x - hn) dx + \ -oo (j)'{x)(f)'{x - hn) dx = Son, nGTL. (2.1) J — oo We multiply both left- and right-hand-side of the general pattern (2.1) by exp{i6n) and sum over n G ZZ, Y^ exp{i0n) < / n=-oo W-cxD 4>{x)^{x -hn)dx + X / J-oo 4>'{x)(j)'{x -hn)dx\ =1, 6 e [-TT, TT]. J (2.2) 204 M. Buhmann, A. Iserles, and S. P. N0rsett In order to be able to exchange summation and integration and apply the Poisson summation formula (Stein and Weiss [17], p. 252) we make a number of assumptions. The version of the Poisson summation formula that we wish to use states that for a univariate function / with \f{x)\ = 0({\ + \x\)-'-^) |/(a;)r=0((l + M)-i-) and positive e, the following equality holds (note that the first bound in the above implies existence and continuity of the one-dimensional Fourier transform) oo oo j=—oo j=—oo Specifically, we assume that the following three decay estimates hold: \cl>{x)\<c{l + \x\)-'-', \<l>'{x)\<c{l + \x\)-'-% ■ and where c is a generic positive constant, e > 0, 4> denotes the Fourier transform and we demand the faster rate of decay in the last display because we shall later require summability of translates of the Fourier transform multiplied by the square of its argument. Note in particular that the first decay condition renders the Fourier transform (j) continuous and well defined. ' An example for a function (j) that satisfies the three decay conditions above is the second divided difference of the multiquadric radial basis function [4] \/r^~+C^ that is k^) = \\/{^-1)' + c^ - V^' 'rC'^^\^|{x^ \y + C2. Here, C is a positive constant parameter. The above function decays cubically [4] and its Fourier transform even decays exponentially due to the exponential decay of the modified Bessel function Kx [1] that features in the generalised Fourier transform of the multiquadric, here stated only in the one-dimensional case, (cf. Jones [14]). Once summation and integration are interchanged, (2.2) becomes /C)0 °o <^{x) ^ exp(i^n)(/)(a; - hn) dx Sobolev-orthogonal functions, radial basis functions /oo 205 oo 4>'{x) ^ exp{ien)(p'{x-hn) dx = l, -°° ^G[-7r,7r], (2.3) n=-oo or, applying the Poisson Summation Formula (Stein and Weiss, [17], p. 252) ^{x) J] exp(i/i-^a;(6' + 27rn))(Af/i~^(6' + 27m))da; + iA/i-^ •°o (2.4) n=-oo (})'{x) ^ expfi/i"^a:(6' + 27rn)j(6l + 27rn)^f/i"^(6i + 27rn)jda; = /i, n=-oo ■<» where 6 € [-TT, TT]. Because (f) vanishes at infinity, integration by parts of the second term of.(2.4) gives /oo oo <j){x) Y^ exp{ih-''^x{e + 2Trn)U{h~\0 + 2Tm)]dx "°° n—— OO \ /•OO + T^ / Qo '/'(^) Yl exp(i/i-^a;(6' + 27rn))(6l + 2Tmf4>{h-^{e + 2TTn))dx "^^ n= —OO OO = Y, ^{h~'^{0 + 2m))^(-h-^{6 + 2-Kn)\ I + Xh-'^{9 + 2^)^ = h. n=—OO Since (j) is real, </>(—^) = </'(^)5 and this implies OO /^ |0(/i-i(6l + 27rn))|2(l + A/i-2(6» + 27rn)2)=/i, 6ie[-7r,7r]. (2.5) n=:—oo This is our condition that leads to the required Sobolev-orthogonality. In summary, we have established the following theorem. Theorem 2.1 If the decay conditions on <p, as stated above, hold in tandem with the expression (2.5), then the required orthogonality condition (2.1) is satisfied. We note that, if we are given a tp such that oo i2 Y |i/'(/i"H^ + 27rn))| =/i, ^e[-7r,7r], ' (2.6) n=—OO then yrTA|2 ^ ^ satisfies (2.5). This expression can be used to derive an explicit transformation which takes a tp that satisfies (2.6), into a (f> satisfying (2.5), although its practical computation may be nontrivial. Indeed, by the Parseval-Plancherel theorem [17], we get the useful identity 0(;^) = ^ /_" ^(^ - y)^o (^) dy, (2.8) 206 M. Buhmann, A. Iserles, and S. P. N0rsett which is a convolution and whose Fourier transform is therefore (2.7) (cf., for instance, Jones [14]). In (2.8), KQ is the 0th modified Bessel function (Abramowitz and Stegun [1]) which is positive on positive reals and satisfies Ko{t) ~ -logi near zero and iiro(i) ~ ^7r/(2i)e~* for large t, similar to the asymptotics We have used before for the Ki modified Bessel function. Hence, by a lemma in [7], see also (Light and Cheney [15]) 4> decays algebraically of a certain order if tj> does. Moreover, because l/\/l + Xx^ is positive, integer translates of <)) are dense in L^, say, provided that this is the case with , integer translates of V'[18]. In some trivial cases we may evaluate the integral (2.8) explicitly, for instance for il}{x) = cos a;, where the integral is again a constant multiple of the cosine function (Abramowitz and Stegun [1]). Otherwise, the smoothness and fast exponential decay of the modified Bessel function can be used together with a quadrature formula. We may now use the translates of such Sobolev-orthogonal functions in the spectral approximation of a PDE as above, letting W := {(/>(• - nh) | n G 2Z}. An example of a function "ip that satisfies (2.5) is simply the characteristic function scaled by h of the interval [-hn, HTT]. In that case, \tp{x)\ decays like l/|x|. In fact, any ip that satisfies |V'(OI < c(l + ICJ)"^^^"^^ for positive e can be made to satisfy (2.6) by subjecting it to the transformation ^(0^?(0:=^—^M_, J2 m + h-'2iTn) (2.9) i. see for instance (Battle [2]). If V* is compactly supported then the transformed tjj will not necessarily be compact supported but decay exponentially [6]. In order to find a class of examples of compactly supported ip that satisfy (2.6), see Daubechies [8] for her compactly supported scaling functions ip which are fundamental for the construction of Daubechies wavelets. For example, the following conditions are sufficient for ip which shall be defined by its Fourier transform to satisfy (2.6) for /i = 1 (other h can be used by scaling): where, for some suitable coefficients hk, 2N-1 fc=0 has to satisfy h(0) = 1, h{ir) = 0, and For the construction of such h, see [8]. Compactly supported basis functions are important to approximate the numerical solution of a PDE as in the above example defined on Sobolev-orthogonal functions, radial basis functions 207 a compact V. Moreover, any ip with the aforementioned decay property can be made to satisfy (2.5) by the transformation m^ , ^^^^^ (2.10) £ IV'C^ + ^"^27rn)|2(l + A(^ +/i-i27m)2 \ n=—oo They can also be found by applying the transformation (2.10) and using the transformation (2.9) as well. We note finally, that for instance, when V' is a B-spline then its translates are dense in L^ if we allow h to become arbitrarily small (see, for instance, Powell [16]) and the last section of this paper). 2.2 Sobolev-orthogonal translates of a function in higher dimensions Applying the approach of the previous subsection to the Sobolev inner product f f{K)g{K)dK + X f V^/(x)Vff(x)dx, the outcome is the orthogonality condition J2 \kh'\O + 2Trn))\''{l + Xh-^0 + 2nnf) = h'^, ne'Z'' ^G[-7r,7r]^ (2.11) which replaces (2.5). We are now also interested in the more general case of Sobolev-type inner products / /(x)5(x)/x(x)dx + A / V^/(x)V5(x)Kx)dx, where the weights n and v are positive. Here the orthogonality condition becomes more complicated. Specifically, it is Y, 4>^(h-\e + 2Tni))J'^(h-\e + 27rn)^ neK + Xh-^$^(h-\e + 2Trn)^J'^(h-\9 + 27rn)^=h'^, ^e[-7r,7r]^ where 0^ := <A*VA*> ^^ := (II • II X <^) * V^, and * denotes continuous convolution, used as in (2.8), where V' is convolved with a modified Bessel function. 2.3 Error estimates We can offer error estimates for the Sobolev-orthogonal bases, firstly, in the case when (f) is a, univariate spline of fixed degree m, say, with knots on hTZ, and, secondly, in the 208 M. Buhmann, A. Iserles, and S. P. N0rsett case when 0 is a linear combination of translates of the radial Gauss kernel e-»'-'/\ xeM, along hTl. In the former case it is known that the uniform approximation error to a sufficiently smooth function from the linear space spanned by </)(• - nh), n e 71, is at most a constant multiple of /i^^^ ([16])- We have already mentioned that we require A = 0{h'^), therefore it can be deduced by twofold integration by parts that the Sobolev error is indeed 0(ft'"+^). This can be generahzed in a straightforward way to higher dimensions by tensor-product B-splines. Our L^(IR) error estimates can be carried out as follows: Let / be a band-limited function, that is, one with a compactly-supported Fourier transform, which satisfies such assumptions that imply that the best least-squares approximation using a Sobolev inner product oo Sh{x)= ^ {f, <!>{■-hn))^,^c}>{x-nh), xGll, (2.12) n=—oo is well defined. For instance, we may require that {f,f)x,h < oo, as well as sufficient decay of the radial basis function cj), i.e. \cp{r)\ \<f>'{r)\ < c{l + \r\)-'-^, < c(l + |r|)-l-^ |^(r)i < c{l + \r\r'-' for a positive e. Here {•, ■)x,h is the Sobolev inner product which we study in this note and it is helpful to emphasise its dependence on h in the subscript. We begin with the piecewise polynomial, i.e. spline, case. Hence, let ^ be from the space of splines of degree m with knots on h7Z such that its translates are Sobolev orthogonal. Theorem 2.2 estimate Subject to the assumptions of the last paragraph, we have the error \\sH-f\\2 = 0{h"'+'), h-^0. (2.13) Proof: We shall establish in the course of this proof an error estimate for the first derivative of the error function in (2.13), so that an order of convergence can also be concluded for the norm associated with our Sobolev inner product. Indeed, because the Fourier transform is an L^(Il) isometry, we may prove (2.13) by considering ph-fh (2.14) instead of the left-hand side of (2.13). The Fourier transform of (2.12) is oo n=—oo The absolute convergence of the above is guaranteed by the decay conditions on 0. Hence the square of (2.14) is, by the Parseval-Plancherel Formula and periodisation of 209 Sobolev-orthogonal functions, radial basis functions the integrand with respect to ^, /oo I oo /»- E (/,0(--n/i)>,,,e-'«''"0W '—OO -c>o d^ «__,^ n=—oo /oo /•oo j-^ /»oo /»- E / -°°l /(0<^(0e'«'"(l + A^2)d^e-'«''",^(^) d6» n=-oo''-°° /7r/h oo I ^ \f{e + 2'Kk/h)-4>{e + 2'K.fc//i) ^Afc=-oo ^ oo /-GO /-OO E / _ /(0<A(0e'«"'^(l + A^2)d^g —iOnh d9. (2.15) ,„•/—OO The (1 + A^^) term in the above comes from the derivative in the Sobolev inner product and Fourier transform. Because / is band-limited, for small enough h (2.15) assumes the form oo »oo E mSok-He + '^^k/h) Y^ / /(O0(C)e'«"\l + Ae')dCe-«"'^ d9. „^_„v/-oo ■ir/h, (2.16) Using again the band limitedness of /, together with the Poisson Summation Formula, (2.16) can be brought into the form /w/h °° I E \f{e)Sok-kO + 2Trk/h) ■n/h I ^ ft E f{0 + '^'^n/h)4>{e + 2-Kn/h)(l + X{e + 2Trn/hf) d(9 n=~oo p7v/h oo ^ J—rr/h fe=—oo ,_ , In the case when cp is in the aforementioned spline space, it can be expressed as the inverse Fourier transform of ^(0 = Vhm) VE:L-OO ^ e M, (2.18) \ri^ + h-'27rn)mi + A(^ + h-^27rny) where f(^) = ^~™~-^. This follows from (2.5) and from the fact that all spUnes from our space are linear combinations of integer translates of r{x) := [x]"^, whose generalised Fourier transform is a multiple of ^"'"'"^ [14]. Since any constant factors in front of the function ^~™"-^ in f cancel in the expression for 4> above, we have ignored them M. Buhmann, A. Iserles, and S. P. N0rsett 210 straightaway. Substituting (2.18) into (2.17), we get the integral over [-TT/ZI, TT/ZI] of 2 E f{0)Sok fc=-oo r{e + h-'^2irk)r{e) -fie){i + x9') J^ \fie + h-^27m)\'^il + X{e + h-Hmf) (2.19) Considering (2.19) for each m separately, it follows from (2.19) and from f(^) = ^-"'-i that our claim is true. Indeed for the sum over all terms with fc ^^ 0, it is evident that we obtain a factor of /i^™"*"^ from the numerator, because the denominator is periodic, containing one term independent of h, and the nonvanishing expression /i~^27rfc in the argument of f{9 + h-^2Trk) guarantees f{e + h-'^2-nk) ~ /i^+i due to r(^) = ^-'"-i. Of course, the squares then taken provide the /i^"'+^ instead of ft'"+^. On the other hand, for A; = 0, we have for small enough h m= \m\' \fie)\H'^ + >^o^)m E„^o \r{e + h-^2i^nmi + \{9 + h-'2'Knf) 1 + En^o \^i^ + /i-^27rn)|2(l + A(^ + /i-i27rn)2) which is also ©(/i^™^^), as required, because the numerator provides an 0{h?^^), according to the rate of the decay of f and the power of h in its argument. This is then squared to provide 0(/i4™) = 0(/i2'"+2). As for the derivatives, one only has to multiply the Fourier transform of the error function in (2.14) with 6, and we get the same error estimate by multiplying the integrands in all the following integrals with l^p. D The same analysis remains valid when considering integer translates of the Gauss kernel e""^ ^ /^ jn order to form 4>. In this case we make use of the fact that the Gauss kernel has a Fourier transform which is a multiple of e~^ ''^^i ). ^g put this instead of f into (2.19), and we then get arbitrarily-high orders of convergence from (2.14) as long as we take 7 = 0[h), see also [3]. For this choice ((> is exponentially decaying, whereas for splines of degree m we merely get algebraic decay at infinity of order -m - 1. Bibliography 1. Abramowitz, M. and LA. Stegun (1970) Handbook of Mathematical Functions, Dover Publications. 2. Battle, G. (1987) "A block-spin construction of ondelettes. Part I: Lemarie functions", Comm. Math. Phys. 3. Beatson, R.K. and W.A. Light (1992) "Quasi-interpolation in the absence of polynomial reproduction", in Numerical Methods of Approximation Theory, D. Braess and L.L. Schumaker (eds.), Birkhauser-Verlag, Basel, 21-39. Soholev-orthogonal functions, radial basis functions 4. Buhmann, M.D. (1988) "Convergence of univariate quasi-interpolation using multiquadrics", IMA Journal of Numerical Analysis 8, 365-384. 5. Buhmann, M.D. (2000) "Radial basis functions", Acta Numerica 9, 1-38. 6. Chui, C.K. (1992) An Introduction to Wavelets, Academic Press, New York. 7. Buhmann, M.D. and N. Dyn (1993) "Spectral convergence of multiquadric interpolation", Proc. Edinburgh Math. Soc. 36, 319-333. 8. Daubechies, I. (1988) "Orthogonal bases of compactly supported wavelets", Comm. Pure Appl. Maths 16, 909-996. 9. DeVore, R.A. and B. Lucier (1992) "Wavelets", Acta Numerica 1, 1-55. 10. Driscoll, T.A. and Fornberg, B. (2001), "Interpolation in the hmit of increasingly flat radial basis functions", to appear in Computers and Maths & AppUcs. 11. Prank, J. and Reich, S. (2001) "A particle-mesh method for the shallow water equations near geostrophic balance". Tech. Rep., Imperial College, London. 12. Gautschi, W. (1996) "Orthogonal polynomials: applications and computation", Acta Numerica 5, 45-119. 13. Iserles, A., P.E. Koch, S.P. N0rsett and J.M. Sanz-Serna (1991) "On polynomials orthogonal with respect to certain Sobolev inner products", J. Approx. Th. 65, 151-175. 14. Jones, D.S. (1982) The Theory of Generalised Functions, Cambridge University Press, Cambridge. 15. Light, W.A. and E.W. Cheney (1992) "Quasi-interpolation with translates of a function having non-compact support", Constr. Approx. 8, 35-48. 16. Powell, M.J.D. (1981) Approximation Theory and Methods, Cambridge University Press, Cambridge. 17. Stein, E.M. and G.Weiss (1971) Introduction to Fourier Analysis on EucUdean Spaces, Princeton. 18. Wiener, N. (1933) The Fourier Integral and Certain of its Applications, Cambridge University Press, Cambridge. 211 Approximation with the radial basis functions of Lewitt J. J. Green Dept. Applied Mathematics, University of Sheffield, UK. j.j.green@sheffield.ac.uk Abstract R. M. Lewitt has introduced a family of compactly supported radial basis functions which are particularly useful in discretising for inversion ill-posed problems involving line integrals. We consider some practical considerations in their use and implementation, compare square and triangular grids of the functions in two dimensions, and describe some particularly favourable choices of the defining parameters. 1 Introduction In the article [5], R. M. Lewitt introduced a family of window functions ^(,) ^ I (1 - ir/arr/'lm{a{l - {r/ary/^)/Ua), 0 < r < a, ^^ ^^ where /„ is the modified Bessel function of order m (see Ch. Ill, 3.7 [13]). The implicit dependence of tp on the parameters a > 0, a > 0 and m 6 N is discussed below. Lewitt's motivation for studying these functions is the use of translates of the radially symmetric function *(x) = ^(||a;||) (x e R') (see Figure 1) as a basis for the discretisation of tomographic problems [8, 9]. Such a basis overcomes a number of difficulties associated with the usual, pixel-based, representation in problems involving the recovery of function from a set of line, curve or strip integrals across its domain, while retaining the advantage of a sparse discretisation. The author's interest in these functions arises in their application to a Radon-like problem in the remote sensing of ocean waves [15], a detailed exposition of which may be found in [3]. 2 Discretising x-ray problems The discretisation of an x-ray transform inversion problem with Lewitt's basis is straightforward. Given a set of centres Xi 6 R'', one represents the (unknown) function / as a linear combination of the translates of ^, f{x) = Y,^i^{x-Xi) 212 [x^Kf). (2.1) 213 On the functions of Lewitt FIG. 1. Lewitt's radial basis function in dimension 2 with m = 2, a = 3. The given data in such problems are the values Ij of integrals of / over lines (or more generally, submanifolds) Lj (2.2) The latter integral in (2.2) is the projection or Abel transform of /, which can be calculated exphcitly in the linear case. For a line Lj whose closest point to Xi is at a distance s from it, and with the dependence of ^ on rri here made explicit, 2rv™(N/^^T^)rft-«%^f-)'^%™+i/2(«) Jo Im{a) \a J (see A7, [5]). Thus (2.2) reduces to a linear system which may be solved for the coefficients ^j. If the support of the basis functions is small (i. e., if a is small) then this Unear system has an unstructured sparsity which can be exploited by, for example, an iterative rowaction solution method [2]. The computational cost of such a discretisation lies mainly in the evaluation of the Abel transform which requires the calculation of a Bessel function. Fortunately, Bessel functions of half-integer order can be calculated efficiently from their recurrence relations (see the Atlas, [12], for details). The discretisation techniques describe here can also be applied to problems in which the integrals are over curves of sufficient smoothness to allow a local linear approximation. 214 J. J. Green 3 Fourier transform and invertibility The Fourier transform of the rf-dimensional basis function $„ is radially symmetric and given in (A3) of [5] as *"W = '^W^%SI^' -V(^-'-ll)-^.-. (3.1) The presence of the Bessel function Jm+d/2{z) in this expression clearly implies that it is not non-negative, and so by Bochner's characterisation of positive definite translationinvariant functions, ^ is not positive definite for any choices of the parameters. This fact denies us the attractive approximation theory of the compactly supported radial functions of Wu, Wendland and Buhmann (Section 3, [1]). In particular, there is no guarantee, per se, on the invertibility of the interpolation matrix [^{xi — Xj)], needed to ensure that (2.1) can represent an arbitrary function at its centres. However, this interpolation matrix is invertible if it is strictly diagonally dominant (Corollary 5.6.17, [4]) which, for a set of centres on a uniform grid F, holds if ^(0)> Y, *(^)- (3-2) ^Gr\{o} Values of the parameters for which (3.2) is satisfied for the square planar grid AZ^ are shown in Figure 2. As is noted in [5], there are several reasons why a rapid decay of the Fourier transform of the basis function is advantageous in functional representation for the inversion of xray and related transforms. • Such inversions may be complicated by functions in the nullspace of the transform, so-called ghosts. For some transforms [7] it can be shown that such functions have a Fourier transform which is small close to zero in the frequency domain, and so representation by a basis with Fourier transform localised around zero will suppress these ghosts. • These inversions are often ill-posed and the given data noisy. Representation of the sought function by a basis with localised Fourier transform imposes smoothness, and so acts to regularise the problem in the sense of Tikhonov. • It is often convenient to sample the inverted function on a grid which difi^ers from the set of centres Xi of the basis. With a localised Fourier transform, such a sampling can be performed without significant aliasing. The asymptotic estimate ^m.{x) = 0(l/||a;||'"+^'^+^^''^) may be derived from (3.1) and estimates /„ with large argument (see Eq. A4, [5]), a fact which should inform our choice of m. 4 Choice of parameters One agreeable feature of Lewitt's radial functions is that the choice of parameters of the functions correspond in a natural way to the balance between representation quality and efficiency of computation. For example, the asymptotic rate of decay of the Fourier transform increases with m (see above), but so does the cost of the calculation of /,„• On the functions of Lewitt 215 A similar choice arises when the centres he on a uniform (square or triangular) grid r. Let A denote the grid spacing of such a grid, i.e., the minimum distance between distinct centres in T. It is desirable that the grid ratio a/A be small, as this results in sparsity of the discretisation. As a guide to fixing the values of a and the grid ratio, Lewitt suggests the error in quasi interpolation to a constant, the error with which the function gix):=Y,^{x) ier approximates the function whose constant value is that of g at the centres (edge effects are ignored here). In Figure 2 the root mean square of this representation error (estimated numerically) is shown for the square planar grid, m = 2 and a range of values of a and a/A. The distinctive "trenches" in the error can be explained with Poisson summation formula (see [11]), ^ ^{x + An) = --^ Y^ exp(27rin.a;/A)*(n/A). neZ2 (4.1) n€Z2 The summand for n = 0 in the second sum is *(0), so the representation error depends only on the values of ^ on the dual grid, Z^/A. Provided that * decays rapidly, we would expect a small error when ^ is zero, or close to zero, for the dual grid-nodes close to the origin. By (3.1), ^(a;) is zero exactly when '/m+d/2(\/(27ra||a;||)2 - Q2) =0, i.e., for radial values ||a;|| = i?fc, where T]k is the fc-th zero of Jm+d/2- Thus, the requirement that that the fc-th zero of ^(x) occurs at the radius of the closest non-zero dual grid node (i.e., Rk = 1/A) is a constraint on the values of a and a/A a=^(27ra/A)2-r?i (4-2) The contours (4.2) agree well with the trenches evident in Figure 2. With the same intent we can require that the l-th zero of ^(a;) occur at the radius of the second closest dual grid node {Ri = \/2/A). Points satisfying both of these constraints can be expected to have a particularly small representation error. In Figure 2 these favourable choices are labelled k:l. The above argument can be also be applied the triangular grid. Establishing the Poisson summation formula for such is straightforward (either generalised from VII Section 2 of [11] or specialised from the formula for topological groups in [6]), and one finds that dual grid is the triangular grid with node spacing 2/(A\/3). The representation error is, qualitatively, similar to that shown in Figure 2. To make a quantitative comparison we plot, in Figure 3, the representation error on the principal trench (i. e., along the contour 216 J. J. Green Grid ratio a/A 2. The representation error of the square planar grid for m = 2. The lower contour map shows the root mean square error in representation for different values of the grid ratio a/A and localisation a. The upper figure shows the error along the trenches evident in the lower. Favourable choices of the parameters are marked 1:2,1:3,..., and are also shown in the lower figure. Values of the parameters to the left of the dashed line give rise to a diagonally dominant interpolation matrix. FIG. 217 On the functions of Lewitt 1.0 1.5 2.0 2.5 1.0 1.5 2.0 2.5 1.0 1.5 2.0 2.5 adjusted grid ratio FIG. 3. Error in the principal trench for square (dashed) and triangular (sohd) grids. (4.2) for A; = 1 in the case of the square grid) for each grid type and a number of values of m. To ensure a fair comparison, the horizontal scale in Figure 3 is adjusted for each grid-type to give equal node densities. As is seen, the two grid-types have similar error performance, suggesting that the square grid (with attendant ease of implementation) is to be preferred in practice. 5 The functions of Wendland It is interesting to compare Lewitt's functions with the radial basis functions of Wendland [1, 14], positive definite functions whose window functions are piecewise polynomial. The positive definiteness of Wendland's functions indicate their usefulness in approximation, for which extensive results exist, and a number of recent papers have explored their use in the discretisation of partial differential equations. The use of Wendland's functions in x-ray problems does not appear to have been investigated, although their Abel transforms can be obtained analytically. We do not address this question here, but indicate why Lewitt's functions may offer some advantages for such problems. The Fourier transform of Wendland's function $2,0, whose window is 02,0 ('') = (1 — ^)+) is proportional to />27rr r-M Jo {2nr-t)HJo{t)dt = 0{r-^) (r = ||x||) (see Section 3, [14]). In Figure 4, $2,0 is plotted along with the Fourier transform ^^2, of Lewitt's function with a = 1 and the parameter choice 1:2 of Figure 2. Although both have the same asymptotic decay of the Fourier transform, Lewitt's is more localised about zero and thus may offer better suppression of ghosts in x-ray problems. Finally we mention that Buhmann has shown, in [1], that Wendland's window func- 218 J. J. Green FIG. 4. Fourier transforms of basis functions, tion admits a convolution representation of the form /•oo p(r):= / {l-T''lt)ieg{t)dt Jo (5.1) for the weight g{t) = (1 - i)+ with suitable k and n. We note that (5.1) may be solved for g, since substituting a; = r^ in (5.1) allows it to be reduced to a standard integral equation whose solution, (-1)" 9{x) = ^f^"Hx), m=pir), can be found in Article 1.1-4.32 of [10]. In the case that p is Lewitt's window V-'m, one may use the differentiation formula, All of [5], to find the corresponding weight g. For n = 1 we find that 1 Im-iia) 9{x) = --.2a™-2 /^(c^) -i>m-\{r), a weight qualitatively different from that of Wendland's function. Acknowledgements: The author wishes to thank L. R. Wyatt and the referees for a number of helpful comments, and acknowledges the financial support provided by the EC with the grant MAS3-CT98-0168. Bibliography 1. M. D. Buhmann. Radial basis functions. In Acta Numerica, volume 9, pages 1-38. Cambridge University Press, 2000. 2. Y. Censor. Row-action methods for huge and sparse systems and their applications. SMMiJemew, 23(4):444-466, October 1981. 3. J. J. Green. Discretizing Barrick's equations. Submitted. On the functions of Lewitt 4. R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1990. 5. R. M. Lewitt. Multidimensional digital image representations using generalized Kaiser-Bessel window functions. J. Opt. Soc. Am. A, 7(10):1834-1846, October 1990. 6. L. H. Loomis. An Introduction to Abstract Harmonic Analysis. D. Van Nostrand, 1953. 7. A. K. Louis. Orthogonal function series expansion and the null space of the Radon transform. SIAM J. Math. Anal, 15(3):621-633, May 1984. 8. S. Matej, G. T. Herman, T. K. Narayan, S. S. Furuie, R. M. Lewitt, and RE. Kinahan. Evaluation of task-oriented performance of several fully 3d PET reconstruction algorithms. Phys. Med. Biol, 39:355-367, 1994. 9. S. Matej and R. M. Lewitt. Practical considerations for 3-d image reconstructions using spherically symmetric volume elements. IEEE Transactions on Medical Imaging, 15(l):68-78, 1996. 10. A. D. Polianin and V. Manzhirov. Handbook of Integral Equations. CRC Press, 1998. 11. E. M. Stein and G. Weiss. Fourier Analysis on Euclidean Spaces. Princetion University Press, 1971. 12. William J. Thompson. Atlas for computing mathematical functions. John Wiley & Sons Inc., New York, 1997. 13. G. N. Watson. A Treatise on the Theory of Bessel Functions. Cambridge, second edition, 1944. 14. H. Wendland. Error estimates for interpolation by compactly supported radial basis functions of minimal degree. Journal of Approximation Theory, 93:258-272, 1998. 15. L. R. Wyatt. A relaxation method for integral inversion apphed to HF radar measurement of the ocean wave directional spectrum. International J. Remote Sensing, 11:1481-1494,1990. 219 Computing with radial basic functions the Beatson-Light way! Will Light Department of Mathematics and Computer Science, University of Leicester, UK. pwl@mcs.le.ac.uk Abstract In this paper we discuss a number of recent developments in the practice of how to compute with radial basic functions. The two main problems addressed are how to develop fast evaluation schemes for radial basic functions, and how to efficiently carry out the solution of the interpolation problem. The approach is to mainly describe work which has involved the author and Professor Rick Beatson as contributors, and to include an idiosyncratic selection of works by other researchers which have attracted the attention of the author. 1 Introduction Research into radial basic functions has been active now for about 30 years. The basic setup is as follows. A function tp : M" —> M, which we refer to as the basic function, is specified. A subspace is then constructed by reference to points xi,. ..,Xm in K". The members of this subspace all have the form s{x) = '^aiip{x - Xi), xeH", where the at,... ,am are real numbers. It is important to appreciate at the outset that throughout this paper, and indeed in most of the papers appearing in this area, the underlying assumption is that the points a;i,..., a;^ are distinct. One of the most common tasks for which these functions are used is interpolation. A small amount of research has been carried out where the points at which an interpolant is developed are arbitrary distinct points in H", but by far the majority of the work relates to interpolation which is carried out at the same points as those used to effect the translation. Accordingly, data di,..., dm are given at Xi,...,Xm,, and we require that m dj = s{xj) = '^aitp{xj~Xi), j = l,...,m. (1-1) Two immediate observations present themselves. Firstly, at the present level of generality there is absolutely no guarantee that the Equations (1.1) will have a unique solution. Secondly, one knows from the work of Mairhuber [14] that there are no Haar subspaces 220 Computing with radial basic functions 221 of significant dimension in any space H" for n > 2. What this means is that if we are to construct interpolation problems which have a unique solution for each location of the data points xi,... ,Xm and for each choice of the data rfi,..., dm, then the subspace used must vary as the interpolation points vary. If we pause for a moment and consider how we might in some sensible and orderly way vary the subspace as the points xi,... ,Xm vary, then using simple shifts of a single basic function ^ is one of the most natural choices. It is very common to work with a function if) which is a radial function. Thus we take a function </> : R"'" -^ IR and determine ip by the rule ij){x) = (t>{\x\) for all x € R". Note that throughout this account, the symbol | • J will stand for the Euclidean norm in M". At this point a common inaccuracy arises. The function I{J can be correctly referred to as a radial basic function. However, many authors give this appellation to the function (f), whose radiality is of no consequence whatsoever, since it would imply that 4> was simply an even function on R. Since (j) only acts on TR^ the idea that 4> can be radial is vacuous. Let us continue in this spirit of criticism a little while longer. As far as the author is aware, only two people in the world would refer to ^ as a basic function, or a radial basic function. All other authors would use the word basis in place of basic. There are very obvious problems with this terminology. We are seeking to generate subspaces which are suitable for interpolation. Such subspaces will naturally have the same dimension as the number of data, and the functions {ip{- — Xi) : i = l,...,m} should form a basis for the subspace. The use of the word basis in two completely different senses seems to the author to be misleading and unhelpful, whereas use of the word basic ^ a difference of one character — eliminates any possibility of confusion, and avoids the use of the word basis, which has a very specific mathematical meaning, in a context where its meaning is not the usual mathematical one. The problem about whether interpolation is possible has a highly satisfactory answer in the work of MiccheUi [15]. We direct the reader to the book of Cheney and Light [10] for a full account of these matters. A couple of examples will be helpful. If one chooses jn rn s{x) = y^^ai(j){\x — Xi\) = /^Qiexp(—ja: — Xj^), a; € R", t=i or s(a;) = \Jai(/>(|a; — a;t|) = yjaj|a; — a;j|, a; S R", then the resulting interpolation problem is uniquely solvable for any choice of Xi,..., Xm and for any data di,... ,dm- This result contrasts very strongly with the case for polynomial interpolation, where the data points x\,..., Xm have to be constrained not to lie on an algebraic surface of appropriate degree. Indeed, the alternative formulation of the above result for the pecond example is quite often surprising to mathematicians who are uninitiated in the theory of radial basic functions. Theorem 1.1 Let xi,... ,Xm be distinct points in JR". Then the matrix {\xj — Xi\) is invertible. Having drawn a clear distinction between polynomial approximation and approximation by (radial) basic functions it is at this point that we must consider having some 222 Will Light polynomial ingredients in our interpolant. This is done in a very standard way by a process we call augmentation by polynomials. We consider interpolants of the form m ' s{x) = ^ai(j){\x-Xi\)+p{x), ■ (XGH"). i=l Here p is a polynomial of total degree at most fc - 1. We still wish to interpolate to m pieces of information, but now have more than m parameters to determine with this information. The remaining parameters are determined via the 'natural' boundary : conditions. The full set of equations is dj = 0 = s{xj) = '^ai(l){\xj-Xi\), j = l,...,m m 'Y^aiq{xi), for all Q G 7ri._i(lR"). i=l Here 7r/j_i(]R") represents the space of polynomials of total degree A; — 1 in K". Two questions present themselves pretty quickly from this additional hypothesis. Why should polynomials be added to the interpolant, and why are the boundary conditions chosen in this particular way? In some sense it is essential that we allow ourselves the possibility of adding polynomial terms to some of the interpolants, as we shall soon see. The most important example of a radial basic function interpolant which has a polynomial part will be the thin-plate spline. We will make considerable reference to this interpolant in K^, where it has the form m s(a;) = yjoil^; - 3;ipln|a; - Xil+a.T + 6, (x G R^). Note here that the parameter a is a vector with two entries, as is x. Thus ax stands for the dot product between a and x. The parameter 6 is a real number. The natural boundary conditions take the form m m m /; Qt = Xl QiSj = VI ajti = 0, where Xi = {si,ti), i = 1,... ,m. This particular interpolant exhibits a feature common to all the cases where augmentation by polynomials is either necessary or desirable: the degree of the polynomial added is very low. The usual choices are k = 0 (when no polynomial term is added), fc = 1 (when the term is a constant polynomial) and k = 2 (when the added polynomial is linear). It is now no longer possible to carry out interpolation for all choices of the points Xi,... ,Xm. One must avoid distributions of these points which lie on a zero surface of the corresponding polynomial subspace. In the explicit case we considered above (thin-plate splines), the very mild restriction needed is that Xi,... ,Xm, should not all lie on a single straight line. The theory developed by Micchelli [15] includes the case of augmentation by polynomials. We now propose to take a look at a very simple example which we hope will give the Computing with radial basic functions reader a feel for some of the ideas and concepts we have introduced so far. We consider s{x) = 2_.^i\^ ^ ^i\'^^ (a; e R). Here the parameter 6 is a real number, and the natural boundary condition gives us YllLi flj = 0. A unique feature of the univariate case is that we can order the interpolation points iri < a;2 < • ■ • < Xm- Now consider the function s in one of the intervals [xi,Xi+i], i = 1,..., m — 1. It is clear that in such an interval s is simply a linear function. The demand that s interpolates the data at xi,... ,Xm means that s must be the piecewise linear interpolant to the data in the interval [xi, Xm]- What is the effect of the 'natural' boundary conditions? In the interval [xm,oo) we can write m m s{x) = y^^ai{x - Xj) + b = -y^^ajXj + b. i=l i=l Thus 8 is constant in [xm, oo). A similar calculation reveals that s is constant in (—oo, xi]. Combining all these observations shows that s is the natural Unear spline interpolant to the data at xi,...,Xm- This goes some way to explaining why the word 'natural' is appended the boundary or extra conditions. But we can go a little further. It is well known that the natural splines satisfy a variational principle. For the linear spline, if we examine X = {feS':f'€L\]R)}, then -oo is'f < ./—oo / iff for all f e X which also interpolate the data. This variational principle is very useful in developing error estimates, and we shall return to this general thread of ideas later in this account. However, we ought to observe that S' is the space of tempered distributions, and that the first derivative is to be taken in the distributional sense. There are ways of getting round this distributional approach (see Cheney and Light [10] for an example which corresponds closely to the discussion here), but it does give the most succinct description, and creates the technical background which will underpin all the theory which has been developed in this area. Notice also that the quantity being minimised can be used to specify a seminorm on X simply by taking the square root of the integral. This seminorm has as kernel 7ro(]R), which is precisely the polynomial subspace we use to augment the original radial basic function. Something very fundamental is happening here. Most mathematicians would regard this seminorm as being a measure of smoothness of the corresponding function. The natural linear spline therefore interpolates the data, and is the smoothest interpolant to the data from X in the sense that it possesses the smallest derivative in the X^-norm. If we are to pursue this very natural idea of making higher derivatives of s small, then we will naturally develop seminorms with polynomial kernels. This goes a long way towards explaining the need for augmentation. Finally in this introduction, we want to discuss briefly the uses to which radial basic 223 224 Will Light function interpolation is put. There are two significant feelings about interpolation by these functions. Firstly, it is thought that radial basic function interpolation is very good for treating scattered data. Loosely speaking, data is scattered when there is no possibility of determining either a natural choice of coordinate axes, or an origin. It is at the opposite end of the spectrum to gridded data. In the presence of a cartesian product for the data sites, it is much more efficient to use univariate methods together with tensor product constructions to do the interpolation. Secondly, radial basic function interpolation is thought to be very good for dealing with high dimensional data. There is some evidence from the realm of neural networks that this is indeed the case, but we will not venture into the area of high dimensional data interpolation in this paper. Finally, many of the data sets we want to treat have very large numbers of data sites and so our aim is to develop methods which will handle 10,000 to 1,000,000 data sites or more. 2 Computational difficulties and fast evaluation In this Section, we want to discuss the difficulties that arise when a large radial basic function interpolation problem is posed. We shall also deal with one of the essential tools for overcoming some of the difficulties. The system we want to solve has the form dj = s{xj) = '^ai(l>{\xj-Xi\)+p{xj) 0 = ^aiqixi), {j = l,...,m) (2.1) rn for all 9 e 7rfc_i(IR"). (2.2) If we declare a basis for 7rfc_i(]R") then we can write these equations in matrix form as A Q^ Q 0 )(:)=(o)- Here the matrix A has entries (t>{\xj — Xi\) and is m x m. The matrix Q has entries Peixj), where pi,... ,p^ is a basis for 7rfc_i(]R"), and is of size m x u. Recall from our assumptions that only low degree polynomials are used, and so Q is a long thin matrix. In the case of thin-plate splines in H^ it would have size m x 3. However, J4 is a very large matrix, with absolutely no sparsity. In fact, for thin-plate splines, the matrix A is zero on the diagonal, and has large off-diagonal entries. In solving a large system of linear equations, the only effective strategy is to use an iterative solver. Such a solver will involve many multiplications of the matrix A with a vector a, and the full nature of A makes this a very costly process. One of the key discoveries in this area was the Beatson and Newsam [8] result which showed how fast multipole algorithms could be applied to this area. If we consider the expression m s{xj) = 'Y^ai\xj-XiflnQxj-Xi\)+p{xj) for some Xj G M", then this can be considered as an evaluation of the function s at the point Xj, or the formation of an element in the matrix vector product Aa. Because of Computing with radial basic functions 225 this, most authors tend only to consider how to evaluate the function s in an efficient way — generating what are known as fast evaluation algorithms. It is impossible to estimate properly the importance of this discovery. Anyone involved in programming iterative solutions to the thin-plate spline equations with tens of thousands of points would find that any such algorithm would just grind itself into the dust without this technology. The technology really has two aspects: a mathematical tool, and a programming structure. Here we intend to give only the flavour of the argument. The reader who really wants to know the details is advised to look either at the original paper [8], or the later paper of Beatson and Light [5] which deals with polyharmonic splines. She can also look at two papers which give clear explanations of simple cases. The first is found in a survey paper by Beatson and Greengard [3]. The second is a technical report by Beatson, Levesley and Light [7]. This last paper discusses fast evaluation methods on the circle and higher dimensional spheres, and the reader will find a very careful and full account of the one-dimensional circle case. The first trick with problems iii H^ is to consider complex variables, rather than points in ]R^. Let z be a point at which we wish to evaluate s, and u a data point, or centre. Then \z - tip In \z-u\=: n£{\z - uf ln{z - u)) = Tl£{\z - up In z) + TlsiAz - up In Tl - -]]. Look at the last two expressions here. The first of them has the centre u in the square of the modulus term, and this expression is quite cheap to evaluate, even if there are many centres u. The effect of many centres on the second term is however quite profound, and it is with this term that we must work. The idea is to set a tolerance, and only aim to evaluate s to within this tolerance, rather than exactly. The appropriate series expansion can then be used: P=l p=l p=l . , The value of N depends on the tolerance demanded of the evaluation and the relative sizes of u and z. For this reason, we think of z as far away from the origin in M^, and u close to the origin. If there are now many centres ui,..., u^ near the origin, and z is far away from the origin, then we can summarise the effects of linear combinations of all these centres as follows: m rn N $:a,i.-u,fin(i-^) ^ x:«^Eip(«^>"^. 4=1 i=l p=l N = m ■ N 'Yl^°-iU{'^i)^~'^ = '^9p{Ul,...,Um)z"^■ p=l 1=1 p=l The principle now is to use the last expression above to make an approximate evaluation of s. Of course, the assumption that z was far from the origin and ui,...,Um were close to the origin is not important. It is simply important that z be far away from the cluster of centres ui,..., Um- The summarising expression is referred to as a Laurent type expansion, because it summarises the contribution of the centres ui,... ,Um in terms of Will Light 226 FIG. 1. Fast evaluation panelling. series involving negative powers of z. There is now a lot of preprocessing to go on before the fast evaluation algorithm is ready to roll. Figure 1 shows how the algorithm proceeds. The shaded square at the bottom left of the domain is the point which contains z, the evaluation point. All the squares around this one which are the same size are deemed to be 'close' to the evaluation square. All other squares are 'far away'. Of course, as the squares get further away from z it becomes possible to use our summarising technique to total up the contributions of larger and larger numbers of points. This is done in a very explicit manner, which is represented by the shading in Figure 1. As we get further away, we double the size of the squares over which we summarise, and there is a band of same-size squares (or a ring, if the evaluation square was in the middle of the domain) two squares wide surrounding the evaluation square. Once all the preprocessing is done, and we shall discuss this a little more in a moment, all the needed coefRcients gp are available, and evaluation can be carried out in about O(logm) FLOPS instead of 0{m). The above account does not quite reveal the whole story. The coefficients gp are calculated in an orderly manner which greatly improves the efficiency of the algorithm. Suppose our problem is located in [0,1]^. An initial decision is made to divide the original domain into squares of size 2~". There is then a parent child relationship derived through a quad-tree data structure. The parent [0,1]^ has four children: [0,0.5]^, [0.5,1]^, [0,0.5] X [0.5,1] and [0.5,1] x [0,0.5]. This parent-child relationship helps in setting up the coefficients gp(ui,... ,Um) in an efficient way. There is also a further idea involving Computing with radial basic functions 227 Taylor series, which gives more efSciency. We omit any description of this technique. 3 Inverting the interpolation matrix Recall as at the beginning of the previous section that the equations specifiying the interpolation problem are as follows: dj = s{xj) = ^ ai4>(\xj -Xi\)+ p{xj), 0 = ^aiq{xi), (j = 1,..., m) (3.1) m for allge7rfc_i(]R"). (3.2) i=l In matrix terms we have A Q^ Q 0 where ^ is a full matrix which tends to exhibit poor conditioning. The poor conditioning of A is similar to problems experienced by researchers in the theory of finite elements — as the interpolation points become very dense in a given region, the conditioning gets worse. In fact, there are formal statements relating some impression of the condition number of A (usually the smallest eigenvalue of A) to the minimum interpoint distance. The following table represents the condition number of A when the interpolation points are given on a uniform 5x5 grid in [0, a]^. Of course, on a philosophical level, it does not Scale parameter a 1.0 0.1 0.01 0.001 TAB. Condition Number 3.6458 X 10^ 2.5179 X 10^ 2.4364 X 10^ 2.4349 X 10« 1 Two norm condition numbers of A. make any sense whatsoever to describe an interpolation problem as being ill-conditioned. Let's discuss this point in a little more depth. Suppose xi,...,Xm are points in H", and Gi,..., Gm are a set of functions from M" to IR which are linearly independent over {xi,... ,Xm}- That is, interpolation to arbitrary data at xi,... ,Xm by linear combinations of Gi,..., Gm is always uniquely possible. Then there is a basis Fi,..., Fm for the linear span of Gi,..., Gm such that Fi{xj) is 1 if i = j and is zero for all other values of i, j between 1 and m. If the given data is di,..., dm, then the interpolant can be written down immediately as J^diFiix) 1=1 (rcelR") 228 Will Light If one has in one's hands the basis {Fi,...,Fm} and wants to know the coefficients which must be used then one need only invert the identity matrix to obtain the solution, and there are not many matrices which are better conditioned than the identity matrix! Of course, getting one's hands on the basis Fi,... ,Fm is usually rather difficult — as hard as solving the original problem in fact. It has become traditional to refer to the basis Fi,...,Fm as the Lagrange basis (in sympathy with the fact that Lagrange was a person who wrote down this basis for polynomial interpolation in one dimension) or the cardinal basis. This last term seems to the author to be quite appropriate, indicating that the basis is special. However, it does not find favour with spline theorists, since they think of the word cardinal in a very technical sense (the interpolation points are K"). Terminology aside, the point is still made that the conditioning of any interpolation problem is a function of the available basis. A more practical case of this phenomenon is the problem of natural cubic splines in R. They fit into the radial basic function interpolation scenario, because a natural cubic spHne with knots at Xi,...,Xm can be written as m s{x) = '^ai\x-Xif+ ax + b (x e R). j=i If we require this sphne to interpolate data di,...,dm at xi,...,Xm then we have to require that s{xj) = dj for j = 1,... ,m. The natural property comes, as expected, from the natural boundary conditions: I m m ^fli = ^aiXi =0. j=i , 1=1 The ill-conditioning illustrated in Table 1 would be equally present in this example, and the remark that the conditioning increases as the interpoint spacing decreases would also hold good. Of course, to suggest the use of this basis to a spline practitioner would not be a good idea! We are well used to the idea that B-splines are the correct basis to use in this situation. I suppose the two principles to emerge from the above discussion are that the basis we have used thus far to describe the interpolation problem is not satisfactory from a computational point of view, and that in at least some of the cases under discussion (all of them one-dimensional) there are other bases which are superior. There are other ways to conceptualise the difficulties we experience with the radial basic functions. Most of them tend to grow at infinity, and have small value at zero. As a general principle, we would hke a basis to mimic the B-spline basis. That is, we would like the basis to be local if possible — each basis function having a fairly small support around one of the interpolation points. The first people to make progress in this area were Dyn and Levin [11] in 1983. There is a later paper with Rippa [12] in 1986 which is also worth looking at. Their technique was based on the observation that if F{x) = [xp In |a;|, and X €. B,^, then V^F = STTS. Here, V* represents the bilaplacian, and 5 is the Dirac delta distribution whose action on each rapidly decreasing function in S is to evaluate it at zero. This description alone should alert us to the fact that V^F = 8TTS is a distributional equation, and as such must be handled with care. However, numerical analysts dash in Computing with radial basic functions 229 where others fear to tread, and we can approximate the Laplacian as follows: {V^F){x) ^ h-'^{F{x-hei)+F{x+hei)+F{x-he2)+F{x+he2)-4.F{x)} [x e H^). Here his a. real parameter, and ei and 62 are the usual unit vectors in H^. Pictorially, we can represent this approximation by the stencil shown in Figure 2. The bilaplacian stencil 1 -•- FIG. 1 -•- 2. The stencil for the Laplacian. is shown in Figure 3. This observation is used in a straightforward way if the interpolation points lie on a grid. Instead of using the thin-plate spline radial basic function to generate a basis, one uses the appropriate linear combinations which represent the bilaplacian of this function. Because one has a distributional equation relating this quantity to the d function, one does not expect to get the 6 function exactly, but one certainly does expect to get a function which decays rapidly at 00, and this is exactly what happens. Dyn and Levin provide some encouraging numerical results. Of course, there remains the problem of what to do when the data is not gridded. Here one must develop first the appropriate stencil for the Laplacian on a point by point basis. This may seem laborious, but in fact the next few methods we will describe all compute better basis elements on a point by point basis. Perhaps the most successful class of schemes of this nature — computing a new basis on a point by point approach — comes from Beatson, Goodsell and Powell [2] and Beatson, Cherie and Mouat [1]. Their approach is perhaps simpler to appreciate and implement than that of Dyn and Levin. They begin with the observations I made earlier — what we are really after is the cardinal basis Fi,..., Fm with the property that Fi{xj) is 1 ii i = j and is zero for all other values of i, j between 1 and m. However, because this problem is as difficult to solve as the original one, we proceed as follows. Consider Will Light 230 2. -J 20 -• •2« -8 * FIG. -• •- •' 3. The stencil for the bilaplacian. the job of trying to construct F,. This function is supposed to be 1 at a;, and zero at all other points. Choose about 50 near neighbours oi Xi, say yj e {xi,... ,x„,}. This choice must include Xi. Then take 50 Fi{x) ='^aj\x-yj\^In \x-yj\+bx + c, (x G H^). We demand that '^) = {I Fi{y. if 2/j =Xi, otherwise, and that the natural boundary conditions are also satisfied. Thus we are producing approximate cardinal functions which have the value 1 at the required point, but are only zero on about 50 neighbouring points. This suggestion is based on the fact, observed by many workers, that such functions are often small elsewhere in the domain. We produce some pictures to illustrate this. In the first (Figure 4), 289 points are spaced on a regular grid in [0,1]^. The approximate cardinal function is based on the 13 points shown in bold in Figure 4. Figure 5 illustrates the same situation, but now as shown the points used to develop the cardinal function are all clustered in one corner of the domain. The effect is to produce significant values at the opposite corner of the domain. One can infer from this that whenever the data is pretty much uniformly distributed, the cardinal functions using points well inside the domain will have good properties, while those at the edge will be poor. Similarly, in a non-uniform distribution, those interior to a cloud of points will behave well, while those at cloud boundaries might not. There are two methods for dealing with the difficulties which have shown up above. Computing with radial basic functions ••• ••• FIG. 4. Approximate cardinal function with points central to the domain. ••• ••• FIG. 5. Approximate cardinal function with points at one corner of the domain. Firstly, one can pin all the cardinal functions at a fixed set of judiciously chosen points -— so that every cardinal function must have the value zero at these points. This is very effective in the case of regularly spaced data as Figures 6 and 7 show. One can imagine however, that a data set with a number of clouds might benefit from a judicious choice of points at which to carry out the pinning. What one would really like is a method which does not rely on any user intelligence in the choice of points. As mentioned before, a desirable feature of a good basis function is one which decays at infinity. This decay should be at some rate if possible. The Beatson, Cherie and Mouat prescription for thinplate splines in R^ is that the elements should decay like |a;|""^ as \x\ —> oo. There is a problem here, in that if we opt for decay elements everywhere, then we will not obtain a basis for our space. To get around this problem, we accept an element F, as a decay element if it satisfies 50 i=l 231 Will Light 232 •• •»•• • ••• ••• •• FIG. • 6. Approximate pinned cardinal function with points central to the domain. ••• • FIG. • • I • • • • • 7. Approximate pinned cardinal function with points at one corner of the domain. and \Fi{x)\ = Oi\x\-') as oo. Otherwise, we use the Ft which is defined by the previous conditions of cardinality. Again there are a few bells and whistles needed to make this method operate efficiently, but we hope that sufficient detail is present for the reader to be able to see the general idea. All the above methods are providing ways of constructing a better conditioned basis with which to solve the problem. A method still has to be selected to invert the matrix associated with the new basis, which is now much better conditioned than the original matrix corresponding to the conventional basis. The method of choice for most authors is some version of GMRES. Beatson called the points at which decay could be obtained 'good' points, and points at which decay could not be obtained 'bad' points. This idea has been built on in a recent technical report by Beatson and Levesley [4]. The general spirit is to define good and bad points in the same way as Beatson, and then to develop an iterative solver, solving first on the good, then the bad, then returning to the good and so on. Computing with radial basic functions 233 Finally, a very successful method has recently emerged from the researches of Beatson, Light and Billings [6]. This method has the advantage that it is a fast iterative solver which may be regarded as a preconditioner in its own right (thus it may be combined with a solver such as GMRES). We will describe it here as a solver. It is essentially the domain decomposition method, although as with previous solvers, our description will be very much at a 'bare bones' level, and the interested reader is referred to [6] for the fine details, which include some error estimates, some interesting comments on an alternative basis, and a good deal of theory. We shall describe the method as applied to data on the unit square [0,1]^ in M^, and we will not make any attempt to make the method adaptive in character. The reader will be able to see these improvements for herself. We will test our method on randomly chosen data in [0,1]^. We begin with a set of nodes X = {xi,... ,Xm} at which interpolation is to be carried out. We will describe the algorithm as it is implemented for solving the thinplate spline interpolation problem on the node set X. We divide up the square [0,1]^ into a fairly large number of sub-domains Xi,... ,X(. There are two constraints on these subdomains. It is important that they are constructed so that about equal numbers of points lie in each subdomain — about 50 points per subdomain is ideal. Secondly, it is essential that each subdomain overlap all surrounding subdomains. In our terminology, two subdomains overlap if they have a (small) number of points in common. In each subdomain there are some points in X which lie only in that subdomain and not in any other. We call these points the inner points of the subdomain. A coarse set Y of inner points in the node set X is also chosen. We will say more about this coarse set in a moment, but at this stage it simply consists of a small number of inner points from each subdomain. The algorithm will then construct the interpolant s and proceeds as follows. We initialise the interpolant s as s = 0. We want to solve the equations m dj = s(a;j) = 2Jtti|a^j - a^il^lnla;^ — a;i|+Q;a;j+/3, (j = l,...,m) (3.3) , (3.4) subject to the boundary conditions m m m '^ '^ai = '^aiSi = '^aiti=0, ^ where a;^ = (sj,ij). In matrix form these equations are A Q \ f a\ _ / d Q^ 0 )[b )-[0 as we have already seen. Our method will operate by residual correction, so we begin by setting . It is important to recall that a is a vector of length 2, which we write as a = (ai,a2). Suppose now we have begun our iterative procedure and generated an approximation s with a residual r. The next few steps describe how to update the approximation and the ' . Will Light 234 residual. Step 1. We construct Si,. .,S(, such that each s,- is an interpolant based only on all points of the subdomain Xj, using as data the residual vector r restricted to Xj. Step 2. For each inner point x we now have a single real number Ox- which is the coefficient of | • -x\'^ In | ■ -a;|. If we look at the collection of coefficients belonging to all the inner points of all domains, then this collection is not in general orthogonal to TTI. That is, they fail to satisfy boundary conditions of the type given in Equation (3.4). We now correct so that the collection of coefficients corresponding to all inner points of all domains is orthogonal to TTI . Step 3. We set Si = V^{a3;| • -xf In I • -a;| : a; is an inner point}. (3.5) Step 4. We evaluate the residual TZ = r-Si at the coarse grid points, and then construct the interpolant ^2 to this residual on the coarse grid points Y. Step 5. We update s hy s <- s + Si + 82- The new residual is then given by ^(o)' where Zi = di- s{xi), i = l,...,m. This iterative process can either be continued to convergence, or used as a preconditioner followed by GMRES. Table 2 shows some run times taken to obtain an error of less than 1 X 10"^ for the Franke 1 function (see [13] for the definition of this function). Random nodes were generated in [0,1]^ and an Intel Celeron PC was used. Recently, Number of nodes 10,000 20,000 40,000 80,000 160,000 TAB. Number of iterations 8 8 6 6 7 Time (seconds) 7.0 17.5 35.5 105.7 407.8 2 Run times for domain decomposition. the group at Leicester, using a twin processor Compaq PC, has obtained solutions to a problem with 1,000,000 random points in less than 9 minutes, and we can safely say that the combination of domain decomposition methods and multipole fast evaluation has produced a robust and effective method. Most practitioners will be aware of other ways to run a domain decomposition algorithm. In particular, one can use a nesting approach where one starts with only four subdomains each containing large numbers of points. To solve each subdomain problem, one subdivides again and does domain decomposition in the subdomain. Computing with radial basic functions Bibliography 1. Beatson, R.K., J.B. Cherie and C.T. Mouat, Fast fitting of radial basis functions: methods based on preconditioned GMRES iteration. Advances in Computational Mathematics, 11 (1999), 253-270. 2. Beatson, R.K., G. Goodsell and M.J.D. Powell, On multigrid techniques for thin plate spline interpolation in two dimensions, Lectures in Applied Mathematics 32 (1996), 77-97. 3. Beatson, R.K. and L. Greengard, A short course on fast rriultipole methods, in Wavelets, multilevel methods and elliptic PDEs, Ainsworth, M., J. Levesley, W.A. Light and M. Marietta (eds), Oxford University Press, Oxford (1997), 1-38. 4. Beatson, R.K. and J. Levesley, Good point/bad point iterations for solving the thin-plate spline interpolation equations. University of Leicester Technical Report, 2001/34 (2001). 5. Beatson, R.K. and W.A. Light, Fast evaluation of radial basis functions: Methods for two-dimensional polyharmonic splines, IMA Journal of Numerical Analysis 17 (1997), 343-372. 6. Beatson, R.K., W.A. Light and S. Billings, Domain decomposition methods for solution of the radial basis function interpolation problem, SIAM Journal Scient. Stat. Comp. 22(5) (2001), 1717-1740. 7. Beatson, R.K., J. Levesley and W.A. Light, fast evaluation of radial basic functions on spheres, piepr'mt. 8. Beatson, R.K. and G.N. Newsam, Fast evaluation of radial basis functions I, Computers and Mathematics with Applications, 24 (12) (1992), 7-19. 9. Beatson, R.K. and M.J.D. Powell, An iterative method for thin plate spline interpolation that employs approximations to the Lagrange functions, in Numerical Analysis 1993, D.F. Griffiths and G.A. Watson (eds), Longmans, Harlow, 1994. 10. Cheney, E.W. and W.A. Light, A course in approximation theory. Brooks Cole, Pacific Grove Ca, 1999. 11. Dyn, N. and D. Levin, Iterative solution of systems originating from integral equations and surface interpolation, SIAM J. Numer. Anal. 20 (1983), 377-390. 12. Dyn, N., D. Levin and S. Rippa, Numerical procedures for surface fitting of scattered data by radial functions, SIKM io\a:n&\ Scient. Cova.-p. 7 13. Franke, R., Scattered data interpolation: Tests of some methods. Mathematics of Computation 38 (1982), 181-200. 14. Mairhuber, J.C, On Haar's theorem concerning Chebychev approximation problems having unique solution, Proc. Amer. Math. Soc. 7 (1956), 609-615. 15. Micchelli, CM., Interpolation of scattered data: distance matrices and conditionally positive definite functions, Constr. Approx. 2 (1986), 11-22. 16. Sibson, R. and G. Stone, Computation of thin-plate splines, SIAM Journal on Scient. Stat. Comp. 12 (1991), 1304-1313. 235 Application of orthogonalisation procedures for Gaussian radial basis functions and Chebyshev polynomials John C Mason and Andrew Crampton School of Computing and Mathematics, University of Huddersfield, Huddersfield, UK. j.c.mason@hud.ac.-uk, a.cramptonOhud.ac.uk Abstract Procedures for orthogonalisation of Gaussians and B-splines are recalled and it is shown that, provided Gaussians are negligible in appropriate regions, the same recurrence formulae may be adopted in both and render the computation relatively efficient. Chebyshev polynomial collocation is well known to be rapidly defined by discrete orthogonalisation, and similar ideas are commonly applicable to partial difi'erentieal equations (PDEs) and integral equations (lEs). However, it is shown that the most elementary mixed methods (both boundary conditions and PDEs being satisfied) for the Dirichlet problem in rectangular types of domain can lead to a singular linear system, which may be rendered non-singular, for example, by a small modification of interpolation nodes. 1 Introduction Gaussian radial basis functions (RBFs) are negligible outside a certain range, which depends on the accuracy required and the exponent used. For example, if four decimal place accuracy is sufficient, then outside [-2,2] the function e"'*'^ is negligible for A > 2.5. Indeed the translated RBFs (/>i(a;) = e-^(^-*)' i =-1,0,... ,n + 1, (1.1) resemble, at least superficially, a set of translated cubic B-splines, each having a support of four sub-intervals of length one, contained in [i - 2, i -t- 2]. Following work of Mason et al [4] and Goodman et al [1], we show that these RBFs, rounded to the required accuracy, may be conveniently and efficiently orthogonalised so that 236 Orthogonalisation procedures 237 (i) a 4 term recurrence may be adopted identical to the one in [4] for cubic B-splines, (ii) inner products may be determined very simply in terms of 4 parts of a normal distribution, (iii) a well conditioned calculation results and best I2 approximations may be obtained immediately with an orthogonalised basis, (iv) a continuous or discrete inner product (and best approximation) may be adopted. In a second application of orthogonalisation, this time to polynomials, it is shown that a two-dimensional {n+l)x (n-|-1) polynomial collocation problem, which includes amongst its nodes n Chebyshev polynomial zeros on each of 4 sides of a square, leads to a singular (rank one deficient) system. For all n, one superfluous equation is readily identified and a suitable replacement equation is readily found. Discrete orthogonalisation is used to combine and greatly simplify the equations and prove singularity. 2 Orthogonalised Gaussians An orthogonal system {Pj} is developed from the Gaussians 0i in (1.1) using Pk = <t>k - ctkiPk-i - ak2Pk-2 - dkaPk-s, k = -l,...,n + l, (2.1) where 013 = ao3 = ^02 = a-1,3 = a-1,2 = 0,-1,1 =0. Now we define coefficients 6^;^, for r = 0,..., A; -|-1 and A; = -l,...,n-l-l,as the inner products bkr = {(t>k,<Pk-r) = / (pkix)(pk-r{x)dx, (2.2) Jlk.r- where Ik,r is the common support of ^^ and ^k-r and normalising constants rik are the squared norms nk = \\Pkf = {Pk,Pk), (2.3) where (•,•) is the inner product (2.2) and || • || is the corresponding norm. Then, setting (Pfe, Pfe_r) == 0 for r = 1,2,3 gives {(pk,Pk-r) =akrnk-rTaking the inner product of (2.1) with itself gives (2-4) J. Mason and A. Crampton 238 nk = bko + 'Y^[-2akr{(j)k,Pk-r) + alr'"-k-T], (2.5) which, by using (2.4), gives 3 , nfc = 6fco-5Z«L"A:-r■ (2.6) r=l This is the first basic equation for writing {n^.} in terms of {aA,-} and {b^.r}. Now, using (2.1), with k replaced by fc - 1, fc - 2, fc - 3 we obtain {(l>k,Pk-3) =bk3 = ak3nk-3 (2.7) {(j)k, -Pfc-2) = &fc2 - a/c-2,1 (<f>k,Pk-3) ■ Hence afc2"fc-2 = &fc2 - afc-2,l^A-3- (2.8) Finally {^k,Pk-l) = h\ - afc-1,1 {(l>k,Pk-2) - Ofc-1,2 {(l)k,Pk-3) so that akink-i = bki-ak-i,i{ak2nk-2)-ak-i,2bk3- (2.9) Equations (2.6), (2.7), (2.8) and (2.9) may be solved to determine all the required coefficients {ukr} and {uk} expHcitly by substitution, starting from n_i = ||0-i|p. This involves 0{n) operations for n + 3 basis functions. The best approximation to a fimction / (either continuous / = f{x) or discrete / = (/i,...,/m)^ ) by orthogonalised Gaussians may be determined explicitly as n+l where cj = {Pj,P^)-'if, Pj) = {njr'if, Pj)2.1 Numerical example Here we use the procedure for constructing orthogonalised Gaussians to produce an interpolant to data obtained from a fast response oscilloscope^ To the left of Figure 1 we see the first three orthogonalised Gaussian functions, with centres specified at the integers -1,0 and 1, with support growing from left to right. The figure on the right shows the oscilloscope data ** and the fitted o-Gaussian interpolant —. 1 Oscilloscope data supplied by Centre for Electromagnetic and Time Metrology, National Physical Laboratory, London, UK. Orthogonalisation procedures 239 In this example we use 512 centres and choose A = 2.5 in (1.1). Since our choice for A requires only four decimal place accuracy, the normal equations produce the usual identity matrix and the coefficient vector {c_i,..., c„+i} can then be determined by the equations c = A^f where / = {/i, • ■ •, /m} and Aij = Pj{xi). The fit is extremely good and vindicates the neglecting of the Gaussians outside the interval considered. 100 FIG. 2.2 zoo 30O 40) 1. First three orthogonalised beisis functions and o-Gaussian fit to oscilloscope data. Extensions to orthogonalised Gaussians The following extensions are clearly possible. 3 (i) Use of generally placed centres (knots) and/or a discrete inner product. (ii) Use of higher dimensions - as in Anderson et al [2]. (iii) Replacement of interval (-co, oo) in a continuous norm by [0, n] and [0, n] by [0,1] using scaling. (iv) Consideration of a function with wider (approximate) support, such as [—3,3] or more generally [—r, r] for r > 2. Chebyshev polynomials in two-dimensional collocation The (first kind) Chebyshev polynomial T,(a;) of degree i is defined by Ti{x) = cos iO 0,... ,m, -1 < a; < 1, (3.1) where x = cos 0 and 0 < ^ < TT. Among its many properties is the discrete orthogonaUty property m f 0 Y,Ti{xk)Tj{xk) = \ m fe=i \m where Xk are the m zeros of Tm{x), namely ioi for for i ^ j; i,j <m—1 i=i=0 t = j ^ 0, (3.2) 240 J. Mason and A. Crampton Xk = cos(^^——j, k = l,...,m. (3.3) The orthogonality property of (3.2) is not a unique one amongst the Chebyshev polynomials of four kinds. Indeed, Mason and Venturino [5] showed that there are at least fourteen such formulae, depending on alternative weights, choices of Chebyshev-related abscissae and kinds of Chebyshev polynomial. 3.1 The elliptic problem — mixed methods Let us now exploit this property (3.2) in a pseudo-spectral method for a hnear elhptic PDE problem on a square. The PDE Lu = f{x,y), |a;|,M<l, (3.4) subject to u = 9{x,y), (3.5) where g{x, y) is a function known explicitly only on x = ±1 and y = ±1, can be solved approximately in the form m i=0 n j=0 where a dashed summation denotes that the first term in a sum is halved. To obtain equations for aij, we solve Lum.n = f, Umn =9) Umn = 9, at the (m-l) X {n~l) zeros oiTm-i{x)Tn-iiy), on X = ±1 at zeros of Tn{y) (2n equations), on y = ±1 at zeros of Tm{x) (2m equations). (3.7) (3.8) (3.9) Together (3.7)-(3.9) form (m -t-1) x (n -f-1) equations for {atj}. However, we claim that the included equations (3.8), (3.9) are singular of joint rank 2m -f 2n - 1. If this is so, then the system is singular without consideration of the PDE collocation equations (3.7). The equations (3.8), (3.9) become m n m 5fc,±i = E' E' ^iM^k)Ti{±l\ i=0 g±u = E' E' aijTi{±l)Tj{ye), i=0 j=0 n j=0 where Xk,yi are zeros of Tm{x),Tn{y) respectively and where 5i/ = 5(i.yc)9k,i = 9{xk,'^), 5-i,< = 5(-i,y«)i gk,-i = g{xk,-l)- (3.10) 241 Orthogonalisation procedures If we add/subtract the first pair and also the second pair of equations in (3.10), noting that T,(i) = i, T,-(-i) = (-l)^ we deduce that m m n 4°^ = ^'^'^iyTi(a:fc), t=0 4'^ = E!' Z]' ("iM^k), k = l,...,m, j=0 {j even) m i=0 •,=0 (jodd) m n n 4"=>;'>;' n aijTjiye), e^^^ = }^' >7 a,,-T;(w), ^ = l,...,n, (3.11) (3.12) j=0 j=o (iodd) i=0 j=o {i even) where, 4' = 5(fffc,i + c/fc,-i), 4^^ = UdkA -9k-i), ef e'j^'^ = ^{gi,e - g-i,i). = UdU + 9-i,e), Multiplying (3.11) by 2Tr{xk)/{m + 1) and summing over k, and multiplying; (3.12) by 2Ts{ye)l{n + 1) and summing over i, discrete orthogonality (3.2) gives p(0) -\^/ „ = b^Z' R^^l^^arj = bi% j=0 {j even) r = 0,...,m-l. (3.13) s-O (3.14) j=r (jodd) m m -M r^i) =W. -c^i) i=0 (iodd) j=0 (i even) m-l , where ,,(0) _ ^+1 - m '^+1 - n fc=l /s=l |-fi:4»ir.to) c2. = ^E4"T.(»). This constitutes a greatly simplified system to replace (3.10). Indeed we may verify that, for m = n, m—l m—1 4=0 {m — i odd) i=0 {m — i odd) E'^a^ = E'e. (3.15) 1 242 J. Mason and A. Crampton where i = 0,1 for m = odd, even, respectively, and hence that the equations (3.13) and (3.14) are singular. For example, for m {= n) = 2, we seek equations in AQO, ..., 022, and (3.13) gives j?(o) = i„__ + ^ ao2. „_. nf^ = |aoo R(0) -"■2 _ 1 , = 2^*10 + ^^12, p(i) = „„. M"=«oi, ti2 = an, (3.16) meanwhile (3.14) gives cl'" = iaoo + a20, C2 faoi +021, Cf'' = iaoi+a2i, C^i^ = am, (3.17) ^2' = = '^lla\\. *-^2 Clearly R^ =0^ , consistent with (3.15) for m = 2. Which equation do we eliminate? For simplicity, in the case of m even, we delete the equation for C^ and replace it by the equation for i?,„+i. It is easy to verify that, within the system (3.13) and (3.14), this leads to full rank, and i?,„^i is equivalent to boundary specifications of either of u(0,l) + u(0,-l), (3.18) «(1,1) + u(-l, 1) + u(l,-1) + u(-l,-1). For m = n = 2, this is equivalent to 4°^ = ^020+ 022. (3.19) In the case when m is odd, we delete the equation for C\ ' and replace it by the equation for C^^^, the latter being equivalent to adding four boundary point conditions anti-symmetrically, i.e., u(l,l)-u(-l,l) + u(-l,-l)-u(l,-l). (3.20) If g{x, y) is known everywhere in the square, then we could of course consider replacing a mixed collocation problem by an interior collocation problem by including the boundary conditions automatically in the form of approximations. For example, we could replace the form (3.6) by m-2 n-2 «mn = (a;2-l)(?/'-l)5]' 5]'«i,Ti(x)T^(y) + ff(.T,j/), or by an alternative form such as m n i=0 j=0 (3.21) Orthogonalisation procedures where Tj = To{x) or Ti{x) according as i is even or odd. These forms have the disadvantage of being difficult to generalise to other kinds of (non-rectangular) boundaries, although (3.21) is adaptable to the case where an equation of the boundary is known (see Mason [3]). The best Chebyshev method available for the Poissoh problem oil a rectangle is probably a "differentiation matrix" method, such as is described in Trefethen [6], which represents the solution by nodal values rather than Chebyshev coefficients. Acknowledgement: We thank the referees for their perceptive remarks. Bibliography 1. T. N. T. Goodman, C. A. Micchelli, G. Rodriguez and S. Seatzu, On the Cholesky factorization of the Gram matrix of locally supported functions, BIT 35(2), 1995, 233-257. 2. I. J. Anderson, J. C. Mason, G. Rodriguez and S. Seatzu, Training radial basis function networks using separable and orthogonalised Gaussians, in Mathematics of Neural Networks, S. W. EUacot, J. C. Mason and I. J. Anderson (eds), Kluwer, 1997, 265-269. 3. J. C. Mason, Chebyshev polynomial approximations for the L-membrane eigenvalue problem, in 5/ylM J. o/^ppZ. Mafft 15 (1967), 172-186. 4. J. C. Mason, G. Rodriguez and S. Seatzu, Orthogonal splines based on B-spUnes with applications to least squares, smoothing and regularisation problems, in Numerical Algorithms 5 (1993), 25-4:0. 5. J. C. Mason and E. Venturino, Integration methods of Clenshaw- Curtis type based on four kinds of Chebyshev polynomials, in Multivariate Approximation and Splines, G. Nuernberger, J. W. Schmidt and G. Walz (eds), Birkhauser, Basel, 1997, 158165. 6. h.'H.Tneieth.en, Spectral Methods in MATLAB,^lAM,2mO. 243 Geometric knot selection for radial scattered data approximation Rossana Morandi and Alessandra Sestini Dipartimento di Energetica, Universita di Firenze, IT. morandi@de.unifi.it, sestini@de.unifi.it Abstract Scattered exact and non-exact data are approximated by means of radial basis functions with compact support and the related knot selection is based on the information given by the discrete Gaussian curvature defined on a data triangulation. In case of non-exact data, a strategy to obtain a sign-reliable estimate of its distribution is given extending an approach already studied by the authors for non-exact 2D data. 1 Introduction It is well known that, for any interpolation/approximation scheme, data shape preservation is often a desirable quality and, as a consequence, the determination of some criteria to establish the data shape is a very important topic. For this purpose, the use of the discrete curvature in case of exact 2D data is a standard approach. On the other hand, in case of non-exact data, the proposal in [6] allows the determination of a reasonable and sign-reliable discrete curvature estimate if the maximum data error is a priori given. In recent literature, interesting formulas have been introduced [3, 4] for defining the discrete Gaussian curvature when scattered 3D exact data are given and a related triangulation is assigned. Starting from these formulas, the approach considered in [6] is extended to the case of 3D scattered non-exact data in order to define a reasonable and sign-reliable estimate of the Gaussian curvature at the data points thereby obtaining important shape information. Thus we get some suggestions for determining the supports of the local radial basis functions [8] used in the approximation scheme together with the number, the position and the multiplicity of the related knots. The result is a good approximating surface (in particular with respect to its shape) with a high data reduction [2, 7]. The outline of the paper is as follows. In Section 2 the discrete Gaussian curvature is defined and an inequality is given to check its sign-reliability in case of non-exact data. In Section 3 the approximation scheme is presented and the knot selection strategy is given. Finally, in Section 4 some numerical results are presented to illustrate the features of the proposed approach. 2 Information about the shape In this section, following the approach presented in [3, 4], we define the discrete Gaussian curvature (dGc) to obtain information about the shape suggested by the data. For this 244 245 Geometric knot selection purpose, we need the following notation • p^y := {Xj = (xj,yj),j = 1,...,N} C IR^ is the set of the assigned distinct vertices on the xy-plane; • V := {Pj = {Xj,Zj),j = 1,..., A^} C E,^ is the data set, with Zj = /(Xj); • T := {Ij G IN^, 1 < lkj,< N,k = 1,2,3, j = 1,..., T} is a given triangulation of rxy Thus, for any Xj e Vxy not belonging to the boundary of the convex hull of Vxy we can define the integral Gaussian curvature with respect to a related area Sj, [3] n,(J) Kr.= 2^-Y:»\^\ fc=l where the angles a]^ ,k = 1,..., n^ are as follows M Jj) a iJ) ^i^k i^fc+l)' M) .. riJ) ■~^k ^k "j' k — 1,. . . ,nj, U) ._ ^(J) ^„j+i :=e' rU) C P is the set of ordered neighboring points of Pj given by the rU) ,..., Vn/} and {Vj assigned triangulation. To derive the curvature at the vertex Pj from the above integral value, we normalize by the Voronoi area Sj [4] ""f- Sj' (2.1) If Xj is on the boundary of the convex hull of Vxy, some auxiliary suitable "phantom" points should be defined in order to obtain a reliable estimate of the Gaussian curvature from (2.1). 0 0.1 FIG. 0.2 03 0.4 0.5 0.6 0.7 0.8 0.9 1 1. The triangulation (left) and the discrete Gaussian curvature (right). Shown on the left of Figure 1 is the Delaunay triangulation related to a set Vxy of 441 scattered vertices in the unit square and shown on the right is the discrete Gaussian curvature distribution related to the Pranke function sampled on Vxy ,246 Rossana Morandi and Alessandra Sestini In case of non-exact data, we need to check the sign-rehability of Kj for deriving some useful information about the shape suggested by the data. For this purpose, we use the theorem below, where , pU) ._ _ffc ^fc + i r. _ 1 'V n, „ . X (2.2) Remark 2.1 fCj is an approximation of Kj obtained by replacing the angle a]^ with 2(1 — Cj. ),fc = 1,... ,nj. Theorem 2.2 Let Pj G ]R^,J = 1,..., A'' be assigned distinct non-exact data points and let e be a positive quantity such that \Pj — Pj\<e,j = l,...,N, where P^ is the (unknown) exact data point corresponding toPj. If e is sufficiently small and fc=i \^k I then iCjiCj > 0, where /C| is defined as Kj using the exact data points. Proof: Let us consider a point Pj and its neighboring points {Vj ,..., V„j'} C V and let us write the corresponding (unknown) exact points as follows P'j :=Pj-eoWo, with 0 < eo,ei,...,e„^ < e and |wo| = |wi| = ••• = |w„J = 1. So, if e is sufficiently small, we can define the non-zero vectors r,Uh ._ and we have Y(J)O _ pe file (i) ^k =e^ -e/tWfc-t-eoWQ. Thus, if {j)e ._ ^k ^k+1 . -0)e I \c^J)f-1 ' Cr. using a first order Taylor approximation, we obtain \^k I I^A:+1I Geometric knot selection 247 where (2.4) Thus, we can write •^ fc=i \ l^fc ll^fe+il So, if e is sufficiently small, ICjK.^ > 0 if ^ fc=i \ i^fe ii^fe+ii/ and this is true if -^E l^.ll^?'l + -J^ >-4/5. ' ^' fc=l \ F& iFfe+ll/ (2.5) Now, from (2.4) it is easy to verify that \Ak\ < 2e(|e^^'^|-i + \e^,^li\'^) and \Bk\ < "^^i^k \~^~'^\^k+i\^^)\^k \\^k+i\- Using these inequalities, after alittle algebra, we obtain that, if e is sufficiently small, (2.3) implies (2.5). □ If e is an assigned small positive quantity such that |Pj—P|| < e, j = 1,... ,N,ii (2.3) holds we use (2.1) to define Kj because we consider it sign-reliable. Otherwise, we try to get information about the sign of the Gaussian curvature at the point Pj, repeating the check after substituting the neighboring points of Pj with other new suitable rij points. In particular, these are chosen among the neighbors of all the V^,''^ ,k = 1,.. .,nj and they are uniformly spaced as much as possible with respect to the azimuth (defined relating to Pj). If after this substitution (2.3) holds the new neighboring points are used to define Kj through (2.1), otherwise this strategy is repeated until we consider that the new neighbors are too far from Pj. In the last case, we put the curvature value equal to 0. 3 ■ Knot selection in radial approximation Let 4> '■ lR>o -+ R, be a compactly supported radial basis function. We approximate the given data by the surface where the set of knots {X.^,1 = 1, • • • j M} C Vxy and the set of positive ^-parameters {Si,l = 1,..., M} are previously chosen. The coefficients UQ,. .., CLM are determined minimizing Z^,=i(2^j —-z(Xj))^. The knot number and their positions are selected considering 248 Rossana Morandi and Alessandra Sestini the information given by the discrete Gaussian curvature distribution as defined in the previous section. Inspired by the algorithm proposed in [6], the strategy for the Xf and 5i,l = l,...,M choice can be summarized as follows: • an input tolerance tola is given; • a first set of distinct knots {Xf,/ = l,...,Mo} C Vxy with Mo < M is chosen. This is done selecting the areas where the absolute value of the discrete Gaussian curvature is greater than tola- A knot is located in the middle of an area if the sign of the related curvature is positive. In case of negative curvature, four knots are located near the boundary of the area also taking into consideration the suggestions given by the data distribution; • initial values for the 5-parameters 5;, / = 1,..., Mo are determined considering the knot separation distance; • the final set of knots is defined by possibly increasing the multiplicity of the previously selected knots. In this case, the (5-parameters associated to the same knot must be diff'erent. Remark 3.1 We observe that, to he sure that the least squares problem has a unique solution, it should be proved that the related collocation matrix is of full rank and this is clearly equivalent to the uniqueness of the corresponding interpolation problem (the only result we know about uniqueness of the radial interpolant defined with different scales is given in a submitted paper [1] where interesting sufficient conditions are given). However, we believe that the least squares problem is much more robust than the corresponding interpolation problem and in all the numerical experiments we have never had problem.s related to the rank of the collocation matrix (see also [5, 7j). 4 Numerical results In this section we use the compactly supported radial basis function [8] <^(r):=(l-r)^(l + 3r) for checking the features of the proposed approach on two test functions. The first is the well known Franke function and the second is the function 2(X) = 0.35(sin(27r3;) + sin(27ry)), X € [0,1]^. For both tests, N = 441 data points are considered. The exact data are obtained by evaluating the functions at the vertices represented on the left of Figure 1. The corresponding non-exact data are defined adding a random noise to the exact values. In particular, in the first test we have used e = 0.07 and in the second we have used e = 0.08, in [0,0.5]^ U [0.5,1]^ and e = 0.008, otherwise. The related discrete Gaussian curvature (dGc) distributions computed with the strategy sketched at the end of Section 2 are reported in Figure 2. Figures 3 and 4 relate to the first test with exact and non-exact data, respectively. The distinct knots are X^ = (0.207,0.205), X^ = (0.449,0.797),XJ = (0.756,0.349) and each of them is repeated three times with three diff'erent (^-parameter values, 0.6,0.4,0.3. The mean error Jj^f^iizj - z{Xj))'^/N is about 0.016 in Figure 3 and 0.025 in Figure Geometric knot selection FIG. 2. dGc for the first (left) and second (right) set of non-exact data. FIG. 3. The parent Franke surface (left) and its approximation (right). 4 (it was about 1/3 using only 3 distinct knots with all the (5-parameters equal to 0.6). Figures 5 and 6 relate to the second test. The distinct knots are (0.258,0.238), (0.749,0.737), (0.950,0.264), (0.700,0.264), (0.756,0.050), (0.756,0.300), (0.050,0.751), (0.300,0.751), (0.264,0.700), (0.264,0.950). The related 5-parameters are 0.8,0.8,0.6,0.4,0.6,0.4,0.6, 0.4,0.4,0.6. The mean error is about 0.020 in Figure 5 and 0.026 in Figure 6. Acknowledgments: The authors would like to thank the referees for their useful comments. Bibliography 1. M. Bozzini, L. Lenarduzzi, M. Rossini and R. Schaback, Interpolation by basis functions of different scales and shapes, submitted to Adv. Comp. Math., available at http://www.num.math.uni-goettingen.de/schaback/. 249 Rossana Morandi and Alessandra Sestini 250 FIG. 4. The non-exact set of data (left) and its approximation (right). FIG. 5. The parent surface (left) and its approximation (right). 2. C. Conti, R. Morandi, C. Rabut and A. Sestini, Cubic spline data reduction: choosing the knots from a third derivative criterium, to appear in Numerical Algorithms. 3. R. van Damme and L. Alboul, Tight triangulations. Mathematical Methods for Curves and Surfaces, M. Daehlen, T. Lyche and L.L.Schumaker (eds), Vanderbilt University Press, 1995, 517-526. 4. N. Dyn, K. Hormann, S. J. Kim and D. Levin, Optimizing 3D triangulations using discrete curvature analysis. Mathematical Methods for Curves and Surfaces: Oslo 2000, T. Lyche and L. L. Schumaker (eds), Vanderbilt University Press, 2001, 135146. 5. R. Franke, H. Hagen and G. Nielson, Least squares surface approximation to scattered data using multiquadric functions, Adv. Comp. Math. 2 (1994), 81-99. 6. R. Morandi, D. Scaramelli and A. Sestini, A geometric approach for knot selection in convexity-preserving spline approximation, Curve and Surface Design, P. J. Laurent, P. Sablonniere and L.L. Schumaker (eds), Vanderbilt University Press, 2000, 287296. Geometric knot selection FIG. 6. The non-exact set of data (left) and its approximation (right). 7. R. Morandi and A. Sestini, Data reduction in surface approximation, Mathematical Methods for Curves and Surfaces: Oslo 2000, T. Lyche and L. L. Schumaker (eds), Vanderbilt University Press, 2001, 315-324. 8. H. Wendland, Piecewise polynomial, positive definite and compactly supported radial functions of minimal degree, Adv. Comp. Math. 4 (1995), 389-396. 251 On the boundary over distance preconditioner for radial basis function interpolation C. T. Mouat and R. K. Beatson Dept. of Mathematics and Statistics, Univ. of Canterbury, Christchurch, New Zealand. cam@mouat.net, R.BeatsonSmath.canterbury.ac.nz Abstract In this paper we consider the boundary over distance preconditioner for radial basis function interpolation problems. We give both theoretical and numerical results indicating that it performs extremely well. 1 Introduction Let $ : TZ"^ -^TZ, X = {xi,... ,XN} be a set of N distinct points in T?."^ and / be a real valued function which we can evaluate at least at the x,:'s. Define ^'"^ \ where E.ti ^jQi^'j) = 0, for &\\ q € irf J ' ^ ' We consider the problem of finding an element s of 5$,x + 7rf satisfying the interpolation conditions s{xi) == fixi), for all Xi e X. (1.2) Assume # is strictly conditionally positive definite of order 2 (SCPD2) and X is unisolvent for rrf. Then there is a unique element of S^^x + ^^f satisfying the interpolation conditions (1.2). This setting includes popular choices of the basic function such as the thin-plate spline, $(•) = | ■ |^log| • |, and minus the ordinary multiquadric, #(•) = -\/| • p + c^- In this paper we consider various ways of formulating the interpolation problem, showing in particular that a certain inexpensive change of basis can dramatically improve its conditioning. The usual way to formulate this problem is in terms of the functions {$(• — rcj)} and some basis {po,Pi,. ■ ■ ,Pd} for irf. Then the interpolation conditions together with the side conditions taking away the extra degrees of freedom introduced by the polynomial part can be written as AX + Pc = f and P^A = 0, (1.3) where Aij = ^{xi — Xj), Pij = Pj{xi), and / = [fixi),..., f{xN)Y. It is well known [3, 4, 5] that the matrix ' A P (1.4) P^ O 252 Preconditioning RBF interpolation 253 of this usual formulation is frequently badly conditioned, even when the number of nodes is small. Indeed many authors have commented on the numerical difficulties that solving this system presents [3, 4, 5]. Results of Narcowich and Ward show that conditioning of the system (1.4) depends very heavily on the geometry of the nodes. However, frequently in numerical analysis a change of basis, or other reformulation, can make a highly intractable problem tractable. Hence, our goal is to find an inexpensive but highly effective preconditioner for RBF interpolation systems. In this paper we establish properties of a preconditioning method for the RBF interpolation equations which was first presented in Sibson and Stone [5]. In the following section we give a detailed account of the preconditioning method. In Section 3 we prove that the construction produces an iV x (A'^—3) matrix Q whose columns are orthogonal to P, and which is of full rank whenever the nodes X are unisolvent for Trf. Finally, Section 4 contains numerical results for different SCPD2 basic functions over a range of data sets and scales. These numerical results show that using this inexpensive 0{N log N) flop preconditioner and variants of it, dramatically improves the conditioning of RBF interpolation problems. See Figure 1 below. 10 10 10 10 Sorted trial number (a) Multiquadric basic function. 10 10 10 10 Sorted trial number (b) Thin-plate spline basic function. FIG. i. Sorted 2-norm condition numbers of the unpreconditioned matrices, A^, (top) and of the preconditioned matrices, S, (bottom) for fifty thousand random data sets of size one hundred. 2 A preconditioning method A general approach to preconditioning interpolation problems with SCPD2 basic functions in Ti^ [1, 5] is to choose Q as any Nx(N — 3) matrix whose columns are orthogonal 254 Mouat and Beatson to P and has rank N — Z. Letting A = Qfi and premultiplying (1.3) by Q^ gives the new system to be solved for ^i, or equivalently A, Bfi = Q'^f where B = Q'^AQ. (2.1) The three polynomial coefficients can then be found by a small subsidiary calculation. In this section we present the boundary over distance method of Sibson and Stone [5] for constructing the matrix Q. We will prove in the subsequent section that Q has full rank and is orthogonal to P for any set of distinct nodes X = {XI,...,XN} C TZ'^, which are unisolvent for irf. These properties of Q are well known (see e.g. [1, 5]) to imply that the matrix of the preconditioned system B = Q^AQ is positive definite. The construction of Q is appealing in that for "interior" points Xj of X it is local. That is, for such points the entries in the j-th column of Q depend only on the geometry of the nodes near Xj and not on any properties of nodes far away. Choose a closed bounded convex polygonal region W of TZ'^ such that X C W. Suppose without loss of generality that {xi\f-2,XN-i,XN} is unisolvent for wf. We will refer to these points as special points. They are generally chosen so that they are well spread throughout W. In our experience, and that of Sibson and Stone, for typical data sets the choice of special points is not at all critical, as long as the triangle they define has largish area. However, for contrived data sets, such as all but a very few points on a straight line, the choice of special points becomes important. In these cases we have observed that bad choices of special points can lead to large condition numbers. However, the strategy of choosing the three special points to maximise the area of the corresponding triangle has always led to small condition numbers. The region W is divided into panels by intersecting a Voronoi diagram of the points of X with the region W. We denote this panelling of W by N VwiX) = [jVi i=l , ■ where Vi is the Voronoi panel about the ith centre and is defined by Vi = {x e W : \x - Xi\ < \x - Xj\, for all 1 < j < A'' with j ^ i\. Recall that the locus of points equidistant from two fixed points is the perpendicular bisector of the segment connecting the points. It follows that each Voronoi region is polygonal. Associated with a panel V, are its edges. These are a finite number of distinct closed line segments of non-zero length. They are the boundaries between different Voronoi panels, or between a Voronoi panel and W^. The collection of all edges of all the Voronoi panels will be denoted by (S. Definition 2.1 Two polygonal regions ofTZ^ will he said to be strongly contiguous if they have a common boundary of non-zero length. Definition 2.2 a sequence Two Voronoi regions Vi and Vj will be said to be C-related if there is {Vi,Ve„Ve,,...,Ve„„Vj}, I <i,j,ei,...Jm < N-3, Preconditioning RBF interpolation 255 in which all adjacent pairs are strongly contiguous. Loosely speaking Vt and Vj are C-related if they are connected by a chain of strongly contiguous pairs. C-related is an equivalence relation on the set {V^}^'^^ of Voronoi regions of non-special points. Therefore it breaks this set into a finite number of nonempty equivalence classes {Qi : 1 <l < k}. Lemma 2.3 Let Qi he any of the equivalence classes above. Then there is at least one Voronoi region Vi in Ge which is strongly contiguous to either W^ or one of Proof: Consider T= \^ Vi . i:Vi€ge This union is a closed bounded connected polygonal set whose boundary can be written as the union of some of the hne segments from £. Recall in particular that all these hne segments have non-zero length. Pick one line segment < a,b > from the boundary of T. Since it forms part of the boundary of T on one side of it Ues a Voronoi region Vj from Qe. On the other side hes either W"^ or another Voronoi region Vj. In the first case the Lemma is proven. Consider the second case. If 1 < j < A'' — 3 then Vi is strongly contiguous to T^. Consequently, Vj eGe- This contradicts < o, 6 > being on the boundary of T. Hence, N — 2 < j < N and the Lemma follows. □ We now detail the construction of the N x{N — 3) matrix Q using boundary over distance weights. Note that because most elements of Q are zero sparse storage of Q requires only 0{N) memory. A non-special point from {xi : 1 <i < N — 3} which has Voronoi tile that is strongly contiguous to W'-' will be called a Voronoi external point. Define VE{X) as the set of indices of all Voronoi external points. All other points are referred to as Voronoi internal points. The corresponding indices are Vi{X) = {1, —, A^ — 3} — VE{X). We first consider forming a column of Q for an index, j, such that j £ Vj{X). In this case the panel Vj shares non-trivial edges only with other Voronoi panels and not with W'^. The column is formed using boundary over distance weights, found from the Voronoi diagram. For j € Vi{X) the boundary over distance weight r^j is OyXij XjJ for all Vi strongly contiguous to V^-, (2.2) where b{xi,Xj) is the length of the boundary between Vi and Vj. For other values of i ^ j, rij is set to zero. In order that column j of Q is orthogonal to constants the diagonal element rjj is specified as Finally, the jth column of R is scaled by dividing by the area of Vj to obtain the jth column of Q. Note that the column is by construction diagonally dominant, but not strictly so. If j € VE{X) then Vj is stongly contiguous to the complement of W, W^. The boundary segment corresponds to a Voronoi edge between Xj and an artificial point, the reflection of Xj in the boundary (see Figure 3 in [7]). The reflected point, Xj, can be Mount and Beatson 256 written as a linear combination of the special points, i.e., Xj - XNXN + >^N-1XN-1 +^N-2XN-2, (2.4) where X^ + AAT-I + XN^2 = 1- If Vj has k edges with W^ then k reflected points {xp... ,Xj} are required. Associated with each reflected point, ;r|, are the coefficients {A^, A^_j, A^_2}. The boundary over distance weights for Xj are partitioned amongst the special points to obtain for all j G VE{X) and i 7^ j Vi strongly contiguous to Vj, |Xi-Xj| ' ij ie{N:N-l,N-2}. EU>^1'^, (2.5) Of course, Vj could be strongly contiguous with a Voronoi panel associated with a special point. If this is the case r^j b(x. ji^ +Xti ^i§fe2l- Again, for other values of i ^ j, rij is set to zero. Finally rjj is specified as in (2.3) and column j of Q is defined as column j of R scaled by dividing by the area oiVj. Partition Q as Q= p , (2.6) where E is {N - S) x {N - 3). Thus E results from interactions between non-special points, and F those between special and non-special points. Note in the construction above that for 1 < i, j < iV - 3, etj is non-zero if and only if Vi is strongly contiguous to V^-. Furthermore, note that E is necessarily column diagonally dominant, with strict dominance in column j whenever Vj is strongly contiguous to the Voronoi region of a special point, or to W'-'. Relabelling if necessary we can assume the indices of the Voronoi regions in each of the equivalence classes Qi form a contiguous subset of {1,..., A^ - 3}. Similarly, we can also assume that the indices corresponding to any Qt precede those corresponding to Qt+i. Furthermore, by construction Hi ^ j none of the regions in Qi is strongly contiguous with a region in Gj- Thus, corresponding entries in the matrix E constructed using boundary over distance weights and artificial points are zero. That is E is block diagonal with the square matrix En on the main diagonal corresponding to the equivalence class of Voronoi regions Gi. More precisely, Q will have form En 0 Q= ■ ■■ ■ ■■ 0 0 : : •.. : 0 0 F2 ■■■ ■ ■■ Ekk Fk Fi 3 0 E22 . (2.7) Properties of the matrix Q In this section we establish the fundamental properties of the matrix Q of (2.7). Namely that it is of full rank and that its columns are orthogonal to those of P. Preconditioning RBF interpolation 257 Definition 3.1 For m > 2, an m x m matrix K is irreducible if there does not exist anmx m permutation matrix P such that PKP'^ = Mil 0 Mi2 M22 where Mn is r x r, M22 is {m — r) x {m — r), and 1 <r < m. The following result is well known, see for example Varga [6]. Theorem 3.2 Suppose the square matrix K is irreducible and row (column) diagonally dominant with strict row (column) diagonal dominance in at least one row (column). Then K is invertible. Lemma 3.3 Let X be a finite set of distinct points unisolvent for Trf. Let En be one of the square blocks from the diagonal of Q constructed in the previous section. Then En is invertible. Proof: Prom the construction En is column diagonally dominant. Furthermore, by Lemma 2.3 the diagonal dominance is strict for at least one column of Eu. Prom the definition of the equivalence relation C-related there is a chain of strongly contiguous pairs of Voronoi regions, connecting any two Voronoi regions in Qi. This implies the corresponding entries in En are non-zero and hence from [6] Theorem L6 En is irreducible. It follows from Theorem 3.2 that En is invertible. D Theorem 3.4 The matrix Q described in Section 2 is orthogonal to P i.e. Q^P = O. Proof: Omitted, see [2] and [7]. n Theorem 3.5 Let X be a set of distinct points unisolvent for nf. Let Q be formed by the construction in Section 2 and Aij = i>(a;j — Xj) where $ is strictly conditionally positive definite of order 1. Then B = Q^AQ is positive definite. Proof: Prom Lemma 3.3 each of the matrices En occurring in the block partitioning of Q given in Equation (2.7) is invertible. Hence Q has full rank. Also from Theorem 3.4 the columns of Q are orthogonal to the columns of P. Let ^ be any non-zero vector in TZ'^~^, and define A = Qfx. Then A ^ 0, P^A = P'^Qfi = 0, and fi^Bfi = ii^Q'^AQn = X'^AX. Hence, by the definition of strictly conditionally positive definite, fx^Bjj, > 0 whenever /i 7^ 0 and B is symmetric positive definite. D Theorem 3.6 Let # be strictly conditionally positive definite of order 2 and such that <^{hx,hy) = h'^^{x,y)+ph{x— y), h>0 withph S TTI. The preconditioned matrix Bh, which corresponds to preconditioning on the point set hX, is a homogeneous function of scale. Thus its condition number and the relative clustering of its eigenvalues are the same over all scales. Proof: Omitted, see [7].. D Theorem 3.6 applies in particular to the usual thin-plate spline, $(■) = | • |^log | • |, in The extended version of this paper [7] contains a proof that the elements By decay like \xi — Xj\~'^ when \xi — Xj\ is large. Por the multiquadric K is three and for the thin-plate spline K is two. Mount and Beatson 258 Definition 3.7 , The preconditioned matrix S is obtained from B by pre-multiplying and post-multiplying B by the diagonal matrix D with ii entry l/y/bu- 4 Numerical results In this section we present numerical results for the thin-plate spline and multiquadric basic functions. In the following tables the matrix J4$ is defined in (1.4), B in (2.1), 5 in Definition 3.7 and the homogeneous matrix, C, is presented in [1]. In Table 1 we show 2-norm condition numbers of matrices for the various preconditioning techniques over seven different scales. It is clear that the algorithm in Section 2 gives a matrix which dramatically improves the conditioning of the interpolation problem. In one case by a factor of 10^**! Tables 2 and 3 contain condition numbers of the matrices resulting from applying the preconditioning techniques of this paper for the thin-plate spline and multiquadric basic functions. For N < 3200, the entries in the tables are the maximum over one hundred random point sets of size A^. For N = 3200, the tables contain the maximum over twenty random point sets of size 3200. In all cases the preconditioning results in a smaller condition number. For these basic functions the maximum observed condition number of the scaled preconditioned matrix, S, grows very slowly with A^. Certainly there is no numerical evidence of power growth with A''. Scale parameter a 0.001 0.01 0.1 1 10 100 1000 Conventional matrix A,f, 1.531(11) 1.544(9) 1.597(7) 3.107(5) 1.915(6) 1.271(11) 4.006(15) Homogeneous matrix C 1.534(5) 1.534(5) 1.534(5) 1.534(5) 1.534(5) 1.534(5) 1.534(5) Preconditioned matrix B 4.905(1) 4.905(1) 4.905(1) 4.905(1) 4.905(1) 4.905(1) 4.905(1) TAB. 1. Condition numbers for one hundred points in [0, spline. The point set for scale a is Xa = aXi. Q]^ Scaled matrix S 2.405(1) 2.405(1) 2.405(1) 2.405(1) 2.405(1) 2.405(1) 2.405(1) and the thin-plate In an attempt to rule out the possibility that our numerical results were flukes due to the small number of 100 experiments we also conducted 50,000 trials with random data sets of size 100. The results of these trials are shown in Figure 1. The maximum condition number, over all trials with the thin-plate spline, for the matrix A^ was 1.2465(9), for matrix C, 1.5750(9) and for matrix S, 1.8066(2). In our experiments the matrix S is always well conditioned. This held even for geometries of centres for which the matrix A^ is very badly conditioned. To test further the behaviour of 5 for "bad" configurations of points a similar experiment was run with one thousand trials of one hundred points almost on a circle. The maximum condition numbers of the A matrix, C matrix and S matrix were 1.2885(9), 7.2692(8) and 6.6005(2) respectively over 1000 trials. Even though the Voronoi regions Preconditioning RBF interpolation Number of data points 200 400 800 1600 3200 Conventional matrix A^ 6.555(7) 5.675(8) 1.960(10) 1.092(10) 4.997(10) Homogeneous matrix C 3.068(7) 3.397(8) 1.348(10) 8.413(9) 3.783(10) 259 Preconditioned matrix B 1.617(3) 1.945(3) 2.034(3) 8.099(3) 1.261(4) Scaled matrix S 6.028(1) 8.946(1) 9.775(1) 1.258(2) 1.569(2) 2. Maximum condition numbers encountered over a sample of 100 random point sets of size N in [0, 1]^ with the thin-plate spline. TAB. Number of data points 200 400 800 1600 3200 Conventional matrix A^ 2.014(8) 2.045(10) 6.641(10) 1.554(10) 2.477(11) Preconditioned matrix B 1.532(2) 5.932(2) 4.559(2) 7.025(2) 9.362(2) Scaled matrix S 4.224(1) 7.669(1) 5.826(1)' 5.601(1) 6.280(1) TAB. 3. Maximum condition numbers encountered over a sample of 100 random point sets of size N in [0, 1]^ with the multiquadric function, parameter c = I/^/N. are long and thin the matrix S is still well conditioned! Bibliography 1. R. K. Beatson, W. A. Light and S. Billings, Fast solution of the radial basis function interpolation equations: Domain decomposition methods, SIAM Journal on Scientific Computing, 22 (2000), 1717-1740. 2. N. H. Christ, R. Priedberg and T. D. Lee, Weights of hnks and plaquettes in a random lattice. Nuclear Physics B 210 (1982), 337-346. 3. N. Dyn, D. Levin and S. Rippa, Numerical procedures for surface fitting of scattered data by radial functions, SIAM Journal of Scientific and Statistical Computing, 7 (1986), 639-659. 4. P. J. Narcowich and J. D. Ward, Norm estimates for the inverse of a general class of scattered-data radial-function interpolation matrices, Journal of Approximation Theory, 69 (1992), 84-109. 5. R. Sibson and G. Stone, Computation of thin-plate slines, SIAM Journal on Scientific and Statistical Computing, 12 (1991), 1304-1313. 6. R. S. Varga, Matrix Iterative Analysis, Prentice-Hall, New Jersey (1962). 7. C. T. Mouat and R. K. Beatson, Some properties of the boundary over distance preconditioner for radial basis function interpolation, Research report UCDMS 2001/6, Department of Mathematics and Statistics, University of Canterbury, (2001). What are 'good' points for local interpolation by radial basis functions? Robert P. Tong The Numerical Algorithms Group Ltd, Jordan Hill, Oxford, 0X2 SDR, UK. robert.tong@nag.co.uk Andrew Crampton School of Computing and Mathematics, University of Huddersfield, Huddersfield, UK. a.crainpton@hud.ac.uk Anne E. Trefethen The Numerical Algorithms Group Ltd, Jordan Hill, Oxford, 0X2 SDR, UK. anne.trefethen@nag.co.uk Abstract Radial basis function interpolation has an advantage over other methods in that the interpolation matrix is nonsingular under very weak conditions on the location of the interpolation points. However, we show that point location can have a significant effect on the performance of an approximation in certain cases. Specifically, we consider multiquadric and thin plate spline interpolation to small data sets where derivative estimates are required. Approximations of this type are important in the motion of unsteady interfaces in fluid dynamics. For data points in the plane, it is shown that interpolation to data on a circle can be related to the polynomial case. For scattered data on the sphere, a comparison is made with the results of Sloan and Womersley. 1 Introduction Radial basis functions (RBFs) such as multiquadrics or thin plate splines have been successfuly used for scattered data approximation in many applications. They have been shown to perform well for data fitting, although problems of ill-conditioning and the computational cost of processing large data sets must be handled carefully. In general, when considering the accuracy of a RBF interpolant, a balance must be achieved between the reduction in fill distance necessary for convergence of the approximation to an assumed underlying function and the need to maximise the separation distance between data points to avoid problems of ill-conditioning [4]. In the present study, we focus on the use of RBF approximation as one stage of a larger algorithm to compute the evolution of an uasteady interface in fluid dynamics. The accuracy of the approximations made in the algorithm and the interaction between its different stages determine whether the output is close to the true solution of the 260 Good points for RBF interpolation 261 governing equations or whether spurious effects are produced. In the three-dimensional setting, a typical example is described by Zinchenko et al. [8] where the deformation of liquid drops in a viscous medium is studied. A critical feature of the algorithm is the approximation of the normal directions and curvatures of the droplet surface defined at a number of discrete points. The focus here is algorithmic rather than theoretical and we investigate the performance of multiquadric and thin plate spline local interpolants applied to the determination of normal directions and curvatures of a smooth, closed surface. Certain configurations of data points, such as points located on a circle, impose constraints on the interpolant. A framework for understanding the behaviour of the RBF interpolants is provided by a comparison with the multiva;riate polyiiomial interpolant of de Boor and Ron [1] and by considering the free parameter in the multiquadric as a tensioning parameter [2]. 2 Approximation method A common approach to solving fluid dynamics problems that include moving interfaces, combines a computational grid with meshless approximation methods. The governing partial differential equations, or corresponding integral equation formulation, are solved on the grid, while quantities characterising the interface are computed as meshless scattered data approximations. Here we examine the behaviour of local RBF approximations in the general context described by Zinchenko et al. [8]. For a given data set, a particular point is selected together with its nearest neighbours giving a set of typically 6 or 7 points. The initial locations of these points may be determined by a regular mesh, but the surface is allowed to deform so that the approximation is essentially to a small set of scattered data. The constructed RBF interpolant, S, can be expressed as ' N K S{x) = Y^aj(l){\\x-Xj\\) + ^biPi{x), with the constraint JV . 2_]ajPi{xj) = 0, ioT 1 <i < K, 3=1 where a; G 3?^ and {pi{x)}i=:-\_:K is a basis for the space of bivariate polynomials of degree < m — 1 with K = m{m + l)/,2. The chosen forms for <j) are the thin plate spline (j){\\x-Xj\\) = \\x-Xj\\'^\og\\x-Xj\\, (TPS) and the multiquadric 4>{\\x-x^\\) = {\\x-x^\\^ + ^)K (MQ) with 11 • 11 taken to be the Euclidean norm. A framework for interpreting the computed results in the context being considered can be derived from [2] where the arbitrary parameter, c, of the MQ function is viewed as a tensioning parameter. As c ^^ oo the MQ interpolant approaches the correspond- Tong et al. 262 ing polynomial interpolant to the given data, while as c -> 0, the MQ surface is tensioned. Multivariate polynomial interpolation can fail on particular point sets and this has provided a motivation for using RBF methods. However, the algorithm of de Boor and Ron [1] provides a reliable means of computing the 'least' polynomial interpolant. This algorithm is used to compute a polynomial fit as one reference point for the interpretation of the MQ interpolants. A second reference point is provided by the TPS interpolant which gives a minimum energy surface in a certain norm. This is shown to correspond closely to the MQ fit for a 'small', but nonzero value of c. The MQ interpolant can thus be shown to connect the minimum energy, tensioned, TPS surface with the polynomial fit to given data as c increases. In a fluid dynamics context a fluid-fluid interface is often assumed to be represented by a C°° function (although cusps may occur requiring a change in the representation). This would suggest that a high degree polynomial would be preferred to a TPS surface. 1. Interpolants to random data at 6 points (-I-) in the plane: (left) polynomial (upper) and multiquadric (c = 10) (lower), contours [0:0.1:2]; (right) thin plate spline (upper) and multiquadric (c = 0.4) (lower), contours [0:0.1:1.1]. FIG. 3 Scattered data in the plane To illustrate the behaviour of local interpolation by MQ and TPS methods, random points in the a;2/-plane (with -1 < Xj,2/,: < 1, for i = 1 : 6) are associated with random data values, fi (-1 < fi < 1). Figure 1 shows, in the upper frames, the two reference Good points for RBF interpolation 263 FIG. 2. Effect of varying the parameter c on multiquadric interpolants to random data in the plane: (left) norms of the difference between multiquadric and polynomial interpolants (upper curve doo = IMIoo, lower curve dg = IMb/v^); (right) curvature {K = 2H) computed at the centroid: — multiquadric; - • — thin plate spline; — polynomial. interpolating surfaces: (left) the polynomial surface computed by the algorithm of [1] and (right) the TPS surface. The lower frames give the contours of the MQ interpolants for c = 10.0 (left) and c = 0.4 (right). There is a close correspondence between the upper and lower frames on each side, but a large difference between the polynomial and TPS surfaces. Figure 2 (left) shows the difference between the MQ surface and the polynomial reference interpolant computed on a regular grid on the interior of the circle with centre at the centroid of the data points (0.44, -0.09) and radius the maximum distance from the centroid to a data point. There is convergence of the MQ surface to the polynomial as 1/c -^ 0, but the condition number of the interpolation matrix increases until the calculation cannot be continued. For c = 10.0 the condition number is 3 x 10^. As an indication of the behaviour of first and second partial derivatives of the interpolating surfaces we calculate the curvature at the centroid of the data points for the polynomial and TPS, together with MQ as c varies, using K = 2H where H is the mean curvature. Figure 2 (right) shows that KMQ for the MQ interpolant coincides with the value KTPS = -0.46 for the TPS when c « 0.4. When c < 0.4, KMQ < I^TPS, while 'tMQ—^ K^p =-9.78, the polynomial curvature, as c increases. An interesting example is presented in [1] of polynomial interpolation for points located at the vertices of a regular hexagon {xuVi) cos 2'Ki ~6~ ,sm 2m ~6~ 1,...,6 (3.1) with data values /, = (-1)'. This gives the interpolant p{x,y)=x'^ -"ixy^. Since the points lie on the unit circle, the quadratic polynomial P2{x,y) = I - x^ - y^ (3.2) Tong et al. 264 vanishes at the data points and this causes difficulties for general polynomial methods. MQ interpolants do not suffer from these difficulties. When c = 10.0, the MQ surface is very close to (3.2). As c becomes smaller, the MQ surface approaches that of the TPS with the data values becoming local maxima or minima as the surface is tensioned. In addition, the restriction of the data points to a circle implies that the interpolating polynomial is harmonic, but the convergence of the approximation is only first order [1]. The MQ surface for large c inherits the properties of the polynomial fit. Thus, points on a circle are 'good' if the data being interpolated correspond to a harmonic function, but 'bad' if the data describe a function which has a maximum or minimum within the circle or a singularity. These constraints on the interpolant are discussed further in Section 5. 4 Scattered data on the sphere In this section we examine the accuracy obtained from three separate methods for interpolating scattered data on the unit sphere S^ C K''. In particular we compare the results obtained using the MQ basis function in 9?^ with those obtained using the spherical harmonics of Sloan and Womersley^ [6] and the C^ Hermite interpolant of Renka [5]. For the multiquadric function, we list the uniform norm interpolation errors calculated using a range of values for the shape parameter c. ..<:::;x. FIG. 3. Minimum energy points and spherical cap. The point distribution used is the 256 'minimum energj^' points of Fliege and Maier [3] and the uniform norm interpolation errors are calculated at points distributed on a spherical cap (see [5]). The following functions are used for the comparisons in Table 4, where the results presented in [7] are labelled 'W&S'. Fl= F3= ^e^+^'+^ l|x||i/10, F2= F4= -5sin(l + 10^), sin2(l + ||x||i)/10. We note from Table 4 that the multiquadric function provides consistently better interpolants to the four test functions compared with the spherical harmonics. Here, the ^Uniform norm errors used for comparison are approximate only and were taken from graphical representations presented in Womersley and Sloan [7]. Good points for RBF interpolation Method W&S Renka MQ c = 0.01 MQc = 1 MQc = 2 TAB. Fl 2.0000e-10 0.0013 6.0128e-04 4.5807e-10 2.2615e-13 F2 0.5000 0.1951 0.3276 0.0175 0.0227 F3 0.1100 0.0054 0.0051 0.0076 0.0079 265 F4 0.0500 0.0055 0.0051 0.0062 0.0065 1. Comparison of uniform norm errors. points have been chosen to minimise the interpolation errors for the harmonic functions, yet we see from results given in [7] that increasing the number of points in the distribution (which also increases the degree of the interpolating function) does not necessarily produce better accuracy. However, these point distributions when used for the multiquadric function provide consistently better accuracy. Further evidence suggests that points considered optimal for the spherical harmonics are also 'good' for the multiquadric function when compared to an equal number of generally scattered points. However, this is due to the uniformity of the point distributions and similar results can be obtained on a refined icosahedral mesh. Method Renka MQ c = 0.01 MQc = 1 MQ c = 2 TAB. 12 pts 0.1730 0.2596 0.0715 0.0442 92 pts 0.0103 0.0170 7.7662e-05 3.8206e-05 362 pts 0.8230e-03 0.0020 1.9678e-10 3.4113e-ll 2. Multiquadric vs Renka for/(a;, y, 2;) = sin(a; + j/) + sin(a;2;) . The Renka algorithm produced similar results to those obtained using the multiquadric (for small c) for the F3 and F4 functions, although the results for the functions Fl and F2 were poor. Further comparisons with the Renka algorithm have been made using 12, 92 and 362 icosahedral points to interpolate the function /(a;, y, ^) = sm.{x+y)+sm.{xz). The uniform norm interpolation errors have been calculated on the previously mentioned spherical cap. Again we see that the multiquadric function produces better accuracy than the Renka method when the number of interpolation points is increased. 5 Evolution of a smooth closed surface In this section we return to the local interpolation scheme of §3 and apply it to scattered data on a smooth closed surface. This is the setting described in [8], where initially the interface is spherical with the point locations determined by subdivision of an icosahedral mesh. Each set of points consists of a central point together its nearest neighbours, giving sets of 6 points associated with the 12 vertices of the icosahedron and sets of 7 points otherwise. The local method of Renka [5] is followed and, for a chosen point, a local coordinate system is defined with this point on the z-axis. The local point set Tong et al. 266 is projected onto the xy-plane and the surface heights provide the data vahies. This typically gives a configuration very close to the hexagon points (3.1) with an additional point at the centre, except for those points associated with the icosahedron vertices where the arrangement is a pentagon. As the iscosahedral mesh is refined these configurations become less regular. The addition of the central point to the hexagon points increases the order of the approximation. When the surface is spherical, the symmetry of the data ensures that the computed unit normal at the centre point for polynomial, MQ or TPS is exact except for rounding error {e.g. for MQ the error is ||n - nMglh = 3 x 10"-'''). However, taking MQ with c = 10 and a sphere of radius 9, if the central point is displaced from the origin to (0.01,0.01) the error in the normal is 3 x 10~^. To illustrate convergence for an irregular point set, the hexagon points are perturbed by the addition of a factor (i - l)eh[l, 1]^ for points i = 1,2..., 6 with h the radius of the circumcircle and taking e = 0.05. For MQ with c = 10, the error in the surface normal is 0{h^) whereas, for c = 0.4, the error is larger and the rate of convergence varies (see Table 3). h 1.0 0.5 0.1 0.05 TAB. lln-riA'/glb, c = 10 3.15 X 10-^ 3.85 X lO-'' 3.06 X 10-» 4.99 X 10-^ \\n-nMQ\ 6.14 X 1.26 X 6.35 X 8.47 X 2, c = 0.4 10-^ 10-3 10-'^ 10-^ 3. Error in MQ approximation to surface normal of sphere, irregular point set. Accurate curvature values are essential for an interface which is driven by surface tension. The exact value of K = -2/9 for a sphere of radius 9, together with the computed values, are shown in Table 5. The polynomial and MQ with c = 10 are close to the exact value. Method exact polynomial MQ c = 0.1 MQ c = 10.0 TAB. 4. Curvature, K K -0.222... -0.222912 -1.638002 -0.225387 = 2H, evaluated at the central point of a regular hexagon. It is found that, for the icosahedral mesh with N = 362, the local point sets are sufficiently regular to give good accuracy for surface normals and curvature using MQ interpolants when c is chosen to be 'large' in relation to the point spacing. This mesh also gives a corresponding accuracy for the discretised integral equation. These points can thus be considered 'good' for the MQ approximation. However, if the mesh is further refined Or the surface deforms during its evolution, then the approximation becomes Good points for RBF interpolation 'less good' as the regularity of the point locations is lost. Numerical experiments suggest second order convergence with point separation for irregular local point sets. 6 Conclusions The behaviour of MQ and TPS interpolants can be interpreted by reference to the corresponding 'least' polynomial interpolant, with the MQ connecting the polynomial C°° surface to the tensioned surface of the TPS as the parameter c decreases. The MQ interpolant with 'large' c (relative to the point separation) exhibits the properties of the polynomial case and is similarly affected by the location of data points. Thus, points on a circle in the plane can be 'good' if the function to be represented is harmonic, but in general give only first order convergence on the interior. For data on the sphere, 'good' points for polynomial interpolation are also good for the MQ with 'large' c, but other near equispaced point distributions appear to give similar accuracy with MQ. The tensioning effect of smaller values of c can improve the results if the underlying function is not C°°. When appHed to an evolving interface, starting from an initially spherical shape and a refined icosahedral point distribution, it is found that local MQ approximations to the surface derivatives are affected by the point locations. This can be understood by reference to the polynomial interpolant to data located on a circle and causes an irregularity in the convergence as N increases. Bibliography 1. C. de Boor and A. Ron, Computational aspects of polynomial interpolation in several variables, Math. Comp. 58 (1992), 705-727. 2. M. Eck, MQ-curves are curves in tension, in Mathematical Methods in Computer Aided Geometric Design II, T. Lyche and L. L. Schumaker (eds). Academic Press, 1992,217-228. 3. J. Fliege and U. Maier, The distribution of points on the sphere and corresponding cubature formulae, IMA J. Num. Anal 19 (1999), 317-334. 4. A. Iske, Perfect centre placement for radial basis function methods, preprint (1999). 5. R. J. Renka, Interpolation of data on the surface of a sphere, ACM Trans. Math. Softw. 10 (1984), 417-436. 6. I. H. Sloan and R. S. Womersley, The search for good polynomial interpolation points on the sphere, in Numerical Analysis 1999, D. F. Grifffths and G. A. Watson (eds), Chapman and Hall, 2000, 211-229. 7. R. S. Womersley and I. H. Sloan, How good can polynomial interpolation on the sphere be? preprint (1999). 8. A. Z. Zinchenko, M. A. Rother and R. H. Davis, A novel boundary-integral algorithm for viscous interaction of deformable drops, Phys. Fluids 9 (1997), 1493-1511. 267 Chapter 5 Regression 269 preceding Page Blank Generalised Gauss-Markov regression Alistair B Forbes, Peter M Harris and Ian M Smith National Physical Laboratory, Teddington, Middlesex, TWll OLW, UK. alistair.forbesQnpl.co.uk, peter.harrisOnpl.co.uk, ian.smithQnpl.co.uk Abstract Experimental data analysis is an key activity in metrology, the science of measurement. It involves developing a mathematical model of the physical system in terms of mathematical equations involving parameters that describe all the relevant aspects of the system. The model specifies how the system is expected to respond to input data and the nature of the uncertainties in the inputs. Given measurement data, estimates of the model parameters are determined by solving the mathematical equations constructed as part of the model, and this requires developing an algorithm (or estimator) to determine values for the parameters that best explain the data. In many cases, the parameter estimates are given by the solution of a least-squares problem. This paper discusses how various uncertainty structures eissociated with the measurement data can be taken into consideration and describes the algorithms used to solve the resulting regression problems. Two applications from NPL are described which require the solution of generalised distance regression problems: the use of measurements of primary standard natural gas mixtures to estimate the composition of a new natural gas mixture, and the analysis of calibration data to estimate the effective area of a pressure balance. 1 Introduction Many metrology experiments involve determining the behaviour of a response variable y as a function of a set of independent variables x = {3:i,a;2,.. .,x„}. Model building involves establishing the functional relationship between these quantities, usually involving a set of model parameters a, i.e., y* =(/)(x*,a), where y* and x* represent exact values of the variables. The terms a parametrize the range of possible response behaviour and the actual behaviour is specified by determining values for these parameters from measurement data. In practice, measurements are subject to error, and the error structure must be taken into account firstly in order to determine effective methods for obtaining parameter estimates and secondly in determining the uncertainty in the fitted model parameters. For a set of measurement data {xj, t/i}i^i, the data analysis problem involves the accurate estimation of the parameters a, taking into account knowledge of the uncertainties in {xJ and/or {yi}, and typically leads to a least-squares problem [4]. This paper describes the various uncertainty structures that arise and corresponding regressions problems for determining estimates of the model parameters. If the covari270 Generalised Gauss-Markov regression 271 ance information associated with the measurements is structured so that only the ith set of measurement errors are correlated with each other, a generalised distance regression approach is appropriate. However, some applications have quite general correlation structure and a full Gauss-Markov estimation approach is required to make efhcient use of the statistical model [7]. This leads to a generalised Gauss-Markov regression problem to take into account the errors in the variables and the general correlation structure. While the covariance structure may dictate which solution algorithms are to be employed, the information required of the model function 0 is limited to the evaluation of the function and its derivatives with respect to a and x. This means that solution algorithms can be based on a compact set of model-dependent modules and a generic set of harnessing routines that link the models to general purpose least-squares optimisation software. The layout of the paper is as follows. In Section 2 we Consider the various error structures and corresponding regression problems. Section 3 introduces two measurement problems encountered at NPL: the use of measurements of primary standard natural gas mixtures to estimate the composition of a new natural gas mixture; and the analysis of calibration data to estimate the effective area of a pressure balance. Although the functional models for these measurement systems are simple, taking the form of low-order polynomials, the statistical models need to account for (a) uncertainties in both the dependent and independent variables, and (b) possible correlations between measurements. These requirements lead us to solve generalised regression problems. An overview of solution algorithms for the various problems is given in Section 4. Concluding remarks are made in Section 5. 2 Error structures and regression problems Within metrology, various error structures arise all of which can be taken into account. We now consider the main types. 2.1 2.1.1 Error in one variable only Ordinary (weighted) least squares The simplest type of error structure occurs when only one of the system variables is subject to error and there is no correlation between errors. The model is summarised by y*='?^«>a), yi=y*-^ei, Xj = x*, where it is assumed that E{ei)=0, var(ei)=o-?, cov(ei,ej) = 0, i ^ j. Good estimates of a can be found by solving min ^u;f[2/i-^(xi,a)]2, j=i where tUj = l/cr,, i = 1,... ,m. (2.1) Forbes et al. 272 2.1.2 Gauss-Markov regression If instead of (2.1), the measurement errors are correlated so that Eie) = 0, var(6) = V, with V full rank, then an estimate of a can be found by solving min[y - 0(a)py-My - 0(a)]> a (2-2) where the ith element of (/)(a) is (^(xj,a). 2.2 Errors in more than one variable In many metrological applications more than one of the measured variables is subject to error, and this must be taken into account in order to determine estimates of the model parameters which are statistically efficient and free from major bias. 2.2.1 Orthogonal distance regression The simplest case arises when the covariance matrix associated with the ith set of measurements is a multiple of the identity matrix and there is no correlation between any of the errors, summarised by the model y* = <j){x*,ai), Vi = y* + u, Xj = X* + 5i, with Eim) = 0, var(77,) = pfl, (2.3) where rj^ = {euSj)'^. In this case, appropriate estimates of the parameters are determined by the solution of min y"vf{(Xi-x*)'^(xi-xn + (j/,:-<^(x*,a))2}, where Vi = l/pi, i = 1,. ■ ■ ,m. Note that this optimisation problem involves m sets of parameters xt as well as the parameters a specifying the model y = (^(x, a). 2.2.2 Generalised distance regression If we assume that the errors r}i are correlated with var(rjj) = Vi with Vi full rank, but that cov{rii,r]j) =0,1^ j, then the appropriate regression problem is mm K},a E j/i-0(xt,a) Vi X,; - X ■ 2/i-0(x*,a) X,: - X* (2.4) 2.2.3 Generalised Gauss-Markov regression The most complicated error structure arises when all variables are subject to measurement error and there is general correlation between the errors. If ^ (4*) is the vector of measurements {xj (variables {x^}), then the corresponding regression problem is mm y-</'(ea) T V -1 y-0(ea) (2.5) i Generalised Gauss-Markov regression where the ith element of <f>{^,a) is </>(x*,a). 3 Examples from metrology 3.1 Preparation of primary standard natural gas mixtures Within the Centre for Optical and Analytical Measurement at NPL, one part of the work of the Environmental Standards Group is to prepare primary standard natural gas mixtures. These are cylinders containing natural gas prepared gravimetrically to contain known compositions of each of the 11 constituent components (methane, ethane, propane, 1-butane, n-butane, 1-pentane, n-pentane, neo-pentane, hexane, nitrogen and carbon dioxide). Mixtures are prepared to cover various concentration ranges, e.g., methane: 64% — 98%. These primary standard mixtures are used as the basis for determining the composition of a new mixture and hence its calorific value. Given a number of primary standard natural gas mixtures containing known concentrations of one of the constituent components (e.g., CO2), the detector response for each mixture and the detector response for the new mixture, we wish to determine the concentration of CO2 in the new mixture. An approach to solving this problem is firstly to use the calibration data (relating to the primary gas mixtures) to calibrate the detector and, secondly, to use the calibration curve so constructed with the new measurement to predict the concentration in the new mixture. Errors to be accounted for are: • the calibration data is known inexactly. The process of preparing the primary standards means that they are known inexactly, and indeed the errors in the standards may be correlated (this is a consequence of the gravimetric process used to prepare the standard mixtures which involves comparing on a balance each standard mixture at each stage of preparation against calibrated masses selected from a common set of masses), • the data returned by the detector (which is based on the analytical technique of chromatography) is subject to measurement error. Consequently, we wish our data analysis to account for the inexactness of the measurement data and to quantify the resulting uncertainty associated with the final measurement result. Figure 1 shows a sample set of measurement data, with the ellipses around the calibration points illustrating the errors in the concentrations and detector responses. (The error ellipses have been magnified greatly for illustrative purposes.) The figure also shows a calibration curve which is used to estimate the concentration of the component for which the detector response (and its uncertainty) is known. 3.2 Calibration of pressure balances The principal role of the Pressure and Vacuum Section in the Centre for Mechanical and Acoustical Metrology at NPL is the development and maintenance of primary measurement standards for pressure and vacuum and their dissemination to industry. Pressure balances are pressure generators and consist essentially of finely-machined pistons moun- 273 Forbes et al. 274 »x10' FIG. 1. Sample data (+), fitted calibration curve and predicted measurement (o). ted vertically in close-fitting cylinders. The pressure required to support a piston and associated ring-weights depends on the mass of the piston and ring-weights and the cross-sectional area of the piston [5]. Due to various fluid dymamic effects, the effective area Aip, a) of the piston-cylinder assembly is a function of pressure, usually taken to be a hnear function A{p,a) = ai-V a2P- Many other factors such as temperature and air buoyancy have to be taken into account but for our purposes here, the pressure generated satisfies aip + a2p'^ = j/(m), where a are the instrument parameters and y{m) is a simple function of the applied load m. This equation determines p implicitly as a function of m and a. Suppose a reference pressure balance has been calibrated so that estimates of the instrument parameters a and their uncertainties are known. The reference balance can be used to calibrate a test balance in a cross-floating experiment in the following way. A load m; is applied to the reference balance to generate pressure pi — p{mi,a). A load Jii is applied to the test balance so that the pressures generated are matched. The test calibration curve is determined from a best fit to the data (rii,Pi) hp* + b2iPi)^ = yini), Pi=p*+ei, ni = n* + £i, where 6i and e^ represent measurement error associated with the pressures and masses, respectively. However, the following must be taken into account. Firstly, the pressures Pi all depend on the common estimates a of the instrument parameters of the reference balance, leading to correlation of the measurement errors S,. Secondly, the masses n, and rrii are made up from the same ensemble of masses /it = (/xi,..., (J-M)^ SO that n,; T rrii mjfi, i 275 Generalised Gauss-Markov regression where n, and nii are binary coefiScient vectors. This means that measurement errors associated with the masses Hk give rise to (further) correlation between 6i and e^. Taking this general correlation into account, estimates of the the instrument parameters b, are found from solving mm b,p' P-p* V- y p- (3.1) where the ith elements of (f) and y are bip* + &2(p*)^ and y{ni), respectively, and V is the appropriate covariance matrix determined from the dependence of y and 0 on a and /x. This is a generalised Gauss-Markov regression problem. 4 Algorithms for generalised regression Algorithms for ordinary least squares problems of the form miua Xli fii^) ^'^^ "'^^^^ known and include QR factorisation methods for linear models or the Gauss-Newton algorithm for non-linear models; see, e.g., [2, 6]. The latter algorithm requires the user to supply a software module to evaluate the vector of function values f (a) and the Jacobian matrix J of partial derivatives *Ji'. dai If fi{a) =yi — (^(xj, a) as considered above, the user has to supply a module to calculate 0(x, a) and d(j)/daj. If V is symmetric and strictly positive definite, the Gauss-Markov regression problem (2.2) can be formulated as an ordinary least squares problem. li V = LL'^ where L is lower-triangular, then the problem becomes min^?(a), Bi where f = L~^{. The associated Jacobian matrix is J = L~^J. If the matrix V is well-conditioned, matrix operations with V or L~^ should not lead to unnecessary loss of precision. However, explicit calculations involving V can be avoided by using the generalised QR factorisation [2, 8, 9], leading to solution algorithms with good numerical properties. The generalised distance regression problem (2.4) can be solved efficiently by making use of the fact that the parameters x* appear only in the ith summand. The associated Jacobian matrix has a block-angular structure that can be exploited effectively in the QR factorisation stage [2, 3]. Alternatively, a separation-of-variables approach can be adopted in which the parameters x* (a) are first determined as functions of a specified by the solution of the corresponding footpoint problem mm J/i-0(x,^a) X,: - x? T Vr Vi <?!'(x*,a) i-x* and the problem formulated as a non-linear least squares problem in a [1, 4]. Either approach yields an algorithm requiring 0(mv?) flops while a full matrix approach requires 0{m^) flops. 276 Forbes et al. The generalised Gauss Markov problem (2.5) can be solved as a Gauss-Markov problem problem in the variables {x*} and a, but ideally, we would like to develop algorithms that exploit problem structure as in generalised distance regression algorithms. In particular, while the covariance matrix V may well be full, in many situations it is constructed from smaller matrices and for which more efficient algorithms could be developed. From the user's point of view, all the regression algorithms discussed here require only the calculation of the model function (p and its derivatives ^ and ^. Thus, a wide range of regression problems can be solved using standard optimisation modules along with generic harness modules that perform the conversion without input from the user over and above the calculation of </> and its derivatives. For example, we have implemented a generalised Gauss-Markov solver to solve problems such as (3.1) for any explicit model y = (j){x,a). However, issues of efficiency and numerical stability need to be taken into account. As part of the UK Department of Trade and Industry's Software Support for Metrology programme, NPL is developing and making available to metrologists a suite of routines for the generalised regression problems discussed above. By combining structure exploiting linear algebra and numerically stable components such as the orthogonal factorisation, it is hoped that metrologists will be able to use these routines with the same confidence and effectiveness that they currently experience with standard, wellengineered regression modules available in numerical libraries. 5 Concluding remarks In metrology, we are interested in the determination of accurate estimates of the parameters that describe a physical process. It is imperative that knowledge of the measurement system should be used to describe the error structure as accurately as possible. We have described the five types of regression problems that can occur in metrology depending on the error structures that are assumed. In all cases it is important that we employ efficient, numerically stable algorithms and exploit any structure in both the Jacobian and covariance matrices. Acknowledgements. This work has been supported by the Department of Trade and Industry's National Measurement System Software Support for Metrology Programme and undertaken by a project team at the Centre for Mathematics and Scientific Software, National Physical Laboratory. The authors are particularly grateful to Paul Holland (Centre for Optical and Analytical Measurement) and the Pressure and Vacuum Section for their contributions. Bibliography 1. M. Bartholomew-Biggs, B. P. Butler, and A. B. Forbes, Optimisation algorithms for generalised distance regression in metrology, in Advanced Mathematical and Computational Tools in Metrology IV, P. Ciarlini, A. B. Forbes, F. Pavese and D. Richter (eds), 21-31, World Scientific, Singapore, 2000. 2. A. Bjorck, Numerical Methods for Least Squares Problems, SIAM, Philadelphia, 1996. i Generalised Gauss-Markov regression 3. M. G. Cox, The least-squares solution of linear equations with block-angular observation matrix, in Advances in Reliable Numerical Computation, M. G. Cox and S. Hammerhng (eds), 227-240, Oxford University Press, 1989. 4. M. G. Cox, A. B. Forbes, and P. M. Harris, Software Support for Metrology Best Practice Guide 4: Modelling Discrete Data, National Physical Laboratory, Teddington, 2000. 5. A. B. Forbes, and P. M. Harris, Estimation algorithms in the calculation of the effective area of pressure balances, Metrologia, 36(6): 689-692, 1999. 6. G. H. Golub and C. F. Van Loan, Matrix Computations, John Hopkins University Press, Baltimore, third edition, 1996. 7. K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate Analysis, Academic Press, London, 1979. 8. C. C. Paige, Fast numerically stable computations for generalized least squares problems, 5L4M J. ATMmer. .4na/., 16:165-171, 1979. 9. SIAM, Philadelphia, The LAPACK Users' Guide, third edition, 1999. 277 Nonparametric regression subject to a given number of local extreme value Ali Majidi and Laurie Davies Department of Mathematics and Computer Science, University of Essen, Germany. {ali.majidi,laurie.davies}@stat-math. iini-essen.de Abstract We consider the problem of nonparametric regression. The aim is to get a smooth function which represents the dataset and has a reasonable number of extreme values. An iterative method, the QSOR method is introduced. Problems with the slow convergence of the method are reduced using multigrid techniques. 1 Introduction Given a dataset {y{ti),i = 1,... , n} which we denote by y, we look for a decomposition y{ti) = f{U) + r{ti), {ti = i/n, i = l,...,n) where / is a simple function and the {r{ti), (i = 1,... ,n)} are the resulting residuals which approximate white noise. We use two different concepts of simplicity. The first is the number of local extreme values. The second is the smoothness of the function as measured by the standard smoothness functional S{f) := f\f^'Ht)rdt, Jo where /^^^ is the second derivative of /. The number of local extremes is taken to have priority over smoothness. The number of local extremes and their locations are determined by the taut string method developed in [3]. This is described briefly in the next section. The residuals are required to look like white noise in the sense that the means over certain dyadic intervals are required to lie within given bounds [3]. The multiresolution coefficients for (n = 2") are defined by: Wij := 2'---'/^^J2k=j2^iir{tk),ii = 0,...,i^),(i = 0,..., 2^''"') - 1). The multiresolution condition now requires that -c„ < Wij < c„, where c„ represents some form of thresholding. The defatilt value of c„ which we use is c„ = cT„i/2.51og(n) where CT„ = 1.482 • median(|2/2 - J/il, • ■ ■ , IVn - 2/n-i|)/\/2Supported by SFB 475, University of Dortmund 278 i Smooth regression subject to extreme values 279 1. The top-left caption shows the original doppler function and the top-right caption shows the noisy version. The bottom-left caption shows the result of the taut string algorithm with the resulting residuals being shown in the bottom-right caption. FIG. 2 Taut string A short description of the taut string method is as follows. We write / = (/i,..., /„)-^ := (/(ii),... ,/(i„))-^ € R" and denote the cumulative sums of y and f hy Y and F respectively, Yi = X)}=i Vj, Fi = Yl]=i /j, (« = 0,..., n), with YQ = FQ = 0. We specify bounds defined by A = (Ai,..., A„)^ G R" and consider the tube {G:|y-G| <A}. (2.1) The taut string V{X) is now the function defined by a taut string attatched to the points {0,Yo) and (n,y„) and constrained to lie within the tube (2.1). It can be shown that the taut string minimizes the number of extreme values of the functions g whose cumulative sums G lie within the tube. The taut string is continuous and piecewise Unear. Its derivative v{X) is taken as a candidate regression functions. The vector A is determined in a data dependent manner by the requirement that the residuals associated with v(\) {r{\)i = Vi — v{X)i,i = l,...,n} satisfy the multiresolution condition. If such a condition fails on an interval then the A-values associated with that interval are reduced in size. An application of the taut string method to the doppler data of Donoho and Johnstone (see e.g. [4]) is shown in Figure 1. The function is defined by f{t) = 2lA/(i;(l - t))sin (27ri^^^ j . The derivative v(A) is piecewise constant as may be seen from Figure 1. The function v{X) determines the number of local extremes. We take the midpoints of the intervals associated with a local extremes as the locations of the local extremes for the smoothing algorithm. 280 A. Majidi and L. Davies 3 The smoothing problem We make the smoothing problem precise as follows. The number, locations and type of extreme values are taken from the taut string as explained in the last section. We further require the function / to lie in the tube determined by the taut string. This is to prevent the smoothing procedure from moving too far from the data. These restrictions may be described in the form Af>b (3.1) for an appropriate matrix A and vector b. This leads to the following problem: minimize ^"^i(/i+i - 2/i +/i_i)2 subject to (3.1), or equivalently minimize F^QsF subject to (3.1), for some quadratic form Q3. We denote this latter quadratic programming problem by QP3. Clearly the matrix associated with the quadratic form Yll^iifi+i ~ ^fi + fi-i)^ is singular. Nevertheless the solution of QP3 may be unique. We have the following theorem. Theorem 3.1 Let V{X) be the result of the taut string method. Assume that V{\) has one extreme value. We define the bounds L,U by L:=Y-X,U ■.= Y + X. Let Fi,F2 be two solutions of the corresponding quadratic program. Additionally let Fi touch three bounds alternately (i.e. Ui^,Li.2,Ui^ or Li^,Ui^,Li^,{ii < 12 <iz) are active). Then We call a problem with a unique solution a nondegenerate problem. From now on we assume that our problem is nondegenerate. 3.1 Quadratic programming There are many algorithms which solve quadratic programming problems directly. Unfortunately most of them are expensive in terms of memory requirements and are not feasible for data sets of the order say n = 8196. To overcome this we look for iterative methods which converge to the solution. Gradient projection methods (e.g. as defined in [8], [2] or [9]) are not appropriate for this purpose a.s the monotonicity constraints make the projection into the feasible set too expensive. Instead we use a modified version of the QSOR (quasi successive over relaxation) method developed by Metzner in [7]. QSOR is a very cheap iteration and converges to the solution of QP3. Unfortunately the convergence is very slow on sections where the solution is smooth. To overcome this we use multigrid methods which have to be adapted to our requirements. i Smooth regression subject to extreme values 4 QSOR The QSOR algorithm is an iterative method which produces a feasible sequence {F''}'^^^ converging towards the solution of QP3. For simpUcity, we describe the iteration only for a convexity interval. Let F° G R" be an arbitrary feasible vector. The obvious candidate is the derivative of the taut string. Let Q — Qs, and w G (0,2). The following defines a QSOR iteration. • While convergence not achieved i=l Fi = Fi- ^.{Qf)u U = max{2Fi+i - Fi+2, Li}, Ui = Uu Pi = med{Li, % FJ i=2 " Fi = Fi- -^,{Qz)ii,li = max{2Fi+i - Fi+2,U} tJi = mm{{Fi+^+Fi^{)l2,Ui},Fi=m.ed{Li,Ui,Fi} • for (iin3:(n-2)){ Fi = Fi--^^{Qz)i,Li = m&x{2Fi+i-Fi+2,2Fi^i-Fi_2,Li] Ui = mm{{Fi+^ + Fi_i)l2,Ui),Fi = med.{Li,Ui,Fi} if (i active) mark i } } • i=n Fi = F2- ■^{Qz)ii,Li = max{2Fi_i - F^-a,U},Ui = Ui,Pi == med{Zi,Ui,Pi} i=l " Pi = Fi-^^{Qz)i Li = ma,x{2Fi+i - Fi+2, Li} Ui = UiPi = med{Li,Ui,Pi} • correct the active intervals: * Let [Fi^,F^+k] be an active Interval: Fi = F^ + ^ ■'"J"^ (-Fiz+fc — F,^). Denoting the i-th unit vector in K" by e^ and a, b defined by set Ff := Pj - &{atj + b) with • F^ = Pi for all i in other intervals Theorem 4.1 (convergence) Let (F'^)^Q be the sequence in R" produced by the QSOR algorithm and let the problem QP3 be nondegenerate. Then • (-F'^)^Q converges in M". • F* := Mm F'^ is the solution of QP3. fc—»oo ' 281 282 A. Majidi and L. Davies FIG. 2. The captions top-left, top-right, bottom-left, bottom-right show the result of the QSOR iteration for the doppler data (n = 2048) after 1000,5000,10000,20000 steps respectively. The slowness of the convergence can be seen by the fact that the doppler data of Figure 1 required two million iterations before a satisfactory solution was obtained. This is shown in Figure 2. After a small number of iterations the solution does not change any more on the "left side" where the function oscilates rapidly. After 1000 iterations of QSOR (which is fast because one QSOR step is very cheap!) the solution looks very smooth except for a few "buckles" on the "right side" of the solution. The method needs many iterations (up to two million) to reach an adequate smoothness. The slowness of the convergence is known from the original SOR method for solving linear equations. In the standard case of solving linear equations multigrid methods can be used to speed up the rate of convergence. We now apply this idea to solving the problem QP3. 5 Multigrid QSOR Multigrid techniques are general techniques to speed up iterative methods which indeed have other good properties. The ideas are given for example in [1] or [5]. We will give here a short description of the multgrid idea in our case. First some notation. Given a grid G = Gf = {ti,...,tn) we define the coarse gridGc = (<i,*J2, • •■ ,i.„,^i,tn), Ji = l,im = n with ij £ {1,... ,n}. We define the projection down IcX - (Fi,Fi^,. F F 1^ and the projection up Px = y where yi = Fi {I € {ij\j = l,...,m}). and by hnear interpolation elsewhere, i.e.. (ij_i < / < ij). yi = ''ij *i We define now the multigrid QSOR with only two grids, i.e., of level two. The general case of level i/ G N is defined similarly. Let QSOR{G, A, b, fx, x) denote the result of the Smooth regression subject to extreme values BOOD OSOR Iteratfonsn QSOR a Multlgrld FIG. 3. The left figure shows the result of QSOR after 5000 iterations. The right figure shows the result of (1000) multigrid QSOR with one coarsing step (i.e. the right figure is "cheaper" than 2000 QSOR streps). QSOR method apphed to the problem on the grid G after fx iterations on the Grid G with starting vector x and constraints defined by A, b. Additionally let F'' be given. • Multigrid QSOR * while precision not achieved o F = QSOR{G,A,b,n,F'') o F = PQSOR{Gc,AcA,ti,IcF) o F^+^^QSOR{G,A,b,n,F) o k^k + l where A^ be are the corresponding constraints for the coarser grid. The question is now how to define the projection of the constraints. One can think of an example where the canonical projection of bounds like Gc can fail. This happens for example if strong constraints (e.g. tight bounds) are not on the coarse grid. To overcome this problem one has to think of a method of defining the problem QP3 on the coarser grid in a reasonable way. One way to handle this problem is to define Li. := ra&yi{Lk\ij-i < k < ij+i} and "min" for the upper bounds. Special cases have to be treated but we do not go into details here. A coarser grid means that the QSOR step on this grid converges much faster. On the other hand the approximation of the solution gets worse by coarsening the grid. In our case (see Figure 4) we have n = 2048. The coarsest grid was made by taking every eighth gridpoint. We iterated until there was no recognizable improvement. 6 Proofs Proof of Theorem 3.1: We set D = F2 — Fi. One simply verifies that D has to be a fine, i.e., there are a, 6 £ R such that Di = ati + b. Touching three bounds alternately means that D changes its sign at least two times which leads to D = 0. □ 283 A. Majidi and L. Davies 284 MuHlgrtdOSOR FIG. 4. Multigrid QSOR applied to the doppler data with n = 2048. The figure took less than 6 seconds comparing to three hours without multigrid on the same computer. Proof of Theorem 4.1: We set Q = Qs. We have to show: 1) (S'3(F'=))fceNo decreases; 2) (F'^)fegN is feasible; 3) If F^ is a stationary point of QSOR, then F^ minimizes S3 in the feasible set. • our feasible set is compact, so the sequence has a convergent subsequence, • a hmit of a subsequence of (F'')^i is a stationary point of QSOR, • the problem has only one solution. To the first point, we only remark that a, 6 as defined in the algorithm, are the minimizers of the term: f i/+fc i/+fe N [z-ixY^tiei + y'^tiei]] QIZ The others are treated as in [6]. The second point is clear, because by the definition we start with a feasible vector and we retain the feasibility in every single step. It remains to show the third point. Let F* be a stationary point of the algorithm. It is sufficient to show that {QF^,Z - F*) > 0 for all feasible vectors z (see [6]), where (,) denotes the standard inner product in M". To show this we first note that Q = D^Q2D, where / 1 -1 \ D= -1 1 \ -11/ and Q2 is the matrix according to QP3, i.e., to the direct problem. So we can deduce that {QF', Z-F') = (Z -F'fQF' = [Z - F'YD'^QMDF' = {QMI'^Z - Dif' := DF^,z := DZ). Now we only have to look at the "active points" because {QF^)i is Smooth regression subject to extreme values 285 zero everywhere else. Let Z be an arbitrary feasible vector and j be an index with Z| = Lj and {Qz)j 7^ 0, so -uj{QF') < 0. With the feasibility of Z, it follows that {QF^)j{Zj - Zj) — {QF^)j{Zj - Lj) > 0. With the same argument we can derive {QF^)j{Zj — Zj) > 0 if F* touches the upper bound. It remains to show the inequality for the linearity intervals. Let [iy,iiy+fc] be a linearity interval of F*. Then obviously [ti,+i,ti,+k] is a constancy interval for F^. Furthermore it follows from the stationarity of -F'* that a, b as defined in the algorithm are zero. This is equivalent to J2iQn==0, J^tiQF'=-J2iQF' = 0 {ti=^i/n), , which implies that I I _ I for arbitrary X € K" and x = DX. So our conditions are v+k {QMF% = 0, ^(QMF^)^ i=u This case was proved by Lowendick [6]. i/+k = 0^ Y1 iQMF^i = 0. i=i/+l D Bibliography 1. William L. Briggs. A Multigrid Tutorial. SIAM, New York, 1994. 2. Paul H. Calamai and Jorge J. More. Projected gradient methods for linearly constrained problems. Afaiftemaiica? Pro^rammmg, 39:93-116, 1987. 3. P. L. Davies and A. Kovac. Modality, Runs, Strings and Multiresolution. To appear in Annals of Statistics, 2001. 4. D. L. Donoho and LM. Johnstone. Ideal Spatial Adaption by Wavelet Shrinkage. Biometrika, 81:425-455, 1994. 5. W. Hackbush. Multi-Grid Methods and their Applications. Springer, Berlin, 1985. 6. M. Lowendick. On Smoothing under Bounds and Geometric Constraints. Dissertation, Universitdt Essen, 2000. 7. L. Metzner. Facettierte Nichtparametrische Regression. Dissertation, Universitdt Essen, 1997. 8. Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer, Berlin, 1999. 9. Gerardo Toraldo and Jorge J. More. On the solution of large quadratic programming problems with bound constraints. SIAM J. Optimization, 1:93-113, 1991. Model fitting using the least volume criterion Chris Tofallis University of Hertfordshire Business School Dept. of Statistics, Economics, Accounting and Management Systems Mangrove Rd, Hertford, SG13 8QF, UK c.tofallis@herts.ac.uk Abstract Given data on multiple variables we present a method for fitting a function to the data which, unlike conventional regression, treats all the variables on the same basis i.e. there is no distinction between dependent and independent variables. Moreover, all variables are permitted to have error and we do not assume any information is available regarding the errors. The aim is to generate law-like relationships between variables where the data represent quantities arising in the natural and social sciences. Such relationships are referred to as structural or functional models. The method requires that a (monotonic) relationship exists; thus, in the two variable case we do not allow cases where there is zero correlation. Our fitting criterion is simply the sum of the products of the deviations in each dimension and so corresponds to a volume, or more generally a hyper-volume. One important advantage of this criterion is that the fitted models will always be units (i.e. scale) invariant. We formulate the estimation problem as a fractional programming problem. We demonstrate the method with a numerical example in which we try and uncover the coefficients from a known data-generating model. The data used suffers from multicoUinearity and there is preliminary evidence that the least volume method is much more stable against this problem than least squares. 1 On the undeserved ubiquity of least squares regression In fitting a function to data, conventional regression requires one variable to be 'special' — this is the dependent variable. In the sciences however, one often wishes to re-arrange the model equation by changing the subject of the formula. Statisticians tell us that in that case we should carry out a second regression. Yet scientists are uncomfortable with having separate models for each variable, which are not equivalent to each other and yet are meant to represent the same relationship. Calibration Is another case where one would like mutual equivalence: e.g. in psychology one can have two tests that are intended to measure the same ability: a formula or conversion table is required to relate the score on one test to that on the other. Another case where regression is inappropriate is where one wants to deduce a parameter such as the rate of change (slope). If both variables are subject to error then ordinary least squares will under-estimate the slope, and regressing x on y will over-estimate it. A simple example involves plotting galaxy speed (or redshift) against distance from the observer. The slope of the fitted line gives what is called the Hubble constant, whose 286 Model fitting using the least volume criterion value crucially determines the future of the universe: will it continue expanding or will it eventually begin to collapse in on itself? Conventional regression gives different values for the Hubble constant depending on which variable is treated as being dependent, but there is no apparent reason for choosing one variable as against the other. An oft-cited reason for using least squares fitting is that under certain assumptions on the errors, it will provide the best linear unbiased estimate ('BLUE') of the slope. This is the Gauss-Markov theorem, where 'best' is taken to mean minimum variance. What is not widely appreciated is that 'hnear' here refers not to the form of the fitted model, but rather that the expression for the estimated coefficient be hnear in y. One can find estimators with even lower variance by removing this non-essential condition e.g. other Lp-norm estimators are not linear in y. In multiple regression it is widely, and mistakenly, believed that that the fitted coeSicients tell us the contribution that a particular variable makes to the dependent variable. In fact, not even the sign of the coefficient can be relied upon to tell us the direction of the relationship i.e. a particular a;-variable may be positively correlated with the yvariable, and yet have a negative coefficient in the regression model. This is the problem of multicoUinearity: if there are near-linear relations among the explanatory variables then the coeflacients produced by regression wiU not only be highly uncertain (large standard error) but also not be open to sensible interpretation. We shall present a technique for model-fitting which treats all variables on the same basis. The method has the important property of being units-invariant; this is an advantage not shared by the total least squares approach (also known as orthogonal regression), and arises from the fact that we use the product of the deviations in each direction rather the sum (or sum of squares) when calculating the fitting criterion. 2 The least areas criterion Consider a set of data points in two dimensions as in Figure 1. By drawing the vertical and horizontal deviations from the line we create a right-angled triangle for each data point. Our fitting criterion is simply to minimise the sum of these areas. A key advantage of this approach is that changing the units of measurement will not affect the resulting line. In other words it is a scale invariant method. Furthermore we can add a constant to either variable and the geometry is such that the line merely gets shifted vertically or horizontally. Combining the scale and translation invariance imphes that the least areas fine is invariant to linear transformations of the data. It is also apparent that switching the axes has no effect: the variables are treated symmetrically. (A textbook discussion of this method appears in Draper and Smith [5].) We note that it is essential that there be a non-zero correlation in the data otherwise the method fails. For those seeking to quantify relationships between data variables in the experimental sciences this would hardly seem to be a restrictive requirement. However for those working in the area of design and who are concerned with geometrical shapes, it does rule out the fitting to data scattered around a vertical or horizontal line, or circle, or rectangle with sides parallel to the co-ordinate axes etc.. We shall not discuss fitting curves in this paper but we note that this method is not suitable for fitting a relationship 287 ,288 Chris Tofallis FIG. 1. Sum of areas to be minimised in least area calculation. that is not monotone over the range of the data i.e. there cannot be maxima or minima over the data range otherwise the area deviation associated with a given point may not be uniquely specified. Such problems may be avoided by breaking up the data set into subsets at the optima and fitting a monotone function to each subset, thus producing a piecewise monotone function. The least areas method has an interesting history, it has surfaced under different guises in diverse research literatures throughout the twentieth century. In astronomy it is known as Stromberg's impartial line. In biology it is the line of organic correlation. In economics it is the method of minimised areas or diagonal regression. In statistics it is sometimes referred to as the 'standard or reduced major axis'. This derives from the fact that if the data are standardised by dividing by their standard deviation, then the fitted hue corresponds to the major (i.e. principal) axis of the ellipse of constant probability for the bivariate normal distribution. Yet another name for this technique is the geometric mean functional relationship. This is because the slope has a magnitude equal to the geometric mean of the two slopes arising from ordinary least squares (OLS) (proved in Barker, Soh and Evans [2], and Teissier [20]) i.e. if we regress y on x and get a slope 6i and then regress x ony (so as to minimise the sum of squared deviations in the X- direction) and obtain a regression line y = a + b2X, then the geometric mean slope is 6 = {bib2)^^^. It is interesting to note that the two OLS slopes are connected via the correlation between the variables , ■ b2' This implies that as the correlation falls the disagreement between the two OLS slopes increases; for example, even with a correlation as high as 0.71 one of these slopes will be twice as large as the other! It also follows that the magnitude of the slope of the least areas line lies between those of the two OLS lines. This is intuitively satisfying in a technique that aims to treat x and y deviations symmetrically. Specifically, for the case of positive but imperfect correlation, we have 62 > ^ > ^i because b/r > b> rb. Prom the geometric mean property and the expressions for OLS slopes one can deduce that the magnitude of the slope of the least areas line takes on a particularly simple closed form: it is the standard deviation of y divided by the standard deviation of x. The sign of the slope is provided by the sign of the correlation between y and x. Numerical experiments have been carried out to compare this fitting technique against five others (Babu and Feigelson [1]). A specified underlying model was used to generate data (mostly bivariate normal samples) and the aim was to see which method could Model fitting using the least volume criterion best recover the slope of the model. The simulations involved varying the sample size, correlation and variances. Orthogonal regression gave the poorest accuracy. There were two methods that came out with highest accuracy: the least areas method and the least squares bisector. The latter bisects the smaller angle formed between the two OLS lines. Unfortunately the OLS bisector is not units invariant and so does not suit our purposes (Ricker [17]). Turning now to applications, the method seems to have appeared most often in the field of biometrics (the application of statistics to biological data). For example, in relating the size of one body part to another (or to the total weight or height) in humans and other animals, one may collect data from an individual at successive stages in their growth, or from many individuals at different points in their development. It is not generally possible to distinguish between dependent and independent variables in such a context. Isometric growth is the special case where the two body parts grow such that their size ratio remains constant. Miller and Kahn [13] argue in favour of our method thus: 'there is usually no clear justification for saying, e.g. that increase in skull length is dependent upon increase of body length; it is more realistic to consider changes in skull length and body length as due to a set of common factors'. Ricker [16] discusses the value of the method in fishery research. Applications include modelling relationships between weight and length, between weight and fecundity (the number of eggs), and estimating the 'catchability' of fish (the fraction of the stock taken by one unit of fishing effort). Rayner [15] gives an apphcation to the flight speed of birds as related to the windspeed. We have already noted the scope for application in astronomy. Babu and Feigelson [1] point out that 'differences in regression methods on similar data may be responsible for a portion of the long-standing controversy over the value of Hubble's constant, which quantifies the recession rate of the galaxies'. The earliest appearance of our method in the astronomical literature seems to be that of Stromberg [19]. The method has also been proposed in the context of educational and psychological testing. A very early reference being that of Otis [14] who named it the 'relation line'. If two tests are meant to measure the same aptitude or attainment one may need to match pairs of equivalent scores on the pair of tests for Creating a conversion table. The direction of the conversion should obviously not affect which values are paired off, hence the need for a symmetric approach. Greenall [7] proposes the 'equivalence line' for this purpose: y- IJ-y ^ X- ^ipo (Jy Ox This turns out to be yet another name for our least areas line. For standardised scores the line equation reduces to y = x. He also proves a very interesting uniqueness result: 'When we seek a relation that will deem a pair of scores mutually equivalent if and only if the proportion of a;-scores less than X equals the proportion of y-scores less than F, we aim at pairing off scores that give rise to equal percentile ranks. In the case of continuous bivariate distributions which satisfy a simple condition \F{x^y) = F{y/c,ex)], only the equivalence relation will provide this relation'. The normal distribution is one case which satisfies this condition. A relevant theoretical result due to Kruskal [12] is that if the two 289 290 Chris Tofallis variables are normally distributed and a line is needed to predict x from y as often as y from X, then the least areas line maximises the probability of correct prediction (i.e. the probability of being within z standard deviations, for any given ^-value). This provides another justification for the use of this line. Hirsch and Gilroy [9] show how it can be useful in hydrology and geomorphology where one may be interested in relationships between e.g. stream slope versus elevation, or stream length versus basin area, etc.. 'In such cases there is no clear direction of causality but there is clearly an inter-relation of variables'. 'A major motivation for the use of the line lies in the equivalence of the cumulative function of y and j/e.st'In general terms when should the least areas method be used? Rayner [15] cites the result of Kendall and Stuart [10] that if no error information is available then this method gives the least-bias or maximum likelihood estimate of the functional relation. Rayner goes on to demonstrate that this line also has the property of being independent of the correlation between the errors of the two variables. Ricker [17] deals with the question of usage by first distinguishing between random measurement error and mutual natural variability (as arises for example in biology). In the former case for each observation there is an associated true point which would arise if the errors in both variables were zero. If one can estimate the variances of the errors by replicating the measurements then measurement error models can be used to estimate the line. One monograph on such models is Cheng and Van Ness [4]. If one cannot estimate the error variances (or their ratio, A) then Ricker recommends the use of the least areas line as being the best approximation: it gives y and x equal weight and will be exact if A = var(i/)/var(a;), i.e. when the ratio of error variances equals the ratio of data variances. For the case of mutual natural variability 'there is no basis for assigning separate vertical and horizontal components to the deviation', i.e. 'it is impossible to say whether it is j/ or a; that is responsible for the deviations from the line'. In this case Ricker concludes that if the data are binormally distributed then the least areas line be used to describe the central trend, and least squares to estimate one variable from the other. For the mixed case i.e. having both measurement error and natural variability, 'the best that can be done is to treat them in terms of whichever source of variation makes the larger contribution to the total. In biological work this will usually be natural variability'. ' Despite appearing in so many other fields, it is remarkable that this technique does not seem to have appeared in the numerical analysis/approximation literature. For example it is not listed in Grosse's Algorithms for Approximation catalogue. The present paper looks at an obvious way of extending the approach to any number of variables by minimising volumes. 3 Least volume fitting We now intend to fit a linear function of the form Yl^r^i^-j^j = c to data {Xj} in p dimensions, in other words we have data on p variables and we seek a linear relationship between them. Of course this is not uniquely specified because we can divide through by any non-zero constant. Thus we are free to impose a constraint on the coefficients, such Model fitting using the least volume criterion as c = 1. Note that we shall not permit any of the coefficients Oj to be zero because that would imply the associated variable Xj is unrelated to the other variables One obvious way of generalising the least areas procedure to higher dimensions is to minimise the volumes (or hypervolumes). Each data point will have associated with it a 'volume deviation' which is simply the product of its deviations from the fitted plane in each dimension. We must take care to make all these non-negative by taking the absolute values. For the ith data point this volume deviation Vi is proportional to We now introduce non-negative variables Ui, vi to deal with the absolute value of the numeratoi:. The positive Ui represent points on one side of the fitted plane, and positive Vi refer to points on the opposite side. Setting c — 1 allows us to model the bracketed term thus: Ui — Vi = 2_^o,jXij - 1. j At least one of each of the pairs {ui,Vi} will be forced to be zero by their presence in the objective function which is being minimised. Consequently the numerator can be represented as X^(w^ -|- vf). We shall suppose the denominator is positive; if it is not we can always make it so by multiplying one of the a:-variables by —1 so that its coefficient, and hence that of the product of coefficients, also changes sign. We can now formulate our problem as the following fractional programme: Minimise ^(wf + vf)/ [J aj i such that ■ ■ Ui — Vi = NJ o,jXij — 1 3 and Ui,Vi>0. The field of fractional programming is comprehensively covered by Stancu-Minasian [18]. We note that Draper and Yang [6] used a different route to generalising the technique to more than two dimensions. They minimised the pth root of the squared volumes and showed that the estimated coefficients were a convex combination of those from the p OLS estimates. 4 Numerical test We shall now apply the least volume criterion to try and uncover the coefficients from data that have been generated from a known underlying model with some randomness thrown in. In order to make this a difficult test we shall choose data, which suffers from multicollinearity. This means that there is a near linear dependence within the data, i.e., one of the variables almost lies in the space spanned by the remaining variables, and so we are close to being rank-deficient. The data is taken from Belsley's [3] comprehensive monograph on coUinearity. The generating model is y = 1.2 - OAxi + 0.6a;2 -|- 0.9a;3 + e 291 292 Chris Tofallis with € normally distributed with zero mean and variance 0.01. The absolute correlations between the variables ranged from 0.35 to 0.61 and so these in themselves would not have alerted the researcher to any difficulty associated with multicollinearity. Two very similar data sets (A,B) are tabulated in Belsley based on this model. For set A ordinary least squares (OLS) gives: y = 1.26 + 0.97x1 + 9.0.T2 - 38.4x3. The fit as measured by R^ is very good at 0.992 but the underlying model is far from being uncovered. In particular, the coefficient of X2 is 15 times too high and two of the coefficients have the wrong sign! Getting the signs wrong is very serious if one is trying to understand how variables are related to each other. Turning to the least volume approach we find: 2/= 1.20 - 0.43.T1 + 0.37x2 + 1.97x3We now have all the correct signs and the magnitudes are much closer to the true ones. Repeating this for data set B: OLS: Least volume: 1/= 1.275+ 0.25x1+4.5x2-17.6x3 y = 1.20 - 0.43xi+0.37x2 + 1.98.T;3- Once again the least volume approach produces a superior model. Moreover it is also worth noting that the two OLS models are very different from each other whereas the least volume models seem to be more stable to small variations in the data. This is noteworthy because of how similar the two data sets were: the y-values were identical for sets A and B, and the x-values never varied by more than one in the third digit. Thus our method seems to be much more stable than OLS. Of course a comprehensive set of Monte Carlo simulations is required to fully explore this aspect. 5 Conclusion We have presented a fitting method whose criterion combines the deviations in each dimension by multiplying them together. This simple device means that re-scaling of any of the variables e.g. by changing the units of measurement, will give rise to an equivalent model. This property of units-invariance is not shared by the total least squares approach (or orthogonal regression: where the sum of the perpendicular distances to the fitted plane are minimised). By taking the product of the deviations we ensure that all variables are treated on the same basis and this is useful if the purpose is to find an underlying relationship rather than to predict one of the variables. When we applied the technique to data we were able to recover the underlying relationship much more closely than when least squares was used. Not only were the signs of the coefficient correctly reproduced (which is crucial for understanding directions of change) but also the magnitudes were much closer to the true values than least squares estimates. It appears that the least volume method may be superior when there is multicollinearity in the data. Much more simulation needs to be done to investigate this potentially very valuable feature. Model fitting using the least volume criterion Bibliography 1. G. J. Babu and E. D. Feigelson, Analytical and Monte Carlo comparisons of six different linear least squares fits, Communications in Statistics: Simulation and Computation, 21 (2) {1992), 5S3-5A9. , 2. F. Barker, Y. C. Soh, and R. J. Evans, Properties of the geometric mean functional relationship, Biometrics 44, (1988) 279-281. ' 3. D. A. Belsley, Conditioning Diagnostics, Wiley,'New York, 1991. 4. C-L Cheng and J. W. Van Ness, Statistical Regression with Measurement Error, Arnold, London, 1999. 5. N. R. Draper and H. Smith, Applied Regression Analysis (3rd edition), Wiley, New York, 1998. 6. N. R. Draper and Y. Yang, Generahzation of the geometric mean functional relationship. Computational Statistics and Data Analysis 23 (1997), 355-372. 7. P. D. Greenall, PD (1949). The concept of equivalent scores in similar tests. British J. of Psychology: Statistical Section 2 (1949), 30-40. 8. E. Grosse, (1989). A catalogue of algorithms for approximation, in Algorithms for Approximation II, eds. J. C. Mason and M. Cox. 9. R. M. Hirsch and E. J. Gilroy, Methods of fitting a straight line to data: examples in water resources, Water Resources Bulletin 20 (5) (1984), 705-711. 10. M. G. Kendall and A. Stuart, The Advanced Theory of Statistics, 4th edition, vol.2, 391-409, Grifiin, London, 1979. 11. D. K. Kimura, Symmetry and scale dependence in functional relationship regression, SystematicBiology 41 (2) (1992), 233-241. 12. W. H. Kruskal, On the uniqueness of the line of organic correlation, Biometrics 9 (1953), 47-58. 13. R. L. Miller and J. S. Kahn, Statistical Analysis in the Geologocal Sciences, Wiley, NY, 1962. 14. A. S. Otis, The method for finding the correspondence between scores in two tests, J. of Educational Psychology XIII (1922), 524r-545. 15. J. M. V. Rayner, Linear relations in biomechanics: the statistics of scaling functions, J. Zoo/., iond.('yi; 206 (1985), 415-439. 16. W. E. Ricker, Linear regressions in fishery research, J. Fisheries Research Board of Canada 30 (1073), 409-434. 17. W. E. Ricker, Computation and uses of central trend lines, Canadian J. of Zoology 62 (1984), 1897-1905. 18. L M. Stancu-Minasian, Fractional Programming: Theory, Methods and Applications, Kluwer Academic, Dordrecht, 1997. 19. G. Stromberg, Accidental and systematic errors in spectroscopic absolute mag. nitudes for dwarf G0-K2 stars, Astrophysical J. 92 (1940), 156-169. 20. G. Teissier, (1948). La relation d'allometrie, Biometrics 4 (1) (1948), 14-48. 293 Some problems in orthogonal distance and nonorthogonal distance regression G. A. Watson Department of Mathematics, University of Dundee, Dundee DDl 4HN, Scotland. gawatson@maths.dundee.ac.uk Abstract Of interest here is the problem of fitting a curve or surface to given data by minimizing some norm of the distances from the points to the surface. These distances may be measured orthogonally to the surface, giving orthogonal distance regression, and for this problem, the least squares norm has attracted most attention. Here we will look at two other important criteria, the h norm and the Chebyshev norm. The former is of value when the data contain wild points, the latter in the context of accept/reject criteria. There are however circumstances when it is not appropriate to force the distances to be orthogonal, and two possibilities of this are also considered. The first arises when the distances are aligned with certain fixed directions, and the second when angular information is available about the measured data points. For the least squares norm, we will consider some algorithmic developments for these problems. 1 Introduction Of interest here is the problem of fitting to given data a curve or surface which depends on a vector a G i?" of parameters. The underlying approach is such that (1) a point on the surface is associated with each data point, (2) the fit of the surface is measured by a norm of the vector whose components are the distances between each pair of corresponding points, (3) the (correct) Gauss-Newton steps in a are used as a basis for minimizing this norm. The distances may be orthogonal to the surface, giving orthogonal distance regression (ODR), or may be forced to satisfy some other criterion which makes them non-orthogonal in general. We consider both situations. For the ODR problem, most attention has been given to the least squares norm (eg [5], [8], [9], [16], [17], [22]). Here we will look at two other important criteria, the h norm and the Chebyshev norm. The former is of value when the data contain wild points, the latter in the context of accept/reject criteria. For the non-orthogonal distance problem we will restrict attention to the least squares case. In terms of a vector a € i?" of parameters, the curve or surface may be defined in two ways, (a) parametrically, when a point x on the surface is given by x = x(a,t), 294 Some problems in distance regression with t the parameters whose values define the particular point, or (b) implicitly, when the surface is defined by the set of points x satisfying the scalar equation /(a,x)=0. It is also assumed here that the expressions required in these representations are differentiable functions of their parameters. 2 h and /oo ODR Consider first the h case. Then the problem is m minimize >J ||xi — Zj(a)||, where the points Zi(a) are the nearest points to Xj on the surface defined by a, and where we will assume throughout that unadorned norms are Euchdean norms. Let Si = \\xi -Zi(a)||, « = l,...,m. Then the problem is effectively now defined in terms of the vector a alone. It is easy to to calculate the correct Gauss-Newton step in a, which minimizes ||.5 + Va<5d||i with respect to d. Now Va<5, =-^^^i:i|^^Va^,(a), 5, ^ 0, so that there are potential problems if any 6i -* 0. Given the nature of the h problem, we cannot exclude that possibility. In fact although 5 is not a smooth function, because derivative discontinuities only occur at zero values it is a strong semi-smooth function, as defined in [12]. Ideas from smooth analysis and from strong semi-smooth analysis as developed in [11] can then be combined to give a local convergence analysis for the present problem. Fast local convergence for the usual smooth problem relies on strong uniqueness [4]; for the h norm, this can be interpreted in terms of a requirement that the sequence of solutions d*^ is "well-behaved" in a certain sense [1]. An analogous requirement can be stated here. Let the current approximation be a*^ and let J*' denote the Jacobian matrix Va5(a*^), assuming this exists. Then the Gauss-Newton step d*^ minimzes ||<5(a'=) + j'^d||i. It is well known (see for example [18]) that if J'' has full rank then there always exists a solution d'' and an index set Z'' containing n indices such that 5i(a^) + efj'=d*==0, ^6Z^ where GJ is the ith coordinate vector. Let a* be a limit point of the iteration. Then for a'' close enough to a*, assume that J'' exists and (i) 5{a'') + J^d'' has exactly n zeros, corresponding to an index set Z'^, 295 296 G. A. Watson (ii) Z*^ = Z*, independent of A;, (iii) the n X n matrices whose rows are efj'', i£ Z*, are bounded away from singularity. In practice these conditions ensure that d*' is unique, and there is no redundancy in the zero components. An analysis is given in [21] for both parametric and implicit fitting. The main result is the following. Theorem 2.1 [21] Let the Gauss-Newton method produce a sequence a'' —> a*, where 6{a.'') has no zero components, and let (i)-(iii) above hold. In the parametric case, assume that for all i e Z*, there exists a unique unit normal vector n,; (up to change of sign) at the point Xj on the surface defined by a*. Then the (undamped) Gauss-Newton method converges to a* at a second order rate. The significance of this result is that, for both parametric and implicit fitting, any Si tending to zero is not by itself necessarily an obstacle to good performance of the Gauss-Newton method in the h case. What is more significant is the possibility of very slow convergence and this has more to do with the number of those zero components of 5 at a limit point, rather than just their presence. A fundamental requirement for the condition (ii) is that the number of zero components of <5(a*) is n. Of course, this condition is a rather special one, and for many problems, will not be satisfied. There is slow (possibly very slow) convergence associated with this case. Turning now to the/oo problem, this can be stated minimize max||xj -z,;(a)||, i : with Zi(a) defined as before. Again 5i = \\xi - Zj(a)|| is not a smooth function, but a solution normally occurs in a region where 6 is smooth. Therefore the problem does not differ significantly from the usual nonlinear minimax problem: the main requirement for fast local convergence is that at a limit point the norm is attained at n + 1 indices [4]. Two simple examples in 2 dimensions are given by way of illustration. A standard line search is incorporated to force global convergence, although trust region methods are a popular alternative. Indeed, local convergence is the main concern here, and we have not begun to address important issues to do with the development of robust general purpose algorithms. Example 2.2 Consider the Spath data set [13] (m = 7), and consider fitting an ellipse defined imphcitly, using the loo and h norms. The solutions are illustrated in Figure 1, where the dashed ellipse and dashed lines are the ^oo solution and corresponding orthogonal directions, and the solid ellipse and solid lines are the h solution and corresponding directions. Both ellipses were obtained using the Gauss-Newton method starting from the circle centre (5,5), radius 2, in 4 and 5 iterations respectively for 5 figure accuracy. Example 2.3 Consider next the GGS data set [6], which has m - 8. Similar fits to those for Example 1 are shown in Figure 2. Again the Gauss-Newton method was used starting from the circle centre (5,5), radius 2, to give convergence in 6 iterations (loo) and 7 iterations (?i). For both these examples n = 5, and favourable conditions hold so that there is quadratic convergence both in the h and l^o cases. Otherwise, the key to recovering fast Some problems in distance regression FIG. 297 1. l\ and loo fits to Spath data set. local convergence in the l\ case is to identify Z* and to reformulate the problem locally as minimize Yj ||xi — Zi(a)|| subject to Xj — Zj(a) = 0, i € Z*. (2.1) A similar remedy in the /oo case is as follows. For a limit point a* of the iteration, let r = {i:(5i(a*)=max<5i(a*)}. i • Then if we can identify /*, a* solves, for any j € I*: minimize 5j{a.) subject to ^j(a) — Sj{a.) = 0, i e I*\jExample 2.4 Fitting an l^o ODR hne in R^ to 100 random data points (equivalent to finding the circumscribing cylinder of smallest radius) gives slow convergence of the basic method, because |/*| = 3 and n = 4. But once we identify I* = {4,42,58}, only 5 iterations of the NAG Fortran subroutine E04UCF are required for 6 figure accuracy. 3 3.1 Non-orthogonal I2 distance regression Using fixed directions Suppose that the data come from sampling the surface of a manufactured part, using a coordinate measuring machine with a touch probe. It has been argued by Hulting [10] that choosing the directions to be the known probe directions Vj (relative to a fixed frame of reference) not only makes explicit use of the measurement design, but G. A. Watson 298 FIG. 2. h and ^oo fits to GGS data set. also complies with traditional fixed-regressor assumptions (enabling standard inference theory to apply). Let Xj, i = l,...,m as usual be the data points, and let Zj be the corresponding points on the surface reached by travelling along the lines from x,; in the direction v,. Then we require to minimize \\5\\ where 5i = ||xi -Zi(a)||, i = 1,.. . ,m, with Zj(a) defined by Zi(a) - x,-= (5iVi, i = 1,... ,m, where Vj satisfying vf v^ = 1 is given for each i. In case of ambiguity, the smallest value of 5i is chosen. The basic idea in efficient algorithmic development is again to treat the problem as one in a alone, which can be solved as before by the Gauss-Newton method (or variants). Let a be given. Then for each point Xj, the point where the line through Xj in the direction Vj first cuts the surface can be obtained (this calculation replaces the "footpoint problem" of calculating Zj (a) as the point on the surface in the orthogonal distance problem), giving 5,; as a function of a. Methods based on Gauss-Newton steps are developed for the parametric case in [19], [20], and for the implicit case in [7]. By way of illustration, the 2 data sets previously considered in Examples 1 and 2 are used to fit ellipses defined implicitly with a particular choice of directions Vj. The initial (circles) and final ellipses (together with the data points and the directions v,;) are shown in Figures 3 and 4. The calculations needed respectively 19 and 17 iterations, reflecting the fact that, unlike the h and /QO cases, the convergence rate is linear. Some problems in distance regression 10 3.2 12 299 14 FIG. 3. ^2 fit to Spath data set: fixed Vj. FIG. 4. h fit to GGS data set: fixed Vj. Using angular information Berman and Griffiths [2, 3] consider fitting a circle when angular differences between successively measured data points are known, with apphcations in physics and archaeology. This fitting problem has been extended to the case of ellipses and ellipsoids by Spath in [14, 15] and it is this kind of problem which is of interest here. The methods of [14] and [15] are based on the alternating algorithm, and while this can be perhaps surprisingly effective (particularly with a reparameterization of the problem), we consider here a correct separated Gauss-Newton method similar to that used before. In addition to (usually) better local convergence properties, standard step-length control can be incorporated. G. A. Watson 300 To illustrate, consider fitting an ellipse in general position. It is convenient to do this by allowing the data to rotate, and fitting to those an ellipse in normal position, aligned with the axes. Let [x,y) denote the components of x. Then we work with the data Xi{4)) = XiCos<j) + yi sin(j), yi{(j)) = -Xi sin^ + j/; cos<f), for i = 1,... ,m, where <f) is an unknown parameter. Therefore we require to minimize, with respect to the 6 parameters a, b,p, q, a, 0, the function m j=i where the numbers U are given. Because (a + U+i) - (a + U) = U+i - U, for each i, we can interpret this as saying say that the angular differences are known, with a degree of freedom given by the parameter a. Note that at a solution to this problem, the directions between pairs of points {xi{(})),yi{4>)) and the corresponding points on the ellipse will not generally be orthogonal to the ellipse. Differentiating the above expression with respect to a,p, b, q gives Ai (3.1) Cl, where ■ Ai = EfciCos(a + ii) YZ.cos^a + ti) b ,Cl = (3.2) C2, where ^ A2 = "^ Efci sin(a + ti) _ Er=i sin(a + U) Er=i sm'(" + *i) J ' ' " Er=i?/«Wsm(a + ii) Then (3.1) and (3.2) give {a,b,p,q) as functions of a and <{), provided that Ai and A2 are nonsingular: this will be assumed. For given a and 0, we can therefore define the function to be minimized as F{a,4>) = ma,4>)l where ||wi||, i = l,.. ,m, (3.3) with Wi = {xi{(t>)-a-pcos{a + ti), yi{({>)-b-qsm{a + ti))'^, and with a,b,p,q defined by (3.1) and (3.2). Then we can apply the Gauss-Newton method to the minimization of F{a, (f>). The basic step d = {6a, Scf)'^ is given by finding min ||(5-|-Jd||, defi= (3.4) Some problems in distance regression 301 where J e _R™x2 h^g jth row given by ef J = Vo,,,^<5i(a, ^), i = 1,..., m. Now V„,^(5i(a, 0) = -^(V„,^Wi + (Va,p,6,,Wi)M), (5^ ^^ 0, (3.5) where 6 ei?'4x2 It is easy to compute M from (3.1) and (3.2) which can be interpreted as identities in a and 0. The details are omitted, but all the hnear systems use just the matrices Ai and A2-I and apart from the solution of (3.4) (a least squares problem in two variables), there remains only evaluation of expressions. Example 3.1 Consider Example 1 from [14], which has m = 11. Starting from a — 0, ^ = 0, 15 iterations are required to satisfy the stopping criterion ||d||oo < 0.001. The resulting value of \bf is 7.7211, with o = 2.1253, 6 = -0.1700, p = 4.1281, q = 3.0931, a = 13.2348°, </) = 34.7309°. Example 3.2 Next consider Example 2 from [14], which has m = 8. Again starting from a = 0, 0 = 0, 9 iterations are required to satisfy the Stopping criterion ||d||oo < 0.001. The resulting value of \bf is 4.4946, with a = 4.3608, h = 1.9537, p = 5.3717, q = 3.3704, a = -0.6215°, cj) = 26.3889°. 4 Conclusions We have examined some aspects of fitting curves and surfaces to given data. The underlying criterion involves associating with each data point a point on the surface and minimizing some norm of the vector whose components are the distances between pairs of points. The distances can be orthogonal to the surface, or fixed in some other way. But the problems have in common that methods based on separated Gauss-Newton steps can readily be developed. Bibliography 1. Anderson, D. H. and M. R. Osborne, Discrete, non-finear approximation problems in polyhedral norms. Num. Math. 28, 143-156 (1977). 2. Herman, M., Estimating the parameters of a circle when angular differences are known, Appl. Statist. 32, 1-6 (1983). 3. Herman, M. and D. Griffiths, Incorporating angular information into models for stone circle data, Appl. Statist. 34, 237-245 (1985). 4. Cromme, L., Strong uniqueness: a far reaching criterion for the convergence of iterative processes, Numer. Math. 29, 179-193 (1978) 302 G. A. Watson 5. Forbes, A. B., Least squares best fit geometric elements, in Algorithms for Approximation II, eds. J. C. Mason and M. G. Cox, Chapman and Hall, London, 311-319 (1990). , 6. Gander, W., G. H. Golub and R. Strebel, Fitting of circles and ellipses: least square solution, BIT 34, 556 -577 (1994). 7. Gulliksson, M., L Soderkvist and G. A. Watson, Implicit surface fitting using directional constraints, BIT 41, 331-344 (2001). 8. Helfrich, H.-P. and D. Zwick, A trust region method for implicit orthogonal distance regression, Numer. Alg. 5, 535-545 (1993). 9. Helfrich, H.-P. and D. Zwick, A trust region algorithm for parametric curve and surface fitting, J. Comp. Appl. Math. 73, 119-134 (1996). 10. Hulting, F. L., Discussion contribution to the paper by M. M. Dowling, P. M. Griffin, K.-L. Tsui and C. Zhou, Statistical issues in geometric feature inspection using coordinate measuring machines, Technometrics 39, 18-20 (1997). 11. Qi, L., Convergence analysis of some algorithms for solving nonsmooth equations. Math, of Operations Research 18, 227-244 (1993). 12. Qi, L. and G. Jiang, Semismooth Karush-Kuhn-Tucker equations and convergence analysis of Newton methods and quasi-Newton methods for solving these equations. Math, of Operations Research 22, 301-325 (1997). 13. Spath, H., Least squares fitting by circles. Computing 57, 179-185 (1996). 14. Spath, H., Estimating the parameters of an ellipse when angular differences are known, Comput. Stat. 14, 491-500 (1999). 15. Spath, H. Least squares fitting of spheres and ellipsoids using not orthogonal distances. Math. Comm. 6, 89-96 (2001). 16. Turner, D. A., The Approximation of Cartesian Co-ordinate Data by Parametric Orthogonal Distance Regression, PhD Thesis, University of Huddersfield (1999). 17. Turner, D. A., I. J. Anderson, J. C. Mason, M. G. Cox and A. B. Forbes, An efficient separation-of-variables approach to parametric orthogonal distance regression, in Advanced Mathematical and Computational Tools in Metrology IV, eds P. Ciarlini, A. B. Forbes, F. Pavese and D. Richter, Series on Advances in Mathematics for Applied Sciences, Volume 53, World Scientific, Singapore, 246-255 (2000). 18. Watson, G. A., Approximation Theory and Numerical Methods, John Wiley, Chichester (1980). 19. Watson, G. A., Least squares fitting of circles and ellipses to measured data, BIT 39,176-191(1999). 20. Watson, G. A., Least squares fitting of parametric surfaces to measured data, ANZIAM J 42 (E), C68-C95 (2000). 21. Watson, G. A, On the Gauss-Newton method for h orthogonal distance regression, IMA J. Num. Anal, (to appear). 22. Zwick, D. S., Applications of orthogonal distance regression in metrology, in i?ecen< Advances in Total Least Squares and Errors-in-Variables Techniques, ed S. Van Huffel, SIAM, Philadelphia, pp. 265-272 (1997). Chapter 6 Splines and Wavelets 305 Preceding Page Blank Nonlinear multiscale transformations: From synchronization to error control F. Arandiga and R. Donat Dept. Matematica Aplicada, University of Valencia, Spain. arandiga@uv.es donat@uv.es Abstract Data-dependent interpolatory techniques can be used in the reconstruction step of a multiresolution "d la Harten". These interpolatory techniques lead to nonlinear multiresolution schemes. When dealing with nonlinear algorithms, the issue of the stability needs to be carefully considered. In this paper we analyze and compare several strategies for image compression and their ability to effectively control the global error due to compression. 1 Introduction Multiscale transformations are being used in recent times in the first step of transform coding algorithms for image compression. Ideally, a multiscale transformation allows for an efficient representation of the image data, which is then processed using a (nonreversible) quantizer and passed on to the encoder which produces the final compressed set of data which is ready to be transmitted or stored. Compression is indeed achieved during the second and third steps: the quantization and the encoding of the transformed set of discrete data. It is quite clear that the properties of the multiscale transformation are most important in the overall performance of the transform coding algorithm. Until recently, the multiscale transformations used for image compression were always based on linear filter banks, however, the nonlinear alternative has been explored lately by various authors from different points of view, and preliminary results show the alternative to be very promising [12, 8, 6, 2, 3]. The key question when using, or even designing, a nonlinear multiscale transformation is that of stability. In order for such transformations to be useful tools in image coding, it is absolutely necessary to keep a tight control on the effect of quantization errors in the decoding process. In this paper we examine the question of stability for nonlinear multiscale transformations within Marten's framework for multiresolution [14, 15]. Harten's framework is broad enough to include all classical wavelet transformations as particular cases (just as it happens in the Lifting framework of W. Sweldens [17], developed slightly later in time but independently), however the design of the multiscale transformation is done directly on the spatial domain. 306 Nonlinear multiscale transformations 307 The building blocks of Harten's multiresolution framework are two operators that connect adjacent resolution levels. The Decimation (or also, Restriction) operator is a linear operator which acts as a low-pass filter, extracting low-resolution information from a discrete data set. The Prediction operator (also Projection) uses low-resolution data to predict discrete data at a higher resolution level. It is precisely the design of this operator what distinguishes Harten's framework from all other multiresolution frameworks. The prediction operator is based on a consistent Reconstruction technique, and this opens up a tremendous number of possibilities in the design of multiresolution schemes. The use of the reconstruction process as a design tool makes it, conceptually, a simple matter to introduce adaptivity into the multiscale transformation; we only need to make the reconstruction process data-dependent [5, 4, 14]. This paper is organized as follows. In Section 2 we recall the so-called cell-average framework, an appropriate setting for image compression, and describe a class of nonlinear prediction operators obtained by mean-average interpolation [10,14,15]. In Section 3 we examine the question of stability for nonlinear multiscale transformations and relate it to the synchronization of the data-dependent choices made in the encoder and the decoder. We also include a set of numerical experiments that illustrate he performance of several nonlinear multiscale transformations. 2 Multiscale transformations in the cell-average setting Harten's general framework for multiresolution [15] relies on two operators. Decimation and Prediction, that define the basic interscale relations. These operators act on finite dimensional Unear vector spaces, W, that represent the different resolution levels {j increasing implies more resolution) (a) D^ : V^ -^ V^-\ (b) Pj : V^'^ -> W, (2.1) and must satisfy two requirements of algebraic nature; D^ needs to be a linear operator and D^Pj = /yj-i, i.e., the identity operator on the lower resolution level represented by V^~^. For all practical purposes, V^ can be considered as spaces of finite dimensional sequences. Using these two operators, a vector (i.e., a discrete sequence) v^ G V^ can be decomposed and reassembled as follows (-) \ ..■ gj ^ — „i_P..i-i' yj — p.yi (b) -'=P^-'-'+-\ (2-2) where e^ represents the error in trying to predict the jth level data, v^, from the low resolution data v^~^ = D^v^, using the prediction operator Pj . In the cell-average setting, the discrete data are interpreted as the cell-averages of a function on an underlying grid, which determines the level of resolution of the given data. The one dimensional case, in which one considers a set of nested dyadic grids on the interval [0,1], {X^}, j > 0 of size/ij = 2-^'/io, X^=^{xi} xi=i-hj, i = 0,...,Nj Nj-hj = l (2.3) 308 F. Arandiga and R. Donat is the easiest one to describe, and it is also directly applicable to two-dimensional (2D) data via tensor product [2, 3] (the cell-average framework in several dimensions and non-tensor product (unstructured) grids is considered in e.g. [1]). In this simple one-dimensional setting, the cell-average framework is characterized by the following decimation operator D^ ' (i5V), = -(4_i+4), 1 <i< Ar,-_i, (2.4) where Nj is the number of equally spaced intervals on X^, the grid on [0,1] that represents the jth resolution level. The consistency requirement for the prediction operator, i.e., D^Pj = lyj--^ which is the only necessary requirement for the prediction in Harten's framework, becomes then (2.5) Observe that (2.4) and (2.5) imply that Hence 4i-x = <-i - iPjv'-')2i-i = iPjv^-%i ''21 -2i- Therefore the prediction errors at even and odd grid points on the jth level in (2.2) are not independent. By considering only the prediction errors at (for example) the odd points of the grid X-', one immediately gets a one-to-one correspondence between ,Ni the sets {vaSi ^ {{vrmi\{dimn, with dl = 4,^, and v^-' = DhK The one-dimensional multiscale transformation and its inverse can be written as follows. Mv' (^°,d\...,d^) I FoTJ = L,...,l For i = 1,... ,Nj-i + <-i)/2 iPjV'-')2i "ii-i (R Vd = iv'',d\:..,d^)^M-'vd Forj = l,...,L For i = l,...,Nj-i {PjV^-')2i-l + di ''2i-l ''2i = 2vr (2.6) (2.7) ^2i-l Observe that since d] = e^i^i = -63,;, the consistency relation (2.5) implies that the computation of vl^ in (2.7) is equivalent to vi, = 2vr'-vi,_,=={Py-'hi-4 = iPjV^-')2i + 4i- (2-8) Therefore (2.6) and (2.7) are just the repeated application of the decomposition and reassembling specified in (2.2)(a) and (2.2)(b). Thus (2.6) defines a multiscale transformation and (2.7) is the inverse transformation, whether or not the prediction operator is linear. Next, we follow [4, 14, 15] to describe a class of linear prediction operators that leads to the (1, M) branch of the Cohen-Daubechies-Feauveau family [7], which is biorthogonal Nonlinear multiscale transformations 309 to the box function [11, 15]. This class is also considered in [6] within the lifting framework, where it is described as a particular case of Donoho's average interpolation [9]. Given an integer s > 1, for each 1 < « < Nj-i we construct a polynomial, Pi{x), of degree 2s such that / Pi{x)dx = vj., , ioTl =-s,...,s. (2-9) There are various ways to prove that pi{x) in (2.9) always exists and it is uniquely defined by the 2s + 1 conditions in (2.9) [1, 9, 14]. Then we define {PjV^-^hi = ^ T' Pi{x)dx, {PjV^-%i^i = ^ r^''Pi{x)dx. (2.10) The prediction operator defined by (2.10) is data-independent, hence linear, and it clearly satisfies the consistency relation (2.5). It can be shown that the multiscale transformations (2.6) and (2.7) for this class of prediction operators turns out to be the (1, M = 2s -|-1) branch of the Cohen-Daubechies-Feauveau family. A nonlinear prediction operator is obtained if we construct Pi{x) in a data-dependent way. An example of nonlinear multiresolution transformation constructed in this fashion is considered in [14, 4, 2], where a nonUnear ENO-type technique (Essentially Non Oscillatory, see [16]) is used to construct Pi{x). The key idea, which is in essence common to the approach used in designing nonlinear filter banks, is to avoid using data across an edge for the prediction step. The ENO nonhnear technique is better described if we associate to each polynomial piece Pi{x) a stencil, Si, which is the set of indices of the values used to define Pi{x). In the linear case <Sj = {i — s,... ,i + s}; the stencil is independent of the data set {v^"^} and, as a consequence, Pj is a linear operator. In the ENO technique described in [16], the selection of stencil is made in a data-dependent way using the divided differences of the data as a measure of its smoothness. Large divided differences occur when considering data across an edge, while divided (or undivided) differences of data on smoother regions tend to be smaller in size. The information contained in the divided differences is then used to decide what is Si for each i, with the only restriction that i e Si (to satisfy the consistency requirement (2.5)). We follow [4] and consider all polynomial pieces of the same degree. In our case #iSi = 2s, but in principle one could decide to lower the degree of Pi{x), or that of some of its neighbours, whenever an edge-detection mechanism finds an edge at the ith interval. By lowering the degree of some polynomial pieces close to an edge, one can avoid crossing the edge in the prediction step, as much as possible. This option is closely related to the nonlinear multiscale transformation considered in [6] (within the Lifting framework), where the nonhnearity comes in from adaptively choosing from the (1,M) family of linear filters. Once Si is determined (i G <Sj), Pi{x) can be uniquely determined when degree Pi{x) = 310 F. Arandiga and R. Donat a^Si [1] so that 1 Hn' / pi{x)dx = vi^'^ iovmeSi, (211) and the prediction operator is then defined by (2.10). One can be sHghtly more 'sophisticated' in the design of the polynomial pieces. The Suhcell Resolution technique [4, 13] allows to account for discontinuities within a cell as follows. If an edge is detected in the ith cell, the polynomial piece Pi{x) is discarded and substituted by its left and right neighbours, pij^\{x) and pj_i(a;), assuming that their respective stencils do not intersect, i.e. <Si_i 0*^1+1 — ^- ^^ ^ t^'^f' one-dimensional edge (a jump) on the ith cell, the function I fv Fiy) = T- f^J Jxi,_, 1 r4i Pi-iix)dx+— pi+i{x)dx rij Jy will have a zero on the ith cell [13], say r), and the location of r] is used to substitute the polynomial piece Pi{x) by the discontinuous piecewise polynomial function "W={^:;;[x1 III p-'^) The prediction operator is again defined by (2.10) at nonsingular cells (cells in which no edge has been detected), while at the singular cell ■ . 1 r'^i {Pjv^-%i = q,{x)dx, ■ , 1 Hi-i qi{x)dx. {Pjv^-')2i-i -^j- In practice it is unnecessary to compute explicitly the value of r/; only its location with respect with X2i_i is needed, which can be found by a sign check. We refer the reader to [4] (and references therein) for specific details on this technique, in particular on the detection mechanism, and on its performance. 3 The question of stability: Error control versus synchronization, with numerical examples Lossy coding schemes introduce errors into the transform coefficients, and it becomes crucial that the nonlinearities do not unduly amplify these errors. In lossy compression the decoder only has the quantized detail coefficients. If we use a nonlinear prediction operator (whether it is constructed as described in the previous section or based on locally adapted filters, as in [6] within the Lifting framework), the quantization errors in coarse scales could cascade across the scale ladder and cause a series of incorrect choices (either on the filters or on the stencils) leading to serious reconstruction errors. To avoid incorrect choices in the prediction step, whether within Harten's or the Lifting framework, one would need to send side information on which filter was used (Lifting) or what was the interpolatory stencil (Harten's). This is clearly inappropriate when trying to design a compression scheme. One way to avoid storing (and sending) side information is to somehow synchronize the nonlinear prediction operators in the encoder Nonlinear multiscale transformations and the decoder, so as to ensure that at a given spatial location on a given scale, the prediction operator will select the same stencil (filter bank), both in the encoding and the decoding steps. Within the Lifting framework, synchronization is achieved in [6] by changing the typical Split-Predict-Update steps to Split-Update-Predict. In doing so, it is possible to base the choice of predictor directly on already 'quantized data', thus synchronizing the nonlinear decisions made by the encoder and the decoder. Within Harten's framework, synchronization is just a consequence of a strategy that is designed to fully control the compression error. Because the main design tool in Harten's framework for multiresolution is a reconstruction technique, and because A. Harten had already worked with nonlinear reconstruction techniques in the context of the numerical simulation for hyperbolic conservation laws, so-called Error-Control (EC) strategies can be found already in the early papers of Harten on multiresolution [14]. Harten's mechanism to control the global accumulated error is based on a modification of the direct multiscale transformation, M, that ensures a prescribed tolerance on the global prediction errors (explicit error bounds can be found in [4, 13]). The modified transformation incorporates the quantizer to the direct multiscale transformation in such a way that the prediction operator in the encoder also acts on already 'quantized' data, hence synchronization is achieved because the nonUnear prediction operators both in M and M~^ work on the same set of discrete data at each resolution level. To illustrate the effect of the different techniques, we take a particular nonlinear prediction operator, a third order ENO reconstruction technique with Subcell Resolution, as described in last section. We denote by MSR the multiscale transformation (2.6), while Mg^j^ denotes the EC modified transform as described in [2, 4], and Mf^ a multiscale transformation in which only synchronization is enforced, as proposed in [6]. The quantization step is carried out as follows: qu(d^) = 2ejround [d^/{2ej)] and it is incorporated to the direct transformation in M||j and Mgj^ (see [2, 6] for specific details), while in MSR it is applied to the scale coefficients obtained after the transformation. In the numerical tests we report, we take CL = 8 with L = 4 and ej = ej+i/2. We consider two different images: the familiar image of Lena as an example of a 'real' image, and a purely geometrical image, to which texture has been added, as in [6]. After the direct transformation (plus the quantization step) has taken place, a lossless Lempel-Ziv compression algorithm is applied to reduce the size of the transformed image, then a compression ratio is computed as the number of bits of the compressed representation over the number of bits of the original image. To recover the original image, we undo the lossless compression and transform back using (2.7) in all three cases. The full compression algorithm is identified in each case by an acronym, 'ST' for MSR, 'EC for M|fj and'SYNC for M|«. In Tables 1 and 2 we compile a number of quantities that measure the 'quality' of the reconstructed image, and therefore the robustness and reliability of each multiresolutionbased compression algorithm, the magnitude of the global compression error, measured 311 312 F. Arandiga and R. Donat Method ST SYNC EC TAB. II ' llco 258 195 25.4 II-111 5.71 6.45 4.47 II -112 Tc 9.08 9.82 5.73 11.3:1 7.9:1 9.7:1 entropy .6449 .8875 .6850 1. Geometrical image. a^ l^Jl^ !^JLr l^JL I >qi FIG. W ; "IF™™!" I "HT"™-*^ 1. Geometrical image: (a) original, (b) ST, (c) EC, (d) SYNC. in various norms, the compression rate TC and the entropy of the transformed image. The reconstructed images in both cases can be observed in Figures 1 and 2. It can be clearly observed that the absence of any type of synchronization procedure can lead to a very poor reconstructed image. Synchronization only improves the quality, but is not as robust as the full EC mechanism, designed in this case to enforce a certain error bound in the 2-norm (as observed in Tables 1 and 2, the 2-norm of the global error is kept below et = 8). It is worth mentioning that the compression rate and the entropy of the compressed data are all very close, however the visual quality of the reconstructed image is significantly better for the EC compression algorithm. Bibliography 1. R. Abgrall and A. Harten. Multiresolution representation in unstructured meshes. SI AM J. Numer. Anal. 35, 2128^2146 (electronic), 1998. 2. S. Amat, F. Arandiga, A. Cohen, and R. Donat. Tensor product multiresolution analysis with error control for compact image representation. Submitted to Signal Processing, 2000. 3. S. Amat, F. Arandiga, A. Cohen, R. Donat, G. Garcia, and M. Von Oehsen. Data compression with ENO schemes. Applied and Computational Harmonic Analysis 11, 273-288, 2001. 4. F. Arandiga and R. Donat. Nonlinear multi-scale decompositions: The approach of A. Harten. Numer. Algorith. 23, 175^216, 2000. 5. F. Arandiga, R. Donat, and A. Harten. Multiresolution based on weighted averages of the hat function II: Nonlinear reconstruction operators. SIAM J. Sci. Comput. 20, 1053-1093, 1999. 6. R. L. Claypoole, G. Davis, W. Sweldens, and R. Baraniuk. Nonlinear wavelet transforms for image coding via lifting scheme, submitted to IEEE Trans, on Image 313 Nonlinear multiscale transformations Method ST SYNC EC 318 277 26.4 TAB. FIG. 5.66 5.97 3.59 10.59 10.56 4.84 i:l 7.5:1 8.2:1 entropy .8261 .9430 .8704 2. Lena. 2. Lena: (a) original, (b) ST, (c) EC, (d) SYNC. Processing., 1999. 7. A. Cohen, L Daubechies, and J.C. Feauveau. Biorthogonal bases of compactly supported wavelets. Comm. Pure Applied Math. 45, 485-560, 1992. 8. R. L. de Quieroz, D. A. Florencio, and R. W. Schafer. Non-expansive pyramid for image coding using a non-linear filter bank. IEEE Trans. Image Processing 7, 246-252, 1998. 9. D. L. Donoho. Interpolating wavelet transforms. Technical report. Department od Statistics, Stanford University, 1992. 10. D. L. Donoho and Thomas P.Y. Yu. Nonlinear pyramid transforms based on medianinterpolation. SIAM Journal on Mathematical Analysis 31, 1030-1061, 2000. 11. M. Guichaoua. Analyses Multiresolution Biorthogonales associees a la Resolution d'Equations aux Derivees Partielles. PhD thesis, Ecole Superieure de Mecanique de Marseille, Universite de la Mediterranee Aix-Marseille II, 1999. 12. F.J. Hampson and J.C. Pesquet. A nonlinear subband decomposition with perfect reconstruction. In Proce. IEEE Int. Conf. Acoust., Speech, and Signal Proc, 1996. 13. A. Harten. ENO schemes with subcell resolution. J. Comput. Phys. 83, 148-184, 1989. 14. A. Harten. Discrete multiresolution analysis and generalized wavelets. J. of Applied Num. Math. 12, 153-193, 1993. 15. A. Harten. Multiresolution representation of data: A general framework. SIAM J. A^umer. ^na/. 33, 1205-1256, 1996. 16. A. Harten, B. Engquist, S. Osher, and S.R. Chakravarthy. Uniformly high-order accurate essentially nonoscillatory schemes. III. J. Comput. Phys., 7 1231-303, 1987. 17. W. Sweldens. The lifting scheme: a custom-design construction of biorthogonal wavelets. Appl. Comput. Harmon. Anal. 3, 186-200, 1996. Splines: a new contribution to wavelet analysis Amir Z. Averbuch, and Valery A. Zheludev School of Computer Science, Tel Aviv University, Israel. amir@math.tau.ac.il, zhel@post.tau.ac.il Abstract We present a new approach to the construction of biorthogonal wavelet transforms using polynomial splines. The construction is performed in a "lifting" manner and we use interpolatory, as well as local quasi-interpolatory and smoothing splines as predicting aggregates in this scheme. The transforms contain some scalar control parameters which enable their flexible tuning in either time or frequency domains. The transforms are implemented in a fast way. They demonstrated efficiency in application to image compression. 1 Introduction Until recently, two methods have been used for the construction of wavelet schemes using splines. One is to construct orthogonal and semi-orthogonal wavelets in the spline spaces (Battle-Lemarie [2, 7], Chui-Wang [6], Unser-Aldroubi-Eden [12]). Another way was introduced by Cohen, Daubechies and Feauveau [3] who constructed symmetric compactly supported spline wavelets whose duals, remaining compactly supported and symmetric, do not belong to a spline space. However, since the introduction of the lifting scheme for the design of wavelet transforms [11], a new way was opened to use splines as a tool for devising a full discrete scheme of wavelet transforms. Namely, various splines can be employed as predicting aggregates in lifting constructions. 2 Lifting scheme of biorthogonal wavelet transform The sequences {a{k)}'^__^, which belong to the space h, we call the discrete-time signals. The ^-transform of a signal {a{k)} is defined as follows: a{z) = YlkL-oo ^~'' o^k). Throughout the paper we assume that z = e"^. We introduce a family of biorthogonal wavelet-type transforms that operate on the signal x = {x{k)}'^^_^, which we construct through lifting steps. The lifting scheme for the wavelet transform of a signal can be implemented in primal or dual modes. For brevity we consider only the primal mode. Decomposition Generally, the primal lifting scheme for decomposition of signals consists of three steps: 1. Split. 2. Predict. 3. Update or lifting. SPLIT - We split the array x into even and odd sub-arrays: ei = {ei(fc) = x{2k)}, di = {rfi(fc) = x{2k + 1)}, fc G Z. 314 Splines and wavelets 315 PREDICT - We use the even array ei to predict the odd array di and redefine the array di as the difference between the existing array and the predicted one. To be specific, we apply some filter with transfer function zU{z) to the sequence ej and predict the function di{z^) which is the z^—transform of di. The z^-transform of the new d—array is defined as follows: dUz^) = di{z'^)-zU{z)ei{z'^). (2.1) Prom now on the superscript u means an update operation of the array. Obviously, the prediction zU{z)ei{z'^) should approximate di{z'^) well. LIFTING - We update the even array using the new odd array: eUz^) = e,{z^) + I3iz)z-'d^{z^). (2.2) Generally, the goal of this step is to eliminate aliasing which appears while downsampling the original signal x into ei. Further on we will discuss how to achieve this effect by a proper choice of the filter /3. Reconstruction The reconstruction of the signal x from the arrays e" and d" is implemented in reverse order: 1. Undo Lifting. 2. Undo Predict. 3. Unsplit. UNDO LIFTING - We restore the even array: ei{z'^) = e^{z'^) - ^{z)z^^ di{z^). UNDO PREDICT - We restore the odd array: ^1(2;^) = d"(2:^) + zU{z)ei{z^). UNSPLIT - The last step represents the standard restoration of the signal from its even and odd components. In the 2;—domain this is x{z) = 61(2;^) + z~^di{z'^). The lifting scheme presented above, yields an efficient algorithm for the implementation of the forward and backward transform of x <—^ e" U d". These operations can be interpreted as a transformation of the signal by a filter bank that possesses the perfect reconstruction properties and it is associated with the biorthogonal pairs of bases in the space of discrete-time signals. These basis signals are synthesis and analysis wavelets. Further steps of the transform are implemented in an iterative way by the same lifting operations. 3 Polynomial splines We will construct polynomial splines of various kinds using the even subarray of a signal, calculate their values in the midpoints between nodes and use these values for prediction of the odd array. In this section we discuss some properties of such splines and derive the corresponding filters U. 3.1 B—splines The central B—spUne of first order on the grid {kh} is defined as follows: M^{x) 1/h 0 iixe[-h/2,h/2], elsewhere. The central B-spline of order p is the convolution Mj^{x) = M^~"^(a;)*M^(a;) p > 2. Note that the B—spline of order p is supported at the interval {—ph/2,ph/2). It is positive within its support and symmetric around zero. The nodes of B—splines of even orders are located at points {kfi} and of odd orders at points {h{k +1/2)}, fc € Z. It is readily 316 Z. Averbuch and V. A. Zheludev verified that hMf^{hx) = MP{X), where MP{X) := Mf(x). Let u" := {hMlihk) = MP{k)}, and w" := {hMl {h{k + 1/2)) = M^ {k + 1/2)} ,keZ. (3.1) Due to the compact support of 5-splines, these sequences are finite. We will use for our constructions only splines of odd orders p = 2r - 1. In Table 1 we present the sequences for initial values r which are of practical importance. k u^ x8 u^ X 384 w^ X 2 w'' X 24 TAB. -2 0 1 0 1 -3 0 0 0 0 -1 1 76 1 11 0 6 230 1 11 1 1 76 0 1 2 0 1 0 0 3 0 0 0 0 1. Values of the sequences uP and w^. We need the z^-transforms Of the sequences u^ and w'' : oo . oc uP{z'^) := ^ z-^''uP{k), w^iz^) := J] z-^^wP{k). fc=—oo k= — oo These functions are Laurent polynomials, and are called the Euler-Frobenius polynomials 110]. , Proposition 3.1. ([9]) On the circle z = e"^ the Laurent polynomials u'''{z'^) are strictly positive. Their roots are all simple and negative. Each root (^ can be paired with a dual root 0 such that (^6 = 1. Thus, ifp = 2r + l is odd, then u'''{z^) can be represented as follows: «^(^')=n-(l+7n2')(l+7n^-'), We denote Uf{z):=z 0<7n<l. _iwP{z^) uP{z^) ■ (3.2) (3.3) Proposition 3.2 The rational functions Uf{z) are real-valued and Uf{-z) = -Uf{z). Ifp = 2r + 1 is odd then , ' _ {a-2r+%{a) l-^^(^)- „P(,2) ' _ {-a-2Y+%{-a) 1 + C^(^)- „P(,2) (3-4) where a := z + z~^ and Cr(ct) is a polynomial of degree r — 1. 3.2 Interpolatory splines The shifts of J5—splines form a basis in the space S^ of splines of order p on the grid kh. Namely, any spline 5,^ G S^^ has the following representation: Slix)=hJ2lil)Mj:{x-lh). (3.5) Splines and wavelets 317 Let q := {Q{1)}, and q{z'^) be the 2:'^—transform of q. We introduce also the sequences sP := h{Sl{hk) = S^{k)} and m^ := {Sl{h{k + 1/2)) = 5f (fc + 1/2)} of values of the spline on the grid points and on the midpoints. Let SP{Z'^) and mP^z^) be the corresponding 2:'^-transforms. We have 5rW=X«(0Mnfc-/), and5f(fc+l)=^9(0M^(fc-«-fi). (3.6) Respectively, sP(0^) = g(2:^)M(0^), and mP{z^) = q{z'^)w{z'^). Prom these formulae we can derive expression for the coefficients of a spline which interpolates a given sequence e := {e(fc)} at grid points: hSl{hk) = e{k),kGZ,^q{z')uP{z) = e{z^)^q{z') = ^^y (3.7) The 0^-transform of the sequence mP is: mP{z^) = q{z'')wP{z^) = zUf{z)e{z''). (3.8) Our further construction exploits the super-convergence property of the interpolatory splines of odd orders (even degrees). Theorem 3.3. ([13]) Let a function f € L^(—00,00) havep+1 continuous derivatives and let Sf^ € S^ interpolate f on the grid {kh}. Denote fk = f{{k + l/2)h). Then in the case of odd p = 2r+ 1, the following asymptotic relation holds. Sl{h{k+l/2)) = ^-/z2r+2^(2.+2)(;,(^+i/2))(2r+l)^Hl±^fc^|^ (3.9) where hs{x) is the Bernoulli polynomial of degree s. Recall, that in general the interpolatory spline of order 2r + 1 approximates the function / with accuracy of h^^^^. Therefore, we may claim that {{k + l/2)/i} are points of super-convergence of the spline S^. Note, that the spline of order 2r -|-1, which interpolates the values of a polynomial of degree 2r, coincides with this polynomial. However, the spline of order 2r -)- 1 which interpolates the values of a polynomial of degree 2r +1 on the grid {kh} restores the values of this polynomial at the mid-points {(fc-M/2)/i}. This property will result in the vanishing moments property of the wavelets to be constructed later. 3.3 Quasi-interpolatory splines We can see from (3.7) and (3.8) that in order to find values at the midpoints of the spline interpolating the signal e, the signal has to be filtered with the filter whose transfer function is zUf{z). This filter has infinite impulse response (IIR). However, the property of super-convergence at the midpoints is not an exclusive attribute of the interpolatory splines. It is also inherent to the so called local quasi-interpolatory splines of odd orders, which can be constructed using finite impulse response (FIR) filtering. Definition 3.4 Let the function f have p continuous derivatives and f := {fk = f{hk)}, fc € Z. The spline S^ e S^ of order p given by (3.5) is said to be the local 318 Z. Averbuch and V. A. Zheludev quasi-interpolatory spline if the array q of its coefficients is derived by FIR filtering the array of samples f q{z')=T{zyiz^), (3.10) where r{z'^) is a Laurent polynomial, and the difference \f{x) — Sl{x)\ — 0{f^P^hP). If f is a polynomial of degree p — I, then the spline Sf^{x) = f{x). If wP is the sequence defined in (3.1) then the midpoint values m'' are produced by the following FIR filtering of the array of samples f: mP{z'^) = ZUP{Z){{Z^), U^{Z) := z~^T{z'^)w'''{z'^). Explicit formulas for the construction of quasi-interpolatory splines as well as the estimations of the differences were established in [13]. In the present work we are interested in splines of odd orders p = 2r + 1. There are many FIR filters which generate quasi-interpolatory splines but only one filter of minimal length 2r -f 1 for each order p = 2r-I-1. Let A(2;) := x"^ - 2-I-z^. Theorem 3.5 A quasi-interpolatory spline of order p = 2r -|- 1 can be produced by filtering (3.10) with filters F of length no less than 2r -|- 1. There exists a unique filter FJ„ of length 2r+l which produces the minimal quasi-interpolatory spline S^''"'"^(x) . Its transfer function is: TUz') = l + j:f3lX\z), n^I^pim = B-l)'/^^*''- (3-11) // the function f has 2r -|- 3 derivatives then the following asymptotic relations hold for the midpoint values of the minimal quasi-interpolatory spline of odd order: Sl'^\h{k + 1/2)) = f{h{k + 1/2)) + /l2r+2j(2r+2)(^(^ ^ ^j^^^^j^r ^ Q(^f{2r+3),^2r+3^^ A'--='(2r-H)b2r+2(0) 'l.jX^r-^Ui^ (2r + 2)! (3.12) where bs{x) is the Bernoulli polynomial of degree s. This implies that the super-convergence property is similar to that of the interpolatory splines. The asymptotic representation (3.12) provides tools for custom design of predicting splines retaining or even enhancing the approximation accuracy of the minimal spline at the midpoints. Proposition 3.6 // the coefficients of the spline Sf^^^ G ^"h"^^ of order 2r + 1 are derived as in (3.10) using the filter F^ of length 2r -I- 3, with the transfer function Fp(z^) = T^niz'^) + pX^'^^iz), then the spline restores polynomials of degree 2r -h 1 at the midpoints between nodes, for any real value p. However, if p = —A^ then the spline restores polynomials of degree 2r -I- 3. If the parameter p is chosen such that p = {-lY\p\ then the spline S^'^J'^ possesses the smoothing property [14]. Splines and wavelets 3.4 3.4.1 319 Examples Quadratic splines Interpolatory spline Let a = z~^ + z. Then Ul{z) = ^4^ ., and 1 - UHz) = ^."""^f ,, '^^ z^+6 + z-^ "-^ ' z-^+6 + z^ Minimal spline The filters are X.1 / 9x . Ix/N rrl/X -Z~^ + 9Z~'^ + 9Z - Z^ 16 and 1 - U^iz) (a-2)2(^-1+4 +z) 16 Extended spline ^^ ,,1, , (a-2)3(3^-2+ 18^-1+38 + 182 + 3^2) andl-[/J.) = ^ ^^g -, Remark 3.7 In [5] Donoho presented a scheme where an odd sample is predicted by the value in the central point of the polynomial of odd degree which interpolates adjacent even samples. One can observe that our filter U^ coincides with the filter derived by Donoho's scheme using the cubic interpolatory polynomial. The filter Ul coincides with the filter derived using the interpolatory polynomial of fifth degree. On the other hand, the filter U} is closely related to the commonly used Butterworth filter [8]. Namely, in this case the filter transfer functions ^]''-{z) := (l + C//(z))/2, $-'''(z) := {l-Ul{z))/2 coincide with magnitude squared of the transfer functions of the discrete-time low-pass and high-pass half-band Butterworth filters of order 4, respectively. 3.4.2 Splines of fifth order (fourth degree) Interpolatory spline 2 _ 16(^3+ m +11^-1+ 2-3) ■ 76z2 + 230 + 762-2 + ^-4' _ _ » w {a - 2)3 (Q - 10) ^4 + ^^^2 + 230 + 76^-2 + ^-4 • Minimal spline The filter is 2 , s _ 47(2-^ + 2^) + 89(2-^ + z^) - 2277(2-3 ^ ^3) ^ 15965a ^rnK^l 27648 ■ 4 , Wavelet transforms using spline filters 4.1 Choosing the filters for the lifting step In the previous section we presented a family of filters U for the predicting step which were originated frona spHnes of various types. But, as it is seen from (2.2), to accomphsh the transform we have to define the filter /3. There is a remarkable freedom in the choice of these filters. The only requirement needed to guarantee a perfect reconstruction property of the transform is that /?(—2) = P{z)- In order to make synthesis and analysis filters 320 Z. Averbuch and V. A. Zheludev similar in their properties, we choose I3{z) = U{z)/2, where U means one of filters U presented above. In particular, U may coincide with the filter U which was used for the prediction. We say that a wavelet V has m vanishing moments if the following relations hold: :/ SfceZ^'^-'(^) = 0' s = 0,l,...,m-l. Proposition 4.1 Suppose the filters U{z) and P{z) = U{z)/2 are used for the predicting and lifting steps, respectively. If I — U{z) contains the factor {z — 2 + 1/zY then the high-frequency analysis wavelets i/A have 2r vanishing moments. If, in addition 1 — lJ{z) contains the factor (2—2+1/^;)'' then the synthesis wavelet ipl has 2q vanishing moments, where q = m.m{p,r]. 4.2 Implementation of the transforms Suppose, we have chosen the filter j3 = U/2. The functions zU{z) and zU{z) depend on z'^ and we write F{z'^) :— zU{z) and F{z^) := zU{z). Then the decomposition procedure is (see (2.1), (2.2)): dUz)=driz)-Fiz)e,{z), e^z) = e^iz) + j-/{z)d^,{z). (4.1) Equation (4.1) means that in order to obtain the detail array d'/, we must process the even array ei with the filter F, with transfer function F{z), and extract the filtered array from the odd array di. In order to obtain the smoothed array e", we must process the detail array d" with the filter $ that has the transfer function $(2) = z~^F{z)/2 and add the filtered array to the even array ej. But the filter $ differs from Fr/2 only by one-sample delay and it operates similarly. Thus, both operations of the decomposition are, in principle, identical. For the reconstruction the same operation is conducted in reverse order. Therefore, it is sufficient to outline the implementation of the filtering with the func: tion F{z). Implementation of FIR filters originating from local splines is straightforward and, therefore we only make a few remarks on IIR filters originating from interpolatory sphnes. A detailed description can be found in [1]. Equations (3.2) and (3.3) imply that, while the interpolatory spline of order 2r + 1 is used, the transfer function F{z) = P{z)/Iln=i T-{1 + JnZ){l + 'JnZ'^), whcrc P{z) is the Laurent polynomial. It means that the IIR filter F can be split into a cascade consisting of a FIR filter with the transfer function P(2), r elementary causal recursive filters denoted by R{n), and r elementary anti-causal recursive filters, denoted by R{n). The causal and anti-causal filters operate as follows: y = R{n)x^=^y{l) = x{l) + j„yil-l), y = ^)K ^=^ y{l) = x{l)+ jny{l+ 1)- Example 4.2 (Example of recursive filter) We present IIR filters derived from the interpolatory splines of third order. Splines and wavelets 321 Let 7J = 3 - 2v^ « 0.172. Then The filter can be implemented with the following cascade: xo{k) = A^l{x{k)+x{k + l)), xi{k) = xo{k)-jlxi{k-l), y{k)=xi{k)-jly{k + l). Bibliography 1. A. Z. Averbuch, A. B. Pevnyi and V. A. Zheludev, Butterworth wavelets derived from discrete interpolatory splines: Recursive implementation, to appear in Signal Processing, www.math.tau.ac.il/~amir (~zhel). 2. G. Battle, A block spin construction of ondelettes. Part I. Lemarie functions, Comm. Mffli/i. P/i?/s. 110 (1987), 601-615. 3. A. Cohen, I. Daubechies and J.-C. Feauveau, Biorthogonal bases of compactly supported wavelets, Commun. on Pure and Appl. Math. 45 (1992), 485-560. 4. I. Daubechies, Ten lectures on wavelets, SIAM, Philadelphia, PA, 1992. 5. D. L. Donoho, Interpolating wavelet transform, Preprint 408, Department of Statistics, Stanford University, 1992. 6. C. K. Chui and J. Z. Wang, On compactly supported spline wavelets and a duality principle. Trans. Amer. Math. Sac. 330 (1992), 903-915. 7. P. G. Lemarie, Ondelettes a localisation exponentielle, J. de Math. Pures et Appl. 67 (1988), 227-236. 8. A. V. Oppenheim, R. W. Shafer, Discrete-time signal processing, Englewood Cliffs, New York, Prentice Hall, 1989. 9. I. J. Schoenberg, Contribution to the problem of approximation of equidistant data by analytic functions, Quart. Appl. Math. 4 (1946), 112-141. 10. L J. Schoenberg, Cardinal spline interpolation, CBMS 12, SIAM, Philadelphia, 1973. 11. W. Sweldens, The Ufting scheme: A custom design construction of biorthogonal wavelets, Appl. Comput. Harm. Anal. 3 (1996), 186-200. 12. M. Unser, A. Aldroubi and M. Eden, A family of polynomial spline wavelet transforms, 5«3naZ Processmt? 30 (1993), 141-162. 13. V. A. Zheludev, Local spline approximation on a uniform grid, U.S.S.R. Comput. Math. & Math. Phys. 27 (1987), 8-19. 14. V. A. Zheludev, Local smoothing splines with a regularizing parameter, Comput. Math. & Math Phys. 31 (1991), 193-211. Knot removal for tensor product splines T. Brenna Dept. of Informatics, Univ. of Oslo, Oslo. trondbre@ifi.ulo.no Abstract Given a spline function as a B-spline expansion the object of knot removal is to remove as many knots as possible without perturbing the spline by more than a specified tolerance. In 1987 Lyche and M0rken proposed an efficient knot removal algorithm which determines both the number of remaining knots and their position automatically. In this paper we show how their method can be extended to knot removal techniques for multivariate tensor product splines. We propose a number of new strategies for removing as many knots as possible, and discuss some of the advantages and challenges posed by the special structure Of tensor product splines. 1 Introduction Given a spline function we are often interested in an approximate representation requiring less data. The object of knot removal is to remove as many knots as possible from a given spline without perturbing the spline by more than a given tolerance. An efficient knot removal strategy presented in [6] determines both the number of remaining knots and their location automatically. This strategy was later extended to parametric curves and surfaces in [5], and incorporated with various constraints such as monotonicity and convexity in [1]. An efficient implementation of knot removal for the special case of trilinear splines is given in [3]. In this paper we address some of the questions and problems arising when extending the knot removal technique to multivariate tensor product splines. The outline of this paper is as follows. We start by fixing notation and presenting techniques for representing tensor product splines. We then proceed with generalizations of coefficient norms, approximation methods, methods for ranking the knots etc., as we review the central parts of the knot removal strategy. Two different ways of perforrriing knot removal are given together with accompanying strategies for finding the desired approximations. We end the paper with two examples demonstrating various aspects of the knot removal techniques presented. 2 Notation Let d = (dfc),m = (m/t) € Z* with 0 < d < m (component-wise) for some positive integer s. Also let t*" = {t'-}'^-^'^'''^^ be a knot vector with dk + 1 equal knots at both ends and with no knot value occurring more than dk + 1 times, for fc = 1,..., s. In this paper we will treat the collection t = {t''}^^i as a "single" knot vector with "length" 322 Knot removal for tensor product splines 323 m + d +1 defined to be the sum of the length of the knot vectors t'', k = l,...,s. Given such a knot vector we may form products of the basis functions associated with each individual knot vector t*^. By letting s Bi{-x.):=Bi^d,t{^) = Y[Bi^^dkM^k) for l<i<m, fc=i where i = (ik) S Z® and x = {xk) G K* ,we get a total of nfe=i '^fc ^^^ basis functions s for the tensor product space Sd,t = <Si Sd^.f^- In this paper we let Bj^, ^^ t*- be the i^th fc=i ' ' ' B-spline of degree dk associated with t'^, for k = 1,. ..,s. To represent an element of Sd,t we use a variant of the classical Kronecker product of matrices. Recall that if A = (aijj[^\'^ii ^ M!^^''^\ B = (bi,j)™^i'"!^i e R^^.ns then this product is given by A ® B = (ai,jB)™\'j^j. In this paper we will use the "equivalent" product defined by A (g) B = (Abij)?^^j^'jf^j, which gives a more convenient ordering of the matrix elements for our use. Also recall that for real matrices A,B,C,D we have the following useful relations (assuming that the matrix products and inverses are defined) (A(8)B)(C<8)D) = (AC)(g)(BD), (A®B)-i = A-^OB-i andA(8)B = Pi(B(g)A)P2, for some permutation matrices Pi and P2. In addition we have that the product A<8)B will have linearly independent columns, provided the same holds for A and B. For further properties of the Kronecker product we refer to [4]. An element Ttii rUs s /W = I] • • • I]/n,-,^. n ^'^A.t*(^fc) = H/'^'.d,t(x) G §d,t ii=l 25=1 fc=l i;^!^ can now be written /(x) = B?^f, where Bt = <8) Btk with Btk = (Bidk.t", • • •jB^^^d^.tk)''^ for k = l,...,s. Here f is a vector containing the B-spline coefficients F = (fii,..,!^) of / given by f = vec(F) := s 2Ji<m fi^i) where ei = ® e^^ with e^j. £ M™*=. Finally we state that for a tensor of real "" fe=i coefficients F = (fi)i<i<m € R™ we let F^'^''' denote the tensor F with its elements rearranged according to the cyclic permutation of the s-tuple {1,2,..., s] given by (Tk = {fc, fc + 1,..., s, 1,..., fc — 1}, for/c = 1,..., s. Finally, for a spline / = Z)i<m/i-^i,d,t(x) we define a class of weighted P-norms of its B-spline coefficients, given by ^{{HiKr^^m"?'", ll/IU'',t—s max I/iI, a<i<m for l<p<0O, for p = 00, where the weights are given by Wi = Ylk=i '^"^/+i ''' > for 1 < i < m. Using the notation introduced above we have that || / ||ip,t=|| W^'^f ||ip, {p > 1) where Wt is a 324 T. Brenna diagonal scaling matrix given by Wt= ® Wtk, k=i with Wtk=diag. . ;'•••'» \\ak + l/ V ^ _L i dk + l These coefBcient norms are easy to compute and are known to approximate the ordinary LP-norms well for splines of moderate degree [2,6]. In the algorithms we use p = 2 when computing approximations and p = oo to measure the error. 3 The knot removal algorithm Given an element / G Sd,t, a tolerance e > 0 and some norm || • |1 the goal of the knot removal algorithm presented in [6] is to find a subspace Sd.r of Sd,t (T C t) and an element g 6 Sa.r with || / - g ||< e, and where we want r to be of minimal length. In this section we review the basic parts of this algorithm as we extend the theory to tensor product splines. Further details of the material in this section can be found in [2]. 3.1 Finding approximations To approximate / € Sd,t hi a subspace Sd,T, where r is of "length" n + d + 1 with n < m, we use the sphne g which is the best approximation to / in the /^,t-norm. In other words, the spline we seek will be the solution to the minimization problem min II / - hWfi ^- Solving this problem is equivalent to solving the linear least squares problem given by min ||Wy^Ac-f)||?., (3.1) where A = (gi Ak is the knot insertion matrix from r to t (i.e. Ak is the knot insertion k=l matrix from r'' to t*", for k = l,...,s), f = vec(F) are the given B-spline coefficients of / in §d,t and c = vec(C) are the unknown B-spline coefficients of g in Sd,T- Since the knot insertion matrix A has full rank and Wt is non-singular, the normal equations A'^^WtAc = A'^Wtf associated with the system (3.1) will have a unique solution which can be found ([2,3]) by solving a series of s tensor equation systems given by (AjWt.Ak)D[r'<'= (A^WtODt''i\ (3.2) for k = l,...,s. HereDk e M""^ with Uk = (ni,.. .,nfc,mi:+i,.. .,ms), and we let Do = F, and set the coefficients of the approximation g equal to the solution of the last tensor equation system, C = Dg. The tensor equations (3.2) can be efficiently solved by calculating the Cholesky factorization of the banded coefficient matrix (AjJ'WtkAk) and solving for each right hand side in the tensor {A'^Wtk)'D^^_^^. 3.2 Ranking the knots The final approximation to the initial spline is found by searching through a sequence of approximations, constructed by using the approximation method of the previous section, on subsets of the knots of the initial spline. These subsets are calculated by associating a weight with each interior knot, representing a rough measure of its importance. See [6] for Knot removal for tensor product splines 325 the details. For higher dimensional tensor product splines we set the weight for a given knot to the maximum of the weights corresponding to this knot when the calculation is iterated over the "remaining" parameter directions. We refer to [2] for further details. 4 Knot removal methods When removing knots from a tensor product spline we are faced with more options than in the case of a spline curve. In this section we present two different ways of performing knot removal. The first one studied in [2] based on a symmetric approach, treats all the parameter directions of a tensor product spline simultaneously, while the second one will treat one parameter direction at a time. 4.1 Knot removal based on a symmetric approach If we let G/(T) denote the approximation to / G Sd,t defined on the knot vector T we see that the approximations in the sequence mentioned above can be written {Gf{Tj)}jLQ, where TJ is constructed from t by removing j of its interior knots, and ^ = Sfc=i["^fc ~ (^fc + 1)] is the total number of interior knots of t. Given such a sequence of approximations we can perform a search on the index j to determine an approximation g* = Gf{T*) to the initial spline / with a preferably short knot vector r*, and with the property that \\f — g*\\i°^,t<s, where e is the specified tolerance. If the knot vector r* is not equal to any of the two knot vectors TQ or TN we may repeat the process to find a new approximation based on g* as proposed in [6]. Taking into account how the sequence {G/(rj)}j^o '^^^ constructed we expect the error || / - Gf{Tj) ||;oo_t to decrease, but not necessarily strictly, for decreasing values of the search parameter j. How the search among the possible approximations is done will generally depend on a number of factors, including some which will be discussed later through examples. Also note that we only have to compute approximations for indexes actually used in the search. By treating all the directions simultaneously we take into consideration the inherent symmetry of the problem. As we will see later this will in some cases enable us to remove more knots than by treating one parameter direction at a time, but it will also lead to more complicated and slower code in an implementation. 4.2 Knot removal for one parameter direction at a time In the second knot removal method we start by thinking of a spline / € Sd,t as a series of parametric curves in corresponding high dimensional spaces. We can then perform a parametric knot removal for each parameter direction. The advantage of this approach is that it is easy to implement since we may use existing knot removal routines for spline curves with only minor modifications. In the following discussion we let e = Xli=i£^ii with e, > 0 for all i, be a given s tolerance. Also let /(x) = ^i<„ /i.Bi,d,t(x) = B^ f be a spline in Sd,t = 0 Sdfc,t*=i with ~ fc=i B^ = ® 3jk and f = vec(F). We start by identifying a series of parametric curves k=l which may be naturally associated with this tensor product spline. We say that the spline / consists of the curves fk{xk), for A; = 1,..., s, where fk{xk) is the parametric T. Brenna 326 rk-l curve in M^^ for Mk = {U;Ji mk)iU.Uk+i ^^k), given by fk{Xk)= (® Im,)®Bjl® (_® Im,) We now return to the problem of finding a preferably short knot vector r C t and a spline P(x) = Ej<nCj-Bj,d,T(x) e Sd,T = ® Sd^.r'' with the property that ||/ - 5lli<»,t< £■ To apply knot removal to / e Sd,t we can now go through the following steps for fc = 1,..., s. 1. Apply parametric knot removal with the tolerance eu to the parametric curve fk-l fkiXk) = [( ®^In,) 0 B^K ® ( _l> Im,)]fk-] 2. defined on t*^, starting with fo = f. This will produce a new parametric curve defined on the knot vector T'' C t'' fk, where ffc = vec(Fk) for Fk G M"u-,n^:mu+i,-,m,_ 3. We also have that /l(^/t) = [C® In.) ® ^r'^ ® (,^f Jm,)]fk = [(Vl„,) ® Bj: ® ( _® Im.)] [f ®/n.) ® A] 4. fM=k+1 ® Im,)/ fk, where Ak is the knot insertion matrix from r*^ to t*^. And consequently \fk — fk lu'^.t'' — f/fc-1 - [( ® In,) ® Ak ® (^^1 Jm,) fk <e/t. Finally we let the coefficients of the function g{x) = B^c 6 §d,T be c = vec(Fs), and we have the following result. Theorem 4.1 // we let /(x) = B^f e §d,t and g{x) = B!J'C e Sd.r be the tensor product splines from the discussion above, then we have \\f — ff ||/~,t< £■ Proof: Let A = ® Ak be the knot insertion matrix from r to t, and let /o(x) = B^fo k=l 327 Knot removal for tensor product splines be equal to / and /s(x) = B^fg be equal to g, i.e. fo = f and (s = c. Then ||/-fl'||z~,t= ||f0- Afsll;^ /fc-1 k=2 \ / s M r/k-1 — — — fc=l ffc-1 fc=i 5 fe=i i°°,t'' <£. U Examples The knot removal methods presented above have been implemented and tested on a computer. In this section we present trivariate examples from this implementation and propose different knot removal strategies depending on the problem at hand. See [3] for a detailed description of this implementation. Example 5.1 In this first example we will compare two different strategies for searching through a hst of approximations {G/(TJ)}^Q introduced above. We will consider the knot removal method treating one parameter direction at a time, which means that we end up solving a parametric knot removal problem with tolerance e, = e/3, i = 1,2,3, for each of the three parameter directions. To improve efficiency the parametric knot removal routine implemented is constructed in a way that lets it abort the computation if an approximation for any component of the parametric curve fails to lie within the specified tolerance. This fact suggests a search strategy where we compute successive approximations to the initial spline by adding one interior knot at a time, starting with zero interior knots, and where each intermediate approximation is given by the first of these approximation processes to be completed. Intuitively we would expect such a sequential search strategy to perform best for "large" tolerances and/or large problems, where it is more to gain by aborting an approximation process. In this example we have compared this search strategy with a strategy proposed in [6] using a binary search. In all the tests we have used an initial trilinear spline constructed by samphng the function given by f{x,y,z) = |[sin(27ra;) + sin(27ry) + sin(27r2;)] in the points specified by a uniform 3-dimensional grid on the domain f2 = [0,1]^, for four selected grid sizes. Each spline was reduced by using both of the search strategies mentioned above, for tolerances varying from s — 0.001 to £ = 0.01. Both of the search strategies produced approximately the same end grid size in each test. In Figure 1 the CPU-time of the two search strategies is plotted against the tolerance for the selected grid sizes. We observe that the reductions utilizing a binary search perform best on Small problems, while the sequential search strategy turn out to be superior for large problems. T. Brenna 328 Comparing tif Oifffrcnl Hciirch ^iriiif|!(tf< , Prohli-m Si/e A Cixnpnrl^on of niffrrt'tii Search Sirrilrpti-t. ProNcm Hire 25 x 3S i M 0,(101 o.mi (b) Problem size 100^ (a) Problem size 25"^ A Comp.irl'irffi of nilTercni Si-an-h Srrim-f (r^. Picit-k-m IW1:50 X 7V> Binary Wrrh -*■Scqmrmi.i? SiMiTh -t*~ \Kr:^ ^1,^ ^Wu-v'V,—.„ 101 (c) Problem size 250'' FIG. 1. am: om ooiw anri^ oow, Tok-riim-t lu m nom oni') a (d) Problem size 400^ A comparison of two different search strategies. Example 5.2 In this example we compare the two different knot removal methods presented in this paper. Here we have used an initial trilinear spline constructed by sampling a function given by f{x, y, z) = e^'^t^xx yz) -^^ ^j^g points specified by a uniform 3-dimensional grid on the domain fi = [0,1]^, for varying grid sizes. Each spline was reduced by both the method based on the symmetric approach and the method treating one parameter direction at a time. The results are presented in Table 1. We see that in our implementation the method using the symmetric approach is by far the slowest method. However, at least for the type of function considered in this example the method based on the symmetric approach will give a much better reduction than the other. Knot removal for tensor product splines Start grid 100^ 1503 200^ 2503 3003 3503 4003 CPU 16.53 56.44 99.48 165.3 256.8 391.4 494.6 Knot Removal for Trilinear Splines, Tolerance e = 0.005 | Parametric, binary search Symmetric, binary search | End grid Error CPU End grid Error 72 X 65 X 65 4.93800 • 10-3 63.23 54 X 53 X 53 4.92080 • 10-^ 81 x71 x71 4.80243 10-3 122.2 51 X 49 X 49 4.77236 • 10-3 68 X 66 X 66 4.91142 10-3 4.98275 ■ 10-3 300.9 54 X 50 X 51 74 X 62 X 62 4.74970 10-3 584.8 61 X 56 X 56 4.85916 ■ 10-3 72 X 62 X 62 4.85316 10-3 1094 60 X 54 X 53 4.81551 • 10-3 75 X 65 X 63 4.77028 10-3 1312 4.92422 • 10-3 54 X 50 X 50 71 X 59 X 63 4.79631 10-3 1865 54 X 50 X 50 4.81064 ■ 10-3 TAB. 1 Knot removal for the trilinear splines of Example 2. Bibliography 1. Arge, E., Daehlen, M., Lyche, T. and M0rken, K. (1990). Constrained spline approximation of functions and data based on constrained knot removal. In: Algorithms for Approximation II, J. C. Mason and M. G. Cox (eds.), Chapman and Hall, London, 4-20 2. Brenna, T. (1998). Knot removal for multivariate tensor product splines. Master thesis, part I. Dept. of Informatics, Univ. of Oslo. 3. Brenna, T. (1998). Knot removal for linear, bilinear and trilinear splines. Master thesis, part 11. Dept. of Informatics, Univ. of Oslo. 4. Graham, A. (1981). Kronecker Products and Matrix Calculus With Applications. Ellis Horwood Series. Mathematics and its applications. 5. Lyche, T. and M0rken, K. (1986). Knot removal for parametric B-spline curves and surfaces. Computer Aided Geometric Design, 4, 217-230 6. Lyche, T. and M0rken, K. (1987). A data reduction strategy for splines with applications to the approximation of functions and data. IMA Journal of Numerical Analysis, 8, 185-208. 329 Fixed- and free-knot univariate least-squares data approximation by polynomial splines Maurice Cox, Peter Harris and Paul Kenward National Physical Laboratory, Teddington, Middlesex, TWll OLW, UK maurice.cox@npl.co.iik, peter.harrisOnpl.co.uk, paul.kenwardOnpl.co.uk Abstract Fixed- and free-knot least-squares data approximation by polynomial splines is considered. Classes of knot-placement algorithms are discussed. A practical example of knot placement is presented, and future possibilities in free-knot spline approximation are addressed. 1 Introduction The representation of univariate polynomial splines in terms of B-splines is reviewed (Section 2), leading to the problem of obtaining fixed- and free-knot £2 spline approximations (Section 3). The accepted approach to the fixed-knot case is recalled (Section 4) and the manner in which spline uncertainties can be evaluated given (Section 5). The importance of families of spline approximants is emphasised (Section 6). The free-knot problem is formulated (Section 7) and several of the established and some lesser-known knot-placement strategies reviewed (Section 8). Conclusions are drawn and future possibilities indicated (Section 9). 2 Univariate polynomial splines Let / := [a;min,a;max] be an interval of the x-axis, and Xm\„ = AQ < Ai < A2 < • • • < Ajv-i < Ajv < Ajv+i = XmHK apartition oil. A spline s{x) of order n (degree n-1) on / is a piecewise polynomial of order n on {Xj, Aj+i), j = 0,...,N. The spline s is C""''''^ at Aj if card(Af = Xj,i G {1,... ,n}) = k. The partition points A = {Aj}f are the (interior) knots of s. To specify the complete set of knots needed to define s on / in terms of Bsphnes, the knots {Aj}f are augmented by knots {Aj}j"i„ and {Xj}%+2^ q = N + n, satisfying Ai_„<---<Ao, Ajv+i < ••■ < A,. For many purposes, a good choice [10] of additional knots is Xl-n = • • • = AQ, AAT + I 330 = ■ ■ ■ = Xq. Data approximation by polynomial splines 331 ■ It readily permits derivative boundary conditions to be incorporated in spline approximants [7]. On I, s{x) has the B-spline representation [5] g s{x):=s{c,X]x) = Y^CjNn,j{X;x), (2.1) where A^„j(A; x) is the B-spline [5,12] of order n with knots {Afe}j_„ and c = (ci,..., c,) T are the B-spline coefficients of s. Each Nnj{X; x) is a spline with knots A, is non-negative and has compact support. Specifically, Nn,ji\;x) >0, xe (Aj_„,Aj), supp(Ar„j(A;a;)) = [Aj_„,Aj]. (2.2) The B-spHne basis {A^„j(A;a;)}j^;^ for splines of order n with knots A is generally very well-conditioned [10]. Moreover, the basis functions for any x £ [a;inin,a;max] can be formed in an unconditionally stable manner using a three-term recurrence relation [5,12]. Specifically, the relative errors in the values fl{Nnj{X; x)) of the basis function computed using IEEE floating-point arithmetic [18] satisfy \fl{Nn,jiX;x))-N„j{X;x)\<CnNn,j{X;x)r], where C is a constant that is a small multiple of unity and r] is the unit roundoff of the floating point processor [5]. The B-spline basis for splines of order 3 with interior knots at a; = (1,2,5)''^ and coincident end knots at a; = 0 and 10, is shown in Figure 1. 1. The B-spline basis for splines of order 3 for some nonuniformly spaced knots. The first three B-spline basis functions are shown as solid fines and the remaining three as dotted lines. FIG. Valuable properties of s can be deduced [12] from those of the B-splines. A useful property is that, for any x e I, s{x) is a convex combination of the coefficients of the B-splines whose support contains x. Thus, local bounds for s can readily be found: min j<k<j+n Ck < six) < • max j<k<j+n c^, x e fA,-,A,+i]. ^ ^' ^^ ' These bounds imply a mimicking property for s, viz., that the elements of c tend to vary in much the same way that $ varies. Figure 2 depicts a spline curve s of order 4 with "non-polynomial" shape having interior knots at x = (1,2,5)'^, coincident end knots at a; = 0 and 10, and B-spline coefficients (0.00,0.20,0.60,0.22,0.18,0.14,0.12)'^. 332 M. Cox, P. Harris and P. Kenward To reproduce this shape to visual accuracy with a polynomial would require a high degree and hence many more defining coefficients. The mimicking property is evident: successive elements of c rise, fall sharply and then gently, behaving in a similar way to s. FIG. 2. A spline curve with "non-polynomial" shape illustrating the mimicking prop- erty. 3 Fixed- and free-knot approximation Two types of data approximation (or data modelling) in the £2 norm by splines are regularly considered. One is the determination of the B-spline coefficients c for given data, a prescribed order n and prescribed knots A. The other is the determination of c and X for given data and spline order n. The former problem is linear with respect to the parameters of the spline, just c being regarded as unknown. The latter is nonlinear, both c and A being unknown. The linear case is well understood, with highly satisfactory algorithms [10] and software implementations [1, 16] available. The nonlinear case remains a research problem, although useful algorithms (Section 8) have been proposed, implemented and used. Many of these algorithms "iterate" with respect to A, where for each choice of knots the resulting linear problem is solved for c. Thus, the linear problem (Section 4) is important in its own right and as part of the solution strategy for knot-placement algorithms. 4 Least-squares data approximation by splines with fixed knots The £2 data approximation problem for splines with fixed knots can be posed as follows. Given are data points {(x,,?/,)}?', with Xi < ■■■ < x^, and corresponding weights {wi}f or standard uncertainties {ui}f. The Wi reflect the relative quality of the j/;,^ Ui is the standard uncertainty of yj and corresponds to the standard deviation of possible "measurements" a.t x = Xi of the function underlying the data, y, being one realisation. Given also are the N knots A = {Aj}f and the order n of the spline s. When weights are specified, the problem is to determine the spline s{x) of order n, with knots A, such that the two-norm of {wiei}"^ is minimised with respect to c. ■ ^The Xi are taken as exact for the treatment here. A generahsed treatment is possible, in which the Xj are alsc regarded as inexact. The problem becomes nonlinear (in c). Data approximation by polynomial splines 333 When standard uncertainties are specified, the two-norm of {«~-^ej}™ is minimised with respect to c. If u;i = u" ,i = l,...,m, the two formulations are identical in terms of the spline produced. When weights are specified, s is referred to as a spline approximant. When uncertainties are prescribed, s is known as a spline model. There are differences (Section 5) in interpretation in terms of the statistical uncertainties associated with the solution and in terms of validating the spline model so obtained. The use of a formulation in terms of standard uncertainties, together with the B-spline representation (2.1) of s, gives the linear algebraic formulation^ mine'^V^~^e, e = y - ^c, (4.1) where y = (yi, • • • ,J/m)^, A is an m x q matrix with a^j = Nnj{xi), and Vy = diag(uf,...,u^). Matrix computational methods can be applied to this formulation. As a consequence of property (2.2) of the B-splines, ^ is a rectangular banded matrix of bandwidth n [8]. The linear algebraic solution can be effected using Givens rotations to triangularise the system, back-solution then yielding the coefficients c [6]. The number of floatingpoint operations (flops) required is to first order 0{mn'^), i.e., independent of the number of knots. Hence computing a sphne model for many knots is hardly more expensive than one for a few knots. Moreover, since for many problems cubic sphnes (n = 4) yield a good balance between approximation properties and smoothness (continuity class C^), regarding the order as/ixed gives a flop count 0(m). The vector c is unique [11] if there is a strictly ordered subset t = {tj}l of x such that the Schoenberg-Whitney conditions [21] tjes\ipp{Nn,j{\;x)), j = l,:..,q, (4.2) hold. In a case where the conditions (4.2) do not hold^, an appropriate member can be selected from the space of possible solutions. Such a selection is also advisable if the conditions are in a practical sense "close" to being violated. A particular solution can be determined by augmenting the least-squares formulation by a minimal number of equality constraints for c such that A has full column rank [10]. An instance of the type of data set to which the algorithms of this paper are addressed is shown in Figure 3.- Such a data set (cf. Section 2) has the variety of behaviour that cannot readily be reproduced by some other classes of approximating functions. 5 Spline uncertainties Once a vahd spline model has been obtained, the uncertainties associated with the spline can be evaluated [9]. Uncertainty evaluations are essential in metrology, where all measurement results are to be accompanied by a quantification of their reliability [2], and important in other fields. The key entity is the covariance matrix Vc of the spline ^A further generalisation is possible in which mutual dependencies are permitted among the measurement errors. In this case, Vy is non-diagonal. ^A set of knots giving rise to this circumstance may be a consequence of an automatic knot-placement procedure. M. Cox, P. Harris and P. Kenward 334 i(Klepfln<lent vartab'B 3. A data set representing heat flow as a function of temperature. Such data forms the basis of the determination of thermophysical properties of materials under test. For clarity only every fifth data point is shown. FIG. coefficients c. Using recognised procedures of linear algebra, Vc = iA'^Vf'Ar\ (5.1) Prom this result, the standard uncertainty of any quantity that depends on c can be evaluated. Specifically, for a given constant vector p, the standard uncertainty u(p'^c) of p'^c is given by «2(pTc) = pTycP. By setting p to contain the values of the B-spline basis at a point x e I, the standard uncertainty of s{x) can be formed. The standard uncertainty of a nonlinear function of c can be estimated by first linearising the expression about the solution value of c. If weights rather than uncertainties are specified for the data, (5.1) takes the form V, = a\A^W^Ar\ where & estimates the standard deviation of the weighted residuals {wiEiJI', W = diag(wi,...,i(;TO), and evaluated at the solution. 6 Families of approximants When dealing with certain classes of approximating function it is natural and useful to consider families of approximants. A simple example is polynomial approximation, for polynomials pj{x) of order j = 1,2,...,N, for some maximum order N. Each member of the family "contains" the previous member. It is then meaningful to consider the approximation measure, e.g., the ^a-norm here, with respect to indices denoting members. Thus, the value of the fa-norm for the polynomial approximant of order j can be inspected with respect to index j for j = 1,2,..., iV. For data approximation, it is more Data approximation by polynomial splines 335 meaningful to use as the measure the root-mean-square residual given by dividing the ^2-norm by (m - j)^/^. For representative data, the expectation is that as j increases this quantity should stabilise to an essentially constant value. This property provides a useful validation procedure. If weights u~^ are used as in Section 4 this measure should settle to the value unity. Thus the approximant with index j (normally the smallest such) that achieves the value one is sought. Within most of the strategies outlined in Section 8 it is possible to produce results for A'' = 1,2,... knots, and thus to study the effect of the number of knots on the quality of the approximant. Prom such information it may be possible to select an acceptable solution. If for each number of knots, the knots contain those for the previous number, and an £2 approximant is determined, the sequence of approximants for A'' = 1,2,... knots forms a family. A family has the property that the sequence of values of the ^2-norm is monotonically decreasing. 7 Least-squares data approximation by splines with free knots The problem of least-squares data approximation by sphnes with free knots can be formulated in the same way as that for fixed knots (Section 4), except that the knots are not specified a priori, either in location or number. The formulation (4.1) no longer yields a hnear problem, since the matrix A of B-sphne values is now a function of A. Instead, e(A) = y — ^(A)c, and it is required to solve mine^(A)y-ie(A). A;c (7.1) In order to reflect the fact that for any given knot set the B-spline coefficients are given by solving a relatively simple, linear problem, formulation (7.1) can be expressed as min('mme^(A)Fjrie(A)V (7.2) Extensive use is made of this elementary result. 8 Knot-placement strategies Many knot-placement strategies have been proposed and used. Some of these strategies are outlined and their properties indicated. Several of the strategies generate a family of candidate spline approximants, with advantages for model validation. 8.1 Manual methods Manual methods can be classed as those methods for which the user examines the general "shape" of the function underpinning the data, selecting the number and location of the knots on this basis. With practice and visual aids, acceptable solutions can often be obtained [6]. Naturally, knots are chosen to be more concentrated where "things are happening" in contrast to regions where the underpinning behaviour is innocuous. 8.2 Strategies that depend only on abscissa values Strategies based on the manner in which the values of the independent variable are distributed may be used to place the knots (at points that are not necessarily the data 336 M. Cox, P. Harris and P. Kenward abscissae themselves). A facility in DASL (the NPL Data Approximation Subroutine Library) [1] provides one such strategy, based on the Schoenberg-Whitney conditions (4.2) in the following way. Intuitively, these conditions imply that there is no region where there are "too many" knots compared with the number of data points. Mathematically, these conditions guarantee uniqueness. Numerically, their satisfaction does not ensure that the solution is well-defined. If the conditions are "close" to being violated, c will be sensitive to perturbations in the data. In particular, since the behaviour of c "controls" that of s (Section 2), the spline is likely to exhibit spurious behaviour such as large undesirable oscillations if ||cl|2 > ||y||2It follows that a sensible choice of knots would be such that the Schoenberg-Whitney conditions are satisfied "as well as possible" for a data subset. Such a choice is made in DASL [1] for spline approximation of arbitrary order. It is also made in a cubic spline interpolation routine in the NAG Library [16], regarding spline interpolation as a special case of spline approximation in which q- m and N = m - n. The choice made is seen most simply by first applying it to spline interpolation. Consider the choice 1 >'j = 2^^j+[n/2\+Xj+i{n+i)/2}), j =^l,...,m-n, where [v\ is the largest integer no larger than u. For n even, \j = Xjj^n/2- Thus, the choice tj = Aj_„/2 would be made. However (Section 2), supp(JV„,j) = [Aj_„,Aj]. Thus, indexwise, the Schoenberg-Whitney conditions are satisfied as well as possible in the sense that the index of Xj-n/2 falls halfway between the indices of the support endpoints Aj_„ and Xj. Comparable considerations apply for n odd. Precisely this choice is recommended [14, 16] in the context of cubic sphne interpolation. It is the "not a knot" criterion, as a practical alternative to the classical use of boundary derivatives. A knot is placed at each "interior" data value Xi apart from X2 and Xm-iThe above choice can be interpreted as follows. Consider the graph x = F(£) given by the join of the points {{i,Xi)}^. The jth interior knot, Xj, for j = 1,... ,m - n, is given by F(j + n/2). The successive spacings between the index arguments of F for j = 0,...,N + 1, using F(0) = x„,in and F{N + 1) = Xmax, are therefore 1 + n/2,1,..., 1,1 + n/2. AT-l For approximation, these successive spacings are proportionally increased to account for the fact that there are fewer knots. The resulting expression for the jth interior knot is A,- = F(l + (m - l)(j + n/2 - l)/(g - 1)), j = l,...,N. The choice can be interpreted as placing the interior knots such that there is an approximately equal number of data points in each knot interval (interval between adjacent knots), except that in the first and the last interval there are approximately n/2 times as many points. The strategy [1] has the property that when N is such that the data is interpolated, the choice of knots agrees with one of the recommended choices for spline interpolation.'* *The approach tends to give better knot locations if the data is gathered in a manner which ensures that the local Data approximation by polynomial splines Figure 4 illustrates the above strategy for a spline interpolant and approximant of order 4 to data with abscissae x = (0,0.25,0.5,0.75,1,1.25,1.5,1.75,2,3,4,5,7.5,10)^. Each figure shows the graph x = F{£). For the interpolant (left-hand graph), ten knots are chosen to coincide with the abscissa values X3,... ,a;i2. For the approximant (righthand graph), four knots are chosen such that there are two points in each interval, excepting the first and last interval where there are four points, i.e., n/2 = 2 times as many. The distribution of the knots reflects that of the abscissa values. J / / / < cr FIG. 6 6 6 6 o-o-cwMw9 4. A knot placement strategy depending only on the abscissa values. A simpler strategy is to select uniformly spaced knots. The Schoenberg-Whitney conditions will not necessarily automatically be satisfied by such a choice, and the spline approximant would therefore not be unique, although the approach indicated at the end of Section 4 could be applied. 8.3 Sequential knot-insertion strategies In a sequential knot-insertion strategy, a succession of approximants is obtained, in which for each approximant a knot is inserted in the knot interval that gives rise to the greatest contribution to the £2 error. A knot interval is an interval between adjacent knots, where the endpoints of I count as knots for this purpose. Previously inserted knots are retained undisturbed. Several variants are possible (also see Section 8.10), e.g.: • Start the process with a number of knots already in place, perhaps obtained from information specific to the application. • Candidate positions for a new knot are * The continuum of points within the interval. The approach gives rise to the minimisation of a univariate function that may possess local minima. * The subset within the interval of a discrete set of points chosen a priori, e.g., the data abscissa themselves or a uniformly spaced set of a;-values. The approach density of the data is greater in regions where the behaviour of y is more marked. 337 M. Cox, P. Harris and P. Kenward 338 gives rise to a finite computation for the globally-best choice of knot, relative to the discretisation, with respect to previous knots. • More than one knot can be inserted at a time. Doing so gives an approach that is intermediate between full optimisation (Section 8.6) and sequential (single) knot insertion. Computation times rise rapidly with the number of "simultaneous" knots so inserted, so in practice only a small number, say two or three, might be feasible. The "upper set" of crosses in Figure 5 shows the root-mean-square residual as a function of the number of knots for the application of this strategy to the thermophysical data of Figure 3. N: number of knots 5. The root-mean-square residual as a function of the number of knots for the application of knot-insertion and knot-removal strategies to the thermophysical data of Figure 3. The "upper set" of crosses indicate the values obtained for knot insertion and the lower for knot removal. The knot-removal strategy starts with the knot set provided by the knot-insertion strategy, which was terminated after 81 knots had been placed. The figure depicts the root-mean-square residual on a logarithmic scale, so its value varies by a factor of 1000 from 1 to 81 knots. FIG. 8.4 Sequential knot-removal strategies In a sequential knot-removal strategy, the starting point is an initial spline approximant having a "large" number of knots that typically would be regarded as an acceptable approximant to the data and that contains (perhaps many) more knots than desired. Also see Section 8.10. Each successive approximant is obtained from the previous approximant by deleting one (or more) knots. The knot selected for removal is chosen as that having least effect in terms of the change in the ^2 error. The process is continued until an acceptable approximant is no longer obtained. The initially large number of knots (Section 8.10) provides an appreciable number of candidate knots for removal and thus greater flexibility. The rationale is that in Data approximation by polynomial splines 339 contrast to successive knot insertion a succession of acceptable approximants is obtained as opposed to a succession of unacceptable approximants, until the final "solution" is provided. There are variants, as with sequential knot insertion. For example, several knots can be removed at each stage. A different class of knot removal algorithms [20] is based on a general class of ^p norms. It is not concerned specifically with data approximation, but with replacing an initial spline approximant (that may correspond to an approximant) by one that is acceptably close according to the measure. The "lower set" of crosses in Figure 5 shows the root-mean-square residual as a function of the number of knots for the application of this strategy to the thermophysical data part-depicted in Figure 3. 8.5 Theory-based approaches The distance of a spline s{x) with knots A from a sufficiently differentiable function f{x) is proportional to /i"|/'"HOI) where h is the local knot spacing and £, is a value of X [14]. Consider inverting this expression in order approximately to equaUse the error with respect to x. The lengths of the knot intervals should consequently be chosen to be proportional to \f^"'\0\~^^"'j where ^ is a value in the neighbourhood of the respective knot interval. Consider the function F{x)= |/("Hi)l'/"^V / |/("Ht)r/"rfi • (8-1) Take knots given by ^A,-)-^, i = l>--->^- * (8-2) This result corresponds to dividing the range of the monotonically increasing function F{x), for a; e /, into A'' +1 contiguous subranges of equal length, taking the values of x corresponding to the subrange endpoints as the knots. In practice /, let alone F, is unknown. Various efforts have been made to estimate / and hence F from the data points. For instance, if the data is approximated by a spline of order n-f 1, its nth derivative, a piecewise-constant function, can be used to estimate F [3]. It is then straightforward to form the required knots. The approach begs the question in the case of data. In order to estimate knots for a spline of order n, it is first necessary to construct a spline approximant of order n -|-1 for the data, the construction of which itself requires a choice of knots. Alternatively [13], a spline approximant of order n for the data can be constructed for some convenient choice of knots. Its nth derivative is of course zero (except at the knots). However, its (n — l)th derivative is piecewise constant, a function that can be approximated by the join of the mean values at the knots of the constant pieces to the immediate right and left, with special consideration at the endpoints of /. The derivative of this piecewise-linear function then provides a piecewise-constant representation of the nth derivative, that can be used as before. Knots can then be deduced from this form as above. The advantage of this approach is that it can be iterated [13]. If the process "converges", the result can be used to provide the required knot set. The process can 340 M. Cox, P. Harris and P. Kenward work well, but is capable of producing disappointing results. Several variants of the basic concept are possible. The approach warrants careful re-visiting. 8.6 "Overall" optimisation approaches For any given value of A^, the problem is regarded as an optimisation problem with respect to the overall error measure. It is necessary to provide a sensible initial estimate of the knot positions. Local solutions which may be grossly inferior to the global solution are possible [4]. At an optimal solution, knots may coalesce, thus reducing the continuity of the spline at such points [19]; the same comment applies to the sequential-knot-insertion and optimisation approach (Section 8.7). 8.7 Sequential knot insertion and optimisation Sequential knot insertion with optimisation is identical to the sequential knot-insertion strategy (Section 8.3) except that, after each knot is inserted, all previously-inserted knots are adjusted such that the complete set of knots at that stage are (locally) optimal with respect to the overall error measure. One such strategy [15] carries out the optimisation at each stage by adjusting in turn each knot in the current knot set in order to achieve satisfactory reduction in the £2 norm, and repeating the complete adjustment as necessary. This strategy is not as poor as the traditional one-variable-at-a-time strategy for nonlinear optimisation because knots far from the newly-inserted knot tend to have little effect on the error measure. Buffering to prevent knots coalescing and reducing the continuity of the approximant can be used. Various features can be incorporated to improve computational efficiency, including the use of contemporary nonlinear least-squares optimisation. It is emphasised that for each choice of knots the problem is linear (cf. Section 7). 8.8 Optimal discontinuous piecew^ise-polynomial approximation Consider the class S^ of splines having N interior knots of multiplicity n (i.e., nN interior knots in all, counting coincidences). An s € SAT will in general be discontinuous at these knots. It is possible to determine the globally optimal locations of such knots, using the principle of dynamic programming [4]. The approach is based on the fact that the best approximant SN e SN to the leading p (> nN) data points is given by the best over g = nN -n + l,nN - n + 2,. ..,p- N of s^-i 6 SN-I for the leading q <p- N points, together with a polynomial piece of order n over points g -|-1 to p. By this simple recursive means the globally best knots for splines of any order that are discontinuous at any number of knots can be computed. Such a solution may not be suitable as the final result in an application. However, it can be useful as part of a knot placement strategy. For example, suppose good knots for a spline of order n are required. An approach would be to determine an optimal discontinuous sphne of order n+l. Use this spline to estimate / in expression (8.1). The integral in the numerator of (8.1) will be continuous piecewise linear and estimates of the optimal knots for a C^""^^ spline readily obtained from (8.2). Mixed results have informally been obtained by the authors with an implementation of this approach. It is suggested that it be revisited. Data approximation by polynomial splines 8.9 Knot dispersion A set of knots of multiplicity n is positioned using an appropriate strategy, such as that in Section (8.8) and a C^^^\l) spline with these knots determined. Each of these multiple knots is "dispersed", viz., replaced by n nearby simple knots, and a replacement (7("-2)(/) spline computed. A careful strategy for knot dispersion is required. Again, informal experiments have been made by the authors and mixed results obtained. 8.10 Knot initialisation and candidate knot locations Several of the above procedures require or can benefit from an initial placement of the knots. Some make use of "candidate knot locations". The solution to the free-knot spline approximation problem returned by iterative algorithms typically depends on the starting set of knots. Although an algorithm may return a result that satisfies the necessary and sufficient conditions for a solution [17], this result may be locally rather than globally optimal. There is no known characterisation of a globally optimal solution. The careful interpretation of solutions is therefore important. The use of candidate knot positions can be helpful. For instance, it may be decided that for splines of even order, only knots that coincide with data abscissae are in the candidate set, or, for splines of odd order, knots only at points mid-way between adjacent data abscissae may be so regarded. Such criteria are consistent with the choice for interpolating splines and the generalisation covered in Section 8.2. The Lyche-M0rken knot removal algorithms [20] use data abscissae as candidate knots. The use of a finite number of candidate knot locations helps to reduce the dimensionality of the problem: there can then only be a finite number of possible knot sets. For large N this number can be extremely large, making it prohibitive to examine all possibilities. However, for small A'', e.g., 1, 2 and 3, it may indeed be possible, and can pay dividends. Knot insertion and knot removal algorithms can also implement the concept. For example, at each stage of a knot insertion strategy, two or three knots can be inserted "simultaneously". By the method of their introduction these new knots will be optimal relative to the knots previously used and the available candidate knot locations. Another aspect of a candidate knot set is that if it is suflBciently dense it will contain, to a degree of approximation dictated by its "spacing", the optimal knots for the given data set [19]. For instance, consider a set of m » 100 data points specified over an interval / normalised to [—1,1]. Take 100 uniformly spaced points spanning this interval. This set will contain, to approximately two figures, each globally optimal knot set having N <98 knots^ (assuming all knots are simple). If a spline based on these 98 candidate interior knots provided a valid model, a suitable knot removal algorithm might be expected to be able to identify reasonably closely the optimal knot sets. Work is required to determine the degree of success in this regard. , 9 Conclusions, discussion and future possibilities There are theoretical difficulties associated with existence, uniqueness and characterisation of best free-knot £2 spline approximants, which influence practical considerations. "The two endpoints do not constitute interior knots. 341 M. Cox, P. Harris and P. Kenward 342 A best spline in the class of splines required may not exist. Take as {xi}i', m = 21 uniformly spaced values in [-1,1] and j/,; = |.T,|^. TO see that a best £2 spline s of order 4 with three interior knots for this data may not exist, consider the choice Aj = -e, A2 = 0 and A3 = e. The £2 error can be made smaller than any given 5 > 0 for some e > 0. However, if the £2 error is made zero by the choice e = 0, the resulting three coincident knots at re = 0 mean that s has lower continuity than the class of splines considered. In practice, allowing knots to come "too close" together can introduce undesirable "sharpness" into the approximant. Buffering of knots [15], to ensure a minimal separation helps in this regard. The use of a candidate knot set introduces a form of buffering. In some circumstances the coalescing of knots would be ideal in terms of the resulting closeness of s to the data. In some applications the loss of smoothness would be unacceptable. Therefore, whether buffering is appropriate depends on the use to be made of s. The solution may not be unique. Figure 6 shows a set of 201 uniformly spaced points in [-1,1] taken from /(x) = sign(a;)min(a;, 1/2). Figure 7 shows the root-mean-square residual as a function of knot location for £2 splines of order 4 with one interior knot. There are two best approximants, one with its knot at x = -0.63 and the other at X — -1-0.63. One of the two approximants is shown in Figure 6. The other spline is its skew-symmetric counterpart. Raw data and spline fll 6. 201 uniformly spaced points in [-1,1] taken from f{x) = sign(a;) min(x, 1/2) and a best £2 spline approximant with one knot. FIG. It is rarely required to determine an £2 spline approximant that is globally or even locally optimal with respect to its knots. An approximant that met some closeness requirement with the smallest possible number of knots is an academic rather than a pragmatic objective. Today, the more important consideration is to obtain an approximant that represents the data in that its smoothness is consistent with that of the function underlying the data and the uncertainties in the data. (This statement must be qualified for situations where the continuity class of splines is a consideration as discussed Data approximation by polynomial splines FIG. 7. The root-mean-square residual as a function of knot location for £2 spline approximants with one knot to the data of Figure 6. above.) These ends may be achieved by seeking an approximant with a reasonable but not necessarily optimal number of knots. The use of knot removal strategies is likely to attract research effort in the future. One reason for this statement is that the need to work with large initial knot sets is not as computationally prohibitive with today's powerful personal and other computers. Another reason is that the approach can be expected to produce better approximants, i.e., smaller £2 errors for the same number of knots. The two sets of crosses in Figure 5 correspond to the values of the root-mean-square residual as a function of the number of knots for the application of the knot-insertion strategy followed by the knot-removal strategy for the thermophysical data of Figure 3. The two sets, where the "progress" takes place from left to right along the "top set", followed by right to left along "the bottom set", constitutes a form of hysteresis. The behaviour in the two directions is distinctly different. In particular, the figure indicates that once an acceptable approximation has been obtained by knot insertion, the use of knot removal can deliver an approximation of comparable quality with many fewer knots or alternatively for the same number of knots an appreciably better approximation can be obtained. In this case, with 30 knots, knot removal gives an £2 error that is one quarter of that for knot insertion. For an £2 error of 0.005, 30 knots are required using knot removal and 43 using knot insertion. Large data sets, as are now frequently being produced in metrology from computercontrolled measuring systems, are ideal for the purpose of obtaining a sound initial approximant in the form of a valid model containing possibly many more knots than the minimum possible. Their size permits initial approximants to be obtained, even with large numbers of uniformly spaced knots, that provide valid but highly redundant models for the data. The fact that such sets do not contain "appreciable gaps", because of the manner in which they gathered, means that this fact together with the quantity of data far outweighing this initial number of knots goes a long way towards ensuring that this 343 344 M. Cox, P. Harris and P. Kenward initial approximant is valid. There is much scope for an appreciable number of knots to be removed. The initial large number of knots may also have been obtained by the use of a knot insertion strategy. It is the experience of the authors that knot insertion can introduce appreciably more knots than given by the optimal choice. Because the early approximants may be far from optimal, an insertion algorithm can produce knots that are totally different from those in an optimal approximant. In contrast, a knot removal algorithm has a possibility to obtain good knots. (See Section 8.10.) For instance, because of the sequential manner in which knots are inserted, there may be two or more close or even coincident knots, although a good knot set might not have this property. It is also possible that such knots, although not part of an optimal set, are influential in their effect on a knot removal algorithm, with the result that they appear in the "final" approximant. The problem of data containing wild points is not addressed satisfactorily by existing knot placement algorithms. Because such points are responsible for a large contribution to the ^2 error, more knots would be placed in the neighbourhood of such a point than would otherwise had been done. The knot placement strategy can then be influenced more by the errors in the data than by the properties of the underlying function. Formulations and hence algorithms are needed that have greater resilience to such effects. In solving the fixed-knot spline approximation problem as part of the free-knot problem, a knot set differs from a previous knot set only by the addition or removal of a small number of knots. In linear algebraic terms the "new" matrix A{\'), say, differs in only a few rows from the previous matrix A(A). Considerable gains in computational efficiency can be obtained by accounting for this fact. This paper has not addressed this issue, concentrating more on the concepts in the area. There is much scope, however, for the application of the recognised stable updating and downdating techniques of linear algebra [17]. Their application will not reduce the computational complexity of a procedure, but could reduce computation times for large problems by an appreciable factor. The work described here was supported by the National Measurement System Policy Unit of the UK Department of Trade and Industry as part of its NMS Software Support for Metrology programme. The referee provided carefully considered comments that permitted the paper to be improved. Bibliography 1. G. T. Anthony and M. G. Cox. The National Physical Laboratory's Data Approximation Subroutine Library. In J. C. Mason and M. G. Cox, editors. Algorithms for Approximation, pages 669-687, Oxford, 1987. Clarendon Press. 2. BIPM, lEC, IFCC, ISO, lUPAC, lUPAP, and OIML. Guide to the Expression of Uncertainty in Measurement, 1995. ISBN 92-67-10188-9, Second Edition. 3. H. G. Burchard. On the degree of convergence of piecewise polynomial approximations on optimal meshes. j4mer. Mat/i. 5oc., 234:531-559, 1977. 4. M. G. Cox. Curve fitting with piecewise polynomials. J. Inst. Math. AppL, 8:36-52, 1971. Data approximation by polynomial splines 5. M. G. Cox. The numerical evaluation of B-splines. J. Inst. Math. Appl., 10:134-149, 1972. ■ 6. M. G. Cox. A survey of numerical methods for data and function approximation. In D. A. H. Jacobs, editor. The State of the Art in Numerical Analysis, pages 627-668, London, 1977. Academic Press. 7. M. G. Cox. The incorporation of boundary conditions in spline approximation problems. In G. A. Watson, editor, Lecture Notes in Numerical Analysis 630: Numerical Analysis, pages 51-63, Berlin, 1978. Springer-Verlag. 8. M. G. Cox. The least squares solution of overdetermined linear equations having band or augmented band structure. IMA J. Numer. Anal, 1:3-22, 1981. 9. M. G. Cox. The NPL Data Approximation Subroutine Library: current and planned facilities. AM G iVewsHier, 2/87:3-16, 1987. 10. M. G. Cox. Algorithms for spline curves and surfaces. In L. Piegl, editor. Fundamental Developments of Computer-Aided Geometric Modelling, pages 51-76, London, 1993. Academic Press. 11. M. G. Cox and J. G. Hayes. Curve fitting: a guide and suite of algorithms for the non-specialist user. Technical Report NAG 26, National Physical Laboratory, Teddington, UK, 1973. 12. C. de Boor. On calculating with B-splines. J. Approx. Theory, 6:50-62, 1972. 13. C. de Boor. Good approximation by splines with variable knots II. In G. A. Watson, editor. Numerical Solution of Differential Equations, Lecture Notes in Mathematics No. 363, pages 12-20. Springer-Verlag, Berlin, 1974. 14. C. de Boor. A Practical Guide to Splines. Springer-Verlag, New York, 1978. 15. C. de Boor and J. R. Rice. Least squares cubic spline approximation II - variable knots. Technical Report CSD TR 21, Purdue University, 1968. 16. B. Ford, J. Bentley, J. J. du Croz, and S. J. Hague. The NAG Library 'machine'. Software - Practice and Experience, 9:56-72, 1979. 17. P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic Press, London, 1981. 18. IEEE. IEEE standard for binary floating-point arithmetic. Technical Report ■ ANSI/IEEE standard 754-1985, IEEE, IEEE Computer Society, New York, USA, 1985. 19. D. Jupp. Non-linear least square spline approximation. Technical report, Fhnders University, Austraha, 1971. 20. T. Lyche and K. M0rken. A discrete approach to knot removal and degree reduction for splines. In J. C. Mason and M. G. Cox, editors, Algorithms for Approximation, pages 67-82, Oxford, 1987. Clarendon Press. 21. I. J. Schoenberg and Anne Whitney. On Polya frequency functions HI. Trans. Am. Math., 74:246-259, 1953. 345 On the approximation power of local least squares polynomials Gleg Davydov Universitdt Giessen, Mathematisches Instittd, D-35392 Giessen, Germany. oleg.davydov@math.uni-giessen.de Abstract We discuss the relationship between the norm of the local discrete least squares polynomial approximation operator, the minimal singular value (T,nin(-P=) of the matrix F= of the evaluations of the basis polynomials, and the norming constant of the set of data points H with respect to the space of polynomials. Since these three quantities are equivalent up to bounded constants, and since CTminC-Ps) can be efficiently computed, it is feasible to use fminC^H) ^ * tool ^°^ distinguishing good local point constellations, which is useful for scattered data fitting. In addition, we give a simple new proof of a bound by Reimer for the norm of the interpolation operators on the sphere and extend it to discrete least squares operators. 1 Introduction Let fi be a bounded domain in R*^, rf > 1, and let E = {^i,..., Cm} be a set of scattered points in Q. Given the values /]= = ifi^i), ■■■, /(Cm))'^of an otherwise unknown function / : Q —> R, we want to reconstruct / from these data. The least squares method consists in choosing some linear independent functions pi,... ,p„ on Q, n < m, and computing the coefficients ai, • • •, On G R that minimize the £2 norm of the residual on E, m , ||/|E-p|E||2=(El/te)-P(eOl') -tin , with p = axpi -\ h anVn eV := span{pi,... ,p„}. Let P\s := spanfpiln, • • ■ ,Pn\s}If dimT'ln = n, then the least squares solution is unique, and we denote it by L-p^^f. Note that the minimum norm solution available in the case of a rank deficient problem (dim'PlH < n) seems less useful since in general it does not reproduce the elements of V exactly. The computation of least squares approximation L-p^sf of / is expensive if m and n are large. To obtain a scattered data fitting algorithm with linear complexity with respect to the size of data, a two-stage method [8] can be employed which consists in 1) covering the original domain fl with a number of subdomains fi^ each containing only a small subset Sfc = E fl Clk of E, computing local approximations to the data in Efc, and 2) using the information obtained from these local approximations to build the final approximation of the (possibly huge) original data set. The least squares method 346 Approximation power of local least squares can be employed in the local approximation stage, especially to deal with "real world" data usually contaminated with errors or just containing undesirable "high frequency" components. If V is chosen to be the space 11^ of algebraic polynomials in d variables of a suitable degree q, then n = ( ^'). To achieve high approximation order, it is desirable to choose q such that n is only a little smaller then m. However, this is not always possible due to the rank deficiency or ill-conditioning of the least squares problem, which is especially difficult to control if ^i,...,^m G S^ are unevenly distributed in fifc. This difficulty can in principle be overcome by constructing, for each S^, a suitable subspace of higher degree polynomials (least interpolation space [2]). If, however, the polynomial degree is not allowed to exceed a fixed small value, then a common practical approach is to choose larger sets H^ C E, with m substantially greater than n, see e.g. [4] where it is suggested to use for local least squares approximation m = 11 points if P = III with n = 6 and m = 15 points if P = 11^ with n = 10. However, even these higher m provide no guaranty that the matrix Ps^-=\Pj{^i) ■ i = 'i-,---,rn, j = l,...,n] of the local least squares problem will always be well-conditioned. Moreover, for some data, this method may lead to the use of inappropriately distant points for the local approximation. The purpose of this paper is to draw attention to the fact that the conditioning of the matrix PH^ is not only the issue of numerical stability of the computation of least squares. Indeed, the reciprocal of the minimal singular value u^miPs) of PH provides a bound for the norm of the least squares operator L-p^s if both m and n are small. Therefore, the approximation power of local least squares depends on (JminC-Fk) and the best approximation from V. Since cr^in{Ps) can be efficiently computed for a small matrix PE by well known numerical algorithms, it is feasible to use it as a tool to decide whether a particular portion of data is suitable for building local least squares approximation from P with reasonable approximation power. If cr^in{Ps) is too small, then either H or V should be modified, e.g. by adding more points to E or using an appropriate subspace of P. A two-stage algorithm for fitting large irregularly distributed scattered data sets employing the conditioning of the local observation matrices PH^. is studied in [3, 5]. The paper is organized as follows. In Section 2 we discuss the relationship between the norm of the discrete least squares approximation operator, the minimal singular value (Tmin(-PH), and the norming constant !/(P,H). As a by-product, we obtain a new proof of a known bound for the norm of the interpolation operators on the sphere [7], and extend it to the discrete least squares operators. Section 3 illustrates the above concepts in the univariate case, when they are also related to the separation distance of S, while Section 4 is devoted to a discussion of the least squares multivariate polynomial approximation. 347 348 Oleg Davydov 2 Bounds for HI'P.HII and approximation error Let pi,..., p„ be linearly independent continuous functions on n c R'' spanning a linear space V. Since all norms on a finite dimensional linear space are equivalent, there are positive constants Ki, K2 such that < K2\\a\\2 ^l||«l|2< (2.1) c(n) for any coefficient vector a = (fli,..., a„)'" e R". Given H = {6,-..,^m} C fi, we consider the matrix Ps € R™^" as defined in the introduction. Obviously, rankPs = dimT'lH- If PE has full rank, then dimPln = n, and the least squares approximation Lp^sf is uniquely determined, giving rise to the operator Lp,E : C'(fi) -* "P C C{Q). It is easy to see that ip = exactly reproduces the elements of V, i.e., Lp,BP = P, all p£V. (2.2) Therefore, a standard argument shows that \\f-Lp,Bf\\c(n)<{l + \\Lv,B\\Mf,V)cin), (2.3) where E{f, V)c{n) denotes the error of the best approximation of/ from V in Chebyshev norm, E{f,P)cm:=ini\\f-p\\ciu)Thus, an estimate for ||LP,E|| immediately gives an upper bound for ||/ — L-p=f\\c(Q)The norming constant v{V, H) of S with respect to V [6] can be defined by KP,H) = min||p|H||oo/lbllc(n). pEV (2.4) Given any matrix A, we denote by CTmmi-^) the minimal singular value o-min(^)= min ||^a;||2. ||X||2 = 1 Recall that if A has full rank, then c7^\ri{A) = ||>1''"||J'', where A+ is the pseudoinverse of A, seee.5. [1]. Theorem 2.1 //rankPH = ", then i^l/^min(PH) < 1MP,S)< \\LvM\ Ili^'.Ell <K2^f^|<yrmr.{PE), <v/^MP,H), K,V{'P,'^)< ^mm(P3) <i^2V^K^,H). (2.5) (2.6) (2.7) Proof: We first prove (2.5). Let Lp^zf = Yll=i C'jPj- It follows by a well-known result in numerical linear algebra that the vector a = (oi,... ,a„)^ can be computed as the product of the pseudoinverse Pj" of PE with the vector /|=. Therefore, IHb = ||PEV|H||2 < ||PE+I|2||/|H||2 =CT-J„(PE)||/iE||2. Approximation power of local least squares 349 Since \\L'p,3f\\ciQ) ^ ^2||a||2 and ||/|H||2 < V^II/IHUOO < \/?ri||/||c(f!), the upper bound in (2.5) follows. To prove the lower bound in (2.5), we choose a function / e C(f2) such that ||PHV1E||2 = ||PH'"II2||/IH||2, II/IEIIOO = ||/||c(fi), which is obviously possible. Then by (2.1) we have \\Lv,Encin)>K^\\P+M2 = K^a^JPs}\\f\Eh, which implies the desired lower bound since ||/|E||2 > H/IHUOO = ||/||c(n)Since \\L-p^sf\\c{Q) < '^~^{'P,'^)\\{L-p,3f)\s\\oc>, the upper bound in (2.6) follows by ||(i7',H/)|E||oo < i|(ip,E/)|H||2 < ||/|E||2 < V^||/||c(n). To prove the lower bound, we denote by p an element of V for which the minimum in (2.4) is attained and choose a function / G C{Cl) such that /|H =P|E and ||/||c(n) = ||/|E||OOThen by (2.2), IliT'.s/lhfi) = l|p||c(n) = t^-'(P,H)||p|H||oo - i^-'(P,E)||/lc(Q)^ which implies ||L-p=li >'^"H'PjE)We finally establish (2.7). For any p G V, let p = YJj=i o-jPj and a = (ai,... ,o„)^. Then p\s — Psa and hence lb|E||oo<||PEa||2<V^|b|H||oo. Since o"min(-PE) = min ||PHa||2/||a||2, (2.7) follows by (2.1). In view of (2.3), the upper bound in (2.5) implies D \\f-Lp,Bf\\cia)<{l + K2V^/<Jmin{PE))E{f,r)cia), (2.8) which shows that the approximation power of discrete least squares proportionally reduces if (TminiPE) (or uCPjE)) is small. We will discuss some practical consequences of this fact in the next two sections. Although uCPjE) gives tighter bounds for ||Lp=||, a^in{Ps) has a clear practical advantage that it is easily computable by using e.g. the singular value decomposition of the small "local" matrix Pn. On the other hand, the norming constants were used in [6, 9] to derive estimates for the approximation error of radial basis function interpolation and moving least squares, respectively. Remark 2.2 li pi,... ,pn is an orthonormal basis for V, then ||a||2 = ||p||z,2(n)) P = Y^^^iO-jPj, and the constants Ki,K2 in (2.1) are closely related to Nikolskii constants of the space V, namely, where -^9i,g2(^) :=max|bllL,i(n)/lblli,,(f2), • l<9i,92<oo. 350 Oleg Davydov In particular, if Q = S''-"^, the unit sphere in R**, and {pi,... ,p„} is the set of spherical harmonics forming an orthonormal basis for the space P = W^ of spherical polynomials of degree g in d variables, then it is not difficult to prove that K^ = Noc,2{'Hq) = y/n/\S'^~'^\, where \S^~^\ denotes the surface area of S'^"'^. Therefore, for any set E C S'^~'^ with #E = m > n, we have by (2.5), WLnid < \lnml\S<^-mcr^^{Ps\ (2.9) which recovers in the case of interpolation (m = n) an error bound by Reimer [7] originally proved by using Lagrangian square sums (see also [10]). 3 Univariate polynomials Let n be an interval [-/i, /i] on the real line R, and let Then V is the restriction to [-h,h] of the space Yl\^i of all univariate polynomials of degree at most n-l. By the well-known interpolation properties of the univariate polynomials, rankPu = n for any S = {^i,... ,^m} C [-h,h], m > n, with distinct ^,:'s. For any E' = {^jj,... ,4in} C E, let qs' denote the separation distance, 9E':= The Lebesgue constant estimated as HLP.E'II ;Jmjn|6/-C^J■ of the corresponding interpolation scheme can be easily Since S' may be any subset of E of cardinality n and since i/{V, E) > HI-P.E' W"^, we get ''"'(^'^^-(Sj!^''/^-")""'' where QE,n ■= max qs Hence, by (2.3) and (2.6), 11/ - Ln^^_^,sf\\ci-hM < (l + T^Tiyr^'^/'^^'"^""') ^(/'nn-i)c[-...i- (3.1) This last estimate shows that the univariate least squares polynomials have the approximation power of the best local polynomial approximation as /i —> 0 provided h/qs,n remains bounded. However, if the scattered points £,i,...,^vi € [-h,h] are clustered together in at most n-l very tight groups, then qs,n may be arbitrarily small, thus forcing the right hand side of (3.1) to blow up. To figure out what happens to 11/ - Lxii ,sf\\c[-h,h] in these circumstances, we consider the following example. Approximation power of local least squares 351 Let h = l,n = 2, f{t) = i^ _ 1/2, and 5 = {-^, 0, £,} for some 0 < ^ < 1. It is easy to see that Lnj^/ = -1/2 + 2^2/3, gince jE;(/,n])c[-i,i] = 1/2, we have 11/- inl,H/llc[-M] = 1/2 + |l/2 - 2^V3| < 2£;(/,ni)c[-i,i] even though, by a simple calculation, l|ini,Hll = 1/3 + 1/^, ^/2/<Tmin(PE) = l/K^,S) = l/fe,2 = 1/^-^00 as ^->0. This may contribute to the opinion that Hini^ll) crmin(-f=), i^i'P,'^) and qE,n are not the right quantities to describe the behaviour of the approximation. Indeed, as the three points -^,0,^ coalesce, L^i ^f converges to a Hermite interpolation polynomial provided the entries of PE as well as the values of /|H are exact. However, if we simulate "real world" data by adding to /(-0)/(0)i/(0 normally distributed errors with standard deviation 10^^, then the picture substantially changes. Table 1 shows that 11/ — Ljii^fWci-i,!] does blow up in this case. For comparison we also include in the table the error of 11/- Ln\H/llc[-i,i] for the same contaminated data. TAB. 1 Average {d^ean) and maximum (dJnax) of 11/ - inJ,H/llc[-i,iI as well as maximum of ||/ - Lni,E/llc[-i,i] (<^max) ™ 1000 t^^t^ with contaminated data e '^mean ''max '^max 10-3 1.06 1.24 1.00018 10-^ 1.56 3.39 1.00018 10-5 6.63 24.9 1.00018 10-6 57.3 10-^ 240 2390 1.00018 1.00018 564 10-^ 5630 23900 1.00018 Thus, if qB,n is too small, we cannot practically achieve with least squares the approximation order of E{f, 'n.n_i)c[-h,h] simply because the points lying too close to each other carry redundant information and we have at most n — 1 clusters of such points. Therefore, we should adjust the polynomial degree to the given data paying attention to the trade-off between higher approximation power of higher degree polynomials and the "pollution" caused by the factor q^\ that increases with n. In practice one may choose maximal n such that h/qs,n is smaller than a prescribed tolerance value 0 < E <oo. 4 Multivariate polynomials The situation becomes substantially more complicated when we turn to multivariate polynomials. Let fi be a bounded domain in IRf and let {pi,... ,p„}, n = ( ^'), be a basis of the space V = Hg of polynomials in d variables of total degree q satisfying (2.1) on n. (For example, we may consider a properly scaled standard power basis with the center at a point in Q or the Bernstein-Bezier basis with respect to some simplex overlapping Q or a significant part of it.) Let, furthermore, S be an arbitrary finite set of points in fi such that m = #H >n. The first problem we face in the case d > 2 is that the matrix Ps may be rank deficient. It is clear, however, that there is no practical difference between this situation and the 352 Oleg Davydov one when Ps has full rank but is extremely ill-conditioned, i.e., CminC-PE) is very small. Moreover, (2.8) shows that even moderately small c7^m{Ps) may significantly reduce the approximation power of Lp^z- Clearly, the same can also happen in the univariate case if qs,n is too small. The real difficulty of the multivariate case seems to be that simple characteristics of E, like separation distance qs,m do not give much information about the norm of ip=- For example, six equidistant points on the unit circle in R^ are well separated and look reasonably distributed. However, they are not good for least squares approximation from the space U^ since the matrix Ps is rank deficient. Suitably perturbed, these points will give rise to the least squares operator in^.E with a very large norm. More generally, the norm of I'n<',H will be large if the points in E C R'* lie "too close" to an algebraic hypersurface of order q. If the data are comparatively dense in ft, namely the fill distance /i=. fj := sup min |a; - ^1 does not exceed some small positive constant depending on ft and the polynomial degree, then the estimates of the norming constant 1/(11^,5) given in [9] provide a bound for ll-^nrfin.slli i"^ ^i^w of (2.6). For example, if fi is a ball of radius r, then i/(n^|n,H) > 1/2 iihE,Q<0.nr/q^. On the other hand, without any density assumptions we can always rely on (2.8), where crj„in(-fE) can be efficiently computed by well known algorithms of numerical linear algebra. In some sense, small <Tmin(-PE) indicates that the local data has "hidden redundancies" {e.g. too many points lying very close to the same straight line or the same ellipse) that prevent it from carrying enough information for a "full power" approximation of the underlying function from 11^. Similar to the univariate case, but using (^mm{Ps) instead of qs,n, we can adaptively choose the polynomial degree according to the following algorithm that has proven to be useful for scattered data fitting [3, 5]. Let n C R'', E C 0, #E = m. Denote by PJ the matrix of the evaluations of appropriate basis functions for 11^, g > 0, at the points ^ G E. Algorithm 4.1 Starting with some q = qo >0 such that ( ^'') < m, compute aminiPs.)If l/crmin(-fE) '* smaller than a prescribed tolerance E < oo, then compute the least squares U'^-approximation to the data in E and accept it as a reliable approximation on Cl. Otherwise, repeat the same with q = qo — 1 and successively redxLce the degree q to go — 2,... ,0, while l/crini„(P|) > E. For q = 0 no comparison of l/<Tinin(-Fs) ™'*'* -^ '•' needed since ||in''|n,Hll *^ bounded for any fl andE. Note that, optionally, the condition number ll-Pllia/crminC-Pl) ^^ -^1 ^^" ^^ "^^d in the above algorithm instead of l/a^\niPs)' ^ ^* ^^ ^^^^ formulated in [5]. Bibliography 1. A. Bjorck, Numerical Methods for Least Squares Problems, SIAM, Philadelphia, 1996. Approximation power of local least squares 2. C. de Boor and A. Ron, On multivariate polynomial interpolation, Constr. Approx. 6 (1990), 287-302. 3. 0. Davydov and F. Zeilfelder, Scattered data fitting by direct extension of local polynomials with bivariate splines, 2002, preprint. 4. T. A. Foley, Scattered data interpolation and approximation with error bounds, Comput. Aided Geom. Design^ {l^%&),l&'i-m. 5. J. Haber, F. Zeilfelder, O. Davydov and H.-P. Seidel, Smooth approximation and rendering of large scattered data sets, in Proceedings of IEEE Visualization 2001, Th. Ertl, K. Joy and A. Varshney (eds), IEEE, 2001, 341-347, 571. 6. K. Jetter, J. Stockier and J. Ward, Error estimates for scattered data interpolation on spheres. Math. Comp. 226 (1999), 733-747. 7. M. Reimer, Interpolation on the sphere and bounds for the Lagrangian square sums, Results in Mathematics 11 (1987), 144-164. 8. L. L. Schumaker, Two-stage methods for fitting surfaces to scattered data, in Quantative Approximation, R. Schaback and K. Scherer (eds). Lecture Notes 556, Springer, Berlin, 1976, 378-389. 9. H. Wendland, Local polynomial reproduction and moving least squares approximation, IMA J. Numer. Anal21 (2001), 285-300. 10. R. S. Womersley and I. H. Sloan, How good can polynomial interpolation on the sphere be?, Advances in Comp. Math. 14 (2001), 195-226. 353 A wavelet-based preconditioning method for dense matrices with block structure Judith M. Ford* and Ke Chen Department of Mathematical Sciences, University of Liverpool, Liverpool L69 7ZL, UK. Judyford@liv.ac.uk, k.chen@liv.ac.uk Abstract In recent years application of a discrete wavelet transform (DWT) has become an established tool for the design of preconditioners for smooth, dense matrices, such as those that arise in the solution of certain integral equations. In this paper we consider the higher dimensional case, where the matrix A is not itself smooth, but has a smooth block structure. To precondition such matrices, we use repeated application of a level 1 blockwise DWT to exploit the fact that corresponding entries in adjacent blocks are close in value. We illustrate the effectiveness of our methods by means of numerical examples. 1 Introduction We have previously ([9]) considered wavelet-based preconditioning methods for dense matrices having the property that the entries vary smoothly (that is to say, adjacent entries are close in value) apart from known areas of singularity, for example a nonsmooth diagonal band. The main idea is to use wavelet compression (see, for example [14]) to convert "smoothness" in the original matrix into "smallness" in the transformed matrix, and then to approximate the transformed matrix by dropping small entries. Smooth matrices arise in a range of applications (see, for example, [6, 8, 10]) involving an essentially 1-dimensional discretization process. In higher dimension cases the corresponding matrices have a block structure: each block is smooth and corresponding entries in different blocks vary smoothly; but discontinuities at the edges of the blocks mean that standard application of DWT does not give good compression. In this paper we extend the ideas of [9] to enable preconditioners to be designed for such matrices. Throughout we use Daubechies wavelets, which are orthogonal and have compact support. 2 DWT-based preconditioners We are interested in fast solution of linear systems Ax = h, 3;,6GC", ^eC"'^", (2.1) where A is a large, dense matrix. Krylov subspace iterative methods, such a.s GMRES (described in [13]), can be used to solve (2.1), but in most cases preconditioning is The first author was supported by the Engineering and Physical Sciences Research Council, UK 354 Wavelet-based preconditioning required in order to obtain good convergence. One method of preconditioning is to seek a matrix M K A such that M~^v can be calculated cheaply for any vector v. For smooth dense A the task is usually made easier by transforming (2.1) into a wavelet basis (see e.g. [4, 5, 6, 10, 11]). When a DWT is applied to such an A, the resulting matrix A has many small elements. A sparse AK A can be obtained by setting to zero small elements. This is the; main idea underlying most wavelet-based preconditioners. 2.1 Preconditioners for 1-D problems Typically A is smooth apart from a narrow diagonal band. When a level k standard DWT is applied A has a 'finger' pattern of large entries (caused by the non-smooth diagonal feature) and an n/2'^ x n/2'' block of large entries at the top-left corner. Here n should be a power of 2. We can form a preconditioner M « A by setting to zero entries that fall below some chosen threshold, but, because of the finger pattern, a large amount of fill-in occurs under LU factorization. To avoid this problem M can be obtained by setting to zero entries in A that fall outside of a diagonal band. We describe this approach as a "band cut". The finger pattern can be avoided by using DWTPer (DWT with permutations, first proposed in [6], see also [7]), which centres the fingers to form a sparse diagonal band whose width can be predicted accurately. M can then be formed by applying a band cut to A and (optionally) imposing a threshold. An alternative way of avoiding the creation of a finger pattern matrix is to use the Non-Standard-forms (NS-forms) of Beylkin, Coifman and Rokhlin (see [3]) to represent A in terms of the blocks of a larger matrix. In [9] we presented a new way of using the NS-form submatrices to precondition A based on the Schur complement and recursive application of a flexible GMRES iteration. We compared four alternative DWT-based preconditioning methods: PI P2 P3 P4 standard DWT preconditioner with band cut ([5]), DWTPer preconditioner with band cut ([6, 10]), NS-form preconditioner with threshold ([3, 11]), Recursive Schur complement preconditioner ([9]), and found that, for smooth matrices with a diagonal singularity, P4 gave consistently good performance, PI performed well for moderate singularities and P2 was best when the diagonal singularity was very pronounced. When we came to consider 2-D problems, the robustness of P4 encouraged us to consider ways of extending it to higher dimensions. 2.2 Extension to matrices with block structure In the 2-dimensional case we are concerned with matrices that have a smooth block structure. We can compress dense block matrices of this type using two different types 355 M. Ford and K. Chen 356 of DWT: The block DWT has a transform matrix of the form / W(") (m.,n) W; 0 = /m®W^^"' V 0 ••• 0 VFW ■■. : (2.2) M/(n) J 0 where W^"'^ is a standard nxn DWT matrix and 0 is the n x n zero matrix. It exploits smoothness within blocks. The Big Block DWT (BBDWT) exploits smoothness between blocks. It has a transform matrix of the form wiT^ / hoi 0 hil 0 ■ ■■ hoi HD-II h2l gol ■■■ g\l ho-il ■■■ 0 go^il \ g2l ■■■ go^il 0 0 0 gol hj gj ho^il 0 0 0 0 0 go-il hoi \ hil 0 0 0 0 gol (2.3) gJ J where /IQ, ... , /ID-I and go,... ,gD-i are the low-pass and high-pass filter coefficients respectively {D being the order of the wavelet transform), / is the n x n identity matrix and 0 is the nxn zero matrix. The resulting transformed matrix has a 'finger' structure of blocks, each with a diagonal structure. We can avoid the finger pattern by permuting the rows and columns of the transformed matrix so as to centre the blocks containing large entries. We call this modified big block transform BBDWTPer, because it is a big block version of the DWTPer transform described in [10]. We anticipate that BBDWTPer may be useful for preconditioning block matrices with a very strong block diagonal singularity (see the comparison of DWTPer and other DWT-based preconditioners in [9]), but we have not yet found example matrices for which BBDWTPer provides a good preconditioner. Preconditioners based on BBDWT and BBDWTPer are tested in Section 4; we now present a more effective method. 3 Recursive BBDWT-based preconditioning An alternative way of avoiding the 'finger' pattern is to use a 'Big Block' version of the NS-forms presented in [3]. We define the Big Block NS-form (BBNS-form) of a matrix as follows. To transform a matrix consisting of m^ blocks, each of dimension n (where m and n are powers of 2) we define Pj, Qi to be the mn/2' x mn/2'~^ matrices such that ^(m/2<'-i',n) _ { Pi \ (3.1) Wavelet-based preconditioning 357 Given an mn X mn matrix A, define To = ^, -'j = t^iM-iPi , Ai = QiTi-iQi , T,= Bi — QiTi-iPi , "^+^ Ci+i ::^+^ Tj+i Ci = PiTi—iQi , . (3.2) (3.3) The level k BBNS-form of A comprises Tk together with Ai,Bi,Ci, i =1,2,... ,k. (The blocks of Ti are arranged differently from those of the standard level 1 DWT of Tj. We have used this ordering in order to be consistent with the notation of [3].) We propose to use banded approximations to the submatrices of the BBNS-form as the basis for our preconditioner. If the blocks of A vary smoothly apart from a diagonal block band, then each of Ai+i, Bi+i, Cj+i will have small entries except for a wrapround diagonal block band. So we can approximate them by J4J+I, BJ+I, Ci+i, formed by cutting to a block band, giving an approximation Tj to Ti: 4»+l ^^+1 f, = -^i+l ^^+1 . (3.4) (In practice, it is unnecessary to compute ^j+i, JBJ+I, Cj+i and then to set entries outside the block band to zero; instead we can compute only the non-zero entries of ^i+i, Bi+i, Cj+i. This enables us to reduce the cost of forming Tj.) We now show how this can help us to solve (2.1). We use a flexible GMRES iteration (see [12]) preconditioned by approximate solution of an equation of the form Ay = v &i each step. To do this we first apply a level 1 BBDWT with a block band cut to give where yi = Qiy,y2 = Piy,yi = Qiv, V2 = Piv. We solve this equation using the Schur complement Si =Ti — CiA^^Bi. This requires us to solve an equation of the form Siy2 = W2, ... (3.6) which we do by a further GMRES iteration. We expect that Ti will be an effective preconditioner for Si (see [1]§9.3), so we now seek a cheap way of applying Tf^ to a vector. To do this we repeat the process of applying a level 1 BBDWT and using the Schur complement. In summary, during the solution of (3.6) we solve a preconditioning equation of the form Tiy = v, 2/,t;GC""/2. ; (37) To do this cheaply we repeat the process of applying a level 1 BBDWT and using the Schur complement and obtain an equation of the form S2z = w, 2:,WGC"'"/^ (3.8) This in turn can be solved using flexible GMRES preconditioned by T2. We continue 358 M. Ford and K. Chen recursively, solving equations of the form SiZ = w, ^,«;eC'""/2* (39) iteratively, preconditioning by solving equations of the form Tiy = v, 2/,i>GC"'"/2\ (3.10) uiitil the matrix T, is small enough that T^^ can be applied directly by means of LU factorization at low cost. Therefore, at level i, each GMRES iteration requires a preconditioning step that in turn calls for iterative solution by GMRES of a coarser level equation. At the coarsest level the preconditioner is applied directly using an LU factorization of Tj+i. This process is summarized in Algorithm 3.1. Algorithm 3.1 Approximate solution of Tiy = v. (!) (2) (3) (4) (5) Compute vi = Qi+iv, V2 = Pi+iv. Solve Ai+iWi =Vi for Wi. Setw2 = V2-Ci+i'Wi. Define Si+i = T^+i - Ci+iAl^^Bi+i. Solve Si+iy2 = W2 for y2 by flexible GMRES iteration, preconditioning with T,:+i, using Algorithm S.lifi + l<k and using matrices Li+i, Ui+i othenvise. {&) Set yi = wi - A-^^Bi+iy2. ''>^"-(^:)=(f:l)' To solve equation (2.1), we start the solution process for level i = 0 and apply a GMRES iteration with the preconditioner Ti to the Schur complement of the transformed To = A. The overall method is presented in Algorithm 3.2. Algorithm 3.2 Solution of Ax = b by recursively preconditioned flexible GMRES. (1) Set up (a) Input matrix A, vector b, tolerance t. (b) Decide on values for: • maximum wavelet level, k, • tolerance ti for inner iterations, • block band width for approximating the submatrices. (c) Set To = A and i = 0. (d) Recursively, for i = 1.. .k + 1, compute Ti, At, Bi, Ci, and factorize Ai. (e) Factorize Tk+i into Lk+i, Uk+i(2) Solve TQX = b by flexible GMRES preconditioned using Algorithm. 3.1. Note that the relatively expensive step of computing the BBNS-form matrices Ai,Bi, Ci,Ti is done only once. Wavelet-based preconditioning 4 359 Numerical results Here we illustrate the effectiveness of our method, and compare it with some alternative approaches, by considering two example mn x mn matrices: A.ni+j,nk+l \ log((i - kY + [j - Z)^) k and j = I, otherwise. (4.1) for i, fc = 0,1,... , m — 1; j,l = 0,1,... , n — 1; c a constant. B.ni+j,nk+l ^-((i-kf+u-if) (4.2) for i,fc = 0,1,... ,m — 1; j,Z = 0,1,... ,n - 1. Tables 1 and 2 give typical results for the matrices A and B respectively. The cost of reducing the relative residual norm to a tolerance of 10"^ is shown for matrices of various sizes using the following preconditloners: PI simple band preconditioner, P2 standard BBDWT + band cut preconditioner, P3 BBDWTPer + band cut preconditioner, P4 recursive BBDWT-based preconditioner. In each case GMRES was restarted after 10 iterations. '*' denotes non-convergence of GMRES(IO). Unpreconditioned GMRES(IO) failed to converge to the required tolerance for any size of matrix, so it is omitted from the tables. m n A'' = mn 8 16 32 64 8 16 32 64 64 256 1024 4096 its. 30 58 86 * TAB. m n N = mn 8 16 32 64 8 16 32 64 64 256 1024 4096 its. 8 62 67 69 TAB. PI Mflops 0.65 17 393 * Preconditioned GMRES P2 P3 its. Mflops its. Mflops 49 1.2 38 0.99 * * * * * * * * * * * * its. 6 7 7 7 P4 Mflops 0.32 5.5 150 6300 Direct solution Mflops 0.21 12 720 46000 its. 4 6 6 6 P4 Mflops 0.26 5.6 120 1700 Direct solution Mflops 0.21 12 720 46000 1. Cost of solving Ax = b. PI Mflops 0.19 19 310 5000 Preconditioned GMRES P2 P3 its. Mflops its. Mflops 8 0.26 .8 0.26 66 21 63 21 76 74 380 370 78 6000 78 6000 2. Cost of solving Bx = b. Clearly the recursive BBDWT approach gives better performance -than the alternat- 360 M. Ford and K. Chen ive preconditioners that we tested and offers substantial savings compared with direct solution. 5 Conclusion and future work We have designed a preconditioning method that exploits smoothness between the blocks of a class of dense matrices giving useful savings compared with both direct solution and preconditioned GMRES using band preconditioners. In the future we plan to explore a number of ways of further improving our methods including: (a) using a block DWT, in addition to the BBDWT, to exploit smoothness within each block; (b) using biorthogonal wavelets or multiwavelets (particularly the new supercompact Haar multiwavelets presented in [2]) to give improved compression; (c) preprocessing the matrix to enhance smoothness. Bibliography , 1. 0. Axelsson. Iterative solution methods. Cambridge University Press, Cambridge, UK, 1996. 2. R. M. Beam and R. F. Warming. Multiresohition analysis and supercompact multiwavelets. SIAM J. Sci. Comput, 22:1238-1268, 2000. 3. G. Beylkin, R. R. Coifman, and V. Rokhlin. Fast wavelet transforms and numerical algorithms I. Comm. Pure Appl. Math., XLIV:141-183, 1991. 4. T. F. Chan and K. Chen. Two-stage preconditioners using wavelet band splitting and sparse approximation. Report CAM 00-26, UCLA, 2000. 5. K. Chen. On a class of preconditioning methods for dense linear systems from boundary elements. SIAM J. Sci. Comput, 20:684^698, 1998. 6. K. Chen. Discrete wavelet transforms accelerated sparse preconditioners for dense boundary element systems. Electron. Trans. Numer. Anal., 8:138-153, 1999. 7. J. Ford and K. Chen. An algorithm for accelerated computation of DWTPer-based band preconditioners. TVwm. ^/(7., 26(2):167-172, 2001. 8. J. Ford and K. Chen. Wavelet-based preconditioners for dense matrices with nonsmooth local features. BIT, 41(2):282-307, 2001. 9. J. Ford, K. Chen, and D. Evans. On a recursive Schur preconditioner for iterative solution of a class of dense matrix problems. Int. J. Comput. Math., 79: to appear. 10. J. Ford, K. Chen, and L. Scales. A new wavelet transform preconditioner for iterative solution of elastohydrodynamic lubrication problems. Int. J. Comput. Math., 75:497-513, 2000. 11. D. Gines, G. Beylkin, and J. Dunn. LU factorization of non-standard forms and direct multiresohition solvers. Appl. Comput. Harmon. Anal, 5:156-201, 1998. 12. Y. Saad. A flexible inner-outer preconditioned GMRES algorithm. SIAM J. Sci. Compttt, 14(2) :461-469, 1993. 13. Y. Saad. Iterative Methods for Sparse Linear Systems. PWS, Boston, 1996. 14. G. Strang and T. Nguyen. Wavelets and Filter Banks. Wellesley-Cambridge Press, USA, 1996. Some properties of the perturbed Haar wavelets A. L. Gonzalez Departmento de Matemdtica, Universidad Nacional de Mar del Plata, Funes 3350, 7600 Mar del Plata, Argentina. algonzal@mdp.edu.ar R. A. Zalik Department of Mathematics, Auburn University, AL 36849-5310. zalik@auburn.edu Abstract One of the authors has studied the properties of a family of Riesz bases obtained by perturbing the Haar function using B-splines. Although these bases cannot be obtained by multiresolution analyses, they have other interesting properties. The present paper discusses how a discrete signal {ar;0 < r < AT — 1} can be studied by considering a suitable function of the form f{t) := X^^_Q arfr{t), so that the existing theory for functions defined over a continuous domain can be applied. 1 Introduction In what follows Z will denote the integers and R the real numbers; t and x will always denote real variables. The support of a function / will be denoted by supp(/), its quadratic norm by ||/|| and if / G L{TR) its Fourier transform is defined by fix) := f e-'"f{t)dt. ./IR In [3] we found a family of affine wavelet Riesz bases of L^(]R), of bounded support and arbitrary degrees of smoothness, obtained by smoothing the discontinuities of the Haar function using B-splines. Although these bases are not orthogonal they are symmetric, a feature that is lacking in orthogonal wavelets. Our bases can be constructed so that the difference between the frame bounds (which are given explicitly) can be made as small as desired. In general, orthogonal wavelets are represented by infinite series, and for computational purposes values are generated over a discrete set using the cascade algorithm [2, 5]. Our bases, on the other hand, are given in closed form. We now briefly describe how these wavelets are defined and introduce additional notation and make assumptions that will be used in the subsequent discussion. Let Nmit) denote the B-spline of order m (m > 2) ([1], Chapter 4), X[o,m-i]{t) the 362 Some properties of the perturbed Haar wavelets 363 characteristic function of [0, m — 1]), m-2 g{t)~Xlo,m-i]{t)J2Nm{t-k), gi{t):=git-m + l), m-2 h{t) := (1/2) J^ ^m{t-k), and q{t) := gi{t) - h{t). For 0 < 5 < 1/2, let QI = -a2 = -a^ = a^ = 2(m - 1)15, A = 2(m - 1),/32 = 2(m - 1)(1 + 5)/(5,/33 =-/34 = (m - l)/5, p«(i):=(-l)'-i9(Qii+A),i = 1,2,3,4, P^'>W:=-(X[i/2-5,i/2)(i)-X[i/2,i/2+5)W), 6 P^'^(i) := X[o,i/2)(t) - X[i/2,i)W, and • ^(t) := ^pW(i). We will call V' ^/te perturbed Haar wavelet. In [3] we proved that supp (V') C [-5,1 + ^], V- e C""-2(]R), and that if Vj,fc(i) := lil'^i,{2H - k), then {^j,fe; j, fc e Z} is a Riesz basis, and we provided explicit upper and lower frame bounds. Moreover, in [7] we showed that given a function fi, the wavelet coefRcients {ii,tpj^k) can be computed in O(A^) steps (where A'' is the sample size), just as in the orthogonal case. In this paper we will discuss the application of the perturbed Haar wavelet to the study of discrete signals. Let us first look at the orthogonal case for comparison. Let fi be an orthogonal wavelet associated with a multiresolution analysis {Vj; j 6 Z} and a scaling function (j), with the caveat that the definition of multiresolution analysis that we are adopting is that of [1] and [4], and therefore Vj C Vj+i,j e Z, whether other authors, Uke [2] and [5] assume that V^+i C V^-. If a := {a^; Q <r <N -1} is an arbitrary sequence of real or complex numbers, then this discrete signal is transformed into a continuous one by considering the function v{t) := Ylr=o a-r4>{t - 'r)The study of the signal u{t) has two stages: the analysis stage consists in computing the wavelet coefficients,whereas the synthesis stage consists in reconstructing the signal from the wavelet coefficients. If Wj denotes the closure of the Unear span of the functions Hj^k, 3 € Z, then the Wj are mutually orthogonal and VQ = ®j<oWj. Since u € Vb, it turns out that the wavelet coefficients {v,^ij,k) vanish for j > 0. Moreover, since u{t) has compact support, for each j < 0 there is only a finite number of nonzero wavelet coefficients. With the perturbed Haar wavelet we face an additional problem: the spaces Wj are no longer orthogonal, and we can therefore no longer assume that all the wavelet coefficients corresponding to positive values of j must vanish. Moreover, we may not even have a scaling function: in [8] we showed that ii 6 = 2^, where ^ is a negative integer, then the perturbed Haar wavelet ip that corresponds to this value of 5 cannot be generated by a multiresolution analysis. To overcome these difficulties, we proceed as follows. Let n € Z be such that 2" > 4(m- 1), 6{i}(i) := X[o,2(m-i))W^W, b^^^t) := q{Aim-l)-t), b{t) := b^^^t) + b^^\t), fr{t) := arb{2H - 4(m - l)r), and f{t) := Y!^^^ U{t). By a direct application of [3] Lemma 6 we obtain the following Lemma 1 The function b{t) has the following properties: (a) supp{b) C [0,4(m - 1)], (b) 6 e C^-^M), (c) 6(2(m - 1)) = 1, 364 A. L. Gonzalez and R. A. Zalik (d) 1^6(0) = j5,6(2(m-l)) = ^6(4(m-l)) = 0, 1 < fc < m - 2, (e) The total variation ofb does not exceed 4(m - 1), (f) \b{t)\ < 1. Prom the preceding lemma we conclude that supp(/) C [0,1], and that the functions fr have disjoint supports. This imphes that ||/||^ = ||6|p||a|p2~", where ||a|p := YI^IQ \arf. We will also use the ii norm: ||a||i := E^"o^ l«r|- Note, moreover, that / G (7'"-2(IR), and that/(2i-"(m-l)(2r + l)) =/r(2i-"(m-l)(2r + l)) = ar6(2(m1)) = 0^. In theory, given all its wavelet coefficients, the function / can be reconstructed using the frame algorithm or other, even faster, algorithms [5]. However, since there may be an infinite number of nonzero wavelet coefficients, the application of such algorithms may not always be practical. We will adopt an approximation approach, li A = A{S, m), and B = B{d,m) are respectively the lower and upper frame bounds of the Riesz basis generated by ip, hj^k ■= {f,'ipj,k), and Lf := Ej,fc,ez'*j,fcV'j,it, then from the error estimates for the frame algorithm we know that \\Lf - f\\ < {{B - A)/{B + A))||/||. Since, as remarked above, we can make A and B as close to 1 as we want by making 5 sufficiently small, we conclude that for every e > 0 there is a SQ such that iiO <S <So, then \\Lf- f\\ < e\\f\\. To approximate / using the wavelet coefficients it will therefore suffice to approximate L / by an operator of the form h Observe that since / has bounded support, E f reduces to a finite sum. Our objective will be accomplished by showing that there is a constant K such that Y^hj^ki'j^k itez <K||a||2-l^'l/^ But first we need to prove five lemmas, of some independent interest. We begin with Lemma 2 Let {a/,;fc e Z} and {6;,;fc e Z} be increasing sequences such that Uk < bk-i < ak+i, k G Z. Assume that fk € L^(JR) and that supp{fk) Q K,6fc], and let /==E.6zA-r/ien||/f <2EfcezllAll'■ Proof: If r < fc - 1 then br < bk-2 < o-k, whereas if r > A; + 1 then ar > ak+2 > hThis implies that if r 7^ fc - 1, A; then fr{t) = 0 on [ak, bk], and we readily see that ll/f < 25] r|A(Ol' = 25^11/,.(OIP. keZ-^"'' Q k€Z Lemma 3 Letu € L"^{IR) be a function with support in an interval [a, b] with b-a< 1. Ifj < 0, then k€Z Proof: Let j < 0 be arbitrary but fixed, and define I{k) := supp {i)j,k) n [a, b]. Then I{k) C [2-^k - 5),2'^{k + S + l)]n \a,b]. If I{k) = 0 then, either 2-^{k + 6 + 1) < a, Some properties of the perturbed Haar wavelets 365 or 2-^k -S)> 6. This implies that if I(k) j^ 0, then k € (2% - S ~ 1,2^b + S). Since the length of this interval is less than 3, we conclude that there are at most three values of k for which I{k) ^ 0. In other words, there are at most three values of k for which /ij,fc 7^ 0. Since |V'(i)| < 1) for any such A; we have: 2 2^ / u{t)ij{2H -k) Jl(k) \{u,il>j,k)? < {h-a)2^ f Jl{k) dt< 2^ [ |u(^)|2 dt [ \-iP{2H - k)\ dt Jl{k) Jl{k) \u{t)\'^dt<2^ D Lemma 4 Let a,(3,'y,a £ R, with a, 7 7^ 0, and define c{t) := q{at + 0), d{t) := q{'yt + a), and i<: = 2 { [25/64 + (25/192f/^] (m - 1)* + (m - 1)2/1024} . If j > 0 and i = 5,6, then (a) ^ \{d,cj,k)f < 2 (4v^Q-2 + 1/3)'2-^- (b) J2 lid^pfi:) ' < (2^2 + 1/2)%-^'. kez kez Proof: (a) Prom [3] p. 3367 (bearing in mind the slightly different definition of the Fourier transform), we have g(x) = (i/:c)e-(™-i)^'/2 ^—{m—l)xi/2 ((2/a;)sina:/2) m—1 Prom [1] p. 56 (3.2.16), iV„(:c) = e-(i/2)m«[(2/x)sina;/2] (1.1) Let s(a;):=[(2/a;)sina;/2]"-^ Then giix) = e-("-i)^^5(a;) = (i/a;)[e-2('"-i)^^ - e-^^/^^^^'-^^^'six)]. Since m-2 few = 0 E ^'""Nmix) = (1/2)' fe=0 "_ _^. JV^(x), , a straightforward computation yields ^(:c) = -i/(2cc)[e-(^/2)(m-l)xi _ g-(3/2)(m-l)«]^^^>)^ whence , q{x) = i-e~("~^'^'[cos(m-l)a;-s(a;)cos-(m-l)a; +z(2s(a;)sin-(m- l)a; - sin(m - l)i)]. (1.2) This implies that ■(a;)p <8a;-2, a; ^ 0. (1.3) A. L. Gonzdlez and R. A. Zalik 366 On the other hand, q{x) = ix-h-'-"'-'^^" [{vi + V2) + i{va + U4)], where ui :=cos(m-l)a;-cos(l/2)(m-l).'C, ^2 ~ [1 - s(2:)]cos(l/2)(m - l)x, U3 :=s(a;)[2sm(l/2)(m-l)a;-sin(m-l)x], i;4 := [s(a;) - 1] sin(m - l)x. A McLaurin expansion shows that |t;i| < (5/8)(m - l)^x'^. Since 1 - w™"^ = (1 u) Ylk=o "'^ ^^^ I sinuI < |u|, we infer that |1 - S(.T)| < (m - 1)|1 - (2/a;)sina;/2| = (m - 1)(2/.T)|X/2 - sin3:/2|. Since \u - sinu| < |u|'V6, we conclude that |1 - s{x)\ < (m - l)x2/48. Thus, |v2(x)| < (m - 1)3:748, and \v4{x)\ < (m - l)a:V48. Another McLaurin expansion yields jual < (5/24)(m - lf\xf. Clearly |u3| < 3; thus |V3| = |v3|2/3|,;3|i/3 < (25/192)i/3(m - 1) V. Since \q{x)f=^x-^[{vi+vy + iv3 + Vi)^]<2x-M + vl + vl + vl], we deduce that \q{x)f < Kx^. (1.4) Prom Plancherel's identity we have: {d,Cj,k) = V'^ f d{t)c{2H-k)dt = V'^I{2T^) f e^^%x)d{Vx)dx /•27r = 2^'/V(2T) / e'=" J3c(x + 27rr)rf(2^'(x + 27rr))d.T. This means that {2-^l'^{d,Cj^k)\k € Z} is the sequence of Fourier coefficients of the function Efcez^C^ + 27rr)d(2-'(x + 27rr)). Thus, applying Bessel's identity and then the Cauchy-Schwarz inequality twice (once for sums and once for integrals), we have: 2 27r2-^ 5^ \{d,Cj^k)f = r Y.^{x + 2-Kr)d{V{x + 2^r)) /c€Z •'° reZ < /"'[|c(x)d(2^x)| + |c(x-27r)d(2J'(x-27r))| + | Y. c(x + 27rr)d(2^'(x + 27rr)) < ( / |c(x)d(2^x) fdxj +i \c{x - 27r)rf(2^ {x 0/-2" ' 0 2TT)) \^ dxj - \ 1/2 I Yl c{x+27Tr)d{2^{x+27rr))\'^dx] ,jn _i r#0,-l Some properties of the perturbed Haar wavelets 367 Since c{x) = a-'^e'-^/"'>'='q{a-^x), (1.3) implies that |c(a; + 27rr)|2 <8|a; + 27rr|-2, x^2TTr, (1.5) whereas from (1.4) we see that \c{x + 2TTr)\^ < Ka-^\x + 2'Kr\^. : (1.6) Since d(a;)= 7-ie('"/T)^*g(7-ia;), (1.3) also imphes that \d{2^{x + 2nr))f<4-^+^2\x + 2Trr\-^, xj^2nr. (1.7) Since ^i is obtained by integrating the product of the left-side members of (1.6) and (1.7) (with r = 0) over an interval of length 2'ir, we readily see that Si< 167rKa-H-^. (1.8) 52 < 16nKa-H-^. (1.9) A similar argument yields Prom Minkowski's inequality /.27r 53< / ■'" y^ \c{x + 2Trr)f Yl T^O-l d{2^{x + 2Trr)) dx. 11 rj^O-l If a; G [0,27r] and r > 1 then from (1.5) we have: y^\c{x + 27:r)f<2^-'J2r-' = l/3, r>l r>l whereas (1.7) implies that \^ d{2^{x + 2Trr)) r>l <2 4-^7r-2^r-2=4-V3. r>l Similarly, ^|c(x + 27rr)|^<27r-2'^r-2 = l/3, r<-2 r>l and ) 1 d{V{x + 2Trr)) r<-2 <2 4-%-2S V-2=4-V3, r>l whence we conclude that S3 < (47r/9)4""^. Combining (1.8), (1.9) and the preceding inequality, the assertion follows. (b) Note that p{6} jg ^{5} ^i^]^ ^ = 1/2. since p{^y{x) = 2ia;-ie-(i/2)^^(l - cosfe), we see that ___ _ ' \pi^}{x + 2Trr)\'^ <A\x + 2Trr\-^, xj^2Trr. (1.10) On the other hand, the inequality |1 -cos&l < {l/2)6'^x'^ implies that |p'(^>(a;)| < S^\x\; therefore \pi^}{x + 2-Kr)f <5^\x + 2'Kr\'^. (1.11) 1 ■ ^ 1 1 1 1 A. L. Gonzalez and R. A. Zalik 368 We now repeat the argument employed in (a), using (1.10) instead of (1.5), (1.11) instead of (1.6), and bearing in mind that 5 < 1/2. □ We now find bounds for the quadratic norms of q{t) and b{t). Lemma 5 (a) \\tjj\\ < 1; (b) ||fe|| < 2(m - 1). Proof: (a) [1] Theorem 4.3 implies that the functions Nm are nonnegative. This implies that both g and h are nonnegative. In the proof of [3] Lemma 6(f) we show that / g(t)dt^ [ h{t)dt^{m-l)/2, whence / m \q(t)\dt<m-l. Moreover, \q{t)\ < 1 ([3] Lemma 6(h)). Thus, i\q{t)?dt< f \q{t)\dt<m--1. Therefore, / |p«(t)|2d< = (5/2(m-l)) / ig(i)pdt<V2, J = l,2,3,4. This implies that f \'il}{t)\^dt<A5l2+ f \p^^\t)-p^^\t)\^dt = 25 + {l-25) = \. (b) [ \b{t)fdt< I \b{t)\dt = 2 I \q{t)\dt<2im-l). 7lR ./IR JlEi, D Theorem 1 (a) Ifj < 0, <2\/6(m-l)l|a||2f^-^^/^ fcez (b) Let K he defined as in Lemma 4- If j > 0, Xl(/-V'j,fc)V'j,fc < V2 {K2^-" + 1/3) + V2 + l/sl ||a||i 2'^^^. kez Proof: Assume first that j < 0. Applying Lemma 2, Lemma 3, and Lemma 5, we have: ^{f,^j:kHj,k kez <2'£\\{f,^j,k)^^j,kf = 2Uf J2 l^/'^J.-^) fcez fcez < 2^] K/,V^,fc)f < 61|/||2 2^- < 6||6|n|a||22^-" < 24(m - l)^||af 2^" fcez Some properties of the perturbed Haar wavelets 369 Assume now that j > 0. Setting br'{t) := ar6^*^(2"i - 4(m — l)r), we see that fr{t) bi^\t) + bj.'^\t). Thus, 2 fceZ 6 N-1 i=l i=l r=0 E^^^^^^JS)v^.^ fcez Applying Lemma 2 and Lemma 5 as above, we see that 2 Y.'^W\i^i)i^,k k€7. <'^Y.W.^^ fcez Since the Fourier transforms of q{t) and X[o,2(m-i))9(^) are identical, and the functions 6r are of the form ar q{at + (3) or a^ X[o,2(m-i))(a* + P)<l{oit + (3) with Lemma 4 we have: |Q:| = 2", from E |(^^'^pjS)f < 2|a.|2 (2^/:^2-" + l/s)^-^ £ = 1,2,3,4, fcez and E I(^^'^-PS)f ^ l«^l'2 (V2 + 1/3)%-^ ^ = 5,6, fcez whence the assertion readily follows. D Bibliography 1. C. K. Chui, An Introduction to Wavelets, Academic Press, San Diego, 1992. 2. L Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, 1992. 3. N. K. Govil and R. A. Zalik, Perturbations of the Haar Wavelet, Proc. American ' Math. Soc. 125 (1997), 3363-3370. 4. E. Hernandez and G. Weiss, A First Course on Wavelets, CRC Press, Boca Raton, FL, 1996. 5. S. Mallat,^ Wavelet Tour of Signal Processing, Academic Press, San Diego, 1997. 6. R. M. Young, An Introduction to Nonharmonic Fourier Series, Academic Press, New York, 1980. 7. R. A. Zalik, A class of quasi-orthogonal wavelet bases. Wavelets, Multiwavelets and their Applications (A. Aldroubi and E. B. Lin, eds.). Contemporary Mathematics, Vol. 216, American Mathematical Society, Providence, RL 1998, pp. 81-94. 8. R. A. Zalik, Riesz bases and multiresolution analyses, Appl. Comput. Harm. Analysis 7 {l%m), 2,1^-2,2,1. An example concerning the Lp-stability of piecewise linear B-wavelets Peeter Oja Department of Mathematics, Tartu University, Liivi 2, Tartu, Estonia. Peeter.OjaOut.ee^ Ewald Quak SINTEF Applied Mathematics, P.O. Box 124 Blindern, 0314 Oslo, Norway. Ewald.Quak@math.slntef.no ^ Abstract In this paper we consider B-wavelets of order 2, i.e. piecewise hnear spline prewavelets of smallest support, over nonuniform knot sequences. We discuss an example showing that for 1 < p < 00, there is no absolute Lp-stability for these B-wavelets. This means that regardless what specific scaling of the B-wavelets is chosen, the corresponding stability constants cannot be made independent of the knot sequences involved. 1 Introduction Polynomial splines are fundamental tools in numerous branches of applied mathematics, and for spline spaces defined over a given knot sequence, the basis of choice is provided by B-splines, which possess a lot of attractive properties for numerical computations. One of these important properties of B-splines is their absolute stability. Given a B-spline basis {B^}^^^ of polynomial order d over a valid knot sequence t, a classical result by de Boor [1] states that properly normalized B-splines are stable in the sense that for each set {&i}jg2: of real coefficients it holds that <l|b||. C^' l|b|L < (1.1) Here ||'|| denotes the standard integral and discrete p-norms for 1 < p < oo, respectively, and the normalizing factor 5i for each B-spline is the length of its support divided by the order d. The important point is that the positive constant Ca is dependent on the order d alone, and not in any way on the underlying knot sequence t. Since nested knot sequences give rise to nested spline spaces, spline functions have also become a focus of attention within the theory of wavelets and multiresolution analysis, ■'Research supported by the Estonian Science Foundation Grant no. 3926. ^Research supported by the EU Research and Training Network MINGLE, RTNl-1999-00117. 370 An Lp-stability example for piecewise linear B-Wavelets starting with cardinal spline wavelets on infinite equally spaced and uniformly refined knot sequences, for which Fourier transforin techniques are available, see [3] and the references therein. The study of spline wavelets on bounded intervals, for arbitrary knot sequences and nonuniform refinement began with the papers [4], [5] and [2], respectively. The construction of so-called minimally supported B-wavelets for a given spline order d and two nested knot sequences to provide a basis of the relative orthogonal complement (wavelet) space is described in detail in [6]. This means that given the coarse and fine knot sequence, there exist expUcit algorithms to determine the supports of the B-wavelet functions, the so-called minimal intervals, and also to compute the corresponding wavelet functions, though only up to a normalization constant. One open problem, however, is how to fix the normalization factor for each B-wavelet function to achieve best possible stability for the whole B-wavelet basis. We provide an example for the case of piecewise linear wavelets, i.e. polynomial order 2, that shows that for 1 < p < 00 there is no absolute stability of B-wavelets, meaning that there is no choice of normalization that provides absolute stability constants which are completely independent of the underlying knot sequences. Lp-stability estimates involving a quantity dependent on the knot sequences for 1 < p < oo and showing absolute stability for p = 1 are given in [7]. 2 Piecewise linear B-wavelets The theory of B-wavelets [6] covers general cases of knot refinement, such as'situations where several or no knots at all are inserted into an old knot interval, or where the multiplicity of an existing knot is increased. For our purposes, however, it is suflacient to consider what one might call the standard setting, where all knots are simple except at the interval endpoints, which we can count as double knots, and where exactly one new knot is inserted strictly between two old ones. Our notations are as follows for the closed interval [0,1]. We have a coarse knot sequence with n — 1 interior knots, namely T : 0 = To < n < • • • < r„ = 1. Strictly between each pair of coarse knots r^-i and arbitrary location, i.e. TJ we insert a new knot Si at an Ti-i < Si <Ti for each i = 1,..., n. Thus we have a sequence s of new knots s : 0 < si < ••• < s„ < 1. The fine knot sequence t = r U s, when ordered appropriately, is given as t: 0 = to < ti < • • • < *2n = 1, where the even numbered knots in t correspond to old knots in r, while the odd numbered knots represent the newly inserted knots from s. To account for the boundary, we treat the interval endpoints as double knots by setting r_i = f _i = 0 and r„+i = ^2,14-1 = 1. 371 372 P. Oja and E. Quak For our investigations it is necessary to introduce also some notation related to the knot spacings. We set di = ti+i — ti for i = 0,...,2n- 1, and St = U+i - ij-i for i = 0,..., 2n, which means 5o = do = ti- to and 62,1 = d^n-i = hn - <2n-i at the boundary. Thus 5i is the distance between two consecutive old knots if i is odd, and between two consecutive new knots if i is even (and not at the boundary). We also introduce the index sets fi = {1,3,..., 2n - 1} and fio = {3,5,..., 2n - 3} . The piecewise linear functions on the knot sequences T Ct form nested linear spaces Vo C Fi of dimensions n +1 and 2n+l, respectively. The corresponding piecewise/wear B-splines forming a basis of these spaces are simple hat functions. We denote them as ipj and 7J for r and t, respectively, where with the necessary adjustments at the endpoints, {x-Tj-i)/62j-i 'Pjix)={ (Tj+i - x)/S2J+1 0 li{^)={ {x-ti-i)/di-i {U+i-x)/di 0 ifx6[rj_iTj] iixe[Tj,Tj+i] otherwise iixe[ti-iti] a x€[titi+i] otherwise ioT j = 0,...,n, for i = 0, ...,2n. (2.1) (2.2) Using for any two functions f,g €Vi the standard inner product {f,9)= I f{t)g{t)dt, Jo we can write ^1 = ^0©!^, where W is the relative orthogonal complement of Vo in Vi, and (B denotes orthogonal summation. The dimension of W is n, so that there is a basis function rpi^ for every index k G fi, or in other words for each newly inserted knot s^. Nonzero functions ipk GW with minimal support are called B-wavelets. The general theory for B-wavelets developed in [6] establishes in this special case that there are n different piecewise linear B-wavelets which form a basis of the wavelet space W. Each such B-wavelet is uniquely determined up to a constant multiple. There are two boundary B-wavelets Vi and ■02n-i and n — 2 interior B-wavelets tpk for k € O.0, which we will consider first. Each interior B-wavelet has support [tk-3,tk+3], so that fc+2 V'fc (x) = Y^ qhi {^) for x € [0,1] i=k-2 with the coefficients determined by tpk €W, or in other words {ipk,fj) = 0 for j = 0, ...,n. An Lp-stability example for piecewise linear B-Wavelets 373 For the boundary wavelets tpi and ^^2^-1 we have to make some minor modifications Their supports are [toM] and [i2n-4,i2n], respectively, so that 3 2n ^1 i^) = J2li^i (^) and ip2n-i (x) = Yl 'li"~S (x) for a; 6 [0,1]. i=0 i=2n-3 In the paper [7] the values of all B-wavelet coefficients qf are given explicitly in terms of the knot locations for the standard setting described here. In the same paper estimates for the coefficients are used to derive ip-stability estimates for these B-wavelets. 3 Stability of B-wavelets Our aim in this paper is to establish Theorem 3.1 Given the B-wavelet basis {A}ken . then for 1< p < 00, there are no sets of weights ak,p,k £0,, such that Ki \\c\\p < }, CkOlk,pllJk < K2 llcl ken (3.1) holds for any wavelet coefficients (ci, C3,..., C2„_i) and with absolute constants K^ > 0 and K2 > 0, which are completely independent from the choice of knot sequences T ands. Due to the finite dimension of W, it is clear that stability constants Ki and K2 exist as any two norms on W are equivalent. The pertinent question is how the weights could be chosen to achieve that the constants are actually independent of the dimension, the pnorm and, if possible, the choice of new knots s. We will prove the assertion by assuming that the estimate (3.1) holds with constants independent of the knot sequences. Then the following special case serves as a counterexample to this assertion. The old knot sequence r consists of the equally spaced points: To = 0, n = 1/3, r2 = 2/3, rg = 1. We want to investigate what happens if two newly inserted points are positioned ever more closely, so we introduce the new knots as si = 1/3 - e, S2 = 1/3 + r?, S3 = 5/6, for 0 < e, ?? < 1/3, in order to find out what happens if both e ^ 0+ and 7/ -+ 0+. Thus the fine knot sequence i is fo = 0, ii = 1/3 - e, i2 = 1/3, is = 1/3 + r/, «4 = 2/3, % = 5/6, is = 1. The fine interval lengths are do = 1/3 - e, rfi == e, d2 = 77, cJs = 1/3 - r/, ^4 = 1/6, dg = 1/6, while ^1 = (^3 == (^5 = 1/3, and 5o = 1/3-£, ^2 =£ + »?, ^4 = 1/2 - ?7, (Je = 1/6. P. Oja and E. Quak 374 In this setting any wavelet ip = Y^qi7i&W i=0 must be orthogonal to the coarse hat functions (/^o, • • •, ^3- This actually means that the column vector q of coefficients qt must satisfy the matrix equation Aq = 0, (3-2) where the entries of A are the inner products of the coarse and fine hat functions, i.e. aj,i = {^j,li), fori = 0,...,3, i = 0,...,6. Direct computations using (2.1) and (2.2) yield as the only nonzero entries 1 1 oo,o 1 1 1 ai,o ' r^f-R' 18' «M=-6^+9' 1 o 1 1 o 1 ai,3 02,2 = 11 -77+-, -g'/-r g' 11 1 2 1 , 1 a,,i = -v "1,*2'' -i^+Ti3 1 1-1,1 L 6' 13 22 a2,4 = 4^'-^+72' 1 ^2,5 ^3,4 12' 1 72' 1 ''''' = 72' 1 "^■^=12' ''^■'^"72- We now investigate the B-wavelets i/^i and Vs in detail, corresponding to si = h and S2 = h Speciahzing the results from [7] then yields all necessary B-wavelet coefficients for this setting up to a scaling factor. Note, however, that it is straightforward to check that the corresponding coefficient vectors satisfy the matrix equation (3.2). The coefficients of the boundary wavelet tpi are 3 QI l-3e' QI 4 4 = 6- 9erj £ + r? + 6eT7 1 + 37? e + 7? + 6e77' e + 77 + 6e7j' An Lp-stability example for piecewise linear B-Wavelets while the ones for the interior B-wavelet ips are 3 9£2 4 = - e-\-r] + 6er] 3(£ + r?) 9(1 - 277) + 2(£ + ?7 + 6£?7) ' 2(5-12?7)' 9 5-1277' 3 2(5-12r;)' 3+ ll QI <ll = We first provide estimates for the p-norms of these B-wavelets. Proposition 3.2 For small enough e and r], it holds for 1 <p < 00 that llV-illp 45 V2 ie + vY .,„.> SG)""(e.,)- Honp Proof: For all 0 < e, 7? < 1/3 we find that H\ > i^ + vY = gi^ + V) inf 0<e"ri"<l/3 (l + 3?7)(£ + 7?) £ + 77 + 6£77 -1 and, similarly, \ql\>7:{e + v) -1 Note that instead of 8/9 we may write 1 — c for any cr > 0 if £ and 77 are small enough or even 1 if £ = ry. In the process £, 77 ^^ 0+ all other coefficients qj and qf have finite limits. This means that for small enough £ + 77 max|9,^| = l^^l > -(£ + 77) \ Q maxl^fl = \q^\ > ^ (£ + 7?)"\ Us The absolute stability of piecewise linear B-splines (1.1) yields with C2 > 5/2 (see [1]) and 82 = e + T] 3 IIV'ilIp = E QIH i=0 > 2/1^'/" l[^) \\{<ilsl^'MsY',4sl/',44'') 375 P. Oja and E. Quak 376 > 2/1 5V2 1/p-l Analogously we get \m\p = E^hi > ia)""i''i^="^SG)'"<-')-"D to complete the proof. Proof of Theorem 3.1: Let us now assume that with some scaling factor B-wa,velets are absolutely stable in p-norm for 1 < p < oo, i.e. there exist weights ak,p so that the inequalities (3.1) hold with constants independent of the specific choice of knot sequences. Choosing in the current setting all coefficients equal to zero except for ci = 1, the stability inequality (3.1) yields ||ai,p^i||p <K2 or in other words, using Proposition 3.2 "'■•'i^^^i^""^^<^+"'"" (3.3) and by a similar argument i-'^i^^^i^^""^^^^^^^'"' (3.4) On the other hand, the stability estimate (3.1) yields for arbitrary ci and C3, while setting all other c^ to zero, that Wciai^ptlh +C3a3,pV'3||p >Ki{\cif + \c3f)^^^ > ii:imax(|ci|, |c3|). Let us choose specifically ci = a3,p and C3 = -ai,p, which results in |Q;i,pa3,p| 11^1 - V'sllp >/<^imax(|Q;i,j,|, |a3,p|), leading with (3.3) and (3.4) to i/p i^ + v) 1-1/p I 01 - 1p3 lp>(^2J I6K1 45K2' (3.5) On the other hand we derive from the absolute stability of linear B-splines (1.1) that Ui-Mp = < IkoTo + {ql - qi) 71 + (92 - 5f) 72 + (93 - 93) 73 - 9^74 - qh4p (qlSl^", {ql - qf) S'/', {ql - ql) ^l^', {ql - 4) S'/', qls'j", qW)/ 11lip \ An Lp-stability example for piecewise linear B-Wavelets :< 6m^x{\ql\6l^',\ql-qf\Sl^^,\ql-ql\5l^',\ql-ql\Sl^', All the terms I llrl/p I 1_ 3|rl/p I 1_ 3|cl/p |„3|rl/p I 3lvl/p are in fact bounded from above for £ + ?7 —> 0"*", so that the expression and the other such terms tend to zero. Since \q2 — ^fl = 3 |e — 7?| / (s + ?? + &er]) we obtain for the only remaining term (r \ n\^-'^^P\n^ (e + r/) 1^2 „3| rl/p _ • 921^2 ^ |£ - ?7| -i + 6£r7/(£ + 7?)-^'^ ''I' which goes to zero as well for £ + ?? ^^ 0+. As a consequence hm (£ + 7y)^-^/niV'i-V'3||p = 0, which contradicts (3.5). □ Remark 3.3 Although we have chosen an example with one boundary and one interior B-wavelet, let us remark that the lack of absolute stability is in no way due to a boundary effect. A completely analogous reasoning is possible if one chooses knot sequences with more interior knots and studies the behaviour for two interior B-wavelets once two new knots coalesce. Similarly just two boundary B-wavelets could be used on an even shorter knot sequence, where there are no interior B-wavelets at all. Bibliography 1. C. deBoor, The quasi-interpolant as a tool in elementary polynomial spline theory, in Approximation Theory, G. G. Lorentz et.al. (eds), Academic Press, 1973, 269-276. 2. M. Buhmann and C. A. Micchelli, Spline prewavelets for non-uniform knots, Numer. Math. 61 (1992), 455-474. 3. C. K. Chui, An Introduction to Wavelets, Academic Press, 1992. 4. C. K. Chui and E. Quak, Wavelets on a bounded interval, in Numerical Methods in Approximation Theory, ISNM105, D. Braess and L. L. Schumaker (eds), Birkhauser, 1992, 53-75. 5. T. Lyche and K. M0rken, Spline-wavelets of minimal support, in Numerical Methods in Approximation Theory, ISNM 105, D. Braess and L. L. Schumaker (eds), Birkhauser, 1992, 177-194. 6. T. Lyche, K. M0rken and E. Quak, Theory and algorithms for nonuniform sphne wavelets, in Multivariate Approximation Theory , N. Dyn, D. Leviatan, D. Levin and A. Pinkus (eds), Cambridge University Press, 2001,152-187. 7. J. Mikkelsen, P. Oja and E. Quak, Lp-stability of piecewise linear B-wavelets, preprint. 377 How many holes can locally linearly independent refinable function vectors have? Gerlind Plonka Institute of Mathematics, University of Duisburg, Germany plonka@math.uni-duisburg.de Abstract In this paper we consider the support properties of locally linearly independent refinable function vectors $ = (</>!, •■ ■ ,'l'r)'^- We propose an algorithm for computing the global support of the components of $. Further, for $ = {^i,4)2f we investigate the supports, especially the possibility of holes of refinable function vectors if local linear independence is assumed. Finally, we give some necessary conditions for local linear independence in terms of rank conditions for special matrices given by the refinement mask. But we are not able to give a final answer to the question whether a locally linearly independent function vector can have more than one hole. 1 Introduction Let # = ((/)!,..., ^r)^) r 6 IN, be a vector of compactly supported continuous functions on H. The function vector $ is said to be refinable if it satisfies a vector refinement equation #(a;)= ^^(fc)$(2x-fc), XGIR, (1.1) where {A{k)} is a finitely supported sequence of real (r x r)-matrices. Refinable function vectors play a basic role in the theory of multiwavelets. In the last years the properties of refinable function vectors have been investigated very extensively. In fact, it is possible to characterize properties like approximation order and regularity of # and L^-stability of the basis generated by $ completely by means of the refinement mask {^(/c)} [1, 6, 7, 11]. We say that $ is L"^-stable if there are constants 0 < J4 < B < oo such that for any sequences ci,...,Cr G/^(K), 2 ' <5EEM'^)|' i/=ifee2z In some applications one needs not only t^-stability of the basis generated by $ but other stronger conditions of linear independence. We say that $ is globally linearly independent 378 Locally linearly independent function vectors _ if for any sequences ci,... ,Cr on 7L r Y^Y^Cu{k)(j)^{--k)-:=0 on IR i/=lfc€ZZ implies that Cu{k) = 0 for all i^ = 1,..., r and all fc e ZZ (see [8, 5]). The following definition is even more restrictive: A function vector # is called to be linearly independent on a nonempty open subset G of IR if for any sequences ci,..., c^ on z; r Y,Y^c,{k)M--k)=0 on G implies that c^{k) = 0 for all k G Iv{G), v = 1,... ,r, where Iv{G) contains all fc 6 Z with (/)y(- - fc) ^ 0 on G. Finally, $ is called to be locally linearly independent if it is hnearly independent on any nonempty open subset G of IR. Obviously, local, hnear independence of $ imphes global linear independence and global linear independence of $ implies L^-stability. It has been shown by Sun [12], that for compactly supported, refinable functions (r = 1) with dilation factor 2 the notions of local and global linear independence are equivalent. However, this is not longer true for function vectors [4]. For (scalar) refinable functions 4>, local linear independence implies that (j) has integer support, i.e., supp^ starts and ends with an integer, and supp(/) does not contain holes, i.e., supp(^ is an interval. Now, one can ask, 'is this also true for locally linearly independent refinable function vectors?' Unfortunately this is not the case. In [10] it has been shown that a component of $ can have a hole. However, it is not clear, whether a refinable, locally linearly independent function vector can also have components with finitely many or even infinitely many holes. In this paper, we want to investigate support properties of locally linearly independent function vectors and consider the 'hole problem' more closely. In the second section we briefly recall a characterization of local linear independence for function vectors in terms of the mask {A{k)}. In Section 3, we present an algorithm for computing the starting points and endpoints of the support of the components (^^ of $. In the remaining part of the paper we restrict ourselves to the special case # = {'t>ij'p2)^- We collect some observations on function vectors with holes in Section 4 and show that holes can only occur in special situations. In Section 5 we give necessary conditions for local linear independence in terms of rank conditions for matrices formed by the mask {A{k)}. In Section 6 we prove that the function vector $ given in Example 4.1 is continuous and locally linearly independent. Finally, we summarize our findings in the conclusion. However, the question put in the title of this paper cannot be answered completely. We conjecture that it is not possible to have locally linearly independent function vectors with more than one hole. 379 380 Gerlind Plonka 2 Preliminaries Let us start with some notations. For a compactly supported, continuous function (j) '■ ]R -^ ]R let supp()f) be the closed subset of IR, where cp does not vanish. Further, let the global support gsupp (j) be the smallest interval containing supp (f>. The fvmction (f) is said to have a hole if there is an interval / which is a subset of gsupp 4> of Lebesgue measure greater than zero, where ^ is identically zero. The function vector $ is said to contain a hole if one of its components has a hole. For a characterization of locally linearly independent function vectors we briefly recall the result of Goodman, Jia and Zhou [4]. Let $ satisfy the refinement equation (1.1), where the mask matrices A{k) are zero matrices for fc < 0 and for k > N. Considering the vector *(.T) = (*(.T + fc))fro^ of length rN and the (rN x r7V)-block matrices ^o = (^(2/c-0)^r=o. Ai = {A{2k-l + l))^^i-2o' (2-1) the refinement equation can equivalently be written as ^{x/2) = Ao^{x) and *((x + l)/2) = A*(x), a;e[0,l]. For ei,... ,e„ e {0, 1} it follows that * (| + --- + |^ + |r)=Ax---A„*W, a:e[0,l]. Now let VQ be a right eigenvector of ^o to the eigenvalue 1. This eigenvector is unique (up to multiplication with a constant) if # is L^-stable (see [3]). Let V be the minimal common invariant subspace of {^Oi ^i} generated by VQ. Then V contains the vectors ^{x), X E [0,1), since #(0) = CVQ with some constant c and each x £ [0,1) can be represented as a hmit of a sequence of dyadic numbers Z/2", ? e Z, n = 1,2, — Further, let M be an {rN x dim y)-matrix such that the columns of M form a basis of V. Then we have from [4] Theorem 2.1 Let ^ be a refinable vector of compactly supported, continuous functions satisfying (1.1) with A{k) = 0 for k <0 and k > N. Then we have (1) $ is linearly independent on (0,1) if and only if all nonzero rows of M are linearly ' independent. (2) # is locally linearly independent if and only if for all n with 0 < n < 2'"^ and all ei,..., e„ G {0, 1} the nonzero rows of Ae„ ■ ■ .Aa^M are linearly independent. Remark 2.2 A similar characterization of local linear independence is possible also for L^-solutions of vector refinement equations (1.1) and even for distributions (see [2, ISj). Some examples of locally linearly independent function vectors can be found in [4, 10]. 3 Global support of $ Now we want to give an algorithm for computing the global support of the components of refinable function vectors # from the mask. To this end let us assume that the (r x r)matrices A{k) in (1.1) are of the form A{k) = (^j,j(A;))[j-^j. We look for a„, 0^ e TR Locally linearly independent function vectors with gsupp(/.^ = [a^,/3^]. Let for all pairs (i,j), z,j = 1,...r, Si,j 9i,j := := mm{k:Aij{k)T^O}, mhyi{k:Aij{k)^Q}. Observe that Si^j, gi^j are integers. The numbers a^ can be found by the following algorithm. Algorithm 3.1 Input: Sij, i,j = l,...,r. (1) Let p := (pi,... ,p^) be a vector of length r. Fori^ from 1 tor do a^:^ Su,u,■?,.■.= ly enddo. (2) For V from 1 to r do For j from 1 to r do if s^j < 2a^ - aj then a^ := (s^j + aj)/2; p^ := j endif enddo enddo. (3) Repeat step (2) as long as the vector p = (pi,... ,pr) changes. (4) Form the (r x r)-coefficient matrix P with 1 2 Pi,j = { -1 0 if i = j and i=p{i), if i =j and ij^p{i), if i^j and j=p{i), elsewhere, and the vectors a :- (ai,... ,a,)^, s :^ equation system Pa = s. (SI,PX,-• • ,Sr,p.r and solve the linear Output: a =(ai,...,ar)'^. Analogously we obtain the algorithm for the endpoints 0^: Algorithm 3.2 Input: gij, i,j = l,...,r. i^) Let p:={pi,...,p^) be a vector of length r. For V from 1 to r do /3^ := g^^^; p^ := u enddo. (2) For V from I to r do For j from I to r do ifg.d > 2/3^ - Pj then (3^ := (g^j + (3j)/2; p^ := j endif enddo enddo. (3) Repeat step (2) as long as the vectorp = {pi,...,p^) changes. (4) Form the (r x r)-coefficient matrix P as defined in Algorithm 3.1, and the vectors '■— {pi,---,Pr) , 9 •■= {9i,pi,---,gr,pr)'^ and solve the linear equation system Pb = g. Output: b:={f3,,...,p^)T, 381 Gerlind Plonka 382 Proof: The refinement equation (1.1) implies for each component (l)„ that fceKj=i In particular, it follows from the local hnear independence, that for all k with A^j{k) i^ 0, gsupp<?ij(2--fe) Cgsuppi?^^, I/, j = l,...,r, that is [(aj +fc)/2, (/3i + fe)/2] C [a,, /3,]. Using the numbers s^j and g,-/defined above, we obtain (a,- + s.,i)/2 > a, and (/?,- + 5.j)/2 < /3., or equivalent^, 2a^ - aj < s^j and 2/?^ - /3j > g^j (3-1) for all 1/ j = 1,... ,r. In particular, for each fixed v at least one of the r inequalities in (3.1) for the starting points (and for the endpoints, respectively) must be an equality. Let us look to the first algorithm computing the starting points, the second works analogously. In the first step of the algorithm we just put a^ := s^,^. These s,.,^ are upper bounds of the true starting points of </., since, for j = u, (3.1) implies a, < s^,.. Hence it is clear that, if 2a, - aj is greater than s,j for a fixed u and some jE {1,. •., r) then a, must be reduced since a,- is already an upper bound for the starting point of </.,•. Putting now a. := (s.j + aj)/2 in step 2, we obtain again an upper bound of a. Repeating the second step of the algorithm we obtain decreasing sequences for a^ (being dyadic rationals, and) approaching the exact starting values. However, if the exact starting values are not dyadic rationals then they cannot be obtained by a finite number of repetitions of step 2. That's why we consider the vector p which stores for each u an index j = p. for which the inequality in (3.1) is even an equality. Then step 2 must only be repeated a few times in order to find the correct vector p. Now, we can use the r equalities 2a„ — a p^ 'F,P^ in order to compute a, directly. By a suitable rearranging of the equations one obtains an (r x r)-coefRcient matrix / Pi 0 0 Pa 0 0 0 \ (3.2) P:= PK R V 0 D j where P(, / = 1,..., K, are circulant matrices of the form / 2 0 \-l -1 ... 2-1 0 0 \ -1 2 / Locally linearly independent function vectors 383 D is a diagonal matrix with diagonal elements 1 or 2, and i? is a matrix of dimension dim D X {r — dimD), with one nonvanishing entry in each row at most. For example, in the case p = (1,2,..., r), P is just the (r x r)-identity matrix, i.e., dimD = r and the matrices Pi and R do not occur in P. For p = (2,3,..., r, 1) we find P = Pi and D as well as i? vanish. If p contains smaller 'cycles' of the form {jpm,- ■ ■ ^Pn^) withp„^. = rij+i, j = 1,..., /i -1 and p„^ = ni, then each cycle corresponds to a circulant matrix P; in P. Since the circulant matrices Pi are invertible, the equation system is uniquely solvable. D Example 3.3 Let r = A and let the values Sj,j, i, j = 1,2,3,4 be given by the matrix 1,3 > / (si,: 2 1 1 0 0\ 3 1 1/ Algorithm 3.1 gives step 1: a^ = (01,02,03,0:4) = (1,1,1,1) and p = (1,2,3,4) step 2: a^ = (oi, 02,03,04) = (1/2,3/4,3/4,3/8) and p = (4,1,1,2) step 3: one repetition of step 2: a^ = (oi, 02,03,04) = (3/16,19/32,19/32,19/64) and p = (4,1,1,2) Since p did not change no further repetition of step 2 is necessary, step 4: We obtain 0 0 -1 \ ( 2 -1 2 0 0 P= -1 0 2 0 \ 0 -1 0 2 / which can be simply changed into a matrix of the form (3.2) by rearranging the equations for the vector 0! = (01,04,02,03)-^. The system Pa = s with s = (0,1,1,0)-^ gives a = (1/7,4/7,4/7,2/7)^. Remark 3.4 In [10] it has been shown that for locally linearly independent refinable function vectors $ = {(pi,... ,(j)r)'^ the starting points and the endpoints of gsupp<j)^, u = 1,..., r, are rational numbers of the form k + Cr, where k £2Z and Cr S Jr with (2' - 1)2'--' ■ 4 ),...,(2'-l)2'-'-l|. ' = 1, ...,r, fc = 0. Function vectors with holes In contrast with the scalar case, where a locally linearly independent refinable function cannot have a hole, for function vectors this need no longer to be true. Example 4.1 Let $ = {4>i, (^2)^ satisfy 2/9 1/3 +■■" 0 0 $(2a;) + $(2a; - 7). 1/3 1 f) ^2x - 1) + /2/3 Vl/3 $(2a; - 2) Gerlind Plonka 384 FIG. 1. Locally linearly independent function vector $ = {4>i,<j>2V with a hole. Hence AQ and ^i in (2.1) are (14 x 14)-matrices. The function vector $ is uniquely determined by the refinement equation (up to multiplication by a constant). Further, gsupp(/ii = [0, 3] and gsupp02 = [0, 5], and (f)^ possesses a hole of length 1, namely (t>2{x) = 0 for a; G (5/2, 7/2) (cf. Figure 1). As we shall show in Section 6, # is continuous and locally linearly independent. Further, one can simply find function vectors # with infinitely many holes (but not being locally linearly independent). Example 4.2 Let $ = (01,^2)^ with <t>i{x) =\<}>i{2x) + <Ai(2a; - 1) + i</.i(2.T - 2), FIG. 4>2{x) = -4>2{2x) + 4>i{2x-A). 2. Function vector $ = {(f>i,^2)'^ with infinitely many holes. Here Ao, Ai in (2.1) are (8 x 8)-matrices. Observe that 0i is just the hat fimction with supp^i = [0,2] and 02 is a fractal function with gsupp(/)2 = [0, 3], formed by infinitely many 'hats' of support length 2"^, j = 0,1,..., and with infinitely many holes of the form 2~''(3/2,2), j = 0,1,... (cf. Figure 2). Of course, this function vector is not locally hnearly independent, since 0i is refinable by itself (see also the proof of Theorem 4.3). We want to consider the support properties of function vectors $ more closely, and investigate, in which cases the components of # can have holes. Locally linearly independent function vectors 385 In the remaining part of the paper, we only investigate the case r = 2, i.e., $ = Theorem 4.3 Let $ = (</>i,^2)^ be a refinable, locally linearly independent vector of compactly supported, continuous functions with gsupp ^i;^ = [a^,, (3i,] and let l^, — Pu — oiv, 1^ — 1,2, be the lengths of the global supports with li < l^- Suppose that $ contains holes. Then we have (1) The support lengths satisfy 1^12 <l\<l2(2) There exist compactly supported, continuous functions /i, /g such that (f>2 = /1 + /2 and the vector {4)1, fI, f2)'^ is refinable. Proof: Since $ contains holes, there exists an open interval I = (71, 72) of greatest length and a t- € {1,2} with I C gsupp (jiy, where (j)i, vanishes on I. If there are several intervals of greatest length (biggest holes) we just choose one of them. Refinability implies for a; € / (t>^{x) = Q = '^A^^i{k)(i}i{2x-k) + A^^2{k)4>2{2x-k). k Since $ is locally linearly independent, it follows that Au^i{k) Au,2{k) = = 0 0 for for supp<?ii(2--fe)n/7^0, supp<^2(2--A;)n/#0. The choice of I as the greatest interval now implies that we can replace supp ()f)y by gsupp(/>!y, such that Av,i{k) Aufilk) =0 =0 for for 271-/3i < fc < 272-ai, 2-^1- p2<k< 272 - a2. (4.1) Let now /i := <t)vX[a^,'yi] and /2 := ^i/X[72,/3„]! where X[a,b] denotes the characteristic function of the interval [a, b]. Then 4>v = /i + h and from refinability and from (4.1) it follows that /iW = Yl A^Ak)M'^x-k) + fe<27i-/3i f2{x) = Yl k>2f2—ai Yl A^Mk)M^x~k), fc<27i-/32 A^Ak)M^x-k)+ Y A.,2{k)M^^--k). fc>272 —02 If the hole / were in (j6i, then at least one of the two functions /i, /2 would have a global support length less than li/2 and hence would vanish since gsupp(;ii(2 ■ -k) and gsupp^i(2 • —k) have a length > li/2. Thus the hole must be in (/>2, i.e., <f>2 = fi + /2For li = I2 we obtain a contradiction, since, with the same argument as before, one of the two functions /i, /2 vanishes. Hence Z2 > ^i- In this case (0i, /i, /2)^ is obviously a refinable vector of continuous functions. It remains to show that I2/2 > li leads to a contradiction. For ^2/2 > ^i, <f>i must be refinableby itself, since gsupp^2(2-—fc) cannot be contained in gsupp (/)i for some fc € 7L. In particular, from local linear independence we know that then [ai,^i] is an integer interval and that 0i has no holes. Further, since at least one of the two functions /i,/2 386 Gerlind Plonka has a global support length less than I2/2, it follows that this function is representable by (?!)i(2 • -k), k e7L, only. Without loss of generality let f,{x)= Y. A2,iik)M^^-k), xeJR. (4.2) fc<27i-/3i : Considering *i = ((?l)i(- + fc))fiJ^\ local linear independence implies that the space V'l = span{*i(x) : a; 6 [0,1)} has full dimension li. Further, we consider (Here, for x € IR, [x\ denotes the greatest integer less than or equal to x and \x] denotes the smallest integer greater that or equal to x.) Now, choosing a matrix M of basis vectors of the space V = span {*(x) : x e[0,1)}, then, because of (4.2), the rows of M corresponding to /i depend on the first h rows (corresponding to 4>i). However, not all /i-rows can be zero rows since /i is not a zero function. But this contradicts the local linear independence condition by Theorem 2.1. □ Corollary 4.4 Let # = {(t)\,4>2Y ^^ 0, refinable, locally linearly independent vector of compactly supported, continuous functions with gsuppcf),^ = [a^, /J^,] and l^ = f3^ - a^, u = 1,1. Suppose that h < l2- Then we have: Ifh = h or li < I2/2 then (pi, 4>2 do not possess holes. Lemma 4.5 Let i> = {<t>i,(p2)'^ be a refinable, locally linearly independent vector of compactly supported, continuous functions. Then i> has no holes that start or end with an integer. Proof: Suppose, $ has a hole which ends with an integer. Choose a hole (71,72) of this type with biggest length. Without loss of generality assume that this hole is in (j)2. Then, at least in a small right neighborhood of 0, 4>2{- + 72) is representable only by (/>!(2 • +ai) and (j)2{2 ■ +a2). Recall from [10] that the supports gsupp<)!>i = [ai, /3i], gsupp02 = [02, 1^2] satisfy Q^ = fc + C2, /3^ = / + C2, fc,/GK,C2e {0,1/2,1/3,2/3}. Now, if both, ai and 02 are integers, then ^i(x + ai), (j)2{x + Oi2), <}>2{x-\-^2) are linearly dependent in some suitable interval x G [0,e), e > 0, since they can be represented by the two functions 0i(2x + ai), 1^2(2.^ + 02). This is a contradiction to the local Hnear independence. If only one a^, v e {1,2} is an integer, then (f)^{x + a^) and ^2(2; + 72) are representable only by (/>^(2x + a^) in some interval x G [0, e) as before and we again obtain a contradiction. If neither ai nor 02 are integers, then (/)2(a; + 72) cannot be represented by integer translates of ({>u{2x), u = 1,2, contradicting the refinability. Analogously, the contradiction follows for holes starting with an integer. □ Let us call a hole (71,72) in # biggest hole if there is no other hole in $ of double size of the form (271 + k, 272 + k) with some k &7L. Lemma 4.6 Let $ = (i?!>i,</>2)^ he a refinable, locally linearly independent vector of compactly supported, continuous functions. Then there is at most one biggest hole in #. Locally linearly independent function vectors 387 Proof: Assume that $ has two biggest holes. Let again h, I2 denote the lengths of the global supports of ^1, (^2 and suppose that h < l2- Then ^1 cannot have a biggest hole by Theorem 4.3. Hence the two holes must be in ^2 and we get a partition (j)2 ■— /i +/2 + /s analogously as in the proof of Theorem 4.3 such that (gsupp /i)U(gsupp /2)U(gsupp /s) C gsupp(?l)2. Further, by refinability, each function /i, /2,/3 can be represented by </>i(2 • -k), ^2(2 • —k), fc G ZZ. Moreover, at least one of the three functions /i, /2, /s must contain a translate of </>2(2-), otherwise at least two of the functions /i, /2, /a would be linearly dependent in a suitable interval inside the starting intervals, since (/>i(2 ■ —fc) either starts at Z + ai/2 or at ZZ + (ai +1)/2 (depending on whether k is even or odd). Hence gsupp(/12 > (gsupp02)/2 + 2(gsupp0i)/2. But this contradicts Corollary 4.4. □ Remark 4.7 All results in this section can be generalized tor >2 and to L^-integrable functions, if the characterization of local linear independence in [2] is used. 5 Rank conditions for matrices formed by the refinement mask We again restrict ourselves to the case that $ = {(j>i,(j)2Y' is a vector of compactly supported, continuous functions satisfying the refinement equation (1.1) with A{k) = 0 for A; < 0 and k> N. Let us consider the matrices ^0 and Ai in (2.1) and the minimal common invariant subspace V of {^0, Ai} generated by VQ as defined in Section 2. Recall that V contains *(a;), X e [0, 1). Let M be an {rN x dim y)-matrix such that the columns of M form a basis of V. Now delete all components in the vector * = i^{x + k))^~Q corresponding to zero rows in M in order to get *. Further, delete the corresponding rows and columns in the matrices ^0 and Ai in (2.1) in order to obtain Ao and ^1 with ^{x/2)=Ao^{x), *((a; + l)/2)=^i*(x), a;G[0,l]. (5.1) Deleting the zero rows and the corresponding columns in M we obtain M. Example 5.1 Let us consider Example 4.1. Here * is a vector of length 14 and V = span{$(a; + fc)|^Q : x e [0,1)}. Since supp^ii = [0, 3] and supp(?!)2 C [0, 5], it follows that the rows oiM corresponding to ^i{x + j), j = 3,4,5,6, and (j)2{x+j), j = 5,6 are zero rows. Indeed, these are all zero rows of A^, i.e., V has dimension 8. We delete these components of $(a;) and obtain ^{x) = (Mx), Mx), Mx + 1), Mx + 1), Mx + 2), M^ + 2), Mx + 3), <A2(a; + 4))^ as well as 9A /I 2 0 0 0 0 0 0\ /3 3 1 2 0 0 0 0\ 33000000 90330000 6 0 3 3 12 0 0 0 0 603320 3090 3 300 0 0309030 , 9ii = 00006032 0 0 0 0 0 0 0 3 000 03003 0 0000000 00000000 9 0 0 0 0 0 0 0 \0 0 9 0 0 0 0 0/ \0 0 0 0 9 0 0 0/ 388 Gerlind Plonka Let us call a row of ^o (resp. ^i) 0i-row if it corresponds to an 0i-entry in * and (/)2-row if it correspond to an 02-entry. Let n be the length of the new vector * and hence ^o, Ai are (n x n)-matrices. If $ is a locally linearly independent vector then Theorem 2.1 implies that M is an invertlble (n X n)-matrix. Deleting the first ^i-row and the first (^2-row and the corresponding columns in .4o, we obtain a new matrix B of dimension (n - 2) x (n - 2). The same matrix B is obtained, if we delete the last 0i-row and the last (?!>2-row and corresponding columns in Ai. Further, the structure of ^0) «4i implies that spec ^0 = spec Jo U spec B, spec .4i = spec Ji U spec B, where JQ (resp. Ji) is a 2 x 2-matrix containing the entries of Ao (resp. ^i) being at the same time in the first 0i- or 02-row (resp. last ^i- or (j>2-raw) and in the first 0i- or 02-column (resp. last 0i- or (/)2-column). (Here spec A denotes the set of eigenvalues of a matrix A.) Example 5.2 For $ = (0i, 02)^ in Example 5.1 we obtain the matrix B after deleting the first and second row and corresponding columns in AQ or by deleting the 5th and 8th row and corresponding columns in ^i. Hence, /3 9 0 B='0 9 0 3 0 0 0 0 1 3 6 3 0 2 3 0 0 0 0 0 3 0 0 0\ 0 2 3 0 ■^°"9 (3 3)' -^^"9 (9 0) Vg 0 0 0 0 o/ where Ji and J2 are invertible. We obtain Theorem 5.3 Let $ = (<l>i,<}>2)'^ be a refinable, locally linearly independent vector of compactly supported, continuous functions. Further, let Ao, Ai and B be given as above. Then we have (1) rank(Jo) > 1 and rank(Ji) > 1, (2),rank(S) >n-3, (3) rank(ylo) > n - 2 and rank(^i) > n - 2, (4) |rank(A) - rank(.Ai)| < 1. Proof: (1) First observe that JQ and Ji at lea-st have rank 1, otherwise a component of *(x), X G [0,1) would completely vanish, contradicting the definition of *. Let gsupp</)i = [ai, /3i] and gsupp(/>2 = [02, ft]- Then, one simple eigenvalue zero in Jo implies that QI G K, a2 G K + 1/2 or vice versa. If Jo has two eigenvalues 0 then the geometric multiplicity of 0 must be 1 and we obtain ai €71 + 1/3, Q2 S Z + 2/3 or vice versa. Analogously, a corresponding behavior of Ji implies ft G Z + 1/2, ft G K or vice versa, and /3i G Z5 + 2/3, ft G IZ + 1/3 or vice versa, respectively. Locally linearly independent function vectors 389 (2) If the matrix B possesses the eigenvalue zero, then both, Ao and Ai possess the eigenvalue zero. Hence, AQM. and AiM. are not invertible, while M is an invertible matrix. Thus, by Theorem 2.1, „4o and Ai have a zero row, but being not the first or last 4>i- or (;&2-row. Hence, also B has a zero row and, by construction, if ^o has the zero row in the ^th 0j-row, i G {1,2}, then Ai must have a zero row in the {I — 1)th (j)i-raw. This means by (5.1), the two zero rows imply a hole in $ containing the interval {k — 1/2, k + 112), for some k € TL. This hole must be a biggest hole. If B has the eigenvalue zero with geometric multiplicity greater than 1, then with the same arguments one obtains a second biggest hole in $. But this contradicts the local linear independence by Lemma 4.6. Hence rank(S) > n — 3.. (3) The above considerations directly imply that rank(^o) >"■-2 and rank(^i) > n-2. _ _ . (4) Now, if ^0 has rank n — 2, then B has rank n — 3 and hence Ai can have rank n — 1 at most. Analogously, rank(^i) = n — 2 implies rank(^o) <n—l. D Prom Theorem 5.3 it follows that we have to investigate the following five cases: (1) rank(.4o) = rank(.Ai) = n, (2) rank(J^o) = rank(J^i) = n - 1, (3) rank(>^o) = rank(^i) = n - 2, (4) rank(^o) = n — 1, rank(^i) = n, (5) rank(^o) = ^ ~ 1) rank(^i) := n — 2. All further cases can be reduced to one of the above. However, some of these cases may contradict the local linear independence assumption for $. Considering the first two cases, we obtain a partial answer to the question of whether the support of c^j, i = 1,2, can have holes. Moreover, we obtain sufficient conditions for the local linear independence of $ in terms of rank conditions iox AQ, AiFor the first case we obtain: Theorem 5.4 Let $ = [4)1, ^2)^ be a refinable vector of compactly supported, continuous functions. Let the space V = span{^{x) : x £ [0,1)} have full dimension, i.e. M, formed by basis vectors of V is an invertible (n x n)-matrix. Let Ao, Ai be given as above. Then rank(.4o) = rank(.Ai) = n implies that $ is locally linearly independent and has no holes. Proof: The assertion on local linear independence is already proved in [4], Theorem 3.2. Since ^O; -^i are invertible, the matrix A^^ ■ ■ -Ae^M. never has a zero row, hence from _ ^{^ + ---+'^ + §^)=-^^^---^^r.Hx), a:G[0,l), - (5.2) it follows that there is no dyadic interval where (/)i or ^2 vanishes. Thus $ has no holes. o For the second case we find Theorem 5.5 Let $ = (0i,^2)'^ be a refinable vector of compactly supported, continuous functions. Let the space V = span{^{x) : x E [0,1)} have full dimension, i.e. M, 390 Gerlind Plonka formed by basis vectors ofV is an invertible (n x n)-matrix. Let Ao, Ai and B be given as above. Further, let Ta,nk{Ao) = rank(^i) = n — 1 and each of these matrices has one zero row. Then we have (1) // rank(B) = n - 2 and the four matrices AQAO, ^o-^i, AiAo^ AiAi have rank n — 1, then $ is locally linearly independent and has no holes. (2) //rank(B) = n-3 and the four matrices AOAQ, AQAI, AIAQ, AiAi haverankn-1, then $ is locally linearly independent and has one hole of the form {k -1/2, fc+1/2) for some k £71. Proof: (1) We consider the first case. Since rank(B) = n - 2, it follows that B is invertible and the zero row of Ao must be the first ^i-row or the first ^2-row. Analogously, the zero row of Ai must be the last (f>i- or (j!)2-row. Since rank(^o-4o) = Ta.nk{AiAi) = n - 1, it follows that Jo and Ji only have a simple eigenvalue^zerj5^and the assumptions (1) of the theorem imply that all matrix products A^ ■■■Ac„M, n 6 IN, have rank n - 1 and one zero row, namely the same as ^o if ei = 0 and the same as ^i if ej = 1. The assumption on V in the theorem already ensures that $ is linearly independent on (0,1). Now the above observations also imply that, by Theorem 2.1, * is locally linearly independent. _ The zero row in ^o implies that the support of one component of $ starts with an integer and the support of the other with a half integer. Considering the zero row in Ai we also find that the support of one component ends with an integer and the support of the other with a half integer. In particular, from (5.2) it follows that $ cannot have holes. (2) We consider the second case. Since rank(B) = n - 3, it follows that B possesses the eigenvalue zero and the zero rows of Ao and Ai are not the first or the last ^i- or (f)2rows. Moreover, as shown in the proof of Theorem 5.3, if the l-th (f>i-TOW, i € {1,2}, of ^0 is a zero row then the {I - l)-th ^j-row of Ai is also a zero row, and this implies by (5.1) a hole of the form (fc - 1/2, k + 1/2) for some k € 7L m (/i^. Further, the rank conditions (2) of the theorem imply that all matrix products Aci • • • A^^M, n 6 IN, have rank n - 1 and either a zero row in the l-th or in the (/ - l)-th row. Thus, by Theorem 2.1, # is locally linearly independent and has only one hole. □ Remark 5.6 Example 4-1 satisfies the assumptions of Theorem 5.5 (2). An example satisfying Theorem 5.5 (1) can be found in [10]. Observe that the case (2) is not completely settled by Theorem 5.5 since for rank(Ao) = rank(^i) = n - 1 some of the four matrices ^o-^o, AQAI, AIAQ, AiAi can also have rank n - 2. Indeed, there exist locally linearly independent function vectors, where rank(.4o) = rank(^i) = n - 1 and rank(«4o^o) = Tank{AiAi) = n - 2, see [10]. The remaining cases are more complicated to handle and we cannot give a final answer to the question of whether a locally linearly independent refinable vector i> can have more than one hole. 6 Proof of the example In this section we want to verify the assertion that the function vector # given by the refinement mask in Example 4.1 is continuous and locally linearly independent. Let us Locally linearly independent function vectors first prove that $ is continuous. To this end we use the following observation by Jia, Riemenschneider and Zhou [9]: Let {-A{k)}^_Q be a real refinement mask satisfying the following properties: (1) I J2k=o -^i^) ^^ o^® eigenvalue 1 and all further eigenvalues are inside the unit circle. (2) The matrices ^o and Ai both have the simple eigenvalue 1 and there is a vector ei 6 IR^''with ef ^0 = ef ^1 = ef. (3) Considering the space U = {u e IR'"^ : efu = 0} the joint spectral radius of Ao\u and ^i|!7 satisfies p(>^o|c/^o|c/) < 1Then the subdivision scheme associated with {A{k)}^^Q converges in the maximum norm, and hence the solution vector $ of the refinement equation is continuous. Here the joint spectral radius satisfies for any matrix norm P{AQ\UAO\U)= inf(max{||^eilc/---A.|i7|| : ei e {0,1}, i = 1,... ,n})^/". TI>1 For our example we find: 7 ^^ 2Z_/^(^) ~ ( 4/Q 1 /f! ) possesses the eigenvalues 1 and -5/18. 2) The matrices ^0 and ^i both have the simple eigenvalue 1 with the left eigenvector el^ = (3,1,3,1,3,1,3,1,3,1,3,1,3,1). 3) The space U = {u £ IR^^ : ej u = 0} has dimension 13 and we find the orthonormal basis oiU: ui U2 = =- 28-1/2(4,0,0,0,-3,-1,0,0,0,-1,0,0,0,-1)^, 110-1/2(0,0,0,0,-3,-1,0,0,0,0,0,0,0,10)^, Us U4 = = 130-1/2 (-3,0,0,0,-3,-1,-3,0,0,-1,0,0,10,-1)^, 132-1/2(0,0,0,0,-3,-1,0,0,0,11,0,0,0,-1)^,; ns U6 = = 70-1/2 (-3,0,0,0,-3,-1,7,0,0,-1,0,0,0,-1)^, 208-1/2 (-3,0,0,0,-3,-1,-3,0,0,-1,13,0,-3,-1)^, UY Us ug = = = 3540-1/2 (-3,-1,-3,0,-3,-1,-3,59,0,-1,-3,-1,-3,-1)^, 3660-1/2 (-3,-1,-3,60,-3,-1,-3,-1,0,-1,-3,-1,-3,-1)^, 2352-1/2 (-3,48,0,0,-3,-1,-3,0,0,-1,-3,0,-3,-1)^, uio = 3422-1/2 (-3,-1,-3,0,-3,-1,-3,0,0,-1,-3,58,-3,-1)^, un ui2 ui3 = = = 4270-1/2 (-9,-3,-9,-3,-9,-3,-9,-3,61,-3,-9,-3,-9,-3)^, 10-1/2(0,0,0,0,1,-3,0,0,0,0,0,0,0,0)^, 2842-1/2 (-9, -3,49,0, -9, -3, -9,0,0, -3, -9,0, -9, -3)^. The matrix representations of ^o|i7, ■Ailu under this basis are Ao\u = {{Ao Uj)'^ Uk)]\^i and Ai\u = ((-^i Wj)^Ufc)]^j._i, and a computation with Maple gives for the spectral norm (max{||Ailc;v4,,|i7A3lc/||2 : ei,e2,e3 G {0, 1}})1/^ <0.95. 391 392 Gerlind Plonka Hence $ is continuous. Let us prove the local linear independence of #. Here we use Theorem 2.1 and a procedure proposed by Goodman, Jia and Zhou [4]. The space V C IR" (as given in Section 2) is spanned by the vector VQ = (0,0,9/5,38/15,6/5,1,0,0,0,9/5,0,0,0,0)^ and by AIVQ, AQAIVO, AIAIVQ, AQAOAIVQ, AIAQAIVQ, AQAOAQAIVQ, AIAOAOAIVQHere VQ is a right eigenvector of Ao to the eigenvalue 1. Hence dim V = 8. Forming the matrix M, we observe that the 7-th, the 9-th and the last four rows of M are zero rows. Hence gsupp(?!>i = [0,3] and gsuppi^a = [0,5]. The remaining 8 rows of M are linearly independent. Thus $ is linearly independent on (0,1) by Theorem 2.1. We can restrict our considerations to the shortened matrices ^o, Ai as given in Example 5.1. Further, we can choose the matrix M as the identity matrix. The procedure proposed in [4] gives rank^o = rank^oA = rank^o^i = 7 and the 7-th rows are zero; rank^i = rank^^i = rank^i^o = 7 and the 6-th rows are zero. Hence, $ is locally linearly independent. Moreover, 02 possesses a hole of length 1, namely M^) = 0 for x G (5/2, 7/2). 7 Conclusions In Section 3 we have presented an algorithm to compute the global supports of the r components of a compactly supported refinable function vector $ from the refinement mask. The rest of the paper was restricted to f = 2. While for the scalar case local linear independence of a refinable function (f> guarantees that the support of 0 is an integer interval without holes, this is not longer the case for r > 1. As we have seen in Section 4, a function vector $ = {(pi, 4>2)'^ can only have holes , if the lengths h and h of the global supports of </>], 02 satisfy ^2/2 < ^i < h- As another property, it has been shown that the endpoints of a hole cannot be integers. Further, $ can have at most one biggest hole. In Section 5 we have investigated matrices derived from the refinement mask. In Theorem 5.3 some results on the rank of these matrices are obtained leaving five different cases to be investigated. The first case has been solved completely in Theorem 5.4. The second case has been settled partiallyjn Theorem 5.5. For the other cases we cannot give a final answer. However, if A and Ai have different rank (as in case (4) and case (5)) then one can show by Theorem 2.1 that $ must have infinitely many holes. In case (4) this can be seen as follows. Since rank(A) = n-1 it follows that rank(^5"A) -n-l for A; = 0,1,.... Hence, by Theorem 2.1, AIAQ has a zero row for all fc = 0,1,... implying that * contains vanishing intervals of the form [h + (2*^ - l)/2^ h + (2*^ -1/2)/2*0 with suitable integers h- Here Ik cannot be the same integer for all A; = 0,1,2,..., in particular one finds Ik i^ h+i, fc e IN. Hence * has infinitely many holes. This observation leads to the following Conjecture 7.1 Let $ = (01,02)^ be a refinable, locally linearly independent vector of compactly supported, continuous functions. Then # cannot have more than one but finitely many holes. Our numerical computations however lead to the hypothesis that the cases (3), (4) and (5) contradict the property of local linear independence. So we obtain Locally linearly independent function vectors Conjecture 7.2 Let $ = {4>i,(l>2)'^ &e a refinable, locally linearly independent vector of compactly supported, continuous functions. Then $ cannot have infinitely many holes. Acknowledgment The author thanks the referees for their valuable suggestions to improve the paper. Bibliography 1. C. de Boor, R. A. DeVore, and A. Ron, Approximation orders of FSI spaces in L2{M'^), Constr. Approx. 14 {1998), 411-427. 2. H. L. Cheung, C. Tang, and D.-X. Zhou, Supports of locally linearly independent M-refinable functions, attractors of iterated function systems and tilings, preprint, 2001. 3. W. Dahmen and C. A. Micchelli, Biorthogonal wavelet expansions, Constr. Approx. 13 (1997), 293-328. 4. T. N. T. Goodman, R. Q. Jia, and D.-X. Zhou, Local linear independence of refinable function vectors of functions, Proc. R. Soc. Edinb. 130 (2000), 813-826. 5. T. A. Hogan, Stability and independence of the shifts of finitely many refinable functions, J. Fourier Anal. Appl. 3 (1997), 757-774. 6. K. Jetter and G. Plonka, A survey on L2-Approximation order from shift-invariant spaces, in Multivariate Approximation and Applications, N. Dyn, D. Leviatan, D. Levin, and A. Pinkus (eds.), Cambridge University Press, 2001, 73-111. 7. R. Q. Jia, Shift-invariant spaces on the real line, Proc. Amer. Math. Soc. 125 (1997) 785-793. 8. R. Q. Jia and C. A. MicchelH, On linear independence of integer translates of a finite number of functions, Proc. Edinburgh Math. Soc. 36 (1992), 69-85. 9. R. Q. Jia, S. D. Riemenschneider and D.-X. Zhou, Vector subdivision schemes and multiple wavelets, Maift. Com^. 67 (1998), 1533-1563. 10. G. Plonka and D.-X. Zhou, Properties of locally linearly independent refinable function vectors, preprint, 2001. 11. A. Ron and Z. Shen, The sobolev regularity of refinable functions, J. Approx. Theory 106 (2000), 185-225. 12. Q. Y. Sun, Two-scale difference equation: local and global linear independence, manuscript, 1991. 13. J.-Z. Wang, Linear independence relations of the shifts of a vector-valued distribution, manuscript, 2001. . 393 The correlation between the convergence of subdivision processes and solvabiUty of refinement equations Vladimir Protasov Department of Mechanics and Mathematics, Moscow State University, Moscow. protasov@dionis.iasnet.ru Abstract We consider the univariate two-scale refinement equation (p{x) = J2k=o^>''P('^^ ~ '^)' where Co,-- ■ ,CN are complex values and J^cjt = 2. This paper analyses the correlation between the existence of smooth compactly supported solutions of this equation and the convergence of the corresponding cascade algorithm/subdivision scheme. In the work [11] we have introduced a criterion that expresses this correlation in terms of the mask of the equation. It is shown that the convergence of subdivision scheme depends on values that the mask takes at the points of its generalized cycles. In this paper we show that the criterion is sharp in the sense that an arbitrary generalized cycle causes the divergence of a suitable subdivision scheme. To do this we construct a general method to produce divergent subdivision schemes having smooth refinable functions. The criterion therefore establishes a complete classification of divergent subdivision schemes. 1 Introduction Refinement equations have been studied by many authors in great detail in connection with their role in the theory of wavelets and of subdivision schemes in approximation theory and design of curves and surfaces (see [1^14]). In this paper we study a criterion of convergence of subdivision processes having smooth refinable functions. This criterion was presented in the work [11]. In particular we show that the criterion is sharp in the sense that each if its cases is realized. To do this we provide a general procedure for constructing divergent subdivision schemes (or cascade algorithms) corresponding to smooth refinable functions. We restrict ourselves to univariate equations with a compactly supported mask. Through the paper we denote by T = R/27rZ the unit circle, by H the space of entire functions on C, by C' the space of I times continuously differentiable functions on K, by C^ = C the space of continuous functions, by CQ the space of compactly supported functions from C', and by CQ the space of compactly supported continuous functions on R. A sequence {fk} converges to zero in CQ if it converges to zero in C' and the supports of /fc, k eN are uniformly bounded. Consider a refinement equation N <p{x) = Y^Ck^{2x-k), fc=0 394 (1.1) Subdivision processes and refinement equations 395 where Ck £ C,Y.k^k = 2. The trigonometric polynomial m(^) = lJ2k=o'^ke~''''^ is the mask of this equation. It is well known that a Co-solution of this equation {refinable function), if it exists at all, is unique up to normalization and has its support on the segment [Q,N]. For a given mask m we denote by [m] the corresponding refinement equation. Let us also define the following subspaces of the space Co: In other words the Fourier transform of a function from M'' has zeros of order > / + 1 at all the points 27rfc, /c G Z. The Fourier transform of a function from £' has zero at the point ^ = 0 and has zeros of order > Z + 1 at all the points 2-?Tk, A; e Z \ {0}. Let us also denote £ = £° = M°. The cascade algorithm for refinement equations is the construction of the sequence fn = Tfn-1 for some initial function /o € Co, where Tf{x) = I]fcCfe/(2a; - k) is the subdivision operator associated to equation (1.1). This operator is defined on the space Co and preserves all the subspaces C', £'. If /„ converges in the space C^ to a function (f GCQ {l> 0), then obviously it converges in CQ and (p is the solution of (1.1). Moreover, in that case the function g = f^-ip necessarily belongs to £} (see [1], [5]). Thus we say that the cascade algorithm converges in C' liT'^g —> 0, n —» oo for any g E CK Properties of the cascade algorithms have been studied by many authors in various contexts. This algorithm gives a simple way for approximation of refinable functions and wavelets. On the other hand the convergence of the cascade algorithm is equivalent to the convergence of the corresponding subdivision scheme ([4]). For a given mask m(^) we say that the subdivision process {m} converges in C' if the corresponding cascade algorithm or the corresponding subdivision scheme converges in that space. It is clear that if a subdivision process converges in C', then the corresponding refinement equation has a Cg-solution. In general the converse is not true, corresponding examples are well-known (see [1], [2], [13] for general discussions of this aspect). A natural question arises; under which extra conditions the solvability of a refinement equation implies the convergence of the subdivision process? 1) A necessary condition (first introduced in [6]): // a subdivision process {m} converges in C', then its mask can be factored as MC) = (i^) a{0 (1.2) for some trigonometric polynomial a{(,). In particular the condition m{0={ 1—)a{0 ^Z1^2fe = 5^C2fc+i = l k ' (1.3) k is necessary for the convergence of the subdivision process in C. Let us remember that for the existence of smooth solutions of refinement equation this condition is not necessary (there is a weaker condition for this, see [10]). For a given mask m denote by l(m) the maximal integer I such that condition (1.2) is satisfied. So if a subdivision process {m} converges in C'^, then k < l(m). 2) A sufficient condition (introduced in [1], developed in [8],[14],[7],[9]): 396 Vladimir Protasov Suppose a mask m satisfying 1.2 for some l>0 has neither symmetric roots nor cycles; then if the equation [m] has a C'^-solution, then the process {m} converges in Ci. Let us recall the notation used in this statement. If, for a trigonometric polynomial p(0 and for some a e T, we have p{a/2) = piir + a/2) = 0, then {a/2, rr + a/2} is a pair of symmetric roots for p{^). In order to be defined we set that for any a 6 T the element a/2 G T has the corresponding real value from the half-interval [0,7r). Further, : a given set b = {/3i, ■ • •, /?„} C T, where n > 2, is called cyclic if 2b = b, i.e., 2/3j = /3j+i for j = 1,• • • ,n (we set /3„+i = /3i). We consider only irreducible cyclic sets, for which all the elements are different. Note that if two cyclic sets do not coincide, then they are disjoint. A cyclic set b is called a cycle of a trigonometric polynomial p if p(b + TT) = 0, i.e.,p(/9 + 7r) = 0fbrall/3eb. It is well known that the sufficient condition (2) for a mask m is equivalent to the stability of the corresponding refinable function (i.e., integer translates of the refinable function possess Riesz basis property in Z,2(K)). It is also equivalent to say that the mask satisfies Cohen's criterion (see for example [5, Proposition 2.4]). Actually condition (2) was formulated for the case / = 0 only, but it can be easily extended to general I. It is seen, for instance, from Theorem 2.2 of this paper. Thus we have one necessary and one sufficient condition for the convergence of subdivision processes having smooth refinable functions. It was a natural problem to fill this gap and to elaborate a criterion in terms "if and only if". In 1998 two attempts were made independently from each other and almost simultaneously. They were the work [9] by M. Neamtu and my work [11]. Those two criteria were very similar, but different. Moreover, it turned out that our results were actually incompatible. We will discuss this aspect after formulating the main result of the work [11]. 2 A criterion for convergence We give a criterion of convergence of a subdivision process under the condition that the corresponding refinement equation has a smooth solution. We will see that symmetric roots of mask do not influence the convergence of subdivision processes. This means in particular that the stability of solutions is not necessary for the convergence. The convergence entirely depends on values of the mask at the points of so-called generalized cycles: Everywhere below we consider trigonometric polynomials without positive powers, i.e., polynomials of the form p(0 = EfcLo afc^-''^^. As usual we set deg p = iV (assuming floajv ^ 0). To a given value a e T we assign a binary tree denoted in the sequel by %,. To every vertex of this tree we associate a value from T as follows: put a at the root, then put a/2 and TT -I- a/2 at the vertices of the first level (the level of the vertex is the distance from this vertex to the root. The root has level 0). If a value 7 is associated to a vertex on the n-th level, then the values 7/2 and TT + 7/2 are associated to its neighbors on the (n + l)-st level. Thus there are the values ^ + ^, fc = 0, • ■ •, 2" - 1 on the n-th level of the tree 7^,. A set of vertices A of the tree % is called a minimal cut set if every infinite path (all the paths are without backtracking) starting at the root includes Subdivision processes and refinement equations exactly one element of A. For instance the one-element set A set. Every minimal cut set is finite. ^,^f";*!°" 2.1 397 {root} is a minimal cut Aset{0,,...,/?„} cT is called a generuMzed cycle of a polynomial ,f ?f '' ?f'f """"^ ^"'' ""^ ■?■ = 1' • • •'" ^/^e ^^ee ^ft+T possesses a mmimaZ c«< set Aj such that p{Aj) = 0. The family {.4i, • • • ,^„} is said to be sets of zeros of the generalized cycle b Let us remark that for a given generalized cycle the set of zeros may not be defined in a unique way. Any (regular) cycle of p(0 is also a generalized cycle, in this simplest case each mmimal cut set Aj is the root of the corresponding tree T«.+^. On the other hand, not any^generahzed^ycle is a regular cycle. For example, the polynomial p{^) = {e-^i % '/l^^ Tr ^"^ ^^ ^° regular cycles, but is has a generalized cycle b = {/3i,P2} = {27r/3,47r/3}. Indeed, this polynomial has three zeros on the period: 7r/3, -7r/6 57r/6 S T. The set Ai = {-7r/6,57r/6} is a minimal cut set for the point /3i + TT, ^2 = W/3} is a minimal cut set for /3, + -K, and p{A^) =. piA^) = 0. Roughly speaking, each cyclic set iPi, ■ • •, /On I has a unique corresponding cycle (the family of zeros is {/Jj + TT /3 + TT j) and a variety of generalized cycles (all possible sets of zeros {^1,..., A}, where X is an arbitrary mimmaJ cut set of the tree T,^^^, j = l,...,n). Note, that if at least one set ^^ differs from the root f3j + n, then it necessarily contains a pair of symmetric roots of p iheretore, if the polynomial p has no symmetric roots, then all its generalized cycles It there are any, are regular cycles. ' For any trigonometric polynomial p and any finite subset Y = fa, ■■■ a \ c T we denote p,{Y) = (U^^^ b(a,)|)V^ This is a multiplicative function'on the set of trigonometric polynomials. Now we formulate the criterion of stability of subdivision process. Theorem 2.2 Suppose a refinement equation [m] has a Ci-solution for some I > Qthen the process {m} converges in C if and only if the mask m satisfies (1.2) and for any generalized cycle b of the mask m we have p„(b) < 2-'. In particular, for I = 0, this means that a subdivision process {m}, whose refinement equation has a continuous solution, converges if and only if p„(b) < 1 for every generalized cycle b of the mask. Another corollary is Condition (2) from the Section 1. Indeed ^a mask has neither symmetric roots nor cycles, then it has no generalized cycles either' Hence, by Theorem 2.2, the subdivision process must converge. Example 2.3 Consider a mask m(0 = (0.2 + 0.5e-'«-F0.3e-2^«)(e-'«-e-^)2(e-2i«_ef)2 (2.1) The corresponding equation [m] has a Co-solution, this is shown in Example 4 5 The polynomial m has a unique generalized cycle b = {27r/3,47r/3}, the same as in the previous example, with the same sets of zeros ^1 = {-n/6, STT/G}, A^ = {n/3}. Actually this IS not one but two coinciding generalized cycles, if we count roots with multiplicity. We have {pm{h)y = ,27r, m{~) Air (-0.2-0.lV3i).M • (-0.2 + 0.1v/30-4e^.4e- 1.12 >1. ggg Vladimir Protasov Hence the subdivision process {m} diverges. 3 : : Statement of the problem Most examples of divergent subdivision schemes (having smooth refinable functions) are constructed for some special class of masks. These are either "unload masks of the form m(£) = p(nO for some polynomial p and an odd integer n, or, at lea.st masks whose associated matrix B = {c,,-jhmo,...M have a multiple eigenvalue 1. The divergence of such schemes is well known and does not require any special criterion. A natural question arises; whether one really needs the criterion of Theorem 2.2 to determine divergent processes? Maybe the family of generalized cycles is too wide to describe unstable subdivision schemes. In general there is no evidence that the condition pM > 1 can be combined with the existence of a smooth solution for the mask m. In this paper we are going to show that Theorem 2.2 indeed characterizes the family of unstable subdivision processes properly. We show that each generalized cycle can cause the^ divergence^of a suitable scheme. On the other hand, we will see that every converging subdivision scheme can be "spoiled" by some generalized cycle. 4 Preliminary results. Reductions of masks To construct examples of divergent processes we need some auxiliary results. The first of them establishes two properties of cyclic sets. The proof of this lemma IS an easy exercise for the reader. Lemma 4.1 a) Let b be a cyclic set and a G T. Then for the polynomials pi(0 = e-a _ e-'" and P2{0 = ^-^^^ - e'*" we have Pp,(b) = PPA^).K ^ I b) Leth^ andh, be cyclic sets andp{0 = Ylpei>^i^-'^+^"^^- ^'^^" ""' ^'""' P^^^'' = ^ if hi ^ b2, and PpCbs) = 2 i/bi = baNow turn back to the subdivision schemes. For a given integer J> 0, a mask m and a function / e C\ denote n{m,f) = -"-"-^^'"^ ^/S'^^'lTrfJ - the subdivision operator associated to m (we set log^O = -oo). The value i^K™) inf f^w n{m, f) is the degree of convergence of the process [m] m the space L . _ For every mask m we have n[m) <l + l (see [3]). Furthermore, it was shown m [3] and [2] that a process {m} converges in C if and only if i^tim) > I In particular the inequality uo{m) > 0 means that {m} converges in C. Let L be the maximal mtegei such that {m} converges in C^ (if the process {m} does not converge in C, then we nevertheless set L = 0). The value u,{m) is said to be the degree of convergence of the process {m} and denoted in the sequel by u{m). If r(mi) = u{m.2), then Ui{rm) = i^iim^) °' FOT a given refinement equation [m] denote by L{m) the maximal integer L such that the corresponding refinable function ^ belongs to C^. If this eqtiation has no cxm inuous compactly-supported solution, we set L{m) = -1. The smoothness of the refinable function V. i/the value s{m) = L + h, where h is the Holder exponent of the Lth derivative «^(^) on E. It is well known that a refinable function belongs to C if and only it s{m) > I (the equality s{m) = I is impossible). In particular, a refinement equation has a Co-solution if and only if s{jn) > 0. Subdivision processes and refinement equations Now we can describe the procedure of reduction of subdivision schemes introduced in [11]. This reduction makes it possible to get rid of both symmetric roots and cycles. 4.1 Eliminating of symmetric roots Let p(^) be a given trigonometric polynomial (let us remember that we consider polynomials without positive powers). Assume that p possesses a pair of symmetric roots {a/2, TT + a/2}. The transfer from p{^) to the polynomial pa{^) = ''^e-2.";_e-ia'°^ is said to be a transfer to the previous level. The inverse transfer from pa to p is a transfer to the next level. So a transfer to the previous level reduces a pair of symmetric roots {a/2, n + a/2} to the one root a. Proposition 4.2 Let a mask rh be obtained from a mask m by a transfer to the previous level. Then 3(171) = s{m). Moreover, 1/(171) = ^(m), whenever l(rh) = l(m). (The constant \(m) responsible for condition 1.2 was defined in Section 1). This imphes, in particular, that the reduced equation [rh] possesses a smooth compactly supported solution if and only if the initial equation [m] does; and the same true for the convergence of the corresponding subdivision schemes. Thus, a transfer to the next (previous) level does not change the smoothness of solutions. It also respects the rate of convergence of subdivision processes, unless this transfer does not violate condition 1.2 (a transfer to the previous level may increase the value l(m)). Using this Proposition one can consequently eliminate all symmetric roots of a given mask. 4.2 Elimination of regular cycles Let a polynomial p possess a cycle b. The transfer from p(^) to the polynomial p($) = PiO/ n/36b(^~*^ + e'"''') is called an eliminating of a cycle. Proposition 4.3 Let a mask fh he obtained from a mask m by eliminating of a cycle h. Then s(m) = s(m) and ^(m) =iLaax.{i'(in),pm(h)}. Thus the equation [m] possesses a smooth compactly supported solution if and only if the equation [rh] does. Moreover, the process {m} converges in C' if and only if the process {m} does and in addition/9TO(b) < 2~'. See [11] for the proofs of Propositions 4.2 and 4.3. Now it becomes clear how to estabhsh Theorem 2.2. First we consequently eliminate all symmetric roots. By Proposition 4.2 it does not change neither the smoothness of solution nor the rate of convergence (if the initial mask satisfied condition 1.2). Moreover, by Lemma 4.1 this process respects the constants pm(t>) for all cyclic sets b. The final mask has no symmetric roots, hence it can have only regular cycles. Then we eliminate all regular cycles (refereeing to Proposition 4.2) and obtain a mask satisfying Cohen's criterion, whose subdivision process does converge. This line of reasoning also allow us to eliminate directly all generahzed cycles as follows. 4.3 Eliminating of generalized cycles Let a polynomial p possess a generalized cycle b with corresponding sets of zeros ^1,... ,An- The transfer from p(^) to the polynomial p(^) = p(^)/ IlaeAj,j=i,...,ni^''''^ - : e"*") is called an eliminating of a generalized cycle. 399 400 Vladimir Protasov Proposition 4.4 Let a mask rh be obtained from a mask m by eliminating of a generalized cycle b. Then s{m) = s{m,) and i/(m) = max{i^(m),/9„,(b)}. Proof: After a suitable sequence of transfers to the previous level all the sets of zeros Ai,...,An drop to the corresponding roots /3i + TT, ...,/?„ + TT, and b becomes a regular cycle. By Lemma 4.1 this does not change the value /)m(b). Now it remains to apply Proposition 4.3. O Example 4.5 Consider again the mask m(0 from Example 2.3. After eliminating the generahzed cycle b = {^, f} we obtain the mask m(0 = 0.2 + 0.5e-'« + 0.3e-2'«. Since all the coefficients of m are positive, it follows that the equation [in] has a Cosolution and, moreover, the corresponding subdivision process {m} converges (see, for instance [1]). Now applying Proposition 4.4 we see that the initial process {m} diverges, since pm{^) = \/1.12. Let us note, that the matrix B corresponding to the mask m {B = {c2i_j}i,jg{o,...,8}) has the eigenvalue 1 with multiplicity one and has no other eigenvalues on the unit circle. So the divergence of the subdivision scheme in this case does not follow from the well-known argument of multiple eigenvalues. 5 : Unimprovability of criterion. Examples of divergent schemes Now we are going to see that Theorem 2.2 gives a full description of divergent subdivision schemes having smooth refinable functions. This means that all possible cases of the criterion of convergence are realized on suitable masks. For the sake of simplicity we formulate this result for the convergence in the space C, i.e., for the case / = 0. Theorem 5.1 Let h = {/3i,... ,jS„} be a cyclic set and let Ai,...,An be arbitrary minimal cut sets of the trees 7^j+^,... ,7^„+x -respectively. Then there exists a mask m(^) such that 1) m{Aj) = 0, j = 1,... ,n, i.e., h is a generalized cycle of the mask m, and Aj are its sets of zeros; 2) the equation [m] has a Co-solution, but the subdivision process {m} does not converge inC; 3) after eliminating of the generalized cycle b this process becomes converging inC. Proof: Consider a mask p(0 = (1 -H e-'«)/2a(0 such that deg o > 2, and the subdivision process {p} converges in C. To obtain such a mask it suffices to take an arbitrary polynomial a(^) with positive coefficients such that a(0) = 1. Now we use the fact that if the process {p} converges in C, then it will still converge in this space after all sufficiently small perturbations of the coefficients of a(C) preserving the condition a(0) = 1 (see [3]). Thus, with possible perturbation of the coefficients, we assume that the trigonometric polynomial a has no real roots and that the value pa{h) is irrational. Such a perturbation exists by the mean value theorem, because pa{h) is a continuous function of the coefficients of a{^). This implies, in particular, that pa{h) > 0 and hence Pp{h) > 0. Now take the polynomial q{^) = nae.4j,j=i,...,n(«~'^ - «="*")• % Lemma 4.1 we have Ppqr{h) = 2'^pp{h) for every r > 0. Consequently there exists a nonnegative integer r such that Ppqr{h) > 1. Take the smallest such integer ro and denote a = aq^""^ and p = pq'^o-^ (if ro = 0, then we put a = a,p = p). Let us remark that the case pp{h) = 1 is impossible, because this value is not rational, therefore Pp(b) < 1. Since b is the only Subdivision processes and refinement equations generalized cycle of the polynomial p, therefore, by Proposition 4.4, the subdivision process {p} converges. Now make a small perturbation of the coefficients of the polynomial a after which the process {p} still converges, and the value Ppg(b) is still bigger than 1, but the polynomial a does not have real roots. Then denote rh = p,m = rhq. We see that the mask m has a unique generalized cycle b, and this cycle has sets of zeros Ai,... ,AnSince /9m(b) > 1, the process {m} diverges, however removing this generalized cycle we obtain the converging process {rh}. This proves the theorem. □ Bibliography 1. D. Cavaretta, W. Dahmen, C. Micchelli, Stationary subdivision, Mem. Amer. Math. Soc. 93 (1991), 1-186. 2. D. Collela and C. Heil, Characterization of scaling functions. I. Continuous solutions, SIAM J. Matrix Anal. Appl. 15 (1994), 496-518. 3. I. Daubechies and J. Lagarias, Two-scale difference equations. I. Global regularity of soZuiions, SIAM. J. Math. Anal. 22 (1991), 1388-1410. 4. I. Daubechies and J. Lagarias, Two-scale difference equations. 11. Local regularity, infinite products of matrices and fractals, SIAM. J. Math. Anal. 23 (1992), 1031-1079. 5. S. Durand, Convergence of the cascade algorithms introduced by I. Daubechies, Numer. Algorithms 4 (1993), 307-322 6. N. Dyn, J. A. Gregory and D. Levin, Analysis of linear binary subdivision schemes for curve design, Constr. Approx. 7 (1991), 127-147. 7. L. Herve, Regularite et conditions de bases de Riesz por les fonctions d'echelle,, C. R. Acad. Sci., Paris, Ser. I 335 (1992), 1029-1032. 8. R. Q. Jia and J. Wang, Stability and linear independence associated with wavelet decomposjiion, Proc. Amer. Math. Soc. 117 (1993), 1115-1124. 9. M. Neamtu Convergence of subdivisions versus solvability of refinement equations, East J. Approx 5, 1999, 183-210. 10. V. PrOtasov, A complete solution characterizing smooth refinable functions, SIAM J. Math. Anal. 31 (1999), 1332-1350. 11. V. Protasov, The stability of subdivision operator at its fixed point, SIAM J. Math. Anal. 33 (2001), 448-460. 12. L. Villemoes, Wavelet analysis of refinement equations, SIAM J. Math. Anal. 25 (1994), 1433-1460. 13. Y. Wang, Two-scale dilation equations and the cascade algorithm. Random Comput. Dynamic 3 (1995), 289-307. 14. D.-X. Zhou, Stability of refinable functions, multiresolution analysis, and Haar bases. SIAM J. Math. Anal. 27 (1996), 891-904. 401 Accurate approximation of functions with discontinuities, using low order Fourier coefficients R. K. Wright Department of Mathematics and Statistics, UVM, Burlington, VT, 05445 USA. wrightSemba.uvm.edu Abstract In previous work we introduced a method of using polynomial splines with appropriate discontinuities to approximate a piecewise smooth function / with jump discontinuities of / and /'. The information used is location of discontinuities, and low order, possibly noisy Fourier coefficients. The number of discontinuities was limited to two at most, and the discontinuities needed to lie at meshpoints in a uniform mesh. We showed that the linear operator corresponding to the method is L2-bounded with a modest bound, and thus that the method is L2-robust in the presence of noise. In the present paper we develop a new method of analysis which enables us to determine operator bounds that are valid for arbitrarily many discontinuities. The new analysis allows discontinuities to be placed arbitrarily. Given a placement, an initially uniform spline mesh of width h must be used such that nearest meshpoints to discontinuities are at least 4h apart (discontinuities then replace these meshpoints); the number of available Fourier coefficients must be at least three times the number of mesh intervals in a period. The previous work was restricted to quadratic splines; the present work includes cubic splines. Much of the analysis uses exact computations with a computer algebra system. We give an example to illustrate the accuracy of the method using noisy Fourier coefficients. 1 Introduction We consider approximating a function / when the information consists of low order, possibly noisy Fourier coefficients, and knowledge that / is smooth except for jumps of / or /' at known locations but unknown magnitudes. We will work with a method, introduced in [10], which amounts to linear least squares fitting of the available coefficients with the coefficients of splines with appropriately placed discontinuities. Since we anticipate applications to ill-posed problems where boundedness of the solution operator is crucial, we develop a method for bounding the norm of this operator. The bounding method depends heavily on exact computations in certain spline spaces. These computations are fundamentally finite dimensional linear algebra with rational integer coefficients. Their goal is to develop upper bounds for the norms of certain projector operators whose norms are naturally expressed in terms of generalized eigenvalues, and to prove by exact computation that the bounds are correct. A computer algebra system is used for the computations. The programming is detailed in [9]. 402 Accurate approximation, discontinuities In [10] we obtained bounds under much more restrictive conditions than in the present paper. In [10] the splines were quadratic only, while here results also are given for cubic splines. The analysis in [10] required all knots of the approximating splines to be uniformly spaced, and since the discontinuities are at the knots, the location of discontinuities was limited. Further, in [10] the estimation process is linear in the total number of discontinuities, and produces results unacceptably large for cases with more than one discontinuity of / and two of /'. Others ([2, 3, 4, 5]) have addressed questions of accurate approximations to functions with discontinuities given Fourier coefficients as information. In [8] we give examples which show that those methods can substantially magnify noise in the coefficients; our main concern here is to prove robustness of our method. We illustrate with an example in Section 5. 2 General linear space-theoretic results Let V be a real Hilbert space with inner product ( , ). We will denote the norm associated with { , ) by || ||. Let P and Q be closed subspaces of V; suppose P is the orthogonal projector on V. Here, as in [10], we deal with the approximation /* obtained as the solution to the constrained least squares problem min||Pr-P/||,r eQ. Assuming that P is invertible as a mapping on Q, we denote by P"*" the mapping from PiQ) to Q which inverts P. It is not hard to verify that /* = P^RPf where R is the orthogonal projector on P{Q). Let A denote the operator that takes / to /*. Theorem 2.1 Let C he a mapping from V to Q. Let e be T-periodic and in ^2(0,T). Then ||A(P/ + e)-/||<(||P+|| + l)||C/-/|| + ||P+||||e||. Proof: A{Pf + e) = Af + Ae. \\Af - f\\ < \\Af - Cf\\ + \\Cf - f\\ = ||A(/ - C/)|| + IIC/-/II < (11^11 +1)11/-C/jj. \\A\\ = \\P+RP\\ < ||P+II because P and iZ are orthogonal projections. D A main objective of the following work will be to bound ||P"'"||. This will be done by establishing upper bounds for ||/ — P|| as a mapping on Q. Prom these, bounds can easily be derived for ||P'''||. Theorem 2.2 Let r/ < 1 exist such that \\{I - P)q\\ < r]\\q\\, for all q € Q. Then P is injective as a mapping on Q and for all h £ P{Q), P"*", the inverse of the restriction of P to Q, satisfies \\P+hf < ^ l-r?2 403 404 R. K. Wright We will obtain bounds for ||/ - P|| by considering the projector perpendicular to a spline space Q which is more tractable than PV, and on which / - F is small. In the next section, Q is the approximating spline space, S a subspace of maximally continuous sphnes, and ^ is a space of maximally continuous splines whose knots are in a mesh refining the mesh for the members of <S. S and Q have orthogonal projectors S and G, respectively. The following estimates ||/ - P|| in terms of ||/ - G\\. Theorem 2.3 Suppose ||(7-P)g|| < r?oil5|| for allg £ Q. Suppose \\{I-G)q\\ < T]i\\q\\ for all QGQ. Then \\{I-P)q\\<{T]o+ r]i)M\ for all for all q€Q. \ Proof: For ^ e Q, 11(7 - P)9|| < 11(1 - P)Gg|| + 11(7 - P)(/- G)g||. 11(7-P)G5||<r7o||G9||<77o|MI, and 11(7-P)(7-G)g||< 11(7-G)9||<mlMI- ,: ° Theorem 2.4 enables us to bound ||7-G|| on Q by instead bounding projectors orthogonal to small subspaces of G, restricted to small subspaces of Q. Theorem 2.4 Let Q and S be closed subspaces ofV with S CQr\Q. Let Vi, V2, ■.., Vr be nonzero mutually orthogonal subspaces ofV. Let Qt Q QnVi, I <i<r be nonzero closed subspaces such that Q = <S + Qi + Q2 H \- Qr- Let Qi cgn Vi, Hi C 5-*- n Vi, l<i <r be nonzero closed subspaces with orthogonal projectors Gi,Hi. Let v be a constant such that \\{I - Gi)qi\\^ < uWHiqiW^ for all qi £ Qi,l<i<r. Then \\{I - G)q\\'^ < ^Ikll^ for all q e Q. Proof: q e Q can be written q = s + v where s G S and v = gi + 52 H h Qr, 9; e Qi, l<i<r. 11(7 - G)q\\ = ||(7 - G)v\\ since S C g. Let F = Gi + G2 + ■ ■ ■ + Gr- Since C^i + ^2+ ••• +a. C a, 11(7 - GHP < 11(7 - FH|2 = ELi lia - G09,.||', the latter : equality because of orthogonality of the Qi. \\q\\^ > ||(7 - S)v\\'^ > WYlUi^^M? = Z)I=i ll-f^iftll^! since Yd^i'^i ^ '5"^) and the Hi are orthogonal. If all Htqi = 0 the hypothesis implies all (7 - Gi)g,: = 0. The above then implies (7 - G)q — 0, and the conclusion is true. We proceed assuming Hiqi ^ 0 for some i and let Af be the set of all those I. Then \KI-G)q\\' Ei^j^\\{I~Gi)qif IkIP - EieAfWHiQi]? ■ An elementary argument shows the quotient of sums is < 1/ since for each « G TV, . \\iI-Gi)qi\\y\\Hiqi\f<u. 3 □ Bounds for restricted projectors Below, we specialize the spaces of the last section, and get our main results. Let T > 0 be a fixed period. We take V to be the space of real-valued T-periodic functions which belong 7^2(7) for some, and thus every, period interval 7. On V and its subspaces we define the inner product (/, g) = Jj f{t)g{t) dt, I a period interval. The other realizations are defined in the statements and proofs of the following results. Lemma 3.1 sets up an application of Theorem 2.4; Theorem 3.2 uses this, together with Theorem 2.2, to get our main result. Lemma 3.1 Let X be a finite set of points in [0,T). Let N > A be an integer. Let K = {iT/N, 0 < i < N}: for each x e X, let k^- be a member of K closest to x where Accurate approximation, discontinuities 0 is identified with T. Assume N large enough that between any two distinct kx are at least three other members of K. Let Kx result from substituting in K each x & X for its kx- Form = 3,4 let Q be the space ofm-th order T-periodic polynomial splines with Kx as knots and with continuity C™"^ at all knots except the x G X, where no continuity is required. Let Q be the space ofm-th order periodic splines with knots in [0, T) at the points {ir/(3JV),0 <i< SN}, and let G be the orthogonal projector on Q. Then I — G restricted to Q satisfies \\I - GWl < .69 ifm = S, and \\I - GUi <.9ifm — 4. Proof: Let S be the subspace of Q consisting of those splines which are C°° at the kx- Clearly S C Q. Let h = T/N. Fix x = Xi £ X = {xi, X2,---, Xr} and let yo = Xi, Va — kxi — ah, a = —2,-1,1,2. Take Vj to be the subspace of V consisting of those functions with support in [t/_2,y2] and its T-translates. For m = 3 let ji and J2 be B-splines with knots y-i,yo,yo,yo and yo,yo,yo,yi; let J3 be the difference of the B-splines with knots y-2,y-i,yo, Vi and j/_i,j/o,2/1,2/2 (see [1] for explanation of multiplicity versus degree of continuity). For m = 4 let ji and J2 be B-splines with knots 2/-i,J/o,2/o,2/o,2/o and 2/o,yo,2/o,2/o,2/i; let j3 be the difference of the B-splines with knots y_2,y-i,yo,yo,2/i and y-i,yo,yo,yi,y2', and let J4 be the B-spline with knots y-2,y-i,yo, 2/1,2/2. Since 2/2 — 2/-2 < T we may identify the j^ with their T-periodic extensions. Let Qi be the space of splines whose generic member is qi = YJ^=I '^aja for constants Ca- For each i, nonzero members of Qi have continuity from C""~^ through full discontinuity at Xi, while members of iS are C°° at xi- It follows that Sn{Qi-\-Q2-\ f-Qr) = 0 and Q = S-\-Qi + ----\-Qr. Let Qi be the subspace of G with basis the C"""^ periodic B-splines whose knots in the period containing [y_2,2/2] are length m+ 1 sublists of consecutive knots from the list {ah/3 -|- fc^, — 6 < a < 6). Let Hi be the space of those m-th order periodic splines which in [—T/2 -\- kx,T/2 + kx] have support in [j/_2,2/2], which have knots at the yi, i ^ 0 and at x, are C™"^ at y_i and j/i, which may be fully discontinuous at 2/-2,2/2, and x, and which are orthogonal to all members of <S. ||(J —Gj)gj||^/||iJjgj|p is a ratio of quadratic forms in the Ca- An upper bound i^ for it can be obtained as an upper bound for the eigenvalues of the pencil A — XB where aap = ((/ — Gi)ja, [I — Gi)jf3), ba0 = {Hija,HiJ0),l<a,P,<m. In [9] explicit bases for the spaces Qi and Hi are calculated as m-th order splines. From their definitions ([1]), B-spUnes are rational functions of the knots, and thus are also inner products of B-splines. The null-basis and orthogonal projection calculations in [9] use standard methods which involve only rational operations. Thus the (7 — Gi)ja and Hija and then the aa/s and fta/j are rational functions of the knots of qi, so long as X remains in [kx,kx + h/3]. When x crosses into [kx + h/Z,kx -\- h/2], thus crossing knots for splines in Gi, the rational functions change, so in general the matrix entries are piecewise rational functions of a;. Let 1/ be a conjectured upper bound for the maximum eigenvalue Xmax of A — \B (in [9] a floating point approximation to Xmax is plotted as a function of x; v is determined from inspecting this plot). For computational convenience in [9] we represent x as 2e/i/3-tkx, 0 < e < 1/2 for x<kx^h/3, and as {l + e)h/3-¥kx, 0 < e < 1/2 for kx + h/2, <x< 405 406 R. K. Wright kx + h/2. For further convenience we take k^ = 0, clearly losing no generality. We have represented only x>kx-, but because of symmetry, x < k-c produces the same bounds. Since /i is a linear factor in all knots in the calculation, we see that aa/) and ba/i can be written as h multiplying piecewise rational functions of e (with integer rational coefficients). The determinant oi A- uB is thus h"^ times a piecewise rational function of e. The MAXRAT algorithm ([9]) proves that its reciprocal is bounded as a function of e in the appropriate ranges, so the determinant itself is bounded away from 0. In [9], e is then set equal to 0 in ^ - TB, and the determinant of that matrix is then shown to have m sign changes as r decreases from v. Thus the conjectured value v bounds all eigenvalues oi A- \B for all values of x. The upper bounds thus obtained are i/ = .69 for m = 3 and i^ = .9 for m = 4. We emphasize that the B-splines, matrix entries, and determinants all are calculated exactly, using the Maple ([6, 7]) computer algebra system, so the bounding property of u is rigorously proven. Since the bounds we obtain apply to the spaces Qi and Bi associated with any one of the Xi, they satisfy the hypotheses of Theorem 2.4 which now provides our conclusions. □ Our main result now follows. Theorem 3.2 Let the hypotheses be those of Lemma 3.1. In addition, let P he the orthogonal projector onto the space of n-th order real-valued T-periodic trigonometric polynomials, where n > SN. Ifm = 3, we have ||P"*"||2 < 2.4, while if m = 4, we have ; l|-P+l|2<4.5. Proof: The space Q in Lemma 3.1 consists of periodic spHnes with uniformly spaced knots. Theorem 3.1 of [10] implies that ||/ - P\\2 < {a/{l + a)y^^ where ^2m a = 4£(l/(l + 2r))^ In [9] we use this formula to get upper bounds of .076 when m = 3 and .025 when m = 4 for ||/ - P\\2- Taking these bounds as rjo in Theorem 2.3 and taking the bounds from Lemma 3.1 as r]i in Theorem 2.3, we obtain from that theorem bounds for ||/ - P||2 of .907 for m = 3 and .974 for m = 4. Theorem 2.2 now applies to produce the present results. D Above, we required n > 3N; under this condition we can get our simplest and most comprehensive results. Since we contemplate applying our results where the number n of useful coefficients may be limited, we have tried to get versions of Theorem 3.2 where n is smaller compared with N. We have no useful versions for n < SN and m = 4 (cubic splines). The following result for quadratic splines may be useful. To formulate it, let ei = max{|a; - kx\N/T}. In the previous results, the separation of the values x from their nearest uniform mesh points kx was unrestricted, which corresponds to ci = 1/2. Here, we can get results for quadratic splines, and n > 2N, provided the x are more restricted; our methods of analysis "blow up" for n > 2A'' as ei approaches a number slightly larger than .25. Accurate approximation, discontinuities 407 Theorem 3.3 Letm = 3 (quadratic splines); letn > 2N. Otherwise, let the hypotheses be those of Theorem 3.2. Corresponding to the list 0, .1, .2, .25 for values of ei, we have the list of values 1.7,2.1,3.9,16 as bounds for ||P"^||. Proof: For each of the cases for e, an argument similar to the proof of Lemma 3.1 apphes to produce a bound rji for ||/ — G||2 where Q now is defined using the uniform knot spacing 1/{2N) rather than 1/{3N). The only difference in the argument is that here, a discontinuity location x always stays in the interval [kx, kx + eih] where h = T/N, so the matrix entries and determinants can be treated as functions of e in [0,ei]. Each bound ?7i now is used just as in the proof of Theorem 3.2, to get the present bounds for r+ii2. 4 . ° Uniform norm bounds Using representers of point evaluation, as in [8], we can get uniform norm bounds for P+, and thus for A. The arguments are similar to those in [8]. The main difference is that there the mesh is uniform and the order m is 3. The constructions of representers extend fairly easily to the present case: here the norms of representers are functions both of the evaluation point and the location of the discontinuity nearest to the evaluation point. One can show that for each point f e [0, T), a spline rj exists in a space U containing Q, such that {rt, q) = q{t) for each q € Q, and such that ||rt||2 < k/Vh where A; = 5, m = 3 and fc = 7,m = 4;/i = T/N as before. The computations for the construction and bound calculations are in [9]. Noting that VT/y/h = y/N, we have ||A/||oo < max|K||2M/||2 < (fc/v^)||P+||2VT||/||oo <kVN\\P+\\2 When N < 100 and the hypotheses are those of Lemma 3.1, this gives ||A/||oo < 120||/||oo for m - 3, and ||A/||oo < 315II/IU for m = 4. 5 Example FIG. 1. f - Af, no noise f - Af, 1 % noise .20 .10 -.10 -.20 •.30t -.40 -.50 ^ exact f 408 R. K. Wright We illustrate the method using an example where the function / is 27r-periodic and on [0,27r) consists of the function e~^/^ with a piecewise quadratic added, so as to produce discontinuities at 0, .5,1.5,2.5, and 4. / is a modification of an example in [2]; for convenience we have shifted that example left by 1 unit, and we have added the exponential term because our method can represent a piecewise quadratic exactly in the absence of noise. Exact (up to 17-decimal digit floating point error) Fourier coefficients are derived from /by exact integration using the Maple ([6, 7]) system. Noisy approximate coefficients are also derived by sampling / at 1024 equidistant values in [0,27r], adding uniformly distributed pseudo-random noise to the samples, and taking the discrete Fourier transform of the samples. In effect, we work with / + e where e is a perturbing function. The level of the noise is set so that the discrete L2-norm of the noise vector is 1% of the discrete i2-norm of the vector of samples oi f. N = 45 and thus n = 135 are the smallest values of n and N for which the hypotheses of the previous section are satisfied. Using these values, we proceed with m = 4 (cubic splines) for each of these cases for Fourier coefficients. Plots of / and of the error for the two cases appear in the figure. The ratio ||/-j4(/-|-e)||2/||/||2 is about .005 for the case of 1% noise. In [9] we develop a probabilistic estimate of .0037 for the ratio of ||e||2/||/||2- This estimate indicates an L2-norm noise magnification of about 1.35-fold, compared with the upper bound of 4.5 given in Theorem 3.2. The uniform error, for noise-free coefficients, is about 10"^; computational experiments show this is dominated by truncation error in approximating the exponential term. In [9] we do the corresponding calculations for m = 3, and find similar results for 1% noise, with larger, but still small, error for noise-free coefficients. In [9], we implement Eckhoff's method as described in [3], used on the above data. For noiseless data, the results are comparable to those reported by Eckhoff for similar examples. The uniform norm error seems to be about .06, with errors at jumps somewhat smaller. For 1% noise, the results of Eckhoff's method are about 750-fold in error. Bibliography 1. C. de Boor, Practical guide to splines, Springer Verlag, New York (1978). 2. K. Eckhoff, Accurate and efficient reconstruction of discontinuous functions from, truncated series expansions, Math. Comp. 61 (1993), 745-763. 3. K. Eckhoff, Accurate reconstructions of functions of finite regularity from truncated Fourier series expansions, Math. Comp. 64 (1995), 671-690. 4. D. Gottlieb and C.-W. Shu, On the Gibbs phenomenon and its resolution, SIAM Review 39 (1997), 644-667. 5. D. Gottfieb, C.-W. Shu, A. Solomonoff and H. Vandeven, On the Gibbs phenomenon I: Recovering exponential accuracy from the Fourier partial sum of a nonperiodic ana/yiic/MnciJon, J. Comput. Appl. Math. 43 (1992), 81-98. 6. K. M. Heal, M.L. Hansen, and K.M. Rickard, Maple V Learning Guide, SpringerVerlag New York (1998). ' 7. M. B. Monagan, K. 0. Geddes, K. M. Heal, G. Labahn and S. M. Vorkoetter, Maple V Programming Guide, Springer Verlag, New York (1998). 8. R. K. Wright, A robust method for accurately representing nonperiodic functions Accurate approximation, discontinuities given Fourier coefficient information, J. Comput. Appl. Math. 140, (2002) 837848. 9. R. K. Wright Computations and examples for spline approximation of discontinuous functions using low order Fourier coefficients, UVM Math/Stat Department Technical Report 2001.2 10. R. K. Wright, Spline fitting discontinuous functions given just a few Fourier coejQ?dente. Numerical Algorithms 9 (1995), 157-169. 409 Chapter 7 General Approximation -- 411 Preceding Page Blank '"\ Remarks on delay approximations based on feedback Alessandro Beghi and Antonio Lepschy Dipartimento di Elettronica e Informatica, Universitd di Padova, Padova, Italy. {beghi,lepschy}@dei.unipd.it Wieslaw Krajewski Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland. krajewsk@ibspan.waw.pl Umberto Viaro Dipartimento di Ingegneria Elet., Mecc. e Gest., Universita di Udine, Italy. viato@uniud.it Abstract The response of a unity-feedback system with a delay element in the forward path exhibits a periodic component that can be approximated by truncating its harmonic expansion. Rational approximants of the transfer function e of such element can simply be obtained from this closed-loop approximation. A unifying approach to recent methods bafsed on this criterion [2, 3] is presented, which allows us to point out their respective features. The standard Pade technique and a heuristic method described in [5] are also considered. 1 Introduction and problem statement In modelling dynamic systems for control purposes, it is often necessary to account for time delays due, e.g., to transport phenomena or distributed-parameter components. The response of an ideal delay element (delayor) to an input u{t), identically equal to 0 for t < 0, is y{t) = u{t — T), T > 0, where T indicates the time delay. By denoting with U{s) the Laplace transform of u{t), the Laplace transform of y{t) is Y{s) = e~'^'^U{s). Therefore the transfer function of the delayor is the transcendental function e""^". The problem of approximating e~-^^ by means of a rational function has a long history (see, e.g., [1]) but is still important from both the computational and the conceptual point of view; a few recent contributions on the subject are quoted in [2]. In many practical applications the physical realizability and the stability of the approximant limit the choice of the approximant to proper rational functions with real coefficients and a Hurwitz denominator. These requirements are satisfied by Blaschke products, i.e., functions of the form: g(.)^ffii(^-°i) lL=l(,* "I" "ij 412 . ReN>0. (LI) Remarks on delay approximations based on feedback This has the desirable property that |B(ja;)| = |e~^"^'^| = 1, Vw, and arg[B(ja;)] is monotonically decreasing with w hke arg[e~J'^'^] = -Tw. On the other hand, the step response of a system with transfer function B{s) starts from +1 or -1, whereas the step response of an ideal delayor obviously starts from 0. The most widely adopted method to form a rational approximant of a delay element is based on the Pade technique which does not always guarantee stability (even if biproper Pade models are necessarily stable). Since such a technique leads to the retention of the first Maclaurin expansion coefHcients of e~^*, the resulting approximation is the best in the neighbourhood of a; = 0. In different frequency bands, other types of models may be preferred. In [3] a unity-feedback system whose forward path consists of a delayor is analysed. In the case of negative feedback, the unit step response is a piecewise constant function taking on the value 0 for 2kT < t < {2k + 1)T and the value 1 for (2fc + 1)T < t < (2fc + 2)r, fc > 0, which can be decomposed into a step of amplitude |, and a square wave of amplitude | starting from —^&tt = 0. In the case of positive feedback, similar considerations allow us to decompose the unit step response into a linear ramp of slope ^, a step of amplitude — |, and a saw-tooth wave that Unearly decreases from | to — | in every period from kT to (A; -|- 1)T. In both cases, the periodic component can easily be expressed as a series of harmonic terms (for t > 0). It is therefore natural to approximate the step response of the unity-feedback system by retaining the non-periodic component together with a suitable number of the first harmonics of the periodic component. A rational approximation Wa{s) of the transcendental transfer function W{s) of the above-mentioned feedback system is obtained by dividing the Laplace transform of the approximate step response by the Laplace transform j of the step input. The rational approximant Ga{s) of the delayor transfer function is then determined as where the minus sign applies to the case of negative feedback and the plus sign to that of positive feedback. It turns out [3] that Ga{s) is a stable biproper rational function having the form of a Blaschke product; precisely, negative feedback supplies even-order approximants and positive feedback produces odd-order approximants. Obviously, the same result could be achieved by referring to different inputs (even an impulse), but the choice of the unit step is particularly convenient. According to the terminology suggested in [4], the rationale of such a procedure consists in retaining the "input component" (and the "resonant component", if any) and in truncating the periodic "system component" of the response. In [2] a feedback structure is used as well, but another approximation criterion is adopted, which leads to different models depending on the chosen input. In particular, the family of inputs considered in [2] is {u{t) = t"^ ,m £N ,t > 0}, and the procedure exploits several properties of Bernoulli numbers and polynomials. In the following, the above approaches are presented in a unified form which allows us to point out their respective features and to derive the related approximants in an 413 414 Beghi et al. easier way. Finally, criteria are given to choose the approximation that is most suited to the application at hand, also taking into account the standard Pade approximation and a further approximation presented in [5]. 2 Derivation of the approximant For the sake of simplicity, we shall almost exclusively refer to the case of negative feedback; only a brief mention will be made of the case of positive feedback. 2.1 Negative feedback The transfer function W{s) of the negative feedback system with forward-path transfer function G{s) = e"^" is whose singularities (poles) are the roots of e^^ = -1, i.e. W{s) can also be interpreted as the Laplace transform of the sequence of positive and negative impulses forming the derivative of the step response described in the introduction. Therefore, it is the sum of a constant equal to | (corresponding to the step component in the just-mentioned step response) and a series of "harmonic" terms associated with the above poles: 1 °° fc=l •- Tk Tk S - jPk S + jPk where the bar denotes conjugate and, using the standard formula for the residues, rfe = lim (s - jPk)W{s) = -- . It follows that oo W{s) = l-^y (2.2) In order to compare the results in [2] and [3], let us consider a canonical input of the form Mt) = jfzj^,t>0, (2.3) whose Laplace transform is s' (In [3] only the case of i = 1 is considered, whereas the inputs used in [2] differ from (2.3) by a scaling factor which is irrelevant for the following considerations.) 415 Remarks on delay approximations based on feedback On the basis of (2.2) the Laplace transform of the (forced) response to (2.3), Yi{s) = -j W{s), can be rewritten as i-l n ; ^^z^ h -r A> ^2 + (2fc _ 1)2-I' h=0 where for i even, I3ki = (-1)5 Qffci = 0 . f (2fc-l)7r (2.4i) and for i odd. (i-i) an = (-1) = ,3fci = 0. T [(2fc-l)7r_ (2.4n) Therefore, W{s) can also be presented in the alternative form ^s2 + (2fc + l)2fi ^=0 (2.5) Each term of the series in (2.5) is given by the sum of a polynomial of degree i — 1 (quotient of the division of its numerator by its denominator) and a strictly proper rational function (whose numerator is the remainder of the division). Therefore, (2.5) becomes w{s) = E <^ns-+E E ^H'--'- +,. Ill % -- ^'-'^ which can be rewritten as w^(5) - E h'^ + fe=l E ^fci.M/ ^'^+E ■ ft=0 \ fc=l*^ + (2^-l)^f2 (2.7) By comparing (2.7) with (2.2), one finds that oo ^ cp + y ^dkifl (2.8) 2' oo C/j + E'^fci.'' = 0) ^>0i 0, Vfc,i, (2.9) fc=i Iki ^fci = —Tf; Vfe,i. The procedure suggested in [3] could alternatively be presented with reference to expression (2.7) where coefficients related to the specific input appear. Precisely, the approximant Wa(s) is obtained in this case by adding to the exact value | of the first 416 Beghi et al. sum (cf. (2.8) and (2.9)) the first K (harmonic) terms of the second summation 2 T^s2 + (2fc_i)2^' which is independent of the input Ui{t). The procedure suggested in [2] refers instead to expressions (2.5) or (2.6), and the approximation consists in truncating the summation over fc, where each addendum is formed by a polynomial and a strictly proper harmonic term. Therefore the resulting Wa(s) is which does depend on i and it is not proper because the part added to the harmonic terms does not reduce to the constant ^, as is instead the case in W{s). Nevertheless, the approximant Ga{s) = Wa{s)/{1 — Wa{s)) of e~^* turns out to be biproper. As concerns the computation of the above approximants, the suggested approach seems to be preferable to that adopted in [2] because (i) coefficients c/,,, which correspond to the first i Maclaurin expansion coefficients of W{s) = J^ -, can be easily be evaluated using the classic Pade procedure, and (ii) formulae (2.4?) and (2.4u) immediately supply coefficients ahi,Pki2.2 Positive feedback Considerations analogous with those of Section 2.1 lead to the following transfer function in the case of positive feedback so that Yi{s) = W{s)Ui{s) can be separated into a (harmonic) series associated with the imaginary conjugate poles of W{s) and a strictly proper fraction with denominator s'+^ Using the terminology in [4], the mentioned series corresponds to the "system component" of the forced response and the fraction corresponds to its "interaction component" because the poles of the latter are common to W{s) and Ui{s) (no "input component" is present in this case since Ui{s) does not exhibit poles different from those of W{s)). As shown in [3], the truncation of the series in (2.2) results in even-order biproper approximants Ga{s), whereas the truncation of the series in (2.11) results in odd-order biproper approximants Go(s). Instead, as shown in [2], truncating the series in (2.5) leads to odd-order approximants, whereas truncating the analogous series corresponding to positive feedback leads to even-order approximants. Remarks on delay approximations based on feedback 2.3 Stability and approximation error It has been proved [3] that the even-order rational approximations Ga (s) of e"^* obtained from (2.1), as well as the odd-order ones obtained by truncating (2.11), are stable. Instead, as explicitly stated in [2] for inputs t", m > 2 (i.e., using the previous notation, Ui{t) with i > 3) the "alternating sign of the Bernoulli numbers makes the approximation in general unstable [...]. Hence, from a practical point of view, any improvement with respect to the approximants obtained in [3] is to be found with p = 1", i.e., i = 2,3. The approximation accuracy can be evaluated by referring, e.g., to the "closed-loop error" E{s) ■= W{s) - Wais). From (2.1) we get E{s) = E^{s) :=-^ ^2 ^.^^r^'+Pl whereas from (2.10) we have «—1 oo E{s) = E2{s):=^ Y. dki,hs''+ E,{s) . h=Ok=K+l Since E{s) is a complex quantity, |£'2(s)| may well be smaller than |.Ei(s)| for certain values of s (or ju). 3 Alternative approximants As already pointed out, the procedure suggested in [2] leads to approximants that depend on the chosen canonical input. To improve the approximation within suitable frequency bands not centred at the origin, it is reasonable to resort to non-canonical inputs whose spectrum has larger amplitude there. A simple choice corresponds, e.g., to 1 U{s) s s^ ' 1 + 2^-^-f^ in which a;„ is at the centre of the band and ^ is suitably small. The choice of the form of the input (as well as the order of the canonical input) is somewhat arbitrary and is influenced, in practice, by empiric considerations. Therefore, it makes sense to compare the results of the above procedures with those obtained in [5] using a heuristic procedure based on the direct approximation of the phase Bode diagram of e^^'^'^ by means of a Blaschke product Bn{juj) of order n. For n odd, the first factor of S„(s) has the form ^ , •. i — TS Gi{s) = —^, 1 + TS : ^ r>0, 417 Beghi et al. 418 and the others have the form + UJ 1-2^,:Ulr, Gi{s) = — 1 + 2^,— + ^ 1 > ^i > 0 , W„i > 0 : (3.1) whereas for n even all factors have form (3.1). All the considered techniques produce unit-magnitude all-pass frequency responses so that the approximation they afford can be judged with reference to the phase deviation A{jw) from -Tu only. As w -^ oo, A{jw) -* oo in all cases. Therefore, reasonable criteria for choosing the method most suited to the specific application are: (i) the bandwidth Be where |A(iw)| is less than a specified value e, or (ii) the maximum AB of |A(jtj)| in a prescribed band B. By way of example, Fig. 1 shows A{ju)) vs w for the 4-th order all-pass approximants of e~^'^, (T = 1) obtained according to (2.1) with K = 2 (curve a), to the procedure suggested in [2] for u^it) = t^ (curve b), to the standard Fade procedure (curve c), and to the heuristic method in [5] (curve d). For instance, with reference to criterion (i) above, the Fade approximant is best for e very small, the method suggested in [2] is optimal for e ~ 10°, and the heuristic method and the method suggested in [3] are preferable for e > 45°. Analogous results are obtainable for approximants of different order. 1 ■ - , ■ b c d /^>'^- ~''"\> ^ :'^ \ 1 ' / / !'■ JM. : : / / ■ \ ll 4 - »/ ■/' f ' ^ ID FIG. - II , ,,ll , , . !' 12 1. Phase deviations A(ja;) for the considered 4-th order approximants. Conclusions The approximation procedure presented in [2] and [3] have been embedded in a unified frame which points out well their respective features and allows us to determine the Remarks on delay approximations based on feedback parameters of the approximants in an easier way. Criteria have been provided for choosing the approximation method that is most suited to the specific appHcation. Bibliography 1. O. Perron, Die Lehre von den Kettenbriichen. Stuttgart: Teubner, 1913. 3rd ed. 1957. In German. 2. C. Battle and A. Miralles, "On the approximation of delay elements by feedback," Automatica, vol. 36, pp. 659-664, 2000. 3. A. Beghi, A. Lepschy, and U. Viaro, "Approximating delay elements by feedback," IEEE Trans. Circ. Sys. I, vol. 44, pp. 824-828, 1997. 4. P. Dorato, A. Lepschy, and U. Viaro, "Some comments on steady-state and asymptotic responses," IEEE Trans. Education, vol. 37, pp. 264-268, 1994. 5. A. Beghi, A. Lepschy, and U. Viaro, "On the simplification of the mathematical model of a delay element," in E. Kuljanic, ed., Advanced Manufacturing Systems and Technology: Springer Verlag, 1996, pp. 617-624. 419 Point shifts in rational interpolation with optimized denominator Jean-Paul Berrut Departement de Mathematiques, Universite de Fribourg, Switzerland j ean-paul. berrut@uiiif r. ch Hans D. Mittelmann DepartmeM of Mathematics, Arizona State University, Tempe, USA mittelmann@asu.edu Abstract In previous work we have suggested obtaining rational interpolants of a function / by attaching optimally placed poles to its interpolating polynomials. For a large number of interpolation points these polynomials are well-known to be good approximants only if the nodes tend to cluster near the endpoints of the interval, as with Cebysev or Legendre points. In practice, however, one would prefer to have them closer to equidistant. This will in particular be the case when the difficult portion of / lies well within the interior of the interval, or when approximating derivatives of /, a.s in the solution of differential equations. To address this difficulty, we use here a conformal change of variable to shift the points from the Cebysev position toward a more equidistant distribution in a way that should maintain the exponential convergence when / is analytic. Numerical examples demonstrate the resulting improvement in the quality of the approximation. 1 Introduction We are concerned here with rational approximation of a continuous function / on an interval [a, b], which we may take as [-1,1] =: /, after a linear change of variable when necessary. We further assume that the approximant r should interpolate / between a finite number, say A^ + 1, of distinct points (nodes) XQ, xi,..., a;Ar in /. In a similar way as in [5], r will be constructed by attaching a certain number of poles to an interpolating polynomial. In some applications, such as the numerical solution of two-point boundary value problems (see, e.g., [6]), one may choose the points more or less at will; in that case, one will place them so as to reach the best compromise between two often conflicting goals: points good for interpolation, on one side, and points favourable for the condition of the problem to be solved, on the other side. In [5], we have considered equidistant and Cebysev points, the first for their regularity, the second for the condition of the interpolation and for the fast convergence of the interpolant for very smooth functions. For the solution of two-point boundary problems in [6] we have merely used Cebysev points. 420 Point shifts in rational interpolation with optimized denominator 421 There is in general no reason besides the problem condition for accumulating the nodes toward the boundary, as with Cebysev or Legendre points. Moreover, one of the reasons for using rational instead of polynomial interpolation is its better suitability for approximating functions with large slopes. Here too, shifting the points away from the center may not be appropriate. Another odd consequence of accumulating interpolation points toward the extremities is the consequent ill-conditioning of the derivatives of the interpolating polynomials [7,1]. This worsens the stability properties of time-stepping in the solution of time evolution problems with the method of hues [13] as well as the convergence of iterative methods for solving discretized stationary problems [3]. To address these difSculties, we will take advantage here of the fact that the fast convergence of the interpolant can be maintained while shifting the points with a conformal map g (independent of A'') toward an equidistant position. This, however, requires an important change to the method in [5], because this point shift ruins the exponential convergence of the Cebysev interpolating polynomial. We therefore use here as the starting interpolant the polynomial interpolating f{g~^) in the domain of the inverse 5"-^ of the conformal map employed for the point shift, and attach poles to this polynomial. Section 2 reviews the formulae and advantages of shifting Cebysev points conformally toward the center of the interval when interpolating functions, and Section 3 briefly recalls the method of optimally attaching poles to the interpolating polynomial introduced in our earlier work. In Section 4 we describe how to take advantage of the better conditioning of derivatives induced by the conformal point shift; the corresponding practical improvements are finally documented with numerical examples. 2 Rational interpolation with a variable change for point shifts Let Vm and TZm,n, respectively, denote the linear space of all polynomials of degree at most m and the set of all rational functions with numerator in Vm and denominator in Vn, furthermore, denote by fk the interpolated values f{xk), k = 0(l)iV, of /. Then, the unique polynomial p &VN that interpolates / at the Xfc's, N Pix) = '^fkLk{x), Lkix):=Y[{x-Xi) /Yl{xk-Xi), can be written in its barycentric form [9] P(-) = E-^^A/E-^^' (2-1) where the so-called weight Wk corresponding to the point Xk is given by / ^ I i=0, ijtk Despite its appearance, (2.1) determines a polynomial of degree at most N: the Wk are precisely the numbers which guarantee this [4]. By choosing other Wk's, a rational 422 , . :; Jean-Paul Berrut and Hans D. Mittelmann interpolant is constructed. The barycentric formula has several advantages over other representations of the interpolating polynomial ([4] p. 357). One of them is the fact that the weights appear in both the numerator and the denominator, so that they can be divided by any common factor. For example, simphfied weights for Cebysev points of the first kind x)^' :- coscpk, where (j)k := ^^'^ and k = 0,...,N, are given by w[.^' = {-l)''sin<pk ([9] p. 249), while for the Cebysev points of the second kind x^^^ := cos fc^ - which will be used here - one simply has Salzer's formula ([9] p. 252) (21 / ,sfcs- wi^ = i-l)%, r f 1/2, -^^^ = { i: k = OoTk = N, , otherwise. These points are, together with Legendre's, the most used nodes for global polynomial interpolation and large N. They achieve exponential convergence of p toward / if the latter is analytic in an ellipse Ep with foci at ±1 and sum of its axes equal to 2p, p> 1. However, this fast convergence comes at the cost of a concentration of the nodes in the vicinity of the extremities of /. As mentioned above, this accumulation may have drawbacks, such as poor spreading of the information about / over the interval and ill-conditioning of the derivatives near the endpoints. With a suitable choice of the interpolant, one may conformally shift the nodes toward an equidistant position (though not all the way) without losing the exponential convergence. For that purpose, one considers, beside the a:-space in which / is to be approximated, another space, denoted by y, say, and the iV + 1 Cebysev points of the second kind (2) in the interval J := [-1,1] in this j/-space. Let g be a conformal map from a domain Vi containing J (in the j/-space) to a domain 1)2 containing / (in the a;-space); moreover, suppose that / is a function V2>-^ C such that the composition fog : 2?ii-> C is analytic in an ellipse Ep, as defined above. With this map we may define new interpolation points on /, Xk = givk), as well as the conformal transplantation F{y) := f{x) [10] of / into the y-space. Then, with the polynomial interpolating F{y) at the yk N N AN{y):=Y,Fiyk)Lk{y) = J^f{xk)Lk{g-Hx))=:aM{x), fc=0 (2.2) fc=o one has \aN{x)~f{x)\ = 0{p-''), x€[-l,l]. Rational interpolation with all poles prescribed is very simple in the barycentric setting [5]: the P poles zi are attached to (2.1) by replacing Wk with p bk = Wkdk, dk ■ = J\ixk - Zi). Point shifts in rational interpolation with optimized denominator If A'' > P this results in a rational interpolant in (when such an interpolant exists, see [5]). TZN^P with poles at Zi, i = 1,..., P Remark 2.1 Exponential convergence of interpolation at the shifted points is also attained with the rational function given by (2.1) with Wk = w^^' [2]. However, this is in general a rational function in 'RN,V, V > N - P: there is not enough defect in the denominator degree for the weights w^ 'dk to warrant the presence of the P poles Zi. We then use a^ as the starting interpolant to which we attach the poles Vi in the y-space. This yields p p N WkY[{yk-Vi) uM. S R[y) ■- ^~y-y^ _ N Wk ^-^ ^" = S ^~l-\x)-9-\xk)—_ YliVk-Vi) T-J^ fc=0 ^ WkY[{9''\xk) -g~'^{zi)) V-Vk II — 111. ^' _ ^: r{x). j^ WkY[{g \xk)-g\zi)) T ^—1 ^ g \x)-g \xk) If a rational interpolant with these poles exists, it is given in the y-space by R, and r is a rational function in the argument g^^{x). Its poles are at Zi = g{vi). 3 Construction of the optimal interpolant Our method consists in optimizing the position of the Vi's so as to minimize ||i2-i^lioo = ||r-/||co, as described in §3 of [5]. Optimal Uj's always exist, but these are not unique in general. Whether the optimal R is unique is an open question; however, for every optimized pole Vi an indicator may be calculated which, if nonzero, guarantees that Vi is indeed a pole of P. In the practical computations documented in §5 the optimization of the Vi's was performed using the same two algorithms as in [5]: for small A'' we used a discrete differential correction algorithm according to [11], while for larger N the simulated annealing method of [8] was applied. Both methods will in principle locate a desired global maximum. The first method achieves it in a systematic and guaranteed way evaluating the error not continuously but on a fine grid; the simulated annealing method cannot be guaranteed to find the global extremum but, when used for an extensive search, will produce a reasonable approximation of it. As mentioned in [5], our way of attaching poles to the interpolating polynomial has a very nice property: the approximation error can only decrease or at worst stay constant with a growing number of poles, this in sharp contrast with classical rational interpolation; when a new unknown, say Vj, is added to the set of variables, {vi,... ,Wj_i}, the optimal values of the latter are a feasible vector for the higher dimensional optimization. Let us conclude this section with a comment on the use of the nomenclature "attaching the poles". In classical rational interpolation, the poles of the interpolant are 423 Jean-Paul Berrut and Hans D. Mittelmann 424 determined by the data. There too, however, one sometimes wishes to prescribe the location of the poles (with corresponding decrease of the number of degrees of freedom): many authors then speak of "assigning", or "prescribing" the poles. In that sense one cannot "assign" poles to a polynomial, which obviously cannot have poles. We thus start with the interpolating polynomial and its poles at infinity and make it a rational interpolant by bringing the poles into an optimal position in C. We call this procedure "attaching poles", to distinguish it from the process of forcing a rational function to have a pole at a particular place. 4 Derivatives of the optimal interpolant with shifted points As mentioned in §1, one of the reasons for shifting the points from their Cebysev position toward the interior of the interval is the improvement of the condition of the derivatives resulting from such a shift. Besides r, we will evaluate also r' and r" as approximants of /', resp. /", and estimate \\r - /'||oo and \\r - /"||ooSchneider and Werner [14] have noticed that every rational interpolant R € 'JIN,N, written in its barycentric form can easily be differentiated. The formulae for the first two derivatives read ' N R'iv) = { fe=0 ' I N y-Vk k=0 y-Vk N -{ Y. UkR[yi,yk])/ui, y = yi fc=0 kjti and N Uk R"iy) = { „r _ IN _; 1 /V^ "A- 2V 2/, 2/.] // E-^^' 2/^ 2/'' ^ = 0(1)^' .^—' ^i^iib, II ^—' II — 111. y-yk y— - 1h. yk fc=0 fc=0 N -2( J2 ukR[yi,yi,yk])/ui, y = yi fe=0 fc/i with R[z,z,yk] = :5Mr|Md,The chain rule then yields, for r{x) = R{g-Hx)), r'ix) = R'(y).[g-\x)]' = ^, r"{x) ^ R"iy)-r^R'bj){9'iy)] WivW (4.1) Specifically, in our calculations we have used the map suggested by Kosloff and Tal-Ezer , . arcsm(aw) giy) - arcsm a 0 < a < 1. Point shifts in rational interpolation with optimized denominator a P=0 P=2 P=4 P=6 P=8 0.0 0.5 0.75 0.9 0.95 0.96 6.37e - 5 3.11e-5 8.06e - 6 1.12e - 6 2.78e - 7 1.85e - 7 1.42e - 6 6.69e - 7 1.60e - 7 1.97e - 8 4.47e - 9 2.93e - 9 5.83e - 8 2.48e - 8 5.50e - 9 5.90e - 10 1.29e - 10 8.27e-ll 9.38e - 9 4.21e-9 9.47e - 10 3.94e - 11 1.36e - 11 4.20e - 12 1.30e - 9 4.23e - 10 1.27e-10 2.05e - 11 3.82e - 12 3.88e - 12 1. Errors when approximating / with increasing P and a in Example 1 TAB. In the hmiting cases, a ^ 0 keeps the points at their Cebysev position, whereas a -^ 1 renders them equidistant. The derivatives of 5 are given by 9'{y) = arcsina^l-(Q,j/)2' 9"(y) = arcsin a {ayf) so that in (4.1) 9"iy) [9'{yW 5 ■ (arcsin^ a)y. Numerical evidence We now report on practical computations, performed on two examples, which demonstrate the efficiency of point shifts for improving the rational interpolants with optimized denommators. These examples share the property that the difficult part of / lies in the center of /, so that the shift of the points toward a more equidistant position naturally improves the quality of the information provided to the interpolation method a P-0 P=2 P=4 P=6 P=8 0.0 0.5 0.75 0.9 0.95 0.96 5.27e-3 2.67e - 3 7.47e - 4 1.14e-4 2.97e - 5 2.01e-5 1.26e - 3 5.87e-4 1.49e - 5 2.01e - 6 4.99e - 7 3.24e - 7 4.85e - 6 2.33e-6 5.16e-7 6.56e - 8 1.48e - 8 9.52e - 9 8.69e - 7 4.03e - 7 9.44e - 8 4.28e - 9 1.59e - 9 4.80e - 10 1.40e - 7 4.63e-8 1.30e-8 2.16e-9 4.52e - 10 4.70e - 10 TAB. 2. Errors when approximating /' with increasing P and a in Example 1. The sup-norm || || has thereby been estimated by considering the 1000 equally spaced pomts:r, = -| + f-if, ^ = 1(1)1000, on the interval [-5/4,5/4] and computing ttie maximal absolute value of the error at those Xi lying in [-1,1]. Example 5.1 We have first revisited Example 3 of [5], which displays in the center of 425 Jean-Paul Berrut and Hans D. Mittelmann 426 I a slope increasing with a positive parameter, here denoted by e erf((5.T) 6 — \A5e, f{x) = cos7rx + erf(5) where erf denotes the error function (see [5] for a graph). In Table 1 we give the results obtained with e = 500 and iV = 81, increasing numbers P of poles and increasing a. Tables 2 and 3 display the same information for the approximation of /' and /" with r' and r" as given by the formulae (4.1). The combination of extra poles and a point shift brings about 7 digits of accuracy, where the PO^t «hift alone makes only for 2^3. The improvement in the derivatives is especially remarkable the error in the second derivative decreases from the useless value of 9.26 to about 10 . a P=0 P=2 0.0 0.5 0.75 0.9 0.95 0.96 9.26 4.26 9.50e - 1 9.30e - 2 1.59e - 2 9.18e - 3 4.05e - 2 2.07e - 2 5.48e - 3 6.49e - 4 1.23e - 4 7.36e - 5 TAB. P=4 4.82e - 4 2.18e-4 6.25e - 5 8.86e - 6 1.88e - 6 1.29e-6 1 P=6 P=8 7.85e - 5 3.75e - 5 9.53e - 6 4.93e-7 1.75e - 7 6.00e-8 1.46e - 5 4.91e - 6 1.26e-6 2.34e-7 5.31e-8 | 5.57e -» | 3. Errors when approximating /" with increasing P and a in Example 1. Example 5.2 Example 3 in [5] has demonstrated that the attachment of poles may be very effective in improving the approximation of oscillatory functions. Here we change the function to 2 /i(a;) = e""'' sin6.T, „ , „ a > 0, 6 > 0, so that the most oscillatory part lies in the center of the interval. Results with a = 5, 6 = 25, AT = 31, P = 0 and P = 2 are given in Table 4. In contrast with the preceding example, here the point shift brings much more improvement than the attachment of poles, about 6^7 digits, an especially heartemng fact for the derivatives, to which the interpolants without shift are useless approximants. Acknowledgement: The authors wish to thank Peter Graves^Morris for his comments which have enhanced the present text. Bibliography 1. R. Baltensperger and J.-P. Berrut, The errors in calculating the pseudospectral differentiation matrices for Cebysev-Gauss-Lobatto points, Comvut. Math. Apphc. 37 (1999), 41-48. Errata: 38 (1999), 119. 2. R. Baltensperger, J.-P. Berrut, and B.Noel, Exponential convergence of a hn^^^^^^^^^^ tional interpolant between transformed Chebyshev points. Math. Comp. 68 (1999), 1109-1120. Point shifts in rational interpolation with optimized denominator a 0.0 0.5 0.75 0.9 0.92 0.94 0.96 TAB. h' h P=0 P=2 P=0 4.12e-2 1.66e - 2 1.97e - 3 1.91e-5 4.57e - 6 6.56e - 7 3.03e-8 2.49e - 3 8.68e - 4 7.95e - 5 4.20e - 7 7.78e - 8 7.18e-9 2.39e - 9 2.03 8.90e - 1 1.17e-l 1.09e - 3 2.48e - 4 3.26e - 5 1.81e-6 427 h" P=2 1.36e 6.08e 7.73e 4.56e 8.20e 5.69e 5.49e - 1 2 3 5 6 7 7 P=0 P=2 1.43e + 3 5.63e + 2 5.98e +1 3.97e - 1 8.24e - 2 9.56e - 3 4.71e-4 9.51e + l 3.84e +1 3.98 1.68e-2 2.75e - 3 1.66e - 4 1.62e - 4 4. Change in the errors induced by the introduction of two poles in Example 2. 3. J.-P. Berrut and R. Baltensperger, The linear rational collocation method for boundary value problems, J5/T 41 (2001), 868-879. 4. J.-P. Berrut and H. D. Mittelmann, Matrices for the direct determination of the barycentric weights of rational interpolation, J. Comput. Appl. Math. 78 (1997), 355-370. 5. J.-P. Berrut and H. D. Mittelmann, Rational interpolation through the optimal attachment of poles to the interpolating polynomial. Numerical Algorithms 23 (2000), 315-328. 6. J.-P. Berrut and H. D. Mittelmann, The linear rational collocation method with iteratively optimized poles for two-point boundary value problems, SIAM J. Scient. Comput. 23 (2001), 961-975. 7. K. S. Breuer and R. M. Everson, On the errors incurred calculating derivatives using Chebyshev polynomials, J. Comput. Phys. 99 (1992), 56-67. 8. A. Corana, M. Marchesi, C. Martini, and S. Ridella, Minimizing multimodal functions of continuous variables with the "Simulated Annealing" algorithm, ACM Trans. Math. Software 13 (1987) 262-280. 9. P. Henrici, Essentials of Numerical Analysis, Wiley, New-York, 1982. 10. P. Henrici, Applied and Computational Complex Analysis Vol. 3, Wiley, New York, 1986. 11. E. H. Kaufman Jr, D. J. Leeming, and G. D. Taylor, Uniform rational approximation by differential correction and Remes-differential correction. Int. J. Numer. Meth. Engin. 17 (1981), 1273-1278. 12. D. Kosloff and H. Tal-Ezer, A modified Chebyshev pseudospectral method with an ^(A^"^) time step restriction, J. Comput. Phys. 104 (1993), 457-469. 13. S. C. Reddy and L. N. Trefethen, Lax-stability of fully discrete spectral methods via stability regions and pseudo-eigenvalues, Comput. Methods Appl. Mech. Engrg. 80 (1990), 147-164. 14. C. Schneider and W. Werner, Some new aspects of rational interpolation, Mai/i. Comp. 47 (1986) 285-299. An application of a mathematical blood flow model Michael BreuB, Andreas Meister Department of Mathematics, Universitxj of Hamburg, Germany. breuss@math.uni-hamburg.de, meister@math.uni-hamburg.de Bernd Fischer Mathematical Institute, Medical University of Lilheck, Germany. fischer@math.mu-luebeck.de Abstract Mathematical models of blood flow are inevitably embedded in models of human thermoregulation because they take the role of the most significant heat distributor in models of the human thermal system [14, 6]. Models of human thermoregulation have a wide range of applications, e.g. for the prediction of the impact of accidents, diseases and clinical treatments (see [14] and the references therein). The application of our interest is the prediction of the influence of cooling on the heat distribution in premature infants, see Section 2. In Section 3 we discuss the requirements of a reliable thermoregulation model while the governing equation is described in paragraph four. The employed blood flow model is discussed within Section 5. Section 6 deals with numerical results, followed by concluding remarks in the last paragraph. 1 Motivation Lack of oxygen of the fetiis or newborn is known to be an important cause for injuries of the developing brain [9]. Experimental studies have shown that the neuronal loss evolves over several days after such an incident [8]. An important factor influencing the degree and distribution of neuronal loss is the cerebral temperature, i.e. lowering the cerebral temperature can prevent much damage [5]. The question arises, if it is possible to lower the cerebral temperature of an infant by 2 -ZK hy the manipulation of the environment inside an incubator while the rest of the body maintains a pleasant temperature. The objective of this paper is to discuss the mathematical measurements which can be used to predict an answer to that question by the use of numerical simulations. 2 Modeling the thermoregulation of premature infants The term thermoregulation stands for the measurements of the body to hold a pleasant temperature [4]. Models for thermoregulation consist of two parts: the active and the passive system [6]. The active system consists of the regulatory mechanisms shivering (heat production within the muscles attached to the skeleton), vasomotion (control over the degree of blood flow within the skin) and sweating (control over the degree of effectiveness of heat transfer between the infant and the surrounding air). The passive system 428 Blood flow model is the combination of the physical human body and the heat transfer in it and at its surface. The idea behind this distinction is that the active system has a controUing influence over the passive system. Naturally, only results obtained by the complete model can be compared with available real life data. Concerning premature infants, it is known that shivering and sweating are not of importance for the modelling process [4, 13], while vasomotion should not be of great concern for our special application [13]. The modeling of the passive system demands the discretiza,tion of the body and the modeling of metabolic heat production and blood flow. We do not consider phenomena which are related to environmental conditions, namely the response to air convection, the probability to gain or loose heat due to radiation and heat loss due to evaporation in dependence on pressure, temperature and humidity of the surrounding air, assuming that these are controllable by the use of an incubator [13]. In order to give an answer to the defined question by use of numerical simulations, a model needs to deliver detailed temperature profiles within the head and a detailed resolution of the heat transfer processes in the body. It should be applicable to diflFerent size neonates whereby aspects like the anatomy and the thermal maturity have to be considered. With the exception of the blood flow model, these aspects can be defined via a suitable geometry and the use of real life data for spatially dependent rates of metabolic heat production within a numerical method [7, 2]. This also incorporates that ejdsting numerical methods made for the simulation of thermoregulation of adults are of no use in the given context since studies have shown [3] that a detailed modeling of geometry and tissue composition is necessary in order to obtain relevant temperature profiles. As it can be shown experimentally [7, 2] in agreement to theoretical discussions concerning thermoregulation models of adults [6, 14], the use of a blood flow model greatly affects the computed numerical solutions. 3 Analysis of the blood flow model The bio-heat equation derived by Pennes [10] forms the basis of the majority of models for human thermoregulation in use today [14, 6]. It describes the dissipation of heat in a homogeneous, infinite tissue volume. For two spatial dimensions, it can be written in the form c(x)p(x)atT(x,i)=div[A(x)VT(x,t)]+f(x,t). (3.1) Thereby, the temperature T depends on the spatial variable x = {xi,X2)'^ as well as on time t. Furthermore, A(x), c(x) and p(x) denote the heat conductivity, specific heat capacity and density of the tissue, respectively. The term /(x) can be decomposed via /(x, i) = (5M(X) + QB(X, i) into parts corresponding to metabolic heat production (5M(X) and blood flow QB(X,i). As already indicated, the term <5M(X) can be defined by the use of real life data [7]. The formulation of the source term due to blood flow is based on variations of the following procedure [6, 14]. The idea is that the body is supplied from a central pool of blood by the major arteries. Before the tissue is perfused, the temperature of the arterial blood mixes with the temperature of venous blood flowing in adjacent veins. After that, the arterial blood exchanges heat with the tissue in the capillaries and becomes venous 429 430 M. Breuss, B. Fischer and A. Meister blood. The venous blood is collected in the major veins and its temperature mixes with the temperature of arterial blood in the adjacent arteries before it flows back into the blood pool. Since equation (3.1) deals with the change of thermal energy per unit volume, the term QB(X) takes the form QB{X, t) - CBPBCCX{X)BF{X) [TB{t) - T{x, t)], (3.2) whereby Tsit) denotes the time-dependent mean value of the temperature of the blood within the blood pool, we also assume that the specific density of the blood pg and the specific heat capacity of the blood CB are constant variables. The described modeling results in a differential equation for the temporal evolution of the temperature within the blood pool, namely in mBCBdtTB{t)= f pBCBCCX{K)BFix)dx[Tv{t)-TBit)]. (3.3) JD Thereby, the total blood mass ma, the time dependent mean value of the temperature of the venous blood Tv{t), and locally defined tissue-dependent measures for the blood perfusion BF{x.) and the counter-current heat exchange CCX{x) are introduced. Equation (3.3) shows that the temporal change of the blood pool temperature is proportional to the difference to the temperature of the venous blood. The outlined idea leads to the modeling of the temperature of the venous blood as r M ^ lDCCX{x)BF{x)T{x,t)d,x ^^^^^= : J^CCX{x)BF{x)dx ^ ^^■^> which is also usable when only steady states are considered [7]. The crucial terms in the order of importance are the blood perfusion BF{x) and the counter current heat exchange CCX{x). There is much debate about the choice of these functions in literature [14, 6]. This debate arises because the representation of blood circulation is substituted by a rather simple model formulation. The cure to this disadvantage is generally sought by exploring more and more detailed models of microstructure, organs, etc., or it is sought by a better modeling of control mechanisms of the actice system in the case of adults [14, 6]. The main drawback of the described blood flow model is given by the blood pool idea itself. This is up to now to our knowledge not outlined in any mathematical description of this model within the literature and can be illustrated as follows. Let a detailed geometry be given with a stationary temperature distribution together with a homogeneous neutral temperature at the whole boundary as initial state. Let us assume that we start a numerical computation where a selective cooling at the neck is employed. By heat conduction of the tissue, the effect of cooling computed with the help of the discretization of heat gradient and heat conductivity of the local tissue propagates into the inner part of the domain. Concerning the blood flow, the averaging step within (3.4) captures the local cooling eflPect which results in a slightly cooler average temperature of the venous blood within the whole domain than in the initial state. Employing this value in (3.3) results in a slight negative change of the blood pool temperature. Taking account of the Blood flow model 431 evaluation of the source term (3.2) for the control volumes located in the vicinity of the neck, we notice that a strong cooling is locally equalized by the combination of a) the source term due to blood flow which is mostly influenced by the neutral blood temperature in the rest of the body and b) of the source term due to metabolic heat production which was not influenced at all by the change in the boundary temperature. The result is that the effect of a local coohng mechanism is instantly distributed over the whole domain while a weighted mean Value of the temperature over the domain equalizes local cooling mechanisms. The validity of this reasoning is verified by numerical results [7, 2] and by an exemplary result shown in Section 6. The non-local nature of the described blood flow model can directly be seen by applying an impUcit time stepping strategy. Due to the integration over the whole computational domain in (3.4), one ends up with a fully occupied matrix after the usual linearization step which was already recognized in [7] in the context of steady state calculations. We now illuminate a further property of the bloodflow model. Therefore, let the abbreviations a = PBCB, P - JD i<'B(x)B(x) dx and 7 = pB/ruB hold. A straightforward computation gives TB{t)=Tv{t)-^fTB{t). (3.5) Note that a, (3 and 7 are positive constants. Consider a steady state situation as initial state, i.e. Tg = Ty holds. If the body is heated, the temperature within the body increases and so Ty will increase. This has the effect that the bloodpool temperature TB will increase in the near future, i.e. Tg {t) > 0. We now investigate the net effect of the bloodflow. Integration of the source over the computational domain D results in / QB{-x.,t)dyi = a pTeit)- f irB(x)B(x)r(x,i)dx JD I JD (3.5) ad = —~n.-'-B\t)'j dt When employing T'jg{t) > 0 we see that the total of all sources in the body is negative, i.e. while the blood in the bloodpool cools the increasingly warm body in the mean if the body is exposed to heat, it also takes over heat from it. The bloodpool and the body are to be seen as two separate systems which are connected via heat fluxes and so one can consider the bloodpool as a regulator. 4 Numerical method and experiments The following numerical approximation of the unsteady bio-heat equation (3.1) represents a convenient extension of the finite volume method developed in [7], which has been proven to be a robust, accurate and reliable algorithm in the context of steady state temperature distributions. However, finite volume schemes are categorically based on the integral form of the governing equation. In order to apply Gauss's integral theorem it is neccessary to write the equation in divergence form. Therefore, we introduce the auxihary variable A;(x) = p(x)c(x) and the auxihary temperature T{x,t) = fc(x)r(x,i) M. Breuss, B. Fischer and A. Meister 432 iiito the governing equation and consequently the bio-heat equation (3.1) writes 1 dt //""'"'''=I [i)'^*"'" »i)v.« n(x) ds + / f(x, t) dx Ja (4.1) for all control volumes a C D, see [2]. In order to solve equation (4.1) numerically, the space part D is decomposed into a finite number of sub-domains. We start from an 1. General form of a control volume of the triangulation (left) and its boundary (right). FIG. arbitrary conforming triangulation V^ of the domain D which is called the primary mesh and consisting of finitely many triangles T>i and the corresponding nodes are abbreviated by Xj G ^. Based on the triangulation a discrete control volume CT, is defined as the open set of R^ including the node x,- and bounded by straight lines which are determined by the connection of the midpoints of the edges of the corresponding triangles Vj (i.e. Xj e dVj) and their barycentre (see Figure 1). The union B'" of all boxes is called the secondary mesh. A finite volume method represents a discretizationof the evolutionary equation (4.1) for cell averages defined by (MT) {t)\^ = (1/|(T|)/^T(x,t)rfx, where \a\ denotes the volume of the box a. With respect to the secondary mesh B'' we can write the integral form (4.1) as dt (^^)(^)|^^ = ML A(x) VT{x,t) A;(x) ' *'"'"' A(x)T(x,f) VA;(x) ■ n(x) ds fc(x)2 + / (5B(x,t)dx-f / (5M(x)dxl, Moi&B^. (4.2) Corresponding to a finite element method the evaluation of the boundary integral is performed by using a piecewise constant distribution_of the heat coefficient A and a piecewise linear distribution of the auxilary temperature f. with respect to the triangles of the trianglutaion used. Note that the source term remains unchanged and the calculation is Blood flow model 433 given by / and I QM{x)dx = \ai\QM{xi) QB{yi,t) dx = \ai\cBPBCCX{xi)BF{xi) [Ts^ - T(xi,t)]. J at The computation of the blood pool temperature is directly performed by an explicit time discretization of equation (3.3). Thereby, the temperature of the venous blood is given by equation (3.4). It is remarkable that the method degenerates to the scheme presented in [7] in the context of a steady state solution and therefore the excellent properties like the discrete min-max principle are maintained in such a situation. Due to the space available kernel FIG. 2. Primary mesh and tissue layers in the head region. we restrict ourself to the consideration of steady state calculations using the described method. Thereby, we distinguish layers of skin, fat, bone and kernel by different rates of metabolism, specific heat capacity and blood perfusion associated with the regions depicted in Figure 2. As boundary conditions we employ a comfortable boundary temperature of 309.15 if at head, back, legs, and belly while we set 299.15 K at the neck, i.e. we selectively cool the neck. In reality, this corresponds to the situation where the infant is wearing a water-filled collar with the purpose of cooling the blood flowing into the brain through the arteries adjacent to the skin. In Figure 3 (a) we can see the temperature distribution in the two-dimensional discretized idealization of the body of a premature infant. Thereby, no blood flow and no metabolic heat production is applied, so that the depicted distribution of heat is only influenced by the heat conductivity of the employed tissues. The situation where tissue dependent metabolic heat production is taken into account is shown in Figure 3 (b). Note that the heat sources visualized within the picture not only have local effects, they also influence the mean value of the temperature of the blood pool. Within Figure 3 (c), blood flow is additionally given. M. Breuss, B. Fischer and A. Meister 434 It is evident that the blood flow has the effect outlined in Section 5. Especially, the numerical solution incorporates no hint of the fact, that in reality there is a transport of cool blood to the brain and also a transport of blood by the veins coming from the brain. 299.15 301.1 303.1 305.1 307.0 309.0 310.3 Temperature in [Ti'] FIG. 3. Comparison of steady state situations (a) only with heat conduction (b) with heat conduction and metabolic heat production and (c) with blood flow additionally taken into account (from top to bottom). 5 Concluding remarks The range of applicability of the described blood flow model is restricted to situations where it makes sense to employ a mean value of the whole blood, e.g. if the whole body is exposed for a longer time to the same temperature. For a clinical application where the effects of local cooling or heating have to be studied, caution is required when dealing with the results achieved by employing variations of the described model. Blood flow model Bibliography 1. B. Fischer, M. Breufi and A. Meister, The unsteady thermoregulation of premature infants — a model and its application, in Discrete Modelling arid Discrete Algorithms in Continuum Mechanics, Proceedings of thei GAMM Workshop, Th. Sonar and I. Thomas (eds.), 2000. 2. B. Fischer, M. Breufi and A. Meister, The numerical simulation of unsteady heat conduction in a premature infant, in Numerical Methods for Fluid Dynamics, M.J. Baines (editor), ICFD, Oxford University Computing Laboratory 7 (2001). 3. M. Buse and J. Werner, Heat balance of the human body: influence of variations of locally distributed parameters, in Journal of Theoretical Biology 114 (1985), 34-51. 4. O. Bufimann, A model for the thermoregulation of premature infants and neonates under consideration of the thermal maturity, PhD Thesis, Medical University of Lubeck, (2000), in German. 5. R. Busto et al.. The importance of brain temperature in cerebral ischemic injury, in Stroke 20 (1989), 1114-1134. 6. D. Fiala, K.J. Lomas and M. Stohrer, A computer model of human thermoregulation for a wide range of environmental conditions: the passive system, in Journal of Applied Physiology 87'No. 5 (1999), 1957-1972. 7. B. Fischer, M. Ludwig and A. Meister, The thermoregulation of infants: Modeling and numerical simulation, in 5/r 41 No. 5 (2001), 950-966. 8. P.D. Gluckman and C.E. Williams, When and why do brain cells die?, in Dev. Med. Child Neurol. 34 {1992), 1010-lOU. 9. E.G. Mallard et al., Neuronal damage in the developing brain following intrauterine asphyxia, in Reprod. Fertil. Dev. 7 (1995), 647-653. 10. H.H. Pennes, Analysis of Tissue and Arterial Blood Temperatures in the Resting Human Forearm, in Journal of Applied Physiology 1 (1948), 93-122. 11. G. Simbruner, Thermodynamic Models for Diagnostic Purposes in the Newborn and Fetus, Facultas Verlag, Wien, 1983. 12. T. Sonar, On the Construction of Essentially Non-Oscillatory Finite Volume Approximations to Hyperbolic Conservation Laws on General Triangulations: Polynomial Recovery, Accuracy, and Stencil Selection, in Comp. Meth. Appl. Mech. Eng. 140 (1997), 157-181. 13. K. Thomas, Back to Basics: Thermoregulation in Neonates, in Neonatal Network 13 No. 2 (1994), 15-22. 14. J. Werner, Thermoregulatory models. Recent research, current applications and future development, in Scand. J. Work Environ. Health 15 Suppl. 1 (1989), 34-46. 15. J. Werner and P. Webb, A six-cylinder model of human thermoregulation for general use on personal computers, in Ann. Physiol. Anthrop. 12 No. 3 (1993), 123-134. 16. E.H. Wissler, A mathematical model of the human thermal system, in Bulletin of the human thermal system 26 (1964), 147-166. 435 Zeros of the hypergeometric polynomial F{—n,b; c; z) K. Driver* and K. Jordaan School of Mathematics, University of the Witwatersrand, Johannesburg, South Africa. 036kad0cosmos.wits.ac.za, 036jordOcosmos.wits.ac.za Abstract Our interest lies in describing the zero behaviour of Gauss hypergeometric polynomials F{-n,b; c; z) where b and c are arbitrary parameters. In general, this problem has not been solved and even when 6 and c are both real, the only cases that have been fully analysed impose additional restrictions on b and c. We review recent results that have been proved for the zeros of several classes of hypergeometric polynomials F{-n, b; c; z) where b and c are real. We show that the number of real zeros of F{~n,b; c; z) for arbitrary real values of the parameters b and c, as well as the intervals in which these zeros (if any) lie, can be deduced from corresponding results for Jacobi polynomials. 1 Introduction The Gauss hypergeometric function, or 2F1, is defined by FO {a)k{b)k z^ where a, h and c are complex parameters and I (a)fc = a(a + l)...(a + fc-l) = r(Q + fc)/r(a) is Pochhammer's symbol. When a = -n is a negative integer, the series terminates and reduces to a polynomial of degree n, called a hypergeometric polynomial. Our focus lies in the location of the zeros F{-n, b; c; z) for real values of b and c. Hypergeometric polynomials are connected with several different types of orthogonal polynomials, notably Chebyshev, Legendre, Gegenbauer and Jacobi polynomials. In the cases of Chebyshev and Legendre polynomials, the connection demands fixed special values of the parameters b and c, namely, (cf. [1], p.561) -n,n;^;z)=Tn{l-2z) and Fi-n,n + l;l;z)=Pn{l-2z), 'Research of the first author is supported by the John Knopfmacher Centre for Applicable Analysis and Number Theory, University of the Witwatersrand. 436 Zeros of the hypergeometric polynomial F{—n, b; c; z) 437 respectively. However, in the cases of Gegenbauer and Jacobi polynomials, we have F(^-n,n + 2X;X + \;z^=j^C::{l-2z) \ (1.1) F{-n,a + p + l + n;a + l;z) = —^Vi"'^\l-2z), (1.2) and respectively. Since the zeros of orthogonal polynomials are well understood, we expect the connections (1.1) and (1.2) to be very useful in analysing the zeros of F{—n, b; c; z). Conversely, if the zeros of F{—n, b; c; z) are known, this leads to new information about the zero distribution of Gegenbauer or Jacobi polynomials for values of their parameters that lie outside the range of orthogonality of these polynomials. This paper is organized as follows. In Section 2 we give a self-contained review of recent results regarding the zeros of several special classes of hypergeometric polynomials. Section 3 contains results originally due to Klein [9] which detail the numbers and location of real zeros of F{—n,b; c; z) for arbitrary real values of b and c. We provide simple proofs using results proved in [13]. 2 Zeros of special classes of hypergeometric polynomials We begin with a few general remarks. Since we shall assume throughout our discussion that b and c are real parameters, we know that all zeros of F{—n, b; c; z) must occur in complex conjugate pairs. In particular, if n is odd, F must always have at least one real zero. Further, if 6 = —m where m < n,m E N, F{—n,b; c; z) reduces to a polynomial of degree m. However, since we are interested in the behaviour of the zeros of F{—n, b; c; z) as b and/or c vary through real values, we shall adopt the convention that F{—n,—m; c; z) = lirat,-,-mF{—n,b; c; z). This ensures that the zeros of F vary continuously with b and c. Note also that F{—n,b; c; z) is not defined when c = 0, —1,..., —n + 1. Regarding the multiplicity of zeros, a hypergeonietric function w = F{a, b; c; z) satisfies the differential equation z{l-z)w"+[c-{a + b + l)z]w'-abw = 0, so if w{zo) = w'{zo) =0 at some point ^o 7^ 0 or 1, it would follow that w = 0. Thus multiple zeros of F{—n, b; c; z) can only occur at ^ = 0 or 1. 2.1 Quadratic transformations The class of hypergeometric polynomials that admit a quadratic transformation is specified by a necessary and sufficient condition due to Kummer (cf. [1], p.560). There are 438 K. Driver and K. Jordan twelve polynomials in this class (cf. [14], p.l24) F(-n,6;26;2) F{-n,b;-n - b + I; z) F{-n,b; -"+''+^ z) F (-n,b; I; z) F [-n,-n + I; c; z) F {-n,b;-n + b+I; z) jF(-n, 6;|;2) F (-n,-n - 5; c; z) F {-n,b;-n + b-I; z) F {-n,b;-2n; z) F{-n,b; b + n + 1; z) F {-n,n + l; c; z) . The most important polynomial in this class is F{-n, b; 26; z) because complete analysis of its zero distribution for all real values of b (cf. [4], [5]) leads to corresponding results for the zeros of the Gegenbauer polynomials C^{z) for all real values of the parameter ;A(cf. [6]). Theorem 2.1. Let F = F{—n,b; 26; z) where b is real, (i) For b> — |, all zeros of F{—n, b; 26; z) are simple and lie on the circle \z — l\ = 1. (a) For -| - j < 6 < 5 - i, j = 1,2,... [|] - 1, (n - 2j) zeros of F lie on the circle \z - 1\ = 1. If j = 2k is even, there are k non-real zeros of F in each of the four regions bounded by the circle \z - 1\ = 1 and the real axis. If j = 2k + 1 is odd, there are k non-real zeros of F in each of the four regions described above and the remaining two zeros are real. (Hi) Ifn is even, for - [f ] <''<"" [f] + h "° ^^^°^ °f ^ ''^ °" k " 1| = 1- -V'^ = 4fc, all zeros of F are non-real whereas ifn = Ak-\-2, two zeros of F are real andAk are , non-real. Ifn is odd, for -1 - [|] < 6 < - [§] + |, only the fixed real zero of F at z = 2 lies on \z — l\ = 1. Ifn = 4fc + 1, n — 1 = 4fc zeros of F are non-real whereas ifn = Ak + 3, two further zeros are real and the remaining ik are non-real. (iv) For j - n < 6 < j - n + 1, j = 1,2,... [|] - 1, (n - 2j) zeros of F are real and greater than 1. If j = 2k is even, all remaining 2j zeros of F are non-real with k zeros in each of the regions described above; while if j = 2fc + 1, 4/c zeros are non-real as before and 2 are real. (v) For b < l-n, all zeros of F{—n, 6; 26; z) are real and greater than 1. Asb —* —00, all the zeros of F converge to the point z = 2. An analogous theorem which describes the behaviour of the zeros of C^{z) can be found in [6], Section 3 or [7], Theorem 1.2. For the polynomial F (-n, 6; |; 2) the following result has been proved in [7], Theorem 2.3. Theorem 2.2. Let F = F {-n,b; |; z) with b real. (i) For 6 > n - |, alln zeros of F are real and simple and lie in (0,1). (ii) For n - I - j < 6 < n + I - jf, j = 1,2,..., n - 1, (n - j) zeros of F lie in (0,1) and the remaining j zeros of F form [^] non-real complex pairs of zeros and one real zero lying in (l,oo) when j is odd. Zeros of the hypergeometric polynomial F{—n, b; c; z) 439 (Hi) For 0 < 6 < \, F has \_^'\ non-real complex conjugate pairs of zeros with one real zero in (l,oo) when n is odd. (iv) For —j <b < —j + 1, j = 1,2,... ,n—l, F has exactly j real negative zeros. There is exactly one further real zero greater than 1 only when (n — j) is odd and all the remaining zeros of F are non-real, (v) For b <l—n, all zeros of F are real and negative and converge to zero asb ^> —CXD. A very similar theorem is proved for the zeros of F (-n, h; |; 2) in [7], Theorem 2.4 with only minor differences of detail. For the hypergeometric polynomial F{—n, b; —2n; z), less complete results have been proved. We have (cf. [8] Theorem 3.1 and Corollary 3.2) the following. Theorem 2.3. Let F = F {-n, b; -2n; z) with b real. (i) For b > 0, F has n non-real zeros if n is even whereas if n is odd, F has exactly one real negative zero and the remaining (n— 1) zeros of F are all non-real. (a) For -n < b < 0, if -k < b < -k -\- 1, k = l,...,n, F has k real zeros in the interval (l,oo). In addition, if{n — k) is even, F has {n — k) non-real zeros whereas if {n — k) is odd, F has one real negative zero and \n — k — l) non-real zeros. (Hi) For -n> b> -2n, if -n-k > b> -n-k-1, k = 0,1,... ,n-l,F has (n - k) real zeros in the interval (l,oo). In addition, if k is even F has k non-real zeros while if k is odd, F has one real zero in (0,1) and {k — 1) non-real zeros. (iv) For b < —2n, all n zeros of F are non-real for n even whereas for n odd, F has exactly one real zero in the interval (0,1). The identities (cf. [7], Lemma 2.1) F{-n, b;c;l-z)= ^^ ~" ^)" F{-n, b;l-n + b-c; z) (2.1) \C)n and F{-n,b;c;z) = ^{-zYFl-n,l-c-n;l-b-n;-\ (c)„ V zj (2.2) hold for b and c real, c -^^ {0,-1,... ,-n + 1}. Applying (2.1) and (2.2) to each of the polynomials F{-n,b; 26; z), F [-n,b; \; z), F [-n,b; |; z) and F{-n,b; -2n; z) in turn, we obtain the remaining eight polynomials in the quadratic class. It is then an easy task to deduce analogous results for their zero distribution. A similar set of results has been proved for the sixteen hypergeometric polynomials in the cubic class. Again, this class arises from a necessary and sufficient condition (cf. [2], p.67) and details can be found in [7]. 3 The real zeros of F{—n, b; c; z) for h and c real The results proved below are due to Klein [9] who considered the zeros of more general hypergeometric functions (not necessarily polynomials). Klein's proof is geometric and 440 K. Driver and K. Jordan difficult to penetrate. A more transparent perspective in the polynomial case may be provided by the approach given here. The classical equation linking the hypergeometric polynomial F{-n, b; c; z) with Jacobi polynomials vi°''^\z) is given by (1.2). We will find an alternative expression (cf. [12], p.464, eqn. (142)) n-n,6;c;^) = g^7'^'')(l-^), (3.1) where a = -n-6 and (3 = b-c-n, more suited to our analysis. The number of real zeros of V^'^\x) in the intervals (-1,1), (-oo, 1) and (1, oo) are given by the Hilbert-Klein formulas (cf. [13], p.l45, Theorem 6.72), also known to Stieltjes. We use Klein's symbol 0 E{u) = ^ [u] if « < 0 itu>0, u^ integer M-1 if w = 1,2,3,... Noting that under the linear fractional transformation w = 1 - 2/z, the intervals 1 < 10 < oo, -00 < w < -1 and -1 < w < 1 correspond to -oo <2;<0, 0<^;<1 and 1 < z < 00 respectively, we can use equation (3.1) to rephrase the Hilbert-Klein formulas for hypergeometric polynomials. Theorem 3.1. Let 6, c € ffi with b,c,c-h^Q, -1,..., -n + 1. Let X = £|^(|l-c|-|n + 6|-|6-c-n| + l)| (3.2) Y = £;|i(-|l-c| + |n-|-6|-|6-c-n| + l)l (3.3) Z = £^-(-|l-c|-|n ' \ 9 (-|1 - cj - |n + 6| + |6-c-n| |6 - c - n| + l)^. 1) I (3.4) Then the numbers of zeros of F{—n,b; c; z) in the intervals (l,oo), (0,1) and (-oo,0) respectively are 2[(X + l)/2] i/(-l)"r„^)M>0 ■^1 = S =! 2[X/2] + l I N2 = { ,■.,,.. if{-ini')C7)<^ 2[(F+i)/2] if{-:){'-') >^ 2[y/2] + i , if{-:){'7)<^ ^^j2[(.+i)/2] ifi-:){-:)>o 12[z/2]+i if {-:){-:) <o. (^•^) (3-6) ^^^^ Zeros of the hypergeometric polynomial F{—n, b; c; z) 441 Proof: The expressions all follow immediately from the Hilbert-Klein formulas (cf. [13], p. 145, Thm. 6.72) together with equation (3.1). n Theorem 3.2. Let F = F{-n, b; c; z) where b, ceM. and c > 0. (i) For b> c + n, all zeros of F are real and lie in the interval (0,1). (a) For c<b < c + n, c + j -I <b < c + j, j = l,2,...,n; F has j real zeros in (0,1). The remaining {n — j) zeros of F are all non-real if (n — j) is even while if (n — j) is odd, F has {n — j — 1) non-real zeros and one additional real zero m (1, oo). (Hi) For 0 < b < c, all the zeros of F are non-real if n is even, while if n is odd, F has one real zero in (l,oo) and the other {n — 1) zeros are non-real, (iv) For -n <b <0, -j <b < -j -\-l, j = l,2,...,n, F has j real negative zeros. The remaining (n — j) zeros of F are all non-real if {n — j) is even, while if (n — j) is odd, F has {n — j — 1) non-real zeros and one additional real zero in (l,oo). (v) For b < —n, all zeros of F are real and negative. Proof: We use the identity (cf [1], p.559, (15.3.4)) F{-n,b;c;z) = {l-zYF(-n,c-b;c;^—\ (3.8) to show that (i) ^ (v) and (ii) => (iv) so that it will suffice to prove (i), (ii) and (iii) above. (i) => (v): If 6 < —n then c — b>c + n and by (i), all zeros of F{—n, c -^ b; c; w) are real and lie in the interval (0,1). Since w = z/{z — 1) maps (-oo, 0) to (0,1), (v) follows from (3.8). (ii) =^ (iv): If -j < b < -j + 1, j = 1,2,..., n, then c + j - 1 < c - 6 < c + j, ji = 1,2,... ,n. By (ii), since w = z/{z — 1) maps (—oo,0) to (0,1) and (l,oo) to (l,oo), (iv) follows again from (3.8). To prove (i), (ii) and (iii), we note that in each part, 6 > 0 (and of course c > 0 by assumption). Then (i) Suppose b> c-\-n. Then b — c> n and sign ( ~ "^ j > 0 for all n. (3.10) Considering (3.5), (3.6) and (3.7) with (3.9) and (3.10), we observe that ATi = 2[(X + l)/2], Ar3 = 2[(Z + l)/2], r 2[(y + l)/2] for n even [ 2[y/2] + l fornodd Assume now that c > 1. Then for 6 > c + n, we have from (3.2), (3.3) and (3.4) that X = Q,Y — n, Z = Q. Substituting these values into A'"i, N2 and A^3 yields the result. A similar calculation shows that the same result is obtained when 0 < c < 1. 442 K. Driver and K. Jordan (ii) For c + j - 1< 6 < c + j, i = 1,2,..., n, we find that sign (''7) = (-1)"-^'. Then from (3.5), (3.6), (3.7) we see that 2[(X + l)/2] for (n-j) even 2 [X/2] + 1 for (n - j) odd 2 [{Y + l)/2] for j even 2 [Y/2] + 1 for j odd Ni = N2 = Ns = 2[(Z + l)/2]. It follows from (3.2), (3.3) and (3.4) by an easy calculation that X = 0, Y = j, Z = 0 and we deduce that ATj = / ^ !j {'^ ~^!j I'' ^l^^ , iVa = j and N3 = 0 [ 1 it [n- J) IS odd which proves (ii). /•■•\ -n r> ^ I, ^ • /b-c\ / i\r, rp, ^r / 2[(X + l)/2] if n is even (m) For 0 < 6 < c, sign (^/) = (-1)". Then iVi = | 2 [3^/2] + 1 if n is odd ■ N2 = 2 [(y + l)/2], ATa = 2 [(Z + l)/2]. Also, we find X = 0, F = 0 and Z = 0 which completes the proof of (iii) and hence the theorem. □ For c < 0, the range of values of b and c that have to be considered can be reduced if we use the identities (2.1) and (2.2). Since the real zeros of F{-n,b; c; z) are now known for all c> 0 and 6 e K from Theorem 3.2, it follows from (2.1) that we need only consider c-b> 1-n. Similarly, from (2.2) and Theorem 3.2, we can assume 6 > 1 -n. We split the result for c < 0 into the cases where 6 > 0 and 1 - n < 6 < 0. Theorem 3.3. Let F = F(-n, 6; c; 2;). Suppose that c <0, b > 0, c-b> 1-n. Then (i) l-n<c-6<0 and 0 < 6 < n - 1 and 1 - n < c < 0. (ii) If -k < c < -k + 1, k = 1,... ,n - 1 and -j<c-b<-j + l, j = l,...,n-l, then F{-n,b; c; z) has {j-k) > 0 real zeros in (0,1). For the remaining (n-j + k) zeros of F (a) {n — j + k) are non-real if (n - j) and k are even (b) {n — j + k - 1) are non-real and one real zero lies in (1,00) if (n - j) is odd and k is even (c) (n — j + fc — 1) are non-real if {n — j) is even, k odd and one zero is real and negative (d) (n — j + fc — 2) are non-real if (n-j) is odd and k is odd with one real negative zero and one real zero in (l,oo). Proof: (i) This follows immediately from c<0,b>0, c-b>l-n. Zeros of the hypergeometric polynomial F{—n, b; c; z) 443 (ii) For c < 0, 6 > 0, c - 6 > 1 - n, we have |l-c| = l-c, \b + n\ = b + n, \b-c-n\ = c-b + n and it follows from (3.2), (3.3) and (3.4) that X-^(l-c-n), Y = E{b), Z = E{c-b). Since 1 - c - n< 0 and c - 6 < 0, X = Z = 0. Now sign {-^) = (-1)" and for fc = 1,... ,n - 1, -fc < c < -A; + 1 ^ sign (7) = (-l)"-^ while for -j < c - 6 < -j + 1, j = l,...,n- 1, sign {^-^) = (-1)"-^'. Therefore, from (3.5), (3.6) and (3.7), ^ = {? Z4^ ;(3.n) > = {; ;;t:r- .(3.13) Now for j > ft - c > j - 1 and -k < c < -k + 1, b e {j - k - 1, j - k + 1), j-fc = l,2,...,n-2. If 6e (j-fc-1, j-fc), y = £^(6) = j - k-1, whereas if b e ij-k, j-k+1), Y = E{b) = j-k. Considering the cases {j-k) even and {j — k) odd, it is straight-forward to check that for all j, k eN with j - fc = 0,1,..., n - 2, we have N2=j-k. (3.14) Equations (3.11), (3.12), (3.13) and (3.14) complete the proof of (ii). a By virtue of Theorem 3.3 and the identities (2.1), (2.2) and (3.8), it is easy to see that we only have one possibility left that has not been analysed, namely, l-n<c-b<0,l-n<b<0, l-n<c<0. (3.15) Theorem 3.4. Let F = F{—n,b; c; z) where b and c satisfy condition (3.15). If —j < b<-j + l, j -l,...,n-l; -k<c<-k + l,k = l,...,n-l and-£ < c-b < -i-\-l, i= l,...,n-l, then F has no real zeros ifn + j + i, k + i, j + k are even, one real zero in {l,oo) if n + j +i is odd, one real zero in (0,1) if k + £ is odd and one real negative zero if j + k is odd. Proof: Under the restrictions (3.15), we have |l-c| = l-c, \b + n\=b + n, \b-c-n\= c-b + n. Then from (3.2), (3.3) and (3.4), X = E{l-c-n), Y = E{b), Z = E{c-b), 444 K. Driver and K. Jordan and it follows from (3.15) that X = Y = Z = Q. Also, sign ("'') = (-1)"-^ sign (7) = (_l)n-fe and sign(''~'^) = (-1)""^. The stated result then follows immediately from (3.5), (3.6) and (3.7). D Remark 3.1 We have not considered the asymptotic zero distribution as n ^> oo of F{-n,b; c; z). There are recent interesting results in this regard using different approaches, namely complex analysis techniques [10], matrix theoretic tools [11], asymptotic analysis of the Euler integral representation [3] and analysis of coefficients [8]. Bibliography 1. M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, (Dover, New York, 1965). 2. Bateman Manuscript Project, Higher Transcendental Functions, Volume /, (A. Erdelyi, editor; McGraw-Hill, New York, 1953). 3. K. Driver and P. Duren, "Asymptotic zero distribution of hypergeometrlc polynomials",A/'«merim/^?5onf/im5, 21 (1999), 147-156. 4. K. Driver and P. Duren, "Zeros of the hypergeometrlc polynomials F{-n, b; 26; z)", Indag. Math., 11 (1) (2000), 43-51. 5. K. Driver and P. Duren, "Trajectories of the zeros of hypergeometrlc polynomials F{-n,b;2b;z) for b < -A", Constr. Approx., 17 (2001), 169-179. 6. K. Driver and P. Duren, "Zeros of ultraspherical polynomials and the Hilbert-Klein formulas", J. Comput. and Appl. Math., 135 (2001), 293-301. 7. K. Driver and M. Moller, "Quadratic and cubic transformations and the zeros of hypergeometrlc polynomials", J. Comput. and Appl. Math., to appear. 8. K. Driver and M. Moller, "Zeros of the hypergeometrlc polynomials F{-n, b; -In; z)", J. Approx. Th., 110 (2001), 74-87. 9. F. Klein, "Uber die Nullstellen der hypergeometrischen Reihe", Ma^/iemafecfte i47ina?en, 37 (1890), 573-590. 10. A.B.J. Kuijlaars and W. van Assche, "The asymptotic zero distribution of orthogonal polynomials with varying weights", J. Approx. Th., 99 (1999), 167-197. 11. A.B.J. Kuijlaars and S. Serra Capizzano, "Asymptotic zero distribution of orthogonal polynomials with discontinuously varying recurrence coefficients", J. Approx. Th., to appear. 12. A.P. Prudnikov, Yu. A. Brychkov and O.I. Marichev, Integrals and Series, Volume 3, (Moscow, "Nauka", 1986 (in Russian); English translation, Gordon & Breach, New York, 1988); Errata in Math. Comp., 65 (1996), 1380-1384. 13. G. Szego, Orthogonal Polynomials, (American Mathematical Society, New York, 1959). 14. N. Temme, Special Functions: An introduction to the classical functions of mat/iemaiica/p/iysics, (Wiley, New York, 1996). Approximation error maps A. Gomide and J. Stolfi Institute of Computing, University of Campinas, Brazil. anamaria@ic.unicainp.br, stolfi@ic.unicamp.br Abstract In order to analyze the accuracy of a fixed, finite-dimensional approximation space which is not uniform over its domain Q, we define approximation error map, a description of how the error is distributed over Q—not for a single test function but for a general class of such functions. We show how to compute such a map from the best approximations to an orthonormal basis of the target function space. 1 Introduction The expected accuracy of a finite-dimensional approximation space (e.g. a polynomial spline space, or a finite wavelet decomposition) will often vary over its domain Q. Indeed, adaptive-resolution schemes are based on the premise that refining the element grid in a particular region of 0. will improve the approximation accuracy in that region. Knowledge of how the expected approximation error varies over the domain fi is obviously relevant to the evaluation of an approximation space, and to the tuning of knot locations, grid geometry, refinement thresholds and other parameters. Towards that goal, we introduce the concept of approximation error map, a description of how the error is distributed over fi—not for a single test function, but for all functions in some specified space J^. We then show how to compute such a map from the best approximations to an orthonormal basis oi T. 1.1 Notation and definitions Let T and A be two fixed, finite-dimensional vector spaces, not necessarily disjoint, of functions defined on some domain Q with values in R. Let ||-|| be a vector semi-norm for the space A + J^. For any function f e T,vre define its best approximation as the function/•^ € .4 that minimizes the error ||/-/-^II. We refer to A and J" as the approximation and gauge spaces, respectively. We assume that the ||-||-balls in the suhspace A are strictly convex, ensuring that the best approximation always exists and is unique. Since (a/)-^ = oi{f-^) and ||a/|| = |a| ||/|| for any real constant a, we can confine the analysis of approximation errors to the unit J^-sphere :Fi = {f&T: ||/i| = l}. 1.2 Global error measures Usually, the effectiveness of the approximation space A is measured by a single number II/ - /-^11—either for the worst-case function / 6 .Fi, or by the root-mean-power average 446 Approximation error maps 447 over all functions / G ^i L ^P,A,r ■ i/p MI|P rr df I i/p idf (1.1) Note that integrals are taken over the function space J^i, not over the domain fi. The worst-case error is the limit MX^ 1.3 = (1.2) lim (r;,A,r = sup{ ||/-/-^H : / € J^i } . Uniform approximation spaces A global error measure such as /x^^r or CT*j^,p is generally sufficient when all points of ft are equivalent with respect to the quality of approximation. More formally, we say that a normed function space X is uniform over O if there is some family $ of maps from Q to Q that preserves X and its norm ||-||, and which can take any point of fi to any other point. A natural example is y^, the set of all harmonic functions on the sphere S'^ of a given maximum order n, with any Lp norm; this space is preserved by the family of rigid rotations of S''. Obviously, if both A and J^ are uniform under the same family $, then A approximates J- equally well at all points of Q. (Of course, for any specific function / G J", the error f — f'^ will usually vary over O.) There are however many important approximation spaces A which are not uniform. A familiar example is the space of polynomials or trigonometric series defined on a bounded region Q C R". Another example is the space of the piecewise polynomial splines of fixed order and continuity defined over a fixed grid G. Wavelet spaces truncated to a fixed order provide yet another example. For such spaces, the expected approximation error usually varies over Q, even when the functions to be approximated are drawn from a uniform space. 2 Approximation error map We define the root mean power approximation error map of T hy A <^P,.4,:F of O to R defined by cyr>AA^)= \ \A^)-f\^tdf / / irf/ BS the function • (2.1) As before, integrals are taken over the function space T\^ not over the domain O. Note that (y.p^j^^jr{x) is not the error for a specific function /, but rather the average error at the point X for a generic function / in J'^i. As a limiting case, we define also ihs worst-case approximation error map of T hy Aas the function iJ-AA^) = lim '^P,AAX) = sup { \f{x) - f^{x)\ : f eTA. (2.2) Again note that the supremum is taken over Ti, not over fi, and that /i^_>-(a;) is not the error at x for a single function /, but rather the error for the function / in J^i that is worst for that particular x. A plot of (Jp^j,^j^{x) or /i^,jr(a;) over Q. should show at a A. Gomide and J. Stolfi 448 glance how well A approximates :F in different parts of the domain, for all functions of !F at once. 3 Computing the approximation error map Formulas (2.1)-(2.2) become more tractable when the function metric ||-|| is the L2 norm 11/11 = [J^ |/(a;)|^ dx]^^^ defined on the space A+T—in other words, when ||/|| = (/, /) where {f,g) = JQ f{x)g{x) dx. We make this assumption in the remainder of this section. In that case, /"^ is a linear function of /, namely the orthogonal projection of / onto the subspace A; and /i^,^^ is simply |sin0|, where 6 is the angle between the two subspaces. 3.1 Explicit formula for a Let us suppose that A and T are disjoint, and let (/ii ,...,(/»„ be an orthonormal basis for T. Let Ui = (j>f- for all i, and let ei = (pi-ai. We will call (p, a, and e the gauge, approximation, and error bases, respectively (even though Oi and EJ need not be independent). The average error map a-p,A,:F{x) can be expressed in terms of the error basis i/p (^p,A:A^) Idc dc = (3.1) An JS"-M j where J4„ = 27Tt/r(|) is the measure of S n-l Note that J2i^i^i{^) is the dot product of the unit vector c = (ci,C2,... ,c„) and the vector e{x) = {£i{x),e2ix),... ,£„(3;)); it depends only on |e(a;)| and on the angle 6 between those two vectors, and is constant over the slice of S"~^ where 6 is constant. The measure of that slice is An-i |sin^|" dO. Therefore, (^p,A,r{x) = \4- r kWI |cos»|M„_i Isin^l"-^ de ii/p \.An Jo IVP - (r(|))2r(E±i) \eix) v/ir(^)r(2±i±2) 3.2 (3.2) Explicit formula for /x The worst-case error map fiA,r can be obtained by taking p to the limit +00 in formula (3.2), or directly, as follows. Prom formula (2.2), fJ'AA^) = supj (X]ci()f>jj (re)- f ^Ci^iJ {x) : ^Ci(/>i =l| Approximation error maps CiSiyX j 449 :ceS"-i^. (3.3) By considering the effect of negating each c,, it is easy to see that the absolute value in the last formula is superfluous, i.e. fiAAx) = JY^asiix) ceS"-i|. sup<^ Vciei(a;) : ceS"-M. (3.4) Formula (3.4) is the supremum of a linear functional with coefficients ei{x) over the sphere S"~-^; which is achieved at the point c*{x) of S"~^ that is collinear with the coefficient vector, namely c*{x) = Siix)/ Jj^ji^ji^W> whence I^A,A^) = E<(^)^^W = ,/E(^^-W)' = 1^(^)1- (3-5) In summary, the error maps crp^^^j^{x) and fj,jy^jr{x) (which differ only by a constant factor) can be derived from the approximation errors ei{x) for each basis function 4>i{x), combined with the norm |e(a;)| = \/Y^i{si{x)Y. 4 4.1 Practical considerations Connection between the function and point norms The maps (2.2) and (2.1) will be more useful when there is a direct connection between the function-space norm ||-|| and the absolute value |-|, used to compare functions values at a given point x, as in formulas (2.1)-(2.2)—namely, when 11/9 / \f{x)\''dx . (4.1) Jn In . More generally, the function values at x could be compared with a norm which could depend on a;, or take derivatives of the function into account. We will not pursue such extensions in this paper. Connection (4.1) is not strictly necessary—at least when A and .F are finite dimensional. However, it may not make much sense to choose the approximant /•^ so as to minimize the function norm ||-||, and then analyze its accuracy using some other norm |-|, if there is no connection between the two. Considering that the error map is relatively easy to compute when ||-|{ is the £2 norm (see Section 3), and probably intractable otherwise, the connection expressed by formula (4.1) will probably hold in practice (with q = 2). 4.2 Choice of the gauge space The approximation error map depends not only on the space A, but also on the gauge space T and the error metric ||/||. Therefore, the choice of .F and ||-|| must be guided by the intended application. For example, suppose the domain fi is the circle or the sphere S'^, and the appUcation does not specify a preferred direction. Then we should choose T and ||-|| so that they 450 A. Gomide and J. Stolfi are invariant under rotations of Q—otherwise, any inhomogeneity in them may produce irrelevant artifacts in the error map. Also, if the functions to be approximated are expected to be smooth, and/or only their low frequencies are important, then the functions in !F should be smooth too. A natural choice for J^, in this case, are the circular or spherical harmonics up to a certain maximum order, and the metric ||-|| can be simply the Lq norm over the sphere S**. 4.3 Essential dimensions We will argue next that, for the X2 function norm, the "interesting" part of the error map is determined by two "essential" subspaces T' QT and .4' C A^ which are disjoint and such that dim.F' > dim ,4'. First, if the spaces A and T have a non-trivial intersection V, and we split a function / e ^ into its components g € V and /i ± V, we find that /-^ = g^-h^; and that /i-^ is itself orthogonal to V. Therefore, we can confine our attention to the complements T' and A! of V relative to A and T^ which are disjoint. Let us then suppose that A and T are disjoint. If dim.F < dim^, let .4' C ^ be the projection of T onto A^ which contains all optimum approximants. Obviously, for any function /, we have /"^ = /"^ , so we can confine our attention to the spaced', which is still disjoint from T and satisfies dim.F > dim^'. 5 Examples 5.1 Trigonometric splines on the circle Consider the approximation of a function by continuous trigonometric splines, of maximum frequency r = 2, defined on a partition T of S^ into n = 8 unequal intervals. This space coincides with the space V^^lT] of non-homogeneous polynomial splines of R^, restricted to S^, with Co continuity constraints [2]. For the gauge space T, we will use the family of trigonometric series truncated after a suitable maximum frequency s>r, which coincides with the space of general spherical polynomials (not spHnes) V^'"^ for some s > r. The norm is ||/|| = y/{f, f) where (/)fl') = /si f{^)9{^)d'P- Specifically, T consists of the intervals Jo through /y shown below to l_ „ 0 ■ lo ^ f1 1 TT ■:;■ 2 /i ^2 h ta h U h h h *6 -I -I 1 -I -1 37r YTT 97r OTT —r-TT'^-ir-r 4 8 8 4 h h -I OTT -:r 2 h h 1 27r Within each interval Ij, the generic approximant is a linear combination gj of the Fourier basis functions ^i, for —r<i< +r. These partial functions are constrained to be continuous across interval boundaries; i.e. gj^i{tj) = gj{tj) for each j in {0,..., n - 1} (where all indices are taken modulo n). These equations turn out to be independent, therefore the dimension of ^ is n(2r-f-1) - n = 32. For the gauge space T, we will use the trigonometric polynomials of some order s > r, i.e. Unear combinations of the basis functions ^j for -s <i < -f s, where (j)i{9) — {l/\fn) sin(i^-|-7r/4). As observed in Section 4.3, we can ignore the subspace A! = 3^f\A of Approximation error maps 451 A generated by ^-r,... ,<pr- Moreover, in order to use all of A, we need dim J^ > dim A— i.e., 2s+l > 32, implying s > 16. See Figure 1. The resulting error map /x^,jr(a;) is shown in Figure 2. \ i = -4 v..y 12 FIG. 13 14 phifl) alpha{t) eps(t) v.-/ 15 16 1. The functions </>i(i), ai{t), and ei{t), for selected values of i. FIG. 2. The error map Hj^^jr{t) for continuous (CQ) trigonometric sphnes on eight unequal intervals, tested with the space of trigonometric polynomials of order 16. 5.2 Spherical splines on a uniform mesh For the examples in this section, the approximating functions are spherical polynomial splines [1, 2, 3, 4] of continuity class zero and various degrees, homogeneous and nonhomogeneous, defined on some triangulation T of the sphere S^. Figure 3 (left) shows the approximation error map fJ.A,3^{p) for the homogeneous spherical spline space A = 'Ho[T]/S'^, which has dimension 252. In Figure 3 (right), A is the non-homogeneous spherical spline space Vo[T]/S^, which has dimension 254. In both cases, the gauge space J^ is the family 3^fg of spherical harmonics of maximum order 15, which has dimension 256. The intersection J"n 'HQ[T]/S'^ is the family of spherical harmonics of odd order < 5 (dimension 21), whereas J^fl 'PQ[T]/S'^ is the full harmonic space yl (dimension 25). The level curves are logarithmically spaced, five per decade. A. Gomide and J. Stolfi 452 FIG. 3. Error maps HA,r{p) for the approximation spaces A = Hl[T]/S'^ (left) and A = Vo[T]/S'^ (right). The maximum errors are 13.5 and 9.37, respectively. 5.3 Spherical splines on a variable mesh In the following examples, the approximating functions are again spherical polynomial splines, but the vertices of the triangulation T have been displaced so as to create regions of very different sizes (still with icosahedral topology). Figure 4 (left) shows the approximation error map HA,rip) for the space of homogeneous spherical splines A = TiQ\T]/S'^, which has dimension 252. In Figure 4 (right), A is the space of non-homogeneous spherical splines Vo[T']/S'^, which has dimension 254. In both cases, the gauge space J" is the family y^^ of spherical harmonics of maximum order 15, which has dimension 256, as before. The level curves are logarithmically spaced (5 per decade). 6 Conclusion Asymptotic error analysis is not very helpful when comparing two fixed finite-dimensional approximation spaces of similar dimensions—such as a spline space against a wavelet space, or two spline spaces with different grid geometries. Approximation errors computed for individual test functions are difficult to interpret and may not be representative of the average or worst cases. We expect that the approximation error map will be a useful analysis tool for those situations—especially for domains that admit natural uniform target spaces, such as spheres (including the circle) and tori. Acknowledgments. This research was supported in part by CAPES, FINEP, and CNPq (PRONEX-SAI). Approximation error maps front front rear rear FIG. 4. Error maps (J.A,y^(p) for the approximation spaces A = TiQ[T]/S^ (left) and A = Vo[T]/S'^ (right). The maximum errors are 17.1 and 17.9, respectively. Bibliography 1. P. Alfeld, M. Neamtu, and L. L. Schumaker. Dimension and local bases of homogeneous spline spaces. SIAM Journal of Mathematical Analysis, 27(5):1482-1501, Sept. 1996. 2. A. Gomide and J. Stolfi. Non-homogeneous polynomial Ck splines on the sphere 5". Technical Report IC-00-10, Institute of Computing, Univ. of Campinas, July 2000. 3. A. Gomide and J. Stolfi. Bases for non-homogeneous polynomial Cfe splines on the sphere. In Lecture Notes in Computer Science 1380: Proc. LATIN'98 — Latin American Theoretical Informatics Conference, pages 133-140. Springer, Apr. 1998. 4. A. Gomide. Splines Polinomiais Ndo Homogeneos na Esfera. PhD thesis. Institute of Computing, University of Campinas, May 1999. (In Portuguese). 453 Approximation by perceptron networks Vera Kurkova Institute of Computer Science, Academy of Sciences of the Czech Republic Pod voddrenskou vezi 2, P.O. Box 5, 182 07 Prague 8, Czechia vera@cs.cas.cz 1 Introduction The classical perceptron proposed by Rosenblatt [22] as a simplified model of a neuron computes a weighted sum of its inputs and after comparing it with a threshold, applies an activation function representing a rate of neuron firing. To model this rate, Rosenblatt used the Heaviside discontinuous threshold function, which still is, together with its various continuous approximations, the most widespread type of activation used in neurocomputing. Formally, a perceptron with the Heaviside activation function computes a characteristic function of a half-space of T^"*, which is for practical reasons (all inputs are bounded) restricted to a box, usually [0,1]''. Thus theoretical study of perceptron networks leads to various questions concerning approximation of functions by a special class of plane waves formed by linear combinations of characteristic functions of half-spaces (corresponding to the simplest model of perceptron network called the one-hidden-layer network with a linear output unit). Although Rosenblatt's model was inspired biologically, plane waves (sometimes called ridge functions) have been studied for a long time by mathematicians motivated by various problems from physics. In contrast to integration theory, where functions are approximated by linear combinations of characteristic functions of boxes (simple functions), the theory of perceptron networks studies approximation of multivariable functions by linear combinations of characteristic functions of halfspaces. Expressions in terms of such functions exhibit the strength and weakness of plane waves methods described by Courant and Hilbert [4], page 676: "But always the use of plane waves fails to exhibit clearly the domains of dependence and the role of characteristics. This shortcoming, however, is compensated by the elegance of explicit results." In this paper we survey our recent results on properties of approximation by linear combinations of characteristic functions of half-spaces. We focus on existence of best approximation, impossibility of choosing among best approximations a continuous one, estimates of rates of approximation by linear combinations of n characteristic functions of half-spaces and integral representation as a linear combination of a continuum of half-spaces. This work was partially supported by GA CR 201/99/0092 and 201/02/0428. 454 Approximation by perceptron networks 2 455 Preliminaries A perceptron with aii activation function ip : TZ ^ TZ (where 72. denotes the set of real numbers) computes real-valued functions on Tl'^ x T^-''^^ of the form ■0(v • x + b), where x £ T?.'' is an input vector, V e T?.'' is an input weight vector and 6 S 7?. is a bias. The most common activation functions are sigmoidals, i.e., functions with an ess-shaped graph. Both continuous and discontinuous sigmoidals are used. Here, we study networks based on the discontinuous Heaviside function d defined by ^(t) = 0 for i < 0 and i9(t) = 1 for t > 0. Let Hd denote the set of functions on [0, l]** computable by Heaviside perceptrons, i.e., Fd = {/: [0, l]-^-^ 721/(x) = i?(v ■ X-f 6), V € 72^ 6 e 7e}. Notice that Hd is the set of characteristic functions of half-spaces of TZ'^ restricted to [0, l]'^. For all positive integers d, Hd is compact in {Cp{[0,1]**), ||.||p) withp e [l,oo) (see, e.g., [8]). This can be verified easily once the set Hd is reparameterized by elements of the unit sphere S'^ in 72.''+^. Indeed, a function 'd{v-x+b), with anon-zero vector {vi,... ,Vd,b) £ 72''+^, is equal to i9(v-x-t-6), where {vi,... ,Vd,b) € 5"* is obtained from {vi,... ,Vd,b) € 72^^+^ by normalization. The simplest type of multilayer feedforward network has one hidden layer and one linear output. Such networks with Heaviside perceptrons in the hidden layer compute functions of the form n 'Y^Wid{xi-yL-\-b), where n is the number of hidden units, Wi eTZ are output weights and Vj e 72** and 6^ € 72 are input weights and biases, respectively. The set of all such functions is the set of all linear combinations of n elements of Hd and is denoted by span^HdFor all positive integers d, UneAr+span^Hd (where Af+ denotes the set of all positive integers) is dense in (C([0,1]*^), ||.||c), the linear space of all continuous functions on [0,1]'' with the supremum norm, as well as in (£p([0, l]**), ||.||p) with p £ [1, oo] (see, e.g., [5, 9]). 3 Existence of a best approximation A subset M of a normed linear space {X, ||.||) is called proximinal if for every / € X the distance ||/-M|| = infggM ||/-ffll is achieved for some element of M, i.e., ||/-M|| = min^gM II/-5II (see, e.g., [23]). Clearly, a proximinal subset must be closed. A sufficient condition for proximinality of a subset M of a normed linear space (X, ||.||) is compactness or boimded compactness. However, by extending Hd into span^Hd for any positive integer n we lose compactness. Nevertheless compactness can be replaced by a weaker property that requires only those sequences that "minimize" a distance from M of an element of X to have convergent subsequences. More precisely, a subset M of a normed linear space (X, ||.||) is called approximatively compact if for each f & X and any sequence {gi : i € A/+} C M such that limi_>oo 11/ - 9i\] = 11/ - Af II) there exists g e M such that {gi : i E A/+} converges subsequentially to g (see, e.g., [23], p. 368). The following theorem is from [16]. Theorem 3.1 For all n,d positive integers, span^Hd is an approximatively compact subset of {Cp{[0,l]'^,\\.\\p) withpe[l,(x>). The proof is based on an argument showing that any sequence of elements of span^iJ^ has a 456 , yg^^ Kurkovd subsequence that either converges to an element of sp&n„ Hd or to a Dirac delta distribution, and the latter case cannot occur when such a sequence "minimizes" a distance from some function in .£,([0,1]'^). It follows directly from the definitions that each approximatively compact subset is proximinal. Corollary 3.2 For alln,d positive integers, span^iJ^ is a proximinal subset o/(£p([0,1]''), ||.||p) withp G [l,oo). Thus, for any fixed number n, a function in £p([0,1]'') has a best approximation among functions computable by a linear combination of n characteristic functions of half-spaces. 4 Uniqueness and continuity of a best approximation Let M be a subset of a normed linear space {X, ||.||) and let V{M) denote the set of all subsets of M. The set-valued mapping 'PM ■ X -^ P(M) defined by PMif) = {g E M : \\f - g\\ = \\f -M\\} is called the metric projection of X onto M and PA/(/) is called the projection of f onto M. Let F : X -^ P{M) be a set-valued mapping. A selection from F is a mapping (j): X -^ M such that for all f € X, <f){f) e F{f). A mapping </> : X —» M is called a best approximation operator from X to M if it is a selection from PMWhen M is proximinal, then Pmif) is non-empty for all / € X and so there exists a best approximation mapping from X to M. The best approximation need not be unique. When it is unique, M is called a Chebyshev set (or "unicity" set). Thus M is Chebyshev if for all / e X the projection PM(/) is a singleton. Recall that a normed linear space {X, ||.||) is called strictly convex (also called "rotund") if for all f y^ ginX with ||/|| = ||5|| = 1 we have ||(/ + ff)/2|| < 1. It is well known that for all p G (1, cxi), (£p([0, l]**), ||.lip) is strictly convex. The following theorem from [13] implies for p in the open interval (1, CXD) that if among best approximations to span„i?d (the existence of which is guaranteed by Corollary 3.2) there is a continuous one, then span„iJd must be a Chebyshev set. Theorem 4.1 In a strictly convex normed linear space, any subset with a continuous selection from its metric projection is Chebyshev. We shall combine this theorem with the following geometric characterization of Chebyshev sets with a continuous best approximation from [24]. Theorem 4.2 In a Banach space with strictly convex dual, every Chebyshev subset with continuous metric projection is convex. It is well known that £p-spaces with p G (l,oo) satisfy the assumptions of this theorem (since the dual of Cp is Cg where l/p+ l/q = 1 and q e (l,oo)) (see, e.g., [7], p. 160). Hence, to show the non-existence of a continuous selection, it is sufficient to verify that span„iJrf is not convex. Proposition 4.3 For all n, d positive integers, span„ifrf is not convex. Indeed, consider 2n parallel half-spaces with the characteristic functions 5i(x) = i9(v ■x + hi), where 0 > 6i > ... > 62n > -1 and v = (1,0, • • •, 0) G T?.**. Then | YllZi 9i is a convex combination of two elements of span„i?rf, X^"_j gi and J2illn+i 9ii ^^'^^ '*■ ^^ ^^^ ^^ span„Hrf, since its restriction to the one-dimensional set {(^,0,... ,0) G "T^"* : i 6 [0,1]} has 2n discontinuities. Summarizing results of this section and the previous one, we get the following corollary. Approximation by perceptron networks 457 Corollary 4.4 In {Cp{[0,1]'^), ||.||p) with p e (1, oo) for all n,d positive integers there exists a best approximation mapping from Cp{[0,1]'^) to span^Hd, but no such mapping is continuous. Thus convenient properties of projection operators sucii as uniqueness and continuity are not satisfied by span^Hd- These properties would allow one to estimate worst-case errors using methods of algebraic topology (see, e.g., [6]). In Unear approximation theory, application of such methods shows that some sets of functions defined by smoothness conditions exhibit the curse of dimensionality: the approximants converge at rate 0(1/v^n), where d is the number of variables and n is the dimension of the approximating linear space (see, e.g., [20]). Our results show that these arguments are not applicable to approximation by span^jETrf. 5 Rates of approximation Let {X, ||.||) be a normed linear space and G be its subset, then G-variation {variation with respect to G) is defined as the Minkowski functional of the set cl conv (G U —G), i.e., II/IIG = inf{c € 7^+://c G clconv (G U-G)}. Variation with respect to G is a norm on the subspace {/ € X : ||/||G < oo} C X. The closure in its definition depends on the topology induced on X by the norm ||.||. When X is finite-dimensional, G-variation does not depend on the choice of a norm on X, since all norms on a finite-dimensional space are topologically equivalent. Variation with respect to G has been introduced in [17] as an extension of the concept from [1] of iJd-variation called variation with respect to half-spaces. For functions of one variable, variation with respect to half-spaces coincides, up to a constant, with the notion of total variation studied in integration theory (see [1]). For G countable orthonormal, it coincides with /i-norm with respect to G(see [18]). The following theorem from [17] is a reformulation of Maurey-Jones-Barron Theorem (see [2], [10], [21]) on estimates of rates of approximation of the order of 0{l/y/n). Theorem 5.1 Let {X, ||.||) be a Hilbert space, G be its subset and SQ = supg^Q\\g\\. Then for every f G X and for every positive integer n, l!/-span„G||<-/('««-f«°'^-«^«= Corollary 5.2 For all positive integers d,n and for every f G (£2([0,1]'', IMIa); ||/-span„F,||2<MJj|i. Thus worst-case error in approximation of functions from the unit ball in i?d-variation by finear combinations of characteristic functions of n half-spaces of [0,1]'' is at most 1/y/n. Estimates derived from Theorem 5.1 are sometimes called "dimension-independent", which is misleading since with increasing number of variables, the condition of being in the unit ball in G-variation becomes more and more constraining. See [19] for examples of smooth functions with iJ^-variation growing exponentially with the number of variables d. However, such exponentially growing lower bounds 458 Vera Kurkovd on variation with respect to half-spaces are merely lower bounds on upper bounds on rates of approximation by span„i/d, they do not prove that such functions cannot be approximated with faster rates than \\f\\Hdl\/^- Finding whether these exponentially large upper bounds are tight seems to be a difficult task related to some open problems in the theory of complexity of Boolean circuits. Some insight into behavior of ffrf-variation gives its geometric characterization derived in [19] using the Hahn-Banach Theorem. Theorem 5.3 Let{X, ||.||) be a Hilbert space and G be its nonempty subset. Then for every f E X, II/IIG = sup,e5 snp\g-h\' ""^^'^ S = {h€X-G^: \\h\\ = 1}. geG Thus functions that are "almost orthogonal" to Hd (i.e., have small inner products with characteristic functions of half-spaces) have large /Jd-variation. 6 Integral representation The following theorem from [14] shows that a smooth real-valued function on 7?.'* with compact support can be represented as an integral combination of characteristic functions of half-spaces. By Jif";, is denoted the half-space {xeTZ'^ : e-x + b < 0}. Theorem 6.1 Let dbe a positive integer and let f : 11^ -^ TZ be compactly supported and d+2-times continuously dijferentiable. Then /(x) = / Wf{e,b)'d{e-x + b)dedb, where for d odd Wfie,b) = ad f A'=V(y)dy, kd = {d + l)/2, and aa is a constant independent of f, while for d even, Wf{e,b) = ad A'=<*/(y)a(e-y + 6)rfy, where a{t) = —tlog \t\ +1 for t ^0 and a(0) = 0, kd = (d+ 2)/2, and ad is a constant independent off. The assumption that / is compactly supported can be replaced by the weaker assumption that / vanishes sufficiently rapidly at infinity. The integral representation also applies to certain nonsmooth functions that generate tempered distributions. By an approach reminiscent of Radon transform but based directly on distributional techniques from Courant and Hilbert [4], it was shown in [11] that if / is compactly supported function on 11'' with continuous d-th order partial derivatives, where d is odd, then / can be represented as /(x) = / Vf{e,b)i9{e-x + b)dedb, Approximation by perceptron networks 459 where Vf = aa Jjj^^{Di^'^f){y)dy, aa = (-l)'=-Hl/2)(27r)i-'* for d = 2k + l, Di^'^f is the directional derivative of / in the direction e iterated d times, de is the {d — l)-dimensional volume element on S'''~^, and dy is likewise on a hyperplane. Although the coefficients Vf are obtained by integration over hyperplanes, while the Wf arise from integration over half-spaces, these coeiKcients can be shown to coincide by an appUcation of the Divergence Theorem [3] p.423 to the half-spaces H~f^. Theorem 6.1 extends the representation of [11] to even values for d and target functions / which are not compactly supported but which decrease sufficiently rapidly at infinity. FoiweCiiS'^-'^ xTZ) and f eViW^) de&ae Tff(u;)(x) = / w{e,b)'d{e-y: + b)dedb, SH{f){e,b) = Wfie,b). Theorem 6.1 shows that for each / € 'D{TZ'^), THiSnif)) = /• This theorem can be also used to estimate variation with respect to half-spaces by the £i-norm of the weighting function Wf = Vf. It is shown in [11] that for any / to which the above representation appUes, I/IIH. < / \wf{e,b)\dedb. Combining this upper bound on /fd-variation with Corollary 5.2, we get a smoothness condition that defines sets of functions that can be approximated by span„/7d with rates of the order of 1/y/n. Bibliography 1. Barren, A. R. (1992). Neural net approximation, in Proceedings of the 7th Yale Workshop on Adaptive and Learning Systems (pp. 69-72). 2. Barren, A. R. (1993). Universal approximation bounds for superposition of a sigmoidal hinction, IEEE Transactions on Information Theory 39, 930-9i5. 3. Biick, R. C. (1965). Advanced Calculus, McGraw-Hill: New York. 4. Courant, R. and Hilbert, D. (1962). Methods of Mathematical Physics, vol. 2, Wiley: New York. 5. Cybenko, G. (1989). Approximation by superpositions of a single function. Mathematics of Control, Signal and Systems 2, 303-314. 6. DeVore, R., Howard, R. and Micchelli, C. (1989). Optimal nonlinear approximation, Manuscripta Mathematica 63, 469-478. 7. Friedman, A. (1982). Foundations of Modem Analysis, Dover: New York. 8. Gurvits, L. and Koiran, R (1997). Approximation and learning of convex superpositions. Journal of Computer and System Sciences 55, 161-170. 9. Hornik, K., Stinchcombe, M. and White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks 2, 251-257. 10. Jones, L. K. (1992). A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training, Annals of Statistics 20, 608-613. 460 Vera Kurkovd 11. Kurkova, V., Kainen, P. C. and Kreinovich, V. (1997). Estimates of the number of hidden units and variation with respect to half-spaces, A^'ewra? A^etoorfo 10, 1061-1068. 12. Kainen, P. C, Kurkova, V. and Vogt, A. (1999). Approximation by neural networks is not continuous, A^eurocompMimg 29, 47-56. 13. Kainen, P. C, Kurkovd, V. and Vogt, A. (2000). Geometry and topology of continuous best and near best approximations. Journal of Approximation Theory 105, 252-262. 14. Kainen, P. C, Kurkova, V. and Vogt, A. (2000). An integral formula for Heaviside neural networks, iVe«ra/TVetoorfc VForirf 10 313-319 15. Kainen, P. C, Kurkovd, V. and Vogt, A. (2000). Best approximation by Heaviside perceptron networks. Neural Networks 13 645-647. 16. Kainen, P. C, Kurkovd, V. and Vogt, A. (2001). Best approximation by linear combinations of characteristic functions of half-spaces (submitted to J. of Approx. Theory). 17. Kurkova, V. (1997). Dimension-independent rates of approximation by neural networks, in Computer-Intensive Methods in Control and Signal Processing: Curse of Dimensionality (Eds. Warwick, K., Karny, M.) (pp. 261-270). Birkhauser: Boston. 18. Kurkova, V., Sanguineti, M. (2001). Bounds on rates of variable-basis and neural network approximation, IEEE Trans, on Information Theory 47, 2659-2665. 19. Kurkovd, V., Savicky, P. and Hlavdckova, K. (1998). Representations and rates of approximation of real-valued Boolean functions by neural networks, Neural Networks 11, 651-659. 20. Pinktis, A. (1986). n-Width in Approximation Theory, Springer: Berlin. 21. Pisier, G. (1981). Remarques sur un resultat non public de B. Maurey, in Seminaire d'Analyse Foncttone/fe I., n.l2, Ecole Polytechnique, 1980-81. 22. Rosenblatt, F. (1958). The perceptron: A probabihstic model for information storage and organization of the brain. Psychological Review 65, 386-408. 23. Singer, I. (1970). Best Approximation in Normed Linear Spaces by Elements of Linear 5w6spaces, Springer: Berlin. : 24. Vlasov, L. P. (1970). Almost convex and Chebyshev sets. Math. Notes Acad. Sci. USSR 8, 776-779. 25. Zemanian, A. H. (1987). Distribution Theory and Transform Analysis, Dover: New York. Eye-ball rebuilding using splines with a view to refractive surgery simulation Mathieu Lamard Laboratoire de Traitement de I'Information Medicale, Ecole Nationale Superieure des Telecommunications de Bretagne, F-29609 Brest Cedex, Prance. Mathieu.LamardOenst-bretagne.fr Beatrice Cochener CHU de Brest Ophtalmologie, 5 avenue Foch 29609 Brest Cedex, Prance. Beatrice.Cochener-Lainard@chu-brest.fr Alain Le Mehaute Departement de Mathematiques, UMR 6629, CNRS, Universite de Nantes, BP 92208, P-44072 Nantes, Cedex 3, France Alain.Le-MehauteOmath.univ-nantes.fr Abstract In this paper we present a use of splines in the biomedical field. 1 Introduction In the surgical field of ophthalmology, refractive surgery has experiencied an important expansion for about fifteen years. It allows the surgeons to correct different refractive errors (myopia, hyperopia, astigmatism) aiming to decrease or minimize the use of optical equipments such as glasses and lenses. Many surgical techniques are today available for experts; with specific indications for each of them. Development of these methods commonly takes time and requires many research studies on animals before any clinical approach. In overall, abacus are established for all procedures. They provide to the surgeon some rules for the achievement of the surgery. These nomograms are usually based on statistical analysis of first wide series of operated patients. However, up to now, no technique is able to take into account individual variability of eyes (morphology, physiology). The purpose of the present article is to consider this parameter in building a 3 dimensional numerical model of the eye and then applying to it various simulations of surgical techniques in order to measure their effects. 462 Eye-ball rebuilding using splines with a view to refractive surgery simulation 2 2.1 Eye and vision The eye anatomy Schematically the eye-ball has quite a spherical shape with a vertical diameter (approximately 23 mm) and an antero-posterior of 2 mm longer (axial length). Its average volume is 6.5 cm^ for a weight of 7 grams. 2.2 Refractive errors When parallels rays reach a normal eye, they are refracted and converge without accommodation on the retina (called emmetropia). Errors of refraction come from a disparity between the refractive capacity of the anterior segment of the eye and the length of the eye; the light rays are no longer focus on the retina. This is called ametropia, and is mainly of three types; myopia, hyperopia, astigmatism. 3 3.1 Correction of ametropia Optical equipment Glasses or lenses represent the traditional method. Glasses are safe and reversible for correction of most refractive errors but they can be responsible for visual field reduction and prismatic aberrations. They can also be a source of discomfort and cosmetic impairment for the wearer. Contact lenses have solved most of the problems associated with glasses, but require very strict hygiene to avoid severe complications. Refractive surgeries can bring an answer to these various problems. 3.2 Refractive surgery Many techniques are available today in refractive surgeries. Most of them plan to reshape the cornea using of an excimer laser (193 nm). This laser (emitting in far UV) is used in two distinct surgeries; — The Photo Refractive Keratomileusis (PRK) — Laser Assisted In Situ Keratomileusis (LASIK). The PRK technique removes cornea tissue on its surface in breaking molecular bindings. The depth and size of the ablation is determined as a function of the attempted correction. In LASIK the ablation is performed after' the cut of a thin cornea flap (160 fim). This flap is replaced on the area of stromal ablation. In general PRK is used for correction of low ametropia and LASIK for low and medium corrections. For height corrections other concepts have been developed (additive surgery). 4 Data acquisition In order to reconstruct the eyeball in 3D, data from the eye under consideration are needed. Numerous modalities allow us to obtain information about the eye anatomy. 4.1 Ultrasound Ultrasound scan uses ultrasound waves for investigating human tissues in vivo. Nowadays in ophthalmology it is a routine exam for the posterior segment of the eye, especially for the research of foreign intra-ocular body. Reasons for this intense use are multiple. 463 464 Mathieu Lamard, Beatrice Cochener, and Alain Le Mehaute including non invasive procedure, speed and low cost. But problems remain, which define the current limits ultrasound. Multiple phenomena of reflection (between two internal interfaces, or between an interface and a transducer itself) create false echos. Inaccuracies quickly increase with the deepening of the investigation because of all sources of "background noise", such as diffraction, diffusion and refraction. Advantages of ultrasound allowed us to use it without constraint to obtain maximum image quality. Our first work was to set up an images acquisition protocol of quality. The protocol privileged the underwater method to obtain a good acoustic coupling between the probe and the eye. The patient is lying on his back, he is wearing on his face a submarine mask without pane. This mask is filled with physiological serum. The probe, equipped with a Ughting target, is plunged into the liquid. The patient fixes the target, in such conditions the provided images are along the optical axis. The operator turns this probe manually and regularly around the optical axis, and obtains a volume of data. A computer equipped with an image acquisition board can save all images on an hard disk. The images resolution is dependent on the probe and on the frequency of the ultrasound used. 4.2 MRI The MRI, which tries to localize hydrogen pits by measuring their magnetization, realizes a real grey scale cartography of the proton concentration of the various examined structures [1]. The resultant data volume has a dependent acquisition time resolution, which currently represents one of the main important limitations of this technique. Besides the big quality of images obtained, the MRI has probably no harmful effect because it does not use ionisants beams. 4.3 Computerized corneal topography The anterior surface of the cornea is one fundamental element of the refraction. Any modification or abnormality of this surface modifies the visual acuity. So the knowledge of this shape is extremely important. In a traditional way the Javal's keratometer is used to know punctually the refractive power of the cornea. In the last few years ophthalmologists have become used to another system, computerized corneal topography [2]. This technique, based on the reflection and the analysis of the Placido's discs deformation, allows us to obtain numerous data on the topology of the cornea. The curvature of the cornea is represented on a colored map. 4.4 Visible Human images The images of the Visible Human project (the photographic modality) have great space resolution. They allow us to make reconstruction tests without acquisition problems. 5 Data segmentation The purpose of this section is to addign a weight to each pixel of the image. The greater the weight the greater the contribution of this pixel to the reconstruction of the edge will be. Eye-ball rebuilding using splines with a view to refractive surgery simulation 5.1 Pretreatments Little pretreatment were done on the images under various modality. The speckle filtering or the use of enhancement contrast filter have a sure visual action but the reconstruction does not seem to be affected in our specific case. The only pretreatment used is an overlooked one. The ophthalmologist places four points on each image to isolate the lens and hence helps the treatment filters. 5.2 Treatments To affect a weight to each pixel of the image, numerous edge detection filters were tested, using different methods, LOG, Canny-Deriche, Shen-Castan, and the operator based on the geometrical moments. The most convincing results were obtained with the Canny operator. It has been created as the solution of an optimization problem with constraints [3]. This filter is supposed to be an optimal compromise between the following criteria: localization, detection and unicity. We have to note that this filter is optimized for images flooded in a white, Gaussian, additive noise; and it is not the case in most of the used data. This filter is actually one of the references in the edge detection for its quality of results; it is regularly used in the literature to the evaluation of new filters. A recursive implementation of this operator was developed by [4] allowing an important performance gain. The third dimension filter is obtained by supposing the filter separable and by making a convolution product. This choice is an easy one but it introduces anisotropies. These results images are difficult to use, and as recommended by [5], we extract its local maxima. This method consists in estimating the gradient direction and only keeping its watershed. 5.3 Post treatments The previous stages can be applied to any type of images without taking into account their contents. Two post-treatments types are presented to take into account peculiarities of the eye contents. The first post-treatment consists to take into account ultrasound sound images and MRI particularities. The center of the eye have got no edges and generally the first visible edge is the good one. The "visible human" project images [10] have specifics characteristics. They are in fact photos of frozen tissues; crystals of ice are clearly visible in the vitrous, while it is uniform in the other modalities. A simple threshold is ineffective. The hysteresis threshold , introduced by [3] takes into account the edges connexity and luminance (levels of grey) and give us good results on such images. 6 Eyeball rebuilding with splines The most used techniques for edge reconstruction on medical prints are snakes (active contour models) [7, 8]. A shape approaching the organ to be reconstructed is initialized, then deformed locally to fit the data. These deformations use, generally, physical properties of elasticity materials. These various methods allow the organ edge reconstruction of varied forms as bones, heart, brain, etc.. This type of reconstruction is effective but numerous parameters must be set. We opted for a different technique. The edge to be 465 466 Mathieu Lamard, Beatrice Cochener, and Alain Le Mehaute reconstructed in our case is a quasi-spherical shape, and we reconstruct it by using BspHnes. Their mathematical properties allow us to reconstruct the edge in a effective and fast way, and with adjusting only few parameters. 6.1 Principle For a B-spline (ID) on R = [a, b] we have to set: — the degree k of the spline, — the position and numbers of the knots {Xi,i = 0, ...,g + 1), — the coefficients Ci of the spline representation: 9 i=—k where Ni^k+ii^:) is the B-spline basis function. We have chosen to set the degree of the spline to 3. Tests indicate this is a good compromise between computer time and result quality. The other parameter determination depends on the approximation criteria used and the position of control knots. 6.1.1 The Dierckx criteria [6] The Dierckx approximation criteria determine a spline like the solution of a constrained minimization problem: mmimize n:=j:{si'H^i+)-s('H^i-)y with the constraint S ■.= 'Y2 {"WriVr - s{Xr))f < S where {xr,yr) are the coordinates of the m data points, with Wr the associated weight. 6.1!2 Control knots number As the number of control knots becomes important, the smoothness of the curvature decreases. Using that property we set up an iterative algorithm to perform the calculation of the spline. After an initialization with few control knots (we set for example AQ = a, X2 = b and Ai = (a 4- 6)/2), the spline is computed. If the smoothness is too important (with the S estimation) we add some control knots and we start again the estimation of the smoothness. In the other case we stop the algorithm. At each iteration we can insert one or more control knots. The distribution of the control knots is recomputed for each iteration. They can be linearly distributed over R = [a,b]. This method can be generalized to surfaces without difficulty (see [6]) using spherical coordinates and periodic boundary conditions. Eye-ball rebuilding using splines with a view to refractive surgery simulation 6.2 Results Different results are presented either in 3d or 2d view. In 2d view, the spline is drawn in red, and represents the intersection of the 2d spline and the data volume. The main reconstruction errors are due to segmentation errors. But the more data the better, and the quality of the reconstruction needs to be good. The reconstruction of images issued from visible human (22 shces) is better than from the MRI (8 slices) and the ultrasound images (4 slices). FIG. 1. Reconstruction using photographic images. FIG. 2. Reconstruction using MRI. 467 Mathieu Lamard, Beatrice Cochener, and Alain Le Mehaute 468 .,«»,. lillilll<|i||l|l|.M ■. ■iiJH !' 'fH FIG. 7 3. Reconstruction using ultra sound images. Elastic modelisation of surgery 7.1 Method used The finite elements method is used to simulate surgery and solve the elasticity problem. Actually the knowledge of the comportment law of the eye ball tissues is the main limitation of this problem. Literature reports a wide range of coefficients to describe these tissues. In fact they seem to have an individual variability. So we use the approximation [9] for the elasticity coefficients which uses three parameters, internal pressure, radius of the eye and width of the edge. The use of complex models does not offer much information because of the low precision of the data that we used. 7.2 Results Numerous simulations have been done. Results seem good in spite of the comportment law and the duration of the finite element method. The result are represented with a color map of the eye representing the curvature radius like the ophthalmologist does. FIG. 4. Excimer Simulation before (left) and after (right). Eye-ball rebuilding using splines with a view to refractive surgery simulation 8 Conclusion This article presents a very modular path to realize modelisation of refractive surgeries. Each part of this work can be independently modified and can be adapted to an other organ. All this work has been validated by ophthalmologists. The eye ball reconstruction using sphne appears to be an efficient method with a low CPU time. The mechanical modelisation provides proper results despite several approximations. This study might be useful for the medical doctor but also for testing new surgical techniques. Bibliography 1. C. Dupas, La RMN au service de la medecine, La Recherche 81 (1977), 778-781. 2. S.D. Klyce, Computer assisted corneal topography : High resolution graphic presentation and analysis of keratoscopy, Invest. Ophtalmol. Vi SCI. 25 (1984), 1426-145. 3. J. Canny, A computational approach to edge detection. IEEE Transaction on Pattern Analysis and Machine Intelligence. 6 {1986), 679-698. 4. R. Deriche, Fast algorithms for low level vision, IEEE Transaction on Pattern Analysis and Machine Intelligence. 12 (1990), 78-87. 5. R. Deriche, Techniques d'extraction de contours, (http://www-sop.inria.fr/robotvis/personnel/der/der-eng.html) Cours de I'INRIA Sophia-Antipolis, 1998. 6. P. Dierckx, Curve and surface fitting with splines. Clarendon press, Oxford, 1995. 7. M. Klass A. Witkin D. Terzopulos, Snakes: active contour models, Int J Comput Fmon. 1 (1988), 321-331. 8. D. Metaxas, Physics-based deformable models : Application to computer vision, graphics and medical imaging. (1996) Kluwer Academic. 9. P.P. Purslow W.S. Karwatowski, Ocular Elasticity, Ophtalmology. 103 (1996), 16861692. 10. http://www.nlm.nih.gov/research/visible/visible-human.html. 469 A robust algorithm for least absolute deviations curve fitting Dongdong Lei, Iain J Anderson University of Huddersfield, Huddersfield, UK. d.lei@hud.ac.uk Maurice G Cox National Physical Laboratory, Teddington, UK. Maurice.Cox@npl.co.uk Abstract The least absolute deviations criterion, or the £i norm, is frequently used for approximation where the data may contain outliers or 'wild points'. One of the most popular methods for solving the least absolute deviations data fitting problem is the Barrodale and Roberts (BR) algorithm (1973), which is based on linear programming techniques and the use of a modified simplex method [1]. This algorithm is particularly efficient. However, since it is based upon the simplex method it can be susceptible to the accumulation of unrecoverable rounding errors caused by using an inappropriate pivot. In this paper we shall show how we can extend a numerically stable form of the simplex method to the special case of £i approximation whilst still maintaining the efficiency of the Barrodale and Roberts algorithm. This extension is achieved by using the £i characterization to rebuild the relevant parts of the simplex tableau at each iteration. The advantage of this approach is demonstrated most effectively when the observation matrix of the approximation problem is sparse, as in the case when using compactly supported basis functions such as B-splines. Under these circumstances the new method is considerably more efficient than the Barrodale and Roberts algorithm as well as being more robust. 1 Introduction Given aset of m data points {{xi,yi)}^i, the ^i, or least absolute deviations curve-fitting problem seeks c G IR" to solve the optimization problem min||y-Ac||i = ^ Ei'-'i' i=l (1-1) where Ais anmxn observation matrix, and Vi denotes the residual of the ith point. Another way of stating the £i, or least absolute deviations curve-fitting problem, is by the characterization theory of an £i solution [8], which may be given in different forms. The following is perhaps the most commonly used. 470 A robust algorithm for least absolute deviations curve fitting 471 A vector c 6 M" solves the minimization problem (1.1) if and only if there exist A € M"" such that A^A = 0 with ('^^1-3' . ^;''lf \ Ai = sign(ri), ioTi^Z, (1.2) where Z represents the set of indices for which TJ = 0. One of the popular methods designed for solving the £i approximation problem is the Barrodale and Roberts (BR) algorithm. It replaces the unconstrained variables c and r in (1.1) by nonnegative variables c+, c~, u and v^ and considers the linear programming problem min e^u + e^v subject to Ac^-Ac~+u — v c^,c~,u,v > 0. c =y, (1.3) Much of the reason for the popularity of the BR algorithm is that it exploits the characteristics of the ii approximation in order to solve the problem in a more efficient manner than the general simplex approach. However, it is a simplex based method, and so it is susceptible to numerical instabilities caused by using inappropriate pivots. The new method presented here uses matrix factorization instead of simplex pivoting. This approach allows numerically stable updates to be made, thus avoiding the uimecessary build-up of rounding errors. This method is particularly efficient when the observation matrix is large and sparse [5]. Bartels [2] and Gill and Murray [4] presented methods that concentrate on avoiding the inherent instability of the simplex method. However, these methods are designed for a general linear programming problem and if we were to employ these techniques for the special case of the i\ problem, the storage requirements and computational workload of the method would be unnecessarily large compared to those of the highly efficient BR algorithm. The ii problem is, in essence, an interpolation problem. The aim of any iterative procedure for the £i problem is to find an optimal set of interpolation points. Indeed, this is how the BR algorithm solves the ii problem. It begins with all coefficients, c, set to zero (being non-basic variables), and during each iteration of stage one, one of the residuals, r-j, becomes non-basic by making the corresponding point an interpolation point (i.e., the coefficients are altered so that TJ = 0). At the end of stage one, the current estimate interpolates n distinct points. During stage two, the interpolation points are exchanged one at a time with a non-interpolation point imtil an optimal solution is achieved. In fact, the new algorithm is effectively identical to the BR algorithm in the sense that we use exactly the same pivoting strategy. However, we start with a predetermined set of interpolation points and do not store the simplex tableau directly. In each iteration, we only reconstruct the parts of the simplex tableau that are needed by the more stable approach employed. D. Lei, I. J. Anderson, and M. G. Cox 472 2 A more stable computational approach The linear programming presentation of a least absolute deviations curve-fitting problem is given in (1.3). It is a standard linear programming problem of dimension mx (2m+2n). The robust approaches of Bartels and Gill and Murray can be applied to solve it. They involve the factorization of an m x m matrix. On the other hand, the BR algorithm only deals with an m x n matrix in each iteration, if m » n, the direct usage of these stable approaches is less efficient. We shall show next that the factorization of an n x n matrix is all that is required at each iteration. We split the data points based on the set interpolation Z, and let Az, yz, uz and vz be the counterparts of A, y, u and v in (1.3) corresponding to the set Z. Their compjementary matrix and vectors are denoted by Az and yz, uz and vz, so that Az and Az comprise A, etc., problem (1.3) can be expressed as mm e^ [uz + Uz) + e (vz + vz) subjec;t to Azc^ - AzC +UZ-VZ Azc^ — Azc~ +uz -Vz c+,c-,uz,uz,vz,vz = yz, = yz, (2.1) >o. Since the coefficients for c^ are just the negative of the coefficients for ct, j = 1,2,.!., n, it is possible to suppress cj and let c represent the unconstrained variable. The initial simplex tableau associated with problem (2.1) can be constructed in matrix form by Table 1, where e^. A; = m, n, m - n, are fc x 1 vectors with all components equal to one. BV c Uz Uz Vz Vz r Uz Az I 0 -I 0 yz Uz Az 0 / 0 -/ Vz 0 0 -2el Az\ \ Az ) T( Z TAB. -2pT 1. The initial simplex tableau of the ii fitting problem. As we know, the simplex method is an iterative procedure in which each iteration is characterized by specifying which m of 2m + n variables are basic. For the li approximation, we are only concerned with those vertices which are formed by a set of interpolation points. For n interpolation points, the basic variables consist of n of the coefficient parameters c and m - n of the parameters uz corresponding to the non-interpolation points. Let B be the rnxm basis matrix whose columns consist of the m columns associated with the basic variables. Then A robust algorithm for least absolute deviations curve fitting BV Uz r c Af Azhz uz -AzA-i rz Z -el_„iAzA-^')-el ^m-Jz 473 TAB. 2. The condensed simplex tableau associated with a set of interpolation points. B= (2.2) Ai It is readily verified that the inverse of B can be written in the form of (2.3) as long as Az is invertible. A-z' B- 0 AzA-^^ (2.3) Equation(2.3) shows that the explicit inverse computation of an m x m matrix in the form of (2.2) can be achieved by dealing with an inverse of an n x n matrix, and in general, w^.m. To make the m non-basic variables become basic, we multiply the whole simplex tableau hy B~^, and omit the identity and zero matrices. Then new simplex tableau is given in Table 2. An arbitrary choice of the interpolation set Z may cause some of the values in the right hand side column to become negative. Although it is permissible for the coefficient parameters c to be negative, for those rows having negative residuals Fz, we restore feasibility by exchanging the corresponding uz for vz- This exchanging can be made by subtracting twice those rows from the objective row and changing the sign of the original rows [1]. Such an exchange process can be expressed in matrix terms by introducing a sign vector Xz =sign(rz). Let Azs represent the matrix which is obtained by multiplying those rows of Az associated with negative residuals by —1, Azs = dia.g{\z)Az. D. Lei, I. J. Anderson, and M. G. Cox 474 TAB. BV Uz T c A-z' A'zVz Uz -AzsA-i Vz\ Z -\l{A^A-i)-el '~^lrz 3. Restoration of feasibility of the simplex tableau. The simplex tableau after restoring feasibility is shown in Table 3. The point to be removed from Z is decided by the values of the objective row. Each time the maximum value of the objective row (including the suppressed columns) is chosen, we let the index of this element be k. In order to choose which new point is to join the set 2, we compute the value of the pivotal column, the fcth column in the simplex tableau. Since the simplex tableau is in the form of ^Zs ^z ' the fcth column can be obtained by using Azs and the fcth column of A^ • The BR algorithm pivoting strategy is adopted to decide which new point is to be added to the interpolation set, when a new set of indices Z is generated. We repeat the process in an iterative manner until the optimal solution is achieved. Table 3 is in fact identical to the simplex tableau of the BR algorithm in stage 2. The difference here is that the BR algorithm is implemented by a simplex pivoting approach, while the transformation of the simplex tableau in the form of Table 3 can be accomplished in a numerically more stable manner. 3 The improved method The improved method starts with a predetermined interpolation set Z, the minimum requirement for Z being that it forms a well-behaved matrix Az- For B-spline basis functions, we can choose any set of points satisfying the Schoenberg-Whitney condition [6]. For a Chebyshev polynomial basis, points close to the n Chebyshev zeros can be regarded as the initial interpolation set. In other cases, we can choose points approximate to them or even uniformly distributed. If we denote the set of Aj, i G 2, as Az, we can rewrite the characterization equation (1.2) as (3.1) Az^^z = -Az^z, and \z can be obtained mathematically from A robust algorithm for least absolute deviations curve fitting \z = -{Alr\AT\z). 475 (3.2) Table 3 shows that the objective row can be computed as Objective row =- -{XlAz )A^' el. (3.3) Thus, using (3.2) we conclude that Objective row = A^ -el (3.4) We know that at the i\ solution all the values in the objective row are in the range [—2, 0], and also |A| < 1. This latter result can be explained in terms of the former by the relationship (3.4). (3.4) is useful because it can be used to verify whether an interpolation set forms an optimal solution, or to compute A from the values of the objective row. We use it to compute the values of the objective row. The improved method can be summarized as follows; (1) Choose an initial set of interpolation points and form the set Z. (2) Construct A^, yz and their counterpart Az^, yz accordingly. (3) Solve the equation Azc — yz for c, and compute rz = yz- Azc, and Xz = sign(r2). (4) Obtain the values of Xz from the equation AlXz = -AlXz. (3.5) (5) If \Xz\ < 1 hold, the current solution is optimal, and the algorithm terminates. Otherwise, continue. (6) Obtain the objective row of the BR simplex tableau from T objective row = A^ — e.„T n' (7) Examine the values of the objective row; the point associated with the maximum value of the objective row is chosen to leave the set Z. (8) Decide the point to add by the BR pivoting strategy. Obtain a new set of indices Z, and repeat from step 2. 4 Practical considerations and application to the ii spline approximation The robustness of the above algorithm stems from the rehable updating of the relevant parts of the simplex tableau in each iteration. The major computational work is obtaining (explicitly or imphcitly) the inverse of an n x n matrix Az- It can be calculated and stored explicitly by using an LU or QR factorization, or preferably it can be expressed as a product of factors. Since Az differs from its predecessor by only one row, savings can be made by reusing results from the previous step. Necessary material is available [4, 7] regarding the stable implementation of this row updating procedure. D. Lei, I. J. Anderson, and M. G. Cox 476 m = 512 \' 1 = 44 49 54 59 64 69 74 79 Numbers of iterations Execution Time (seconds) New BR New BR 57 75 71 83 78 88 75 87 125 111 134 156 160 194 165 189 1.6 2.2 2.4 3.0 3.1 4.0 3.7 4.8 14.7 13.4 20.2 26.8 32.' 42.4 36.0 48.1 TAB. 4. The number of iterations and execution time taken by the algorithm of this paper and the Barrodale and Roberts algorithm for a set of 512 response data points provided by the National Physical Laboratory. Sparsity almost always is more important than matrix dimension. Additional savings can be made if the observation matrix A is sparse or structured. Approximation using a B-spline basis often occurs in practical applications. In such cases, A is block banded, and Az can be triangularized using 0{n) flops [3]. Similarly, the sparsity of A can be exploited to compute other relevant parts of the simplex tableau efficiently. We have applied our method to solve the least absolute deviations curve-fitting problems by B-splines using various numbers of interior knots. All software was written in MATLAB and implemented on a Sun Workstation. The initial interpolation points are chosen to be those points corresponding to the maximum value in each column of the observation matrix A. Some of our computational results are reported in Tables 4 and 5. Each table presents the outcomes of a particular set of data points by the new method and by the BR algorithm. All the experimental results exhibit the effectiveness of the improved method on large, sparse systems. Although these tables show that the improved method is faster than the BR algorithm, it would be unfair to judge the convergence speed purely based upon the time taken, since the improved method embodies some MATLAB built-in functions, while the BR algorithm uses only user-defined functions. However, on average, the new method requires far fewer iterations than the BR algorithm, and is competitive with the BR algorithm both in efficiency and accuracy for a structured system. Further work to be addressed by the authors will involve a definitive implementation of this algorithm in Fortran, and development of an error analysis for both the improved method and the BR algorithm. A robust algorithm for least absolute deviations curve fitting m = 1200 Numbers of iterations Execution Time (seconds) 9= New BR New BR 50 56 62 68 74 80 86 92 98 82 105 113 131 121 132 155 173 153 143 165 190 189 223 216 245 252 272 4.0 5.2 6.1 7.6 7.8 9.2 11.8 14.0 13.6 58.7 85.8 110.2 110.4 157.9 163.2 209.8 241.8 292.6 TAB. 5. The number of iterations and execution time taken by the algorithm of this paper and the Barrodale and Roberts algorithm for a set of 1200 data points, generated by MATLAB command x = linspace(l, 10,1200)'; y = log(a;) + randn(1200,1). Bibliography 1. I. Barrodale and F. D. K. Roberts. An improved algorithm for discrete £i linear approximation. SIAM Journal of Numerical Analysis 10, 839-848, 1973. 2. R. H. Bartels. A stabiUzation of the simplex method. Numerical Math. 16, 414-434, 1971. 3. M. G. Cox. The least squares solution of overdetermined linear equations having band or augmented band structure. IMA Journal of Numerical Analysis 1, 3-22, 1981. 4. R E. Gill and W. Murray. A numerically stable form of the simplex algorithm. Linear Algebra and its Applications 7, 99-138, 1973. 5. D Lei, I. J. Anderson, and M. G. Cox, An improved algorithm for approximating data in the £i norm. In R Ciarlini, M. G. Cox, E. Filipe, F. Pavese, and D. Richter, editors. Advanced Mathematical and Computational Tools in Metrology V, 247-250, Singapore, 2001. World Scientific Publishing. 6. M. J. D. Powell. Approximation Theory and Methods. Cambridge University Press, Cambridge, UK, 1981. 7. R. J. Vanderbei. Linear Programming — Foundations and Extensions. Kluwer Academic Publishers, Boston, MA, US, 1997. 8. G. A. Watson. Approximation Theory and Numerical Methods. Wiley, New York, US, 1980. 477 Tomographic reconstruction using Cesaro-means and Newman-Shapiro operators Ulrike Maier Mathematisches Institut, Justus-Liebig University, 35392 Giessen, Germany Ulrike.Maier9math.uni-giessen.de Abstract Tomography is well known because of its many applications. Although theoretically solved, the numerical implementation of tomographic reconstruction algorithms is still a difficult problem. In this article the numerical implementation of a reconstruction method using Cesaro-means and Newman-Shapiro operators is described. The key point herein is the use of suitable quadrature formulae on the sphere. It turns out that in the context described product Gaussian formulae are best suited. The algorithm is tested at the so called Shepp-Logan phantom which is a three dimensional model of a human head. 1 Introduction and notation The problem in tomography is to reconstruct a function F from its Radon transform sufficiently well. Since certain classes of functions can be expanded into series of orthogonal polynomials it is essential to exploit the action of the Radon transform on orthogonal polynomials and on polynomials in general. This approach is the more interesting since the inverse of the Radon transform for polynomials is known explicitly. The convergence of orthogonal expansions to the given function is often achieved only by applying a summability method. The application of such methods can be interpreted as a kind of "filter technique" which is necessary for sufficiently good reconstructions. The combination of an expansion of the function and the application of suitable summability methods leads to promising reconstruction algorithms. In this article two examples for a summabiUty method and their implementation are presented — the Cesaro-means and Newman-Shapiro-means. After some introductory remarks on Laplace-series at the end of this section, in Section 2 the theory of summability methods needed here is presented. In Section 3 this theory is applied to the reconstruction of functions from their Radon transform. Section 4 describes the numerical implementation of the reconstruction formula which is tested on the so called Shepp-Logan phantom of a head in Section 5. In this article the following notation is used. Let B'' denote the unit ball in M^, S''~^ denote the unit sphere and Z'" := [-1,1] x S'""^ xy denotes the Euclidean product of x,yem''. 478 Tomographic reconstruction 479 The spaces of restrictions of r-variate polynomials, homogeneous polynomials and homogeneous harmonic polynomials of degree ^ € NQ onto a subspace X C MT {X = S"-i or X = JB'-) are denoted by IPJ^iX), iP; (X), HJ^ (X), respectively. The space C{S'^~^) of all continuously differentiable functions is provided with the inner product < F,G >:= Jgr-i F{x)G{x)dx. The surface measure of the sphere is denoted by w^-i =< 1,1>. Let C^ denote the Gegenbauer polynomials of degree /i and index A and C^ = C^/C^{1) the normalized Gegenbauer polynomials. The reproducing kernel function of * 1u -\- r 2 T—2 ^r (5r-i^ jg given by G^{xy) = — C^= [xy), the normalized reproducing kernel G^ is defined by G^ := G^/G^{1). hetY £ {C{S^-^),L^S''-'^),LP{S''-^)}.Foi f eY let oo oo „ ^(/.^) = E(M(^) = E/ mGu{xy)dy (1.1) be the Laplace-series of /, where [Ki,f){x) := /^^.i f{y)G^{x,y)dy is the orthogonal projection of / onto HI (S'""^) and the partial sums i^(/, x) = Y^'^^Q (■^•^f) (^) ^■re the orthogonal projections of / onto 1F^{S'^~^). Whereas for Y = i^(S''~-^) it is known that the partial sums L^{f, x) converge to / in norm, no convergence is obtained for Y = C{S'^~^) or Y" = LP{S'^~^) for p > 2H 2 and p <2 (see e.g. [l]p.211). Applying a summability method the situation changes. 2 Summability methods Let A = {a^v)fi,veiNo be an infinite matrix for which the elements a^^ G M fulfil the following properties. (i) a^v = 0 ioi V > ij., (ii) lim^_>oo a^,v = 1 for z/ € {0,1}, (iii) X^(0 > 0 for -1 < e < 1, where K^ := ^^^^g a^^G^. If with the aid of a summability method the kernel G^ in (1.1) is substituted by a kernel K^. = ^a^^uG^ (2.1) then the operator L"^ defined by the transformed series £^(/,x)=lim/ f{y)K^{x,y)dy (2.2) can be shown to converge pointwise to the identity provided that for the kernel K^ the properties (i)-(iii) of the matrix A are valid. 480 U. Maier Remark 2.1 The coefficients a/^u can be obtained from a^, = {L^G,{t.)){t) = f G,{tx)K^{tx)dw{x), t G 5'^^ For A being the matrix of the Cesaro-means the proof was given by Kogbetlianz [4] first. Berens et al. [1] give a proof for Cesaro-means as well as for Abel-Poisson-means. They also prove results on the order of convergence and the corresponding saturation classes. The convergence proof for Newman-Shapiro operators {Y = C{S'"'^)) can be found in Reimer [7]. 2.1 Cesaro-means For Cesaro-means the coefficients a^^ in the summability method have to be chosen as a, ■"'' (l)p (fc-I-1)^-1/ (fc + 1)^ (1)^_. .^^. where (p)g =p-{p+l)-.. .-(p+q-l) denotes the Pochammer symbol. Then the kernels Ky, in (iii) take on the form Convergence of the transformed Laplace-series (2.2) is vaHd for k > (r-2)/2; for A; > r-l the operators even are positive (see Kogbetlianz [4]). 2.2 New^man-Shapiro summability method In [8] Reimer considers kernel polynomials K2u+l{0 ■= K2u{0 ■= 9u+l "g.+i(Ol' (2.5) as used by Newman-Shapiro [5]. Here, 77^+1 is the largest root of G^+i and 5.+i = (r-2)a;._i.^^^;^^(^^_3 j =-^-^ • ^—^. (2.6) The coefficients a^^ in the Newman-Shapiro operators can be calculated to be E{X)k (A)j-fc (A);-fc (l)j+;-2fc (2A)j-n-fc , ^^^ (l),(l),_fc (!),_, (2A),+,_2. (A+ l)^+,-/'''''^'+'-''' , ^■' where 5uj+i-2k denotes the Kronecker delta and A = ^-^. The matrix A defined by the Newman-Shapiro operators fulfils the properties (i)-(iii) (see Reimer [8]). Tomographic reconstruction 481 Remark 2.2 The corresponding partial sum operators L^ are nonnegative with positive a^v For continuous and differentiable functions even more is valid (see Reimer [8]): whereas for continuous functions the approximation error is of order 0{fj,~^) , functions F eC^{S'-'^), j €{1,2}, have an error of order 0{fj.-^). 3 Application to tomography The Radon transform 7^ : C(S'") -> C{Z'') is defined by {nF)is,t):= f F{st + v)dv, F€C{B^), {s,t) £Z\ (3.1) which means that the Radon transform 7?. of F is determined by integrating F over all hyperplanes of dimension r — 1. This map can also be defined for fmictions in L?-{]RJ'), L'^{B'^), the Schwartz space S{]RJ') or some Sobolev spaces. TZ is continuous on all of these spaces, whereas the inverse TZ~^ is only continuous on S{FU') and on the Sobolev spaces. For polynomials it is known that {ncl{a.)){s,t)^cl{s)cl{at), aeS'-^\ is,t) e Z"- (3.2) (see Davison, Griinbaum [2]) and, more generally, {nPm){s,t) = Cl{s)Pm{t), {s,t)eZ\ (3.3) * where the polynomials Pm EJPJ^ {S'^"^) are generated by the Gegenbauer polynomials, i.e. ■~~^C^{ax) = J2\Tn\^Li^^^rn{x)- These polynomials Pm, \Tn\ = /x, are known to * constitute a basis for IPJl {S^~^). - Let V^ := span{Pm '• I'm} = /x}. Since the Gegenbauer polynomials Cv can also be . * interpreted as the reproducing kernel of iH^"'"^(5''"'"^), the orthogonal projection F^, of F S C{B^) onto V^{B^) can be identified with the orthogonal projection of F onto * . iff^+^(S''"+^) (see Reimer [7] for details). Thus the theory of Laplace series can be used here for the reconstruction of F from its Radon transform. Let J4 be a matrix transformation aS introduced in Section 2 and let F„ be the orthogonal projection of F onto Vl{B^). Then according to the summability theory of Laplace series F — \va^.^l^aoY^=Q^i>■vFv■ Since the Radon tranform is linear and continuous there is 72.F = lim^_oo J^^_Q a^y7?.Fy. It can be shown that (see Reimer [7]) F,{x) = X,,r^ f {nF){s,t)cl{s)CHtx)d{s,t), where (r-l)cl(l) /■■ UJr-l ■ Wr-2 7-1 V ' V ' a;^_j (3.4) (3.5) 482 U. Maier ^Prom this, after some lengthy calculation using the adjoint operator of 7?. (which essentially is the inverse operator of TZ), the reconstruction formula follows F{x) = \ixn^J2'^>"^^^'r / {nF){s,t)Cd {s)Cl {tx)dsdt. (3.6) Because of the identification of the orthogonal projection of F onto V^{B^) and onto iff^"''^(5'"+^), convergence of the Cesaro-means follows for k > r/2, and positivity of the operators is valid for fc > r + 1. For the same reason the coefficients a^v in the Newman-Shapiro summability method have to be calculated for \ = ^^"^^^ = §. 4 Numerical implementation For the reconstruction of F formula (3.6) was used. As soon as the Radon transform of F is known, the numerical implementation in principle reduces to a stable evaluation of the Gegenbauer polynomials and a suitable approximation of the integrals in (3.6). The Gegenbauer polynomials were evaluated by their recurrence relation (see Szego [11]) which is known to be numerically very stable. The coefficients a^^ for the Cesaro-means and the Newman-Shapiro operators were computed with the aid of formula (2.3) and (2.7), respectively. The factor A„,r was obtained by (3.5). Since the calculation of a^,,, for the Newman-Shapiro operators is very time consuming (more than 10 hours for n > 100) these coefficients were stored before the main computation was started. Since the integrand in (3.6) is a polynomial of degree v + 2 with respect to s (see (3.6) together with (5.1)), /_j ..ds was approximated by a Gaussian-Legendre quadrature of degree fx/2 + 1. This choice ensures that for the evaluation of TlF{s,t) enough evaluations with respect to s are performed and that the integral is evaluated exactly within numerical precision. For the quadrature on S^"^ first an interpolatory quadrature as introduced in [6]p.l32 was used. The weights of such a quadrature formula are obtained as solutions of a linear system of equations GA = e, where e = (1,...,1)^ G IR'^, N = dim JPJ^ (-S"""^), J4 = (J4I, ..., AAT)^ the vector of weights and ■ * G=-^ (C| (xjXk) + Cl_,{xjx,))f^,^,. The points were chosen to be regularly distributed on latitudes of the sphere. For // > 70 in the computation of the weights computational problems occured because of a lack of memory. Apart from this problem, several weights turned out to be negative which led to oscillations of the reconstruction. Therefore, this interpolatory quadrature was substituted by a product-Gauss formula for the sphere S""~^ as suggested by Stroud [10]p. 41. The points and weights of the Gaussian quadrature were computed by the MATLAB program qrule.m which is available via internet from the Mathworks Inc. The number of points of the product Gauss formula is N = 2M''~^ where M = /i/2 -I- 1 is the number of points used in each direction, i.e. A^ = 2M^ for "■•■■ r = 3. Tomographic reconstruction 483 All codes for computation were written in MATLAB 6. The actual computation took place on a SUN UltralO with 256 MB main memory, 691 MB virtual memory and SUN OS operating system release 5.7. To increase the computatinal speed all parts of the MATLAB code were written with as few for-loops as possible. This gave an improvement in speed of a factor > 500. 5 Computational results The theoretical results have been applied to the so called Shepp-Logan phantom which is usually used as a test function for tomographic reconstrution algorithms. It is a three dimensional model of a human head consisting of 10 ellipsoids (see Shepp [9]) which were shrinked here to fit into the unit sphere S^. Figure 1 shows a cut at xs = 0.2721. Let a^ ,a2 ,03 , j = 1,..., 10, denote the axes of the j-th ellipsoid, d^^^ denote its density value and $2 -Sj the diameter of the ellipsoid in the direction of t G S^. Since the Radon transform is Unear, the Radon transform of the Shepp-Logari phantom can be calculated to be 10 nFis,t) = 5^7rd(^)a(^')4^')4^')(s - s^^^){s^i^ - s) „U) _ *i „(i) '9 -3/2 (5.1) j=i Figure 2 shows the reconstruction results according to formula (3.6) for Cesaro-means of index fe = 4 and for Newman-Shapiro operators. The values A; = 1.6 and k = 2 were tested, too, but for high degrees of [i no convergent behaviour could be observed. For Cesaro-means with fc = 4 and for NewmanShapiro operators Figure 2 clearly shows an improving behaviour of the reconstructions for increasing fi. The Newman-Shapiro operators show a better convergence and for n > 150 even the small structures in the original head can be detected in the reconstruction. It can be expected that for higher degrees of /U this behaviour will become more evident. FIG. 1. Shepp-Logan phantom. Unfortunately, for /z > 170 the computation of the coefRcients a^^ for the NewmanShapiro operators caused some numerical problems so that the calculations were stopped with n = 160. Although the numerical results look quite promising, the drawback in the reconstruction is the computational time. For fj. = 160 the computation took 27.5 hours for the Radon transform and 31 hours for the evaluation at the points a; € [—1,1]^. The evaluation was done on an equidistant grid of 200 x 200 points. U. Maier 484 -1 -0.5 0 H =40 -1 -0.5 0 0.5 H=40 FIG. 0.5 1 -1 -0.5 0 0.5 H=100 -1 -0.5 0 0.5 H=100 1 -1 -0.5 0 0.5 (I =160 1 -0.5 0 0.5 H=160 1 2. reconstruction of the Shepp-Logan phantom. In principle there is no problem to produce three dimensional reconstructions. The evaluation points X only have to be chosen from a grid in [-1,1]^. Because of the time consuming calculations this was not done here, yet. Bibliography 1. H. Berens, P.L. Butzer, S. Pawelke, Limitierungsverfahren von Reihen mehrdimensionaler Kugelfunktionen und deren Saturationsverhalten, Publ. Res. Inst. Math. Sci. Ser. A 4 (1969) 201-268. 2. M.E. Davidson, F.A. Griinbaum, Tomographic reconstruction with arbitrary directions, Comm. Pure Appl. Math. 34 (1981) 77-119. 3. S.R. Deans, The Radon transform and some of its applications, Wiley & Sons, New York, Chichester, Brisbane, Toronto, Singapore, 1983. 4. E. Kogbetlianz, Recherches sur la summabilite des series ultraspheriques par la m^thode de moyenne arithmetiques, J. de Math, pure et appl. 9(3) (1924) 107-187. 5. D.J. Newman, H.S. Shapiro, Jackson's theorem in higher dimensions. In: On approximation theory, eds. P.L. Butzer, J. Korevaar, Birkhauser Verlag, Basel, 1964, pp. 208-219. 6. M. Reimer, Constructive theory of multivariate functions. BI Wissenschaftsverlag, Tomographic reconstruction 7. 8. 9. 10. 11. Mannheim, Wien, Zurich, 1990. M. Reimer, Radon-transform, Laplace-series and matrix-transforms, Comm. Appl. Analysis 1 (1997) 337-349. M. Reimer, Generalized hyperinterpolation on the sphere and the Newman-Shapiro operators, submitted. L.A. Shepp, Computerized tomography and nuclear magnetic resonance, J. Comp. Ass. Tomography 4 (1980) 94-107. A.H. Stroud, Approximate calculation of multiple integrals. Englewood Cliffs, NJ: Prentice Hall 1971. G. Szego, Orthogonal polynomials, Amer. Math. Soc, Providence 1991. 485 A unified approach to fast algorithms of discrete trigonometric transforms Manfred Tasche University of Rostock, Department of Mathematics, D-18051 Rostock, Germany. manfred.tasche9matheinatik.uni-rostock.de Hansmartin Zeuner Medical University of Liibeck, Institute of Mathematics, Wallstrafie 40, D-23560 Liibeck, Germany. zeunerSmath.mu-luebeck.de Abstract We present a unified approach to fast algorithms of various discrete trigonometric transforms. With the help of so-called Euler formulas we describe an elegant and useful connection between Fourier matrices and trigonometric matrices. It is known that FFTs are closely related to the factorizations of the unitary Fourier matrix into a product of unitary sparse matrices. Using these Euler formulas and FFTs, we obtain fast algorithms of discrete trigonometric transforms. As a further consequence of these Euler formulas and Gaussian sums, we compute all eigenvalues of some trigonometric matrices. 1 Introduction. The fast Fourier transform (FFT) and related algorithms for orthogonal trigonometric transforms are essential tools for practical computations. Special discrete trigonometric transforms are the discrete Hartley transforms (DHT), discrete cosine transforms (DCT), and the discrete sine transforms (DST) of various types. These transforms have found important applications in approximation methods with Chebyshev polynomials, quadrature methods of Clenshaw-Curtis type (see [3]), signal processing, and image compression (see [4, 6, 9]). Euler formulas describe the algebraic connection between Fourier matrices of a certain type and corresponding cosine and sine matrices. Using these formulas, FFTs can be transformed into fast and stable algorithms for the DCT and DST. Further, from these Euler formulas the orthogonality of various trigonometric matrices follows immediately. For simplicity we consider only symmetric trigonometric matrices, i.e. Fourier and Hartley matrices of type I and IV as well as cosine and sine matrices of type I, IV, V and VIII. This paper is organized as follows; first we introduce generalized Fourier matrices. New Euler formulas for these matrices describe a close connection with various orthogonal Hartley, cosine and sine matrices. These results simplify and extend former results 486 Discrete trigonometric transforms 487 of [9], pp. 83-96. Applying these Euler formulas and FFTs, we obtain fast algorithms of discrete trigonometric transforms. As a further consequence of these formulas and Gaussian sums, we can compute all eigenvalues of orthogonal symmetric trigonometric matrices. 2 Euler formulas for Fourier matrices of type I Let A'' > 2 be a given integer. The Fourier matrix of type I is the classical Fourier matrix defined in unitary form with wjv := exp(-27ri/7V). Note that the Gaussian sum (see [5], pp. 326-330) yields the trace of F^: . 1 v-^ 1^ 1+1 , , Closely related with type I Fourier matrices are the cosine and sine matrices of types I andV: M\r+i • - ,/2"/ (j + l)(fc+l)7rN^-2 j,k=0 qV ^ /■^2(j + l)(fc+l)7rN^-i V2N +1V 2Af + l Jj,k=o ■ ._ Here we set ef := \/2/2 for j € {0, AT} and ef := 1 for j e {l,... ,7V - 1}. In this notation a subscript of a matrix denotes the order, while a superscript signifies the type of the matrix. In the following, Jjv denotes the identity matrix and Jjv the counteridentity matrix, which has the columns of IN in reverse order. Blanks in a block matrix indicate blocks of zeros. The direct sum of ma,trices A, B will be denoted hy A®B. Defining the orthogonal matrices / V2 pi 1_ 0 IN-1 \ IN-1 V2 \ «^Ar-i —JN-1 J we obtain for Fourier matrices of type I the following Euler formulas: Theorem 2.1 Depending on whether the order of the Fourier matrix of type I is even or odd, we have {Ifu) ^N^N = Qr+1 ® (-O-^Jv-i) (2-2) 488 M. Tasche and H. Zeuner mN+iY^N+iHw+i = Q^+1 ® (~i)^jv- (2-3) Proof: It is obvious that {J^MY^N = ^N- Splitting I^j^ into four blocks V2N \ (Wa^v jj,fc=o l'^2W jj,fc=0 / and using the classical Euler formula exp(-ia;) = cos a; - isini, we obtain (2.2) by blockwise computation of il2NY^N^N- The proof of (2.3) is similar. □ Remark 2.2 An analogous result to (2.2) can be found in [9], pp. 85-90, but with a complex matrix instead of J^^. Compare also with [1]. The Euler formula (2.3) is new. Note that the results and their proofs are simpler than in [9], pp. 85-90 and [1]. Corollary 2.3 The matrices C^_^-^,Sl,_-i^,C^^i,S^ are orthogonal. Proof: Since t^fj is unitary and i|^ is orthogonal, C^+i®(-i)S^_i is unitary by (2.2). Hence the real matrices C^+i and S]^_i are orthogonal. Other proofs can be found in [4], pp. 12-16 and [6]. The proof for the type V matrices uses (2.3) and follows similar lines. □ Remark 2.4 Results analogous to (2.2) and (2.3) are true for the Hartley matrix of type I (see [9], pp. 77-80 and [8], pp. 224-227) Hi, ~ A/JV leas-!—-] V AT /j,fc=o with cas X := cos x + sin x. Then we obtain the formulas The Euler formula (2.2) can be used for fast and numerically stable computations of DCTs and DSTs of type I: Let x G R^+^ and y e E'^"'^ with N = 2* {t > 2) and set z := [j € E^^. Since if^^z is real, we can apply Edson's algorithm for the FFT of real data (see [8], pp. 215-223 and [7]). The output of the conjugate even result is in the form U2NF2N{I2NZ) where U2N ■= (JW+i ® (-i)Iiv-i) {IiNy■ Therefore by t^2;vi^V^;.z = (CA+i®(-l)S;_i)^=f*^^ti=" \ we have calculated C^+ia: and Slf^^y simultaneously using 5Nt flops. If we have to use an FFT with complex data, we combine real data vectors x,x' e (X + ix' \ . , j. Then we can compute two DCTs C^+ia;,C^_,_ix' and two DSTs S^_iy,S^_iy' simultaneously via an FFT of length 2N applied to the complex input vector I2N'^'In a similar way, the Euler formula (2.3) can be used for fast computations of DCTs and DSTs of type V: For given a;, a;' 6 K^+^ and y,y' e K^ the transformed vectors Discrete trigonometric transforms 489 Cj^^iX,Cjf^ix', Sj^y, and Sj^y' can be calculated at the same time as components of {P2N+1YF2N+1P2N+1Z' = (CN+I e i-i)S^)z' where we use an FFT of length 2iV + 1 with complex data i2jv+i^'- If 2iV + 1 = 3* or more generally, if 2A'' + 1 is a product of small primes (see [8], pp. 76-101 and [7]) the FFT of length 2A'' + 1 can be computed very efficiently. 3 Euler formulas for Fourier matrices of type IV The Fourier matrix of type IV, defined by is related to the Fourier matrix of type I by the formula I^=uj,j,W^F^W^ (3.1) with W^ := diag(a;2;v)fc^o^ ^^^ is therefore unitary. If iV is a power of 2 or 3, then J^ can be factorized into a product of sparse unitary matrices. Lemma 3.1 The trace of the Fourier matrix of type TV is equal to r _ 1 V^ i2k+if _ I-'" Proof: We begin with the generalized Gaussian sum (see [5], p. 330) - 2N-1 which we split into two sums containing even and odd j respectively. Then . 2N-1 N-1 AT-l • and the results follows by (2.1). D Now we introduce cosine and sine matrices of type IV and VIII which are closely related with the Fourier matrix of type IV: C^ .- ,[Ifcos ^^^+^^^2^±^]''~' ^ rvm ^ ._ ■" ._ \ N\ m Jj,k=o' ^ r „(2j + l)(2fc + l)7rx^-i V2N + lV°^ 2{2N + 1) )j,k=o' 2 / M+i M+i ^+^ ■" ^f+T l'^-+i "'=+1 '''' (2j + l)(2fc+l)7rNiv 2{2N + 1) J,-,.=o- 490 M. Tasche and H. Zeuher As above we define orthogonal matrices piv ._ 1 / -fw IN \ pw ._ -^ I ,/9 Theorem 3.2 For the Fourier matrix of type IV and even resp. odd order, we obtain the following Euler formulas: (P,^fi^>,^ {P2N+1) ^N+i^^N+i = C^®i-i)S^, (3.3) = (3-4) Qv ®(-i)'Sjv+i- Proof: Similar to that of Theorem 2.1. □ Corollary 3.3 The matrices C^,S^,Cj^ and S}^-^ are orthogonal. Remark 3.4 An analogous result to (3.3) can be found in [9], pp. 94-96. Compare also with [1]. Formula (3.4) is new. A difi'erent proof of the orthogonality of C^ and «S^ can be found in [6]. Remark 3.5 Similar formulas as in Theorem 3.2 are true for the Hartley matrix of t2/pe IV (see [1, 2]) 1 / (2j + l)(2fc + l)7r\JV-i ^ ViVV 2N Jj,k=o Then we have (■^2^) -^AT-^iV — Cjv ® -Sjv , (3.5) The Euler formulas can be used for a fast and numerically stable computation of DCT and DST of types IV and VIII: Using (3.3) and (3.1), for arbitrary x,x',y,y' G 1^ the DCTs C^x, C^x' and DSTs S^y and S^y' can be calculated via one FFT of length 27V with complex data i^^z' and z' := (^,^^, j.Ii N = 2'^, this procedure requires about lONt operations. Likewise by (3.4), for x,x' e R^,y,y' € R^+i the DCTs of type VIII, C}^x, C^x' and the DSTs S^^■^y, S^^^y' can be calculated via one FFT of length 27V + 1 with complex data I^N+I^'Remark 3.6 The sine, cosine, Hartley, and Fourier matrices considered above enjoy the interesting intertwining relations (see [2]): Cfj JN = Syv-Slv j HN'I'N — = -^NII-N^ F^Ji\f Jff-^N' (3.7) I^N "IN '^N "IN = = JNHN •'ATFJV 1 ) with the diagonal matrix Sw+i := diag((-l)'')^^o and the reflection matrix fj^ := 1© Jiv-i. Therefore applying (3.7) in the above algorithm, it is also possible to compute Discrete trigonometric transforms 491 four DCTs (or four DSTs) of type IV and order N via one FFT of length 2N with complex data. 4 Eigenvalues of trigonometric matrices Finally we determine the eigenvalues of trigonometric matrices introduced above. Since the cosine and sine matrices of type I, IV, V and VIII, and the Hartley matrices of type I and IV are real, symmetric and orthogonal, only 1 and -1 are possible eigenvalues. For a; e E we denote by [x\ resp. \x] the integer k e Z with k < x < k + 1 resp. fc- 1 <a; < fc. Theorem 4.1 The sine and cosine matrices C^,Slf, C^, Sff, C^,S^, C^ and S^ of order N >2 possess the eigenvalues 1 and —1 with multiplicities m{l)=\N/2l m(-l)={iV/2j. Proof: Since C^ is symmetric and orthogonal, only 1 and —1 can be eigenvalues. Their multiphcities fulfil m{l) + m{-l)=N. On the other hand, since C^ and iS^_2 are real, it follows from (2.2) and the trace formula (2.1) that 1 + 1^^-2 m(l)-m(-l) =trC^ = Re(tri^';v_2) =Re1+i f 1 {I for odd AT, for even N. Prom these two linear equations we obtain m(l) = \N/2] and m(—1) other cases, the proof is similar. [iV/2j.Inthe D From Theorem 4.1 and the Euler formulas (2.2)-(2.3) and (3.3)-(3.4) it follows immediately: Corollary 4.2 The Fourier matrices of type I and IV have only eigenvalues 1, -l,i, —i with multiplicities: ^N m(l) m(-l) m(i) m(-i) 4JV+l jpTV L^/2J + 1 r^/2] \N/2'] -1 L^/2J + 1 \N/2] [Ar/2J riV/21 [N/2} [N/2\ [N/2\ \N/2] \N/2] \N/2] [7V/2J \N/2] [N/2\+1 From Theorem 4.1 and formulas (2.4)-(2.5) and (3.5)-(3.6) it follows: Corollary 4.3 The Hartley matrices of type I and IV have only eigenvalues 1 and —1 with the following multiplicities: i4:N m(l) m(-l) 2[Ar/2j + l 2\N/2] - 1 m. N+1 iV+l N rrlV -"2JV 2|"iV/2] 2[Ar/2j N+1 N 492 M. Tasche and H. Zeuner Bibliography 1. V. BRITANAK AND K. R. RAO, The fast generalized discrete Fourier transform: A unified approach to the discrete sinusoidal transform computation, Signal Process., 79(1999), pp. 135-150. 2. G. HEINIG AND K. ROST, Hartley transform representations of inverses of real Toeplitz-plus-Hankel matrices, Numer. Funct. Anal. Optim., 21 (2000), pp. 175-189. 3. J. C. MASON AND E. VENTURINO, Integration methods of Clenshaw-Curtis type, based on four kinds of Chebyshev polynomials, in Multivariate Approximation and Splines, G. Niirnberger, J. W. Schmidt, and G. Walz, eds., Basel, 1997, Birkhauser, pp. 153-165. 4. K. R. RAO AND P. YIP, Discrete Cosine Transform: Algorithms, Advantages, and Applications, Academic Press, San Diego, 1990. 5. R. REMMERT, l^nfciionenf/ieone I, Springer, Berlin, 1992. 6. G. STRANG, The discrete cosine transform, SIAM Rev., 41 (1999), pp. 135-147. 7. M. TASCHE AND H. ZEUNER, Roundoff error analysis for fast trigonometric transforms, in Handbook of Analytic-Computational Methods in Applied Mathematics, G. Anastassiou, ed., CRC Press, Boca Raton, 2000, pp. 357-406. 8. C. VAN LOAN, Computational Frameworks for the Fast Fourier Transform, SIAM, Philadelphia, 1992. 9. M. V. WICKERHAUSER, Adapted Wavelet Analysis from Theory to Software, A K Peters, Wellesley, 1994. This volume contains the proceedings of an International Symposium on Algorithms for Approximation Four (A4A4) held at University of Huddersfield from July 15th to 20th,' 2001, attended by 106 people from no less than 32 countries. The 54 papers submitted cover a broad range of topics in approximation theory, metrology, orthogonal polynomials, splines, wavelets, radial basis functions approximation on manifolds, and applications in medical modelling, and the solution of integral and differential equations. All papers were refereed meticulously.