B-spline data smoothing subroutine.
[yf,wk]=gcvspl(x(n,1),y(n,k), xf(nf,1)=x, wx(n,1)=[],wy(1,k)=[], m=2,md=2,val=1, ider=[0])
uses GCVSPL.FOR, 1986-05-12
from http://www.netlib.org/gcv/index.html
for B-spline data smoothing using generalized cross-validation
and mean squared prediction or explicit user smoothing
by H.J. Woltring, University of Nijmegen,
Philips Medical Systems, Eindhoven (The Netherlands)
Purpose:
Natural B-spline data smoothing subroutine, using the Generali-
zed Cross-Validation and Mean-Squared Prediction Error Criteria
of Craven & Wahba (1979). Alternatively, the amount of smoothing
can be given explicitly, or it can be based on the effective
number of degrees of freedom in the smoothing process as defined
by Wahba (1980). The model assumes uncorrelated, additive noise
and essentially smooth, underlying functions. The noise may be
non-stationary, and the independent co-ordinates may be spaced
non-equidistantly. Multiple datasets, with common independent
variables and weight factors are accomodated.
A full description of the package is provided in:
H.J. Woltring (1986), A FORTRAN package for generalized,
cross-validatory spline smoothing and differentiation.
Advances in Engineering Software 8(2):104-113
Meaning of parameters:
X(N,1) Independent variables: strictly increasing knot
sequence, with X(I-1).lt.X(I), I=2,...,N.
Y(N,K) Input data to be smoothed (or interpolated).
XF(NF,1)Points where the function should be approximated.
WX(N,1) Weight factor array; WX(I) corresponds with
the relative inverse variance of point Y(I,*).
If no relative weighting information is
available, the WX(I) should be set to ONE.
All WX(I).gt.ZERO, I=1,...,N.
WY(1,K) Weight factor array; WY(J) corresponds with
the relative inverse variance of point Y(*,J).
If no relative weighting information is
available, the WY(J) should be set to ONE.
All WY(J).gt.ZERO, J=1,...,K.
NB: The effective weight for point Y(I,J) is
equal to WX(I)*WY(J).
M Half order of the required B-splines (spline
degree 2*M-1), with M.gt.0. The values M =
1,2,3,4 correspond to linear, cubic, quintic,
and heptic splines, respectively. N.ge.2*M.
MD Optimization mode switch:
|MD| = 1: Prior given value for p in VAL
(VAL.ge.ZERO). This is the fastest
use of GCVSPL, since no iteration
is performed in p.
|MD| = 2: Generalized cross validation.
|MD| = 3: True predicted mean-squared error,
with prior given variance in VAL.
|MD| = 4: Prior given number of degrees of
freedom in VAL (ZERO.le.VAL.le.N-M).
After return from MD.ne.1, the same number of
degrees of freedom can be obtained, for identical
weight factors and knot positions, by selecting
|MD|=1, and by copying the value of p from WK(4)
into VAL. In this way, no iterative optimization
is required when processing other data in Y.
VAL Mode value, as described above under MD.
IDER Derivative order required, with 0.le.IDER
and IDER.le.2*M. If IDER.eq.0, the function
value is returned; otherwise, the IDER-th
derivative of the spline is returned.
Return values:
YF(NF,1)Approximated values at XF.
WK(IWK) On normal exit, the first 6 values of WK are
assigned as follows:
WK(1) = Generalized Cross Validation value
WK(2) = Mean Squared Residual.
WK(3) = Estimate of the number of degrees of
freedom of the residual sum of squares
per dataset, with 0.lt.WK(3).lt.N-M.
WK(4) = Smoothing parameter p, multiplicative
with the splines' derivative constraint.
WK(5) = Estimate of the true mean squared error
(different formula for |MD| = 3).
WK(6) = Gauss-Markov error variance.
If WK(4) --> 0 , WK(3) --> 0 , and an inter-
polating spline is fitted to the data (p --> 0).
A very small value > 0 is used for p, in order
to avoid division by zero in the GCV function.
If WK(4) --> inf, WK(3) --> N-M, and a least-
squares polynomial of order M (degree M-1) is
fitted to the data (p --> inf). For numerical
reasons, a very high value is used for p.
Upon return, the contents of WK can be used for
covariance propagation in terms of the matrices
B and WE: see the source listings. The variance
estimate for dataset J follows as WK(6)/WY(J).
Remarks:
(1) GCVSPL calculates a natural spline of order 2*M (degree
2*M-1) which smoothes or interpolates a given set of data
points, using statistical considerations to determine the
amount of smoothing required (Craven & Wahba, 1979). If the
error variance is a priori known, it should be supplied to
the routine in VAL, for |MD|=3. The degree of smoothing is
then determined to minimize an unbiased estimate of the true
mean squared error. On the other hand, if the error variance
is not known, one may select |MD|=2. The routine then deter-
mines the degree of smoothing to minimize the generalized
cross validation function. This is asymptotically the same
as minimizing the true predicted mean squared error (Craven &
Wahba, 1979). If the estimates from |MD|=2 or 3 do not appear
suitable to the user (as apparent from the smoothness of the
M-th derivative or from the effective number of degrees of
freedom returned in WK(3) ), the user may select an other
value for the noise variance if |MD|=3, or a reasonably large
number of degrees of freedom if |MD|=4. If |MD|=1, the proce-
dure is non-iterative, and returns a spline for the given
value of the smoothing parameter p as entered in VAL.
(2) The number of arithmetic operations and the amount of
storage required are both proportional to N, so very large
datasets may be accomodated. The data points do not have
to be equidistant in the independant variable X or uniformly
weighted in the dependant variable Y. However, the data
points in X must be strictly increasing. Multiple dataset
processing (K.gt.1) is numerically more efficient dan
separate processing of the individual datasets (K.eq.1).
(3) If |MD|=3 (a priori known noise variance), any value of
N.ge.2*M is acceptable. However, it is advisable for N-2*M
be rather large (at least 20) if |MD|=2 (GCV).
(4) For |MD| > 1, GCVSPL tries to iteratively minimize the
selected criterion function. This minimum is unique for |MD|
= 4, but not necessarily for |MD| = 2 or 3. Consequently,
local optima rather that the global optimum might be found,
and some actual findings suggest that local optima might
yield more meaningful results than the global optimum if N
is small. Therefore, the user has some control over the
search procedure. If MD > 1, the iterative search starts
from a value which yields a number of degrees of freedom
which is approximately equal to N/2, until the first (local)
minimum is found via a golden section search procedure
(Utreras, 1980). If MD < -1, the value for p contained in
WK(4) is used instead. Thus, if MD = 2 or 3 yield too noisy
an estimate, the user might try |MD| = 1 or 4, for suitably
selected values for p or for the number of degrees of
freedom, and then run GCVSPL with MD = -2 or -3. The con-
tents of N, M, K, X, WX, WY, and WK are assumed unchanged
if MD < 0.
(5) GCVSPL calculates the spline coefficient array C(N,K);
this array can be used to calculate the spline function
value and any of its derivatives up to the degree 2*M-1
at any argument T within the knot range, using subrou-
tines SPLDER and SEARCH, and the knot array X(N). Since
the splines are constrained at their Mth derivative, only
the lower spline derivatives will tend to be reliable
estimates of the underlying, true signal derivatives.
(6) GCVSPL combines elements of subroutine CRVO5 by Utre-
ras (1980), subroutine SMOOTH by Lyche et al. (1983), and
subroutine CUBGCV by Hutchinson (1985). The trace of the
influence matrix is assessed in a similar way as described
by Hutchinson & de Hoog (1985). The major difference is
that the present approach utilizes non-symmetrical B-spline
design matrices as described by Lyche et al. (1983); there-
fore, the original algorithm by Erisman & Tinney (1975) has
been used, rather than the symmetrical version adopted by
Hutchinson & de Hoog.
References:
P. Craven & G. Wahba (1979), Smoothing noisy data with
spline functions. Numerische Mathematik 31, 377-403.
A.M. Erisman & W.F. Tinney (1975), On computing certain
elements of the inverse of a sparse matrix. Communications
of the ACM 18(3), 177-179.
M.F. Hutchinson & F.R. de Hoog (1985), Smoothing noisy data
with spline functions. Numerische Mathematik 47(1), 99-106.
M.F. Hutchinson (1985), Subroutine CUBGCV. CSIRO Division of
Mathematics and Statistics, P.O. Box 1965, Canberra, ACT 2601,
Australia.
T. Lyche, L.L. Schumaker, & K. Sepehrnoori (1983), Fortran
subroutines for computing smoothing and interpolating natural
splines. Advances in Engineering Software 5(1), 2-5.
F. Utreras (1980), Un paquete de programas para ajustar curvas
mediante funciones spline. Informe Tecnico MA-80-B-209, Depar-
tamento de Matematicas, Faculdad de Ciencias Fisicas y Matema-
ticas, Universidad de Chile, Santiago.
Wahba, G. (1980). Numerical and statistical methods for mildly,
moderately and severely ill-posed problems with noisy data.
Technical report nr. 595 (February 1980). Department of Statis-
tics, University of Madison (WI), U.S.A.