Operators and Keywords

C++ API

Function File: [q, Asq, info] = = anderson_darling_test (x, distribution)

Test the hypothesis that x is selected from the given distribution using the Anderson-Darling test. If the returned q is small, reject the hypothesis at the q*100% level.

The Anderson-Darling A^2 statistic is calculated as follows:

```                        n
A^2_n = -n - SUM (2i-1)/n log(z_i (1 - z_{n-i+1}))
i=1
```

where z_i is the ordered position of the x's in the CDF of the distribution. Unlike the Kolmogorov-Smirnov statistic, the Anderson-Darling statistic is sensitive to the tails of the distribution.

The distribution argument must be a either "uniform", "normal", or "exponential".

For "normal"' and "exponential" distributions, estimate the distribution parameters from the data, convert the values to CDF values, and compare the result to tabluated critical values. This includes an correction for small n which works well enough for n >= 8, but less so from smaller n. The returned `info.Asq_corrected` contains the adjusted statistic.

For "uniform", assume the values are uniformly distributed in (0,1), compute A^2 and return the corresponding p-value from `1-anderson_darling_cdf(A^2,n)`.

If you are selecting from a known distribution, convert your values into CDF values for the distribution and use "uniform". Do not use "uniform" if the distribution parameters are estimated from the data itself, as this sharply biases the A^2 statistic toward smaller values.

[1] Stephens, MA; (1986), "Tests based on EDF statistics", in D'Agostino, RB; Stephens, MA; (eds.) Goodness-of-fit Techinques. New York: Dekker.

## Demonstration 1

The following code

``` c = anderson_darling_test(10*rande(12,10000),'exponential');
tabulate(100*c,100*[unique(c),1]);
% The Fc column should report 100, 250, 500, 1000, 10000 more or less.```

Produces the following output

```bin     Fa       Fr%        Fc
1    114      1.14%      114
2.5    189      1.89%      303
5    290      2.90%      593
10    577      5.77%     1170
100   8830     88.30%    10000```

## Demonstration 2

The following code

``` c = anderson_darling_test(randn(12,10000),'normal');
tabulate(100*c,100*[unique(c),1]);
% The Fc column should report 100, 250, 500, 1000, 10000 more or less.```

Produces the following output

```bin     Fa       Fr%        Fc
1     79      0.79%       79
2.5    158      1.58%      237
5    236      2.36%      473
10    557      5.57%     1030
100   8970     89.70%    10000```

## Demonstration 3

The following code

``` c = anderson_darling_test(rand(12,10000),'uniform');
hist(100*c,1:2:99);
% The histogram should be flat more or less.```

Produces the following figure

```bin     Fa       Fr%        Fc
1     79      0.79%       79
2.5    158      1.58%      237
5    236      2.36%      473
10    557      5.57%     1030
100   8970     89.70%    10000```

Figure 1

Package: statistics