Pushing the AR test beyond the limit

.title[
# Pushing the AR test beyond the limit
]
.subtitle[
## A novel goodness-of-fit test for high-dimensional applications
]
.author[
### Markku Kuismin
]
.institute[
### University of Oulu, Northern Finland Birth Cohorts
]
.date[
### 2026-06-05
]

---

---

# Very short author biography

---

<div class="leaflet html-widget html-fill-item" id="htmlwidget-6445a9dd62fc9ee27953" style="width:100%;height:360px;"></div>
<script type="application/json" data-for="htmlwidget-6445a9dd62fc9ee27953">{"x":{"options":{"crs":{"crsClass":"L.CRS.EPSG3857","code":null,"proj4def":null,"projectedBounds":null,"options":{}}},"calls":[{"method":"addTiles","args":["https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png",null,null,{"minZoom":0,"maxZoom":18,"tileSize":256,"subdomains":"abc","errorTileUrl":"","tms":false,"noWrap":false,"zoomOffset":0,"zoomReverse":false,"opacity":1,"zIndex":1,"detectRetina":false,"attribution":"© <a href=\"https://openstreetmap.org/copyright/\">OpenStreetMap<\/a>,  <a href=\"https://opendatacommons.org/licenses/odbl/\">ODbL<\/a>"}]},{"method":"addMarkers","args":[[65.00749999999999,60.1696],[25.5242,24.9495],null,null,null,{"interactive":true,"draggable":false,"keyboard":true,"title":"","alt":"","zIndexOffset":0,"opacity":1,"riseOnHover":false,"riseOffset":250},null,null,null,null,["My workplace","Nordstat 2026, University of Helsinki"],{"interactive":false,"permanent":false,"direction":"auto","opacity":1,"offset":[0,0],"textsize":"10px","textOnly":false,"className":"","sticky":true},null]}],"setView":[[62.58855,25.23685],4.5,[]],"limits":{"lat":[60.1696,65.00749999999999],"lng":[24.9495,25.5242]}},"evals":[],"jsHooks":[]}</script>

- I work as a biostatistician at the University of Oulu School of Medicine.

- My research interests are high-dimensional statistic, network estimation, non-parametric methods, (multivariate) statistical tests, cluster analysis, and **rejection sampling** just to mention a few examples.

- Here I will introduce a novel multivariate test based on rejection sampling that works in high-dimensional case.

---

# Background and motivation

---

# Background and motivation

- All parametric statistical methods are built on assumptions about the population distribution of the random variable.

- In multivariate applications dealing high-dimensional data assume the population distribution is multivariate normal.

- For example,

- Gaussian Graphical Models. If not normal `->` strict conditional-independence interpretation becomes weaker.

- High-dimensional linear discriminant analysis. If not normal `->` other methods such as RF or SVMs may be more robust alternatives.

- Gaussian Mixture Models. If not normal `->` incorrectly merged clusters

- In general: It is difficult to analyse data meaningfully without understanding the distribution from which they arise.

---

# Background and motivation

For the rest of this presentation, the null hypothesis can be defined as,

`$$H_0: f = f_0$$`

- For example, `$H_0: f\text{ is a multivariate normal density.}$`

- Or a plain-language description: "*The underlying population distribution is multivariate normal.*"

- Several test have been proposed (e.g., Kolmogorov–Smirnov, energy test by Székely and Rizzo (2005), a high-dimensional test by Chen, H & Xia (2023) and many others)

- Until we have a test that can detect the difference in 100% power in every single scenario, I consider this as an open research problem...

---

# The AR statistic

---

# The AR statistic

- Based on the Accept-Reject (AR) algorithm.

`$$\rho = \frac{1}{n}\sum_{i=1}^n \text{I}\Bigl(\frac{f_0(X_i)}{Df(X_i)} > U\Bigr).$$`

- `$\rho$` is the proportion of observations `$X_i$`, among `$n$` observations drawn from a distribution with density `$f$`, that would be accepted as samples from the distribution with density `$f_0$`.

- Note: The fundamental objective of AR algorithms is to maximize the effective sample yield while minimizing sensitivity to a poor choice of the proposal distribution.

- My objective is the complete opposite: I want that the algorithm is sensitive to a poor choice of the proposal distribution that is used to evaluate the null hypothesis.

---

# The AR statistic

- The algorithm was adapted in Kuismin (2026) for statistical inference.

- Main idea:

- Let `$f$` be the true density (population density) and `$f_0$` is the hypothesized density.

- Compute `$\rho$` under null: Assume that `$f_0$` and `$f$` are the same distribution (set the normalizing constant `$D = 1$`).

- Instead of generating pseudo random numbers from `$f$`, compute `$\rho$` with respect to the observed data.

---

# The AR statistic

- A similar test statistic has been proposed previously, without establishing a connection to rejection sampling:

- Györfi, L., & Van Der Meulen, E. C. (1991) investigated a test statistic based on **Total variation distance** and showed how to build a procedure that is consistent against a large class of alternatives.

- (I do not discuss about this test in detail and I am mentioning it to connect my test procedure in the earlier literature.)

- We will later talk a little more about total variation distance, which is the common denominator underlying both methods.

---

# The AR statistic

- Technical challenges:

- `$f$` is unknown (naturally) and we only have observations from it. How to determine `$f$`?

- Solution: Estimate `$f$`, e.g., using KDE(...)

- `$\rho$` depends on an external random variable `$U$`. How to determine a deterministic test statistic?

- Solution: Use `$E_U(\rho)$` instead.

- Denote `$\rho(\textbf{X}) = E_U(\rho) = n^{-1}\sum_{1=1}^n \min\left(1, f_0(X_i)/\widehat{f}(X_i)\right)$`.

---

# The AR statistic

- The statistic has a handful of attractive properties.

- **Direct probabilistic interpretation:**

- The AR statistic can be interpreted as an estimated acceptance probability.

- **It measures how often observations would be accepted under the hypothesized population distribution.**

- Values close to 1 indicate good agreement with the hypothesized distribution.

- Values close to 0 indicate lack of fit and evidence against the model.

- **Interval estimates for `$\rho(\textbf{X})$` can be computed using simulations or Poisson-Binomial distribution.**

- **It is a global measure of distributional agreement.**

- It can detect differences in location, covariance, tail behavior, skewness, multimodality, and other shape features.

---

# The AR statistic

- The statistic depends on a likelihood ratio `$f_0(X_i)/f(X_i)$`.

- (Isn't it just an empirical likelihood test...)

- Yes, but the random threshold step that makes a big difference:

- Likelihood ratio and empirical likelihood tests asymptotic behavior is governed by the Kullback–Leibler divergence.

- The AR statistic on the other hand is connected to *total variation distance*.

---

# Total variation distance

The total variation distance (TVD) between `$f(x)$` and `$f_0(x)$` is half the `$L_1$`-distance between densities or mass functions.

`$$\|f(x) - f_0(x)\|_{\text{TV}} = \frac{1}{2}\int_{\mathcal{X}}|f(x) - f_0(x)|\, dx.$$`

![](nordstat_2026_presentation_files/figure-html/unnamed-chunk-3-1.png)

---

# Total variation distance

In my testing framework, the similarity between the true density `$f$` and the hypothesized density `$f_0$` can be interpreted in terms of TVD.

Asymptotically, as `$n \to \infty$`,

`$$\rho(\textbf{X}) \xrightarrow{P} 1 - \|f(x) - f_0(x)\|_{\text{TV}},$$`
where `$f$` is the true density (population density) and `$f_0$` is the hypothesized density.

- See Kuismin (2026) for more detailed description and proof.

---

# Total variation distance

![](nordstat_2026_presentation_files/figure-html/unnamed-chunk-4-1.png)

- Population dist. is mv. logistic, `$p=5$`, location vector `$\mathbf{0}$`, and variance matrix `$\mathbf{I} = diag(1, \ldots, 1)_{5 \times 5}$`.

- The black horizontal line is 1 - TVD between true pop. density and mvuniform densities, approximated using Monte Carlo integration. Dashed horizontal lines illustrate the 95% interval estimates.

---

# Total variation distance

![](nordstat_2026_presentation_files/figure-html/unnamed-chunk-5-1.png)

- Population dist. is multivariate t, `$p=5$`, location vector `$\mathbf{0}$`, and variance/scale matrix `$\mathbf{I} = diag(1, \ldots, 1)_{5 \times 5}$`.

---

# Total variation distance

![](nordstat_2026_presentation_files/figure-html/unnamed-chunk-6-1.png)

- Population dist. is multivariate normal, `$p=5$`, location vector `$\mathbf{0}$`, and variance matrix `$\mathbf{I} = diag(1, \ldots, 1)_{5 \times 5}$`.

---

# Distribution of the test statistic

---

# Distribution of the test statistic

- The observed test statistic follows Poisson-Binomial distribution (see Kuismin, 2026 for more details).

- Compare the distribution of the test statistic when it is computed:

1. By running the AR algorithm multiple times (grey bars).
  
  2. Using Poisson-Binomial distribution (blue bars).

- (These distributions will overlap in the barplots presented in the next slide.)

---

# Distribution of the test statistic

- Example when the population distribution is mvuniform, `$p = 10$`.

- The black vertical line is 1 - TVD between mvnormal and mvuniform densities, approximated using Monte Carlo integration.

---

# Challenges

- The statistic depends on a density estimate `$f$`.

`->` Estimating `$f$` is more or less hopeless in high-dimensional setting.

- However, let's test how well `$\rho(\textbf{X})$` works when making decision whether to reject `$H_0$` or not.

---

# Going beyond the limit

---

# Going beyond the limit

- Although density estimation becomes unfeasible when `$p \gg n$` and `$p$` increases, can we still get something useful?

- How does the AR statistic hold up?

- For comparison, the energy test statistic cannot be computed when `$p > n$`,

``` r
n = 40
p = 50
mu = rep(0, p)
I = diag(1, p)
x = mvtnorm::rmvnorm(n = n,
                     mean = mu,
                     sigma = I)
energy::mvnorm.test(x, R = 500)
```

```
## 
## 	Energy test of multivariate normality: estimated parameters
## 
## data:  x, sample size 40, dimension 50, replicates 500
## E-statistic = NA, p-value = NA
```

---

# Going beyond the limit

## Interval estimates

---

## TVD - multivariate logistic

- When there are "enough" samples and the population distribution is not too "challenging", the test works reasonably well.

![](nordstat_2026_presentation_files/figure-html/unnamed-chunk-9-1.png)

---

## Interval estimates - multivariate logistic

![](nordstat_2026_presentation_files/figure-html/unnamed-chunk-10-1.png)

---

## TVD - multivariate logistic

- When `$p \gg n$`, interpretation of point and interval estimates becomes weaker/unreliable.

![](nordstat_2026_presentation_files/figure-html/unnamed-chunk-11-1.png)

---

## Interval estimates - multivariate logistic

- However, the test statistic still seems to be able to distinguish the population density from the hypothesized `$N_p\!\left(\boldsymbol{0}, \boldsymbol{I}\right)$` density.

![](nordstat_2026_presentation_files/figure-html/unnamed-chunk-12-1.png)

---

## TVD - multivariate normal

![](nordstat_2026_presentation_files/figure-html/unnamed-chunk-13-1.png)

---

## Interval estimates - multivariate normal

![](nordstat_2026_presentation_files/figure-html/unnamed-chunk-14-1.png)

---

# Going beyond the limit

## Power simulations

---

## Power simulations

`$$H_0:
\mathbf{X}_1, \ldots, \mathbf{X}_n
\overset{\mathrm{i.i.d.}}{\sim}
N_p\!\left(\boldsymbol{0}, \boldsymbol{I}\right).$$`

- Consider five different multivariate population distributions:

- Multivariate t, `$MVT(\mathbf{0}, \mathbf{I}, 5)$`.
  - Multivariate normal, `$MVN(\mathbf{0}, \mathbf{I})$`.
  - Multivariate logistic, `$MVLOGIS(\mathbf{0}, \mathbf{1})$`.
  - Multivariate Burr, `$MVBURR(1, \mathbf{1}, \mathbf{1})$`.
  - Multivariate uniform, `$MVUNIF(1)$`.

- `$n = 50$`, `$p \in \{40, 50, 70\}$`.

- Estimate the power using `$p-$`values and Monte Carlo simulations.

---

## Power simulations

- Compare the AR test with three other tests for multivariate normality:

- Energy test (Székely & Rizzo, 2005).
  - NEW test (Chen & Xia, 2023).
  - Zhou-Shao's test ("Tn test") (Zhou & Shao, 2014).

- Power evaluation: estimate how often the test rejects `$H_0$` under different alternatives at a pre-specified Type I error rate `$\alpha = 0.05$`.

---

## Power simulations

<div class="datatables html-widget html-fill-item" id="htmlwidget-73d93a407f54b716c936" style="width:100%;height:auto;"></div>
<script type="application/json" data-for="htmlwidget-73d93a407f54b716c936">{"x":{"filter":"none","vertical":false,"fillContainer":false,"data":[["1","2","3","4","6","7","8","9","11","12","13","14","16","17","18","19","21","22","23","24","26","27","28","29","31","32","33","34","36","37","38","39","41","42","43","44","46","47","48","49","51","52","53","54","56","57","58","59","61","62","63","64","66","67","68","69","71","72","73","74"],["AR","NEW","E","Tn","AR","NEW","E","Tn","AR","NEW","E","Tn","AR","NEW","E","Tn","AR","NEW","E","Tn","AR","NEW","E","Tn","AR","NEW","E","Tn","AR","NEW","E","Tn","AR","NEW","E","Tn","AR","NEW","E","Tn","AR","NEW","E","Tn","AR","NEW","E","Tn","AR","NEW","E","Tn","AR","NEW","E","Tn","AR","NEW","E","Tn"],["1","0.91","0.99","0.99","1","0.94","0.05","1","1","1",null,"1","0.05","0.06","0.05","0.2","0.03","0.05","0.05","1","0.02","0.06",null,"1","1","0.09","0.24","0.49","1","0.05","0.04","1","1","0.06",null,"1","1","0.57","1","1","1","0.5","0.26","1","1","0.41",null,"1","1","0.12","0.64","0.8","1","0.05","0.05","1","1","0.01",null,"1"],["MVT(0, I, 5)","MVT(0, I, 5)","MVT(0, I, 5)","MVT(0, I, 5)","MVT(0, I, 5)","MVT(0, I, 5)","MVT(0, I, 5)","MVT(0, I, 5)","MVT(0, I, 5)","MVT(0, I, 5)","MVT(0, I, 5)","MVT(0, I, 5)","MVN(0, I)","MVN(0, I)","MVN(0, I)","MVN(0, I)","MVN(0, I)","MVN(0, I)","MVN(0, I)","MVN(0, I)","MVN(0, I)","MVN(0, I)","MVN(0, I)","MVN(0, I)","MVLOGIS(0, 1)","MVLOGIS(0, 1)","MVLOGIS(0, 1)","MVLOGIS(0, 1)","MVLOGIS(0, 1)","MVLOGIS(0, 1)","MVLOGIS(0, 1)","MVLOGIS(0, 1)","MVLOGIS(0, 1)","MVLOGIS(0, 1)","MVLOGIS(0, 1)","MVLOGIS(0, 1)","MVBURR(1, 1, 1)","MVBURR(1, 1, 1)","MVBURR(1, 1, 1)","MVBURR(1, 1, 1)","MVBURR(1, 1, 1)","MVBURR(1, 1, 1)","MVBURR(1, 1, 1)","MVBURR(1, 1, 1)","MVBURR(1, 1, 1)","MVBURR(1, 1, 1)","MVBURR(1, 1, 1)","MVBURR(1, 1, 1)","MVUNIF(1)","MVUNIF(1)","MVUNIF(1)","MVUNIF(1)","MVUNIF(1)","MVUNIF(1)","MVUNIF(1)","MVUNIF(1)","MVUNIF(1)","MVUNIF(1)","MVUNIF(1)","MVUNIF(1)"],["50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50","50"],["40","40","40","40","50","50","50","50","70","70","70","70","40","40","40","40","50","50","50","50","70","70","70","70","40","40","40","40","50","50","50","50","70","70","70","70","40","40","40","40","50","50","50","50","70","70","70","70","40","40","40","40","50","50","50","50","70","70","70","70"]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>Method<\/th>\n      <th>est. Power<\/th>\n      <th>Distribution<\/th>\n      <th>n<\/th>\n      <th>p<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"pageLength":4,"columnDefs":[{"orderable":false,"targets":0},{"name":" ","targets":0},{"name":"Method","targets":1},{"name":"est. Power","targets":2},{"name":"Distribution","targets":3},{"name":"n","targets":4},{"name":"p","targets":5}],"order":[],"autoWidth":false,"orderClasses":false,"lengthMenu":[4,10,25,50,100]}},"evals":[],"jsHooks":[]}</script>

---

# References

Chen, H. & Xia, Y. (2023). A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach, Journal of the American Statistical Association,
118, 719-731, https://doi.org/10.1080/01621459.2021.1953507

Györfi, L. & Van Der Meulen, E. C. (1991). A consistent goodness of fit test based on the total variation distance. In Nonparametric Functional Estimation and Related Topics (pp. 631-645). Dordrecht: Springer Netherlands.

Kuismin, M. (2025). Using the rejection sampling for finding tests. arXiv preprint arXiv:2509.10325, 
https://doi.org/10.48550/arXiv.2509.10325

Székely, G. J. & Rizzo, M. L. (2005). A new test for multivariate normality. Journal of Multivariate Analysis 93, 58-80, https://doi.org/10.1016/j.jmva.2003.12.002

Zhou, M. & Shao, Y. (2014). A powerful test for multivariate normality. Journal of applied statistics, 41, 351-363, https://doi.org/10.1080/02664763.2013.839637

---

[<svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M579.8 267.7c56.5-56.5 56.5-148 0-204.5c-50-50-128.8-56.5-186.3-15.4l-1.6 1.1c-14.4 10.3-17.7 30.3-7.4 44.6s30.3 17.7 44.6 7.4l1.6-1.1c32.1-22.9 76-19.3 103.8 8.6c31.5 31.5 31.5 82.5 0 114L422.3 334.8c-31.5 31.5-82.5 31.5-114 0c-27.9-27.9-31.5-71.8-8.6-103.8l1.1-1.6c10.3-14.4 6.9-34.4-7.4-44.6s-34.4-6.9-44.6 7.4l-1.1 1.6C206.5 251.2 213 330 263 380c56.5 56.5 148 56.5 204.5 0L579.8 267.7zM60.2 244.3c-56.5 56.5-56.5 148 0 204.5c50 50 128.8 56.5 186.3 15.4l1.6-1.1c14.4-10.3 17.7-30.3 7.4-44.6s-30.3-17.7-44.6-7.4l-1.6 1.1c-32.1 22.9-76 19.3-103.8-8.6C74 372 74 321 105.5 289.5L217.7 177.2c31.5-31.5 82.5-31.5 114 0c27.9 27.9 31.5 71.8 8.6 103.9l-1.1 1.6c-10.3 14.4-6.9 34.4 7.4 44.6s34.4 6.9 44.6-7.4l1.1-1.6C433.5 260.8 427 182 377 132c-56.5-56.5-148-56.5-204.5 0L60.2 244.3z"/></svg> github.io](https://markkukuismin.github.io/)<br/>
[<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M16.1 260.2c-22.6 12.9-20.5 47.3 3.6 57.3L160 376V479.3c0 18.1 14.6 32.7 32.7 32.7c9.7 0 18.9-4.3 25.1-11.8l62-74.3 123.9 51.6c18.9 7.9 40.8-4.5 43.9-24.7l64-416c1.9-12.1-3.4-24.3-13.5-31.2s-23.3-7.5-34-1.4l-448 256zm52.1 25.5L409.7 90.6 190.1 336l1.2 1L68.2 285.7zM403.3 425.4L236.7 355.9 450.8 116.6 403.3 425.4z"/></svg> markku.kuismin@oulu.fi](markku.kuismin@oulu.fi)<br>
]]