#### Content

# Description

Regression problems occur in many metrological applications, e.g. in everyday calibration tasks (as illustrated in Annex H.3 of the GUM), in the evaluation of interlaboratory comparisons, the characterization of sensors [Matthews et al., 2014], determination of fundamental constants [Bodnar et al., 2014], interpolation or prediction tasks [Wübbeler et al., 2012], and many more. Such problems arise when the quantity of interest cannot be measured directly, but has to be inferred from measurement data (and their uncertainties) using a mathematical model that relates the quantity of interest to the data. For example, regressions may serve to evaluate the functional relation between variables.

#### Definition and Examples

Regression problems often take the form

$$

\begin{equation*}

y_i = f_{\boldsymbol{\theta}}(x_i) + \varepsilon_i , \quad i=1, \ldots, n \,,

\end{equation*}

$$

where the measurements $\boldsymbol{y}=(y_1, \ldots, y_n)^\top$ are explained by a function $f_{\boldsymbol{\theta}}$ evaluated at values $\boldsymbol{x}=(x_1, \ldots, x_n)^\top$ and depending on unknown parameters $\boldsymbol{\theta}=(\theta_1, \ldots, \theta_p)^\top$. The measurement error $\pmb{\varepsilon}=(\varepsilon_1, \ldots, \varepsilon_n)^\top$ follows a specified distribution $p(\pmb{\varepsilon} | \boldsymbol{\theta}, \boldsymbol{\sigma}).$

Regressions may be used to describe the relationship between a traceable, highly accurate reference device with values denoted by $x$ and a device to be calibrated with values denoted by $y$. The pairs $(x_i,y_i)$ then denote simultaneous measurements made by the two devices of the same measurand such as, for example, temperature.

A simple example is the Normal straight line regression model (as illustrated in Figure 1):

$$

\begin{equation} \label{int_reg_eq1}

y_i = \theta_1 + \theta_2 x_i + \varepsilon_i , \quad \varepsilon_i \stackrel{iid}{\sim} \text{N}(0, \sigma^2), \quad i=1, \ldots, n \,.

\end{equation}

$$

The basic goal of regression tasks is to estimate the unknown parameters $\pmb{\theta}$ of the regression function and possibly also the unknown parameters of the error distribution $\pmb{\sigma}$. The estimated regression model may then be used to evaluate the shape of the regression function, predictions or interpolations of intermediate or extrapolated $x$-values, or to invert the regression function to predict $x$-values for new measurements.

# Research

Decisions based on regression analyses require a reliable evaluation of measurement uncertainty. The current state of the art in uncertainty evaluation in metrology (i.e. the GUM and its supplements) provides little guidance for regression, however. One reason is that the GUM guidelines are based on a model that relates the quantity of interest (the measurand) to the input quantities. Yet, regression models cannot be uniquely formulated as such a measurement function. By way of example, Annex H.3 of the GUM nevertheless suggests a possibility for analyzing regression problems. However, this analysis contains elements from both classical (least squares) and Bayesian statistics such that the results are not deduced from state-of-knowledge distributions and usually differ from a purely classical or Bayesian approach which was shown in [Elster et al., 2011].

Consequently, there is a need for guidance and research in metrology for uncertainty evaluation in regression problems. The Joint Committee for Guides in Metrology (JCGM) has recognized this need. PTB Working Group 8.42 lead the development of guidance for Bayesian inference of regression problems within the EMRP project NEW04, which is summarized in a Guide [Elster et al., 2015]. This Guide also contains template solutions for specific regression problems with known values $\boldsymbol{x}$ and is available free of charge at the NEW04 project web page. For regression problems with Gaussian measurement errors and linear regression functions (such as in formula (1)), [Klauenberg et al., 2015_2] provide guidance when extensive numerical calculations (such as Markov Chain Monte Carlo methods) are to be avoided in a Bayesian inference.

In addition, PTB Working Group 8.42 carries out research emerging from metrological applications involving regression. For example,

- for the analysis of magnetic field fluctuation thermometry, [Wübbeler et al., 2012] propose and validate a Bayesian and [Wübbeler et al., 2013] a simplified approach to perform interpolations or predictions based on regression results,
- for the determination of fundamental constants, [Bodnar et al., 2014] provide an objective Bayesian inference and compare it to the Birge ratio method,
- for the analysis of immunological tests called ELISA, [Klauenberg et al., 2015] have developed informative prior distributions which are widely applicable,
- for the calibration of flow meters, [Kok et al., 2015] provide a Bayesian analysis which accounts for constraints on the values of the regression curve.

# Software

In order to facilitate the application of the methods developed in the working group, the following software implementations are made available free of charge.

**MCMC implementation for the analysis of magnetic field fluctuation thermometry**Bayesian approaches to performing regression often require numerical methods such as Markov Chain Monte Carlo (MCMC) sampling. For the analysis in magnetic field fluctuation thermometry, PTB Working Group 8.42 has developed a MATLAB software package to perform MCMC sampling from the posterior distribution of the calibration parameters and to subsequently estimate temperatures.

This software is available in the electronic supplement to the related publication.- Related publication
G. Wübbeler, F. Schmähling, J. Beyer, J. Engert, and C. Elster (2012).

*Analysis of magnetic field fluctuation thermometry using Bayesian inference.***Meas. Sci. Technol.**23, 125004 (9pp), [DOI: 1018088/0957-0233/23/12/125004].

**WinBUGS software for the analysis of immunoassay data**The Bayesian approach enables the inclusion of additional prior knowledge in regression problems, but often requires numerical methods such as Markov Chain Monte Carlo (MCMC) sampling. For the analysis of immunoassay data, PTB Working Group 8.42 has developed WinBUGS software code to perform MCMC sampling from the posterior distribution for the calibration parameters and the unknown concentration.

This software is available in A Guide to Bayesian Inference for Regression Problems.- Related publications
K. Klauenberg, M. Walzel, B. Ebert, and C. Elster (2015).

*Informative prior distributions for ELISA analyses.***Biostatistics**16, 454-464, [DOI: 10.1093/biostatistics/kxu057].C. Elster, K. Klauenberg, M. Walzel, G. Wübbeler, P. Harris, M. Cox, C. Matthews, I. Smith, L. Wright, A. Allard, N. Fischer, S. Cowen, S. Ellison, P. Wilson, F. Pennecchi, G. Kok, A. van der Veen, and L. Pendrill (2015).

*A Guide to Bayesian Inference for Regression Problems Deliverable of EMRP project NEW04 “Novel mathematical and statistical approaches to uncertainty evaluation”*, [download (pdf)].

**Rejection sampling for the flow meter calibration problem**Bayesian approaches to Normal linear regression problems yield analytical solutions under certain circumstances. Nevertheless, accounting for constraints on the values of the regression curve when calibrating flow meters requires a Monte Carlo procedure combined with an accept/reject algorithm to obtain samples from the posterior distribution.

MATLAB source code implementing this algorithm is available in A Guide to Bayesian Inference for Regression Problems- Related publications
G. J. P. Kok, A. M. H. van der Veen, P. M. Harris, I.M. Smith, C. Elster (2015).

*Bayesian analysis of a flow meter calibration problem.***Metrologia**52, 392-399, [DOI: 10.1088/0026-1394/52/2/392].C. Elster, K. Klauenberg, M. Walzel, G. Wübbeler, P. Harris, M. Cox, C. Matthews, I. Smith, L. Wright, A. Allard, N. Fischer, S. Cowen, S. Ellison, P. Wilson, F. Pennecchi, G. Kok, A. van der Veen, and L. Pendrill (2015).

*A Guide to Bayesian Inference for Regression Problems Deliverable of EMRP project NEW04 “Novel mathematical and statistical approaches to uncertainty evaluation”*, [download (pdf)].

**Software for Bayesian Normal linear regression**Under certain circumstances, Bayesian approaches to Normal linear regression problems yield analytical solutions. In connection with a tutorial, PTB Working Group 8.42 provides software to calculate the posterior distribution of all regression parameters, the regression curve, predictions as well as for estimates, most uncertainties and credible intervals, and also graphically represent these quantities.

MATLAB and R source code for these calculations is available at- Related Publications
K. Klauenberg, G. Wübbeler, B. Mickan, P. M. Harris, and C. Elster. (2015).

*A Tutorial on Bayesian Normal Linear Regression.***Metrologia**, 52, 878–892. [DOI: 10.1088/0026-1394/52/6/878]C. Elster, K. Klauenberg, M. Walzel, G. Wübbeler, P. Harris, M. Cox, C. Matthews, I. Smith, L. Wright, A. Allard, N. Fischer, S. Cowen, S. Ellison, P. Wilson, F. Pennecchi, G. Kok, A. van der Veen, and L. Pendrill (2015).

*A Guide to Bayesian Inference for Regression Problems Deliverable of EMRP project NEW04 “Novel mathematical and statistical approaches to uncertainty evaluation”*, [download (pdf)].

**An introductory example for Markov chain Monte Carlo (MCMC)**When the Guide to the Expression of Uncertainty in Measurement (GUM) and methods from its supplements are not applicable, the Bayesian approach may be a valid and welcome alternative. Evaluating the posterior distribution, estimates or uncertainties involved in Bayesian inferences often requires numerical methods to avoid high-dimensional integrations. Markov chain Monte Carlo (MCMC) sampling is such a method—powerful, flexible and widely applied. PTB Working Group 8.42 has developed a concise introduction, illustrated by a simple, typical example from metrology. Accompanied with few lines of software code to implement the most basic and yet flexible MCMC method, interested readers are invited to get started. MATLAB as well as R source code are available in the related publication.

- Related Publication
K. Klauenberg und C. Elster Markov chain Monte Carlo methods: an introductory example.

**Metrologia**, 53(1), S32, 2016. [DOI: 10.1088/0026-1394/53/1/S32]

# Publications

## Publication single view

### Article

Title: | Robust Bayesian linear regression with application to an analysis of the CODATA values for the Planck constant |
---|---|

Author(s): | G. Wübbeler, O. Bodnar and C. Elster |

Journal: | Metrologia |

Year: | 2018 |

Volume: | 55 |

Issue: | 1 |

Pages: | 20 |

DOI: | 10.1088/1681-7575/aa98aa |

Tags: | 8.4,8.42,Unsicherheit,Regression |