# Regression

Working Group 8.42

# Description

Regression problems occur in many metrological applications, e.g. in everyday calibration tasks (as illustrated in Annex H.3 of the GUM), in the evaluation of interlaboratory comparisons, the characterization of sensors [Matthews et al., 2014], determination of fundamental constants [Bodnar et al., 2014], interpolation or prediction tasks [Wübbeler et al., 2012], and many more. Such problems arise when the quantity of interest cannot be measured directly, but has to be inferred from measurement data (and their uncertainties) using a mathematical model that relates the quantity of interest to the data. For example, regressions may serve to evaluate the functional relation between variables.

#### Definition and Examples

Regression problems often take the form
$$\begin{equation*} y_i = f_{\boldsymbol{\theta}}(x_i) + \varepsilon_i , \quad i=1, \ldots, n \,, \end{equation*}$$
where the measurements $\boldsymbol{y}=(y_1, \ldots, y_n)^\top$ are explained by a function $f_{\boldsymbol{\theta}}$ evaluated at values $\boldsymbol{x}=(x_1, \ldots, x_n)^\top$ and depending on unknown parameters $\boldsymbol{\theta}=(\theta_1, \ldots, \theta_p)^\top$. The measurement error $\pmb{\varepsilon}=(\varepsilon_1, \ldots, \varepsilon_n)^\top$ follows a specified distribution $p(\pmb{\varepsilon} | \boldsymbol{\theta}, \boldsymbol{\sigma}).$
Regressions may be used to describe the relationship between a traceable, highly accurate reference device with values denoted by $x$ and a device to be calibrated with values denoted by $y$. The pairs $(x_i,y_i)$ then denote simultaneous measurements made by the two devices of the same measurand such as, for example, temperature.
A simple example is the Normal straight line regression model (as illustrated in Figure 1):
$$\label{int_reg_eq1} y_i = \theta_1 + \theta_2 x_i + \varepsilon_i , \quad \varepsilon_i \stackrel{iid}{\sim} \text{N}(0, \sigma^2), \quad i=1, \ldots, n \,.$$
The basic goal of regression tasks is to estimate the unknown parameters $\pmb{\theta}$ of the regression function and possibly also the unknown parameters of the error distribution $\pmb{\sigma}$. The estimated regression model may then be used to evaluate the shape of the regression function, predictions or interpolations of intermediate or extrapolated $x$-values, or to invert the regression function to predict $x$-values for new measurements.

# Research

Decisions based on regression analyses require a reliable evaluation of measurement uncertainty. The current state of the art in uncertainty evaluation in metrology (i.e. the GUM and its supplements) provides little guidance for regression, however. One reason is that the GUM guidelines are based on a model that relates the quantity of interest (the measurand) to the input quantities. Yet, regression models cannot be uniquely formulated as such a measurement function. By way of example, Annex H.3 of the GUM nevertheless suggests a possibility for analyzing regression problems. However, this analysis contains elements from both classical (least squares) and Bayesian statistics such that the results are not deduced from state-of-knowledge distributions and usually differ from a purely classical or Bayesian approach which was shown in [Elster et al., 2011].

Consequently, there is a need for guidance and research in metrology for uncertainty evaluation in regression problems. The Joint Committee for Guides in Metrology (JCGM) has recognized this need. PTB Working Group 8.42 lead the development of guidance for Bayesian inference of regression problems within the EMRP project NEW04, which is summarized in a Guide [Elster et al., 2015]. This Guide also contains template solutions for specific regression problems with known values $\boldsymbol{x}$ and is available free of charge at the NEW04 project web page. For regression problems with Gaussian measurement errors and linear regression functions (such as in formula (1)), [Klauenberg et al., 2015_2] provide guidance when extensive numerical calculations (such as Markov Chain Monte Carlo methods) are to be avoided in a Bayesian inference.

Regression problems often involve uncertainty in the x-values as well. Within the EMPIR project 17NRM05 EMUE three adaptable examples were developed, which illustrate different aspects of fitting a straight-line:

• For calibrating a sonic nozzle in line with the GUM, [Martens et.al., 2020a] demonstrates how all uncertainties involved can be quantified and emphasizes the importance of accounting for correlation.
• For two methods measuring haemoglobin, [Martens et.al., 2020b] quantifies the uncertainty when comparing measurement methods. In particular, the example demonstrates how correlations can be accounted for and shows their impact on regression estimates and uncertainties.
• For calibrating a torque measuring system and known x-values, [Martens et.al., 2020c] compares the approaches according to GUM and Bayes. The Bayesian approach is recommended because it accounts for little and different knowledge on the variability of each observation. Analytic expressions are supplied

In addition, PTB Working Group 8.42 carries out research emerging from metrological applications involving regression. For example,

• for the analysis of magnetic field fluctuation thermometry, [Wübbeler et al., 2012] propose and validate a Bayesian and [Wübbeler et al., 2013] a simplified approach  to perform interpolations or predictions based on regression results,
• for the determination of fundamental constants, [Bodnar et al., 2014] provide an objective Bayesian inference and compare it to the Birge ratio method,
• for the analysis of immunological tests called ELISA, [Klauenberg et al., 2015] have developed informative prior distributions which are widely applicable,
• for the calibration of flow meters, [Kok et al., 2015] provide a Bayesian analysis which accounts for constraints on the values of the regression curve.

# Publications

## Publication single view

### Article

Title: A tutorial on Bayesian Normal linear regression K. Klauenberg, G. Wübbeler, B. Mickan, P. Harris;C. Elster Metrologia 2015 52 6 878--892 10.1088/0026-1394/52/6/878 8.42, Regression, Unsicherheit Regression is a common task in metrology and often applied to calibrate instruments, evaluate inter-laboratory comparisons or determine fundamental constants, for example. Yet, a regression model cannot be uniquely formulated as a measurement function, and consequently the Guide to the Expression of Uncertainty in Measurement (GUM) and its supplements are not applicable directly. Bayesian inference, however, is well suited to regression tasks, and has the advantage of accounting for additional a priori information, which typically robustifies analyses. Furthermore, it is anticipated that future revisions of the GUM shall also embrace the Bayesian view.Guidance on Bayesian inference for regression tasks is largely lacking in metrology. For linear regression models with Gaussian measurement errors this tutorial gives explicit guidance. Divided into three steps, the tutorial first illustrates how a priori knowledge, which is available from previous experiments, can be translated into prior distributions from a specific class. These prior distributions have the advantage of yielding analytical, closed form results, thus avoiding the need to apply numerical methods such as Markov Chain Monte Carlo. Secondly, formulas for the posterior results are given, explained and illustrated, and software implementations are provided. In the third step, Bayesian tools are used to assess the assumptions behind the suggested approach.These three steps (prior elicitation, posterior calculation, and robustness to prior uncertainty and model adequacy) are critical to Bayesian inference. The general guidance given here for Normal linear regression tasks is accompanied by a simple, but real-world, metrological example. The calibration of a flow device serves as a running example and illustrates the three steps. It is shown that prior knowledge from previous calibrations of the same sonic nozzle enables robust predictions even for extrapolations.

Back to the list view