# Regression

Working Group 8.42

# Description

Regression problems occur in many metrological applications, e.g. in everyday calibration tasks (as illustrated in Annex H.3 of the GUM), in the evaluation of interlaboratory comparisons, the characterization of sensors [Matthews et al., 2014], determination of fundamental constants [Bodnar et al., 2014], interpolation or prediction tasks [Wübbeler et al., 2012], and many more. Such problems arise when the quantity of interest cannot be measured directly, but has to be inferred from measurement data (and their uncertainties) using a mathematical model that relates the quantity of interest to the data. For example, regressions may serve to evaluate the functional relation between variables.

#### Definition and Examples

Regression problems often take the form
$$\begin{equation*} y_i = f_{\boldsymbol{\theta}}(x_i) + \varepsilon_i , \quad i=1, \ldots, n \,, \end{equation*}$$
where the measurements $\boldsymbol{y}=(y_1, \ldots, y_n)^\top$ are explained by a function $f_{\boldsymbol{\theta}}$ evaluated at values $\boldsymbol{x}=(x_1, \ldots, x_n)^\top$ and depending on unknown parameters $\boldsymbol{\theta}=(\theta_1, \ldots, \theta_p)^\top$. The measurement error $\pmb{\varepsilon}=(\varepsilon_1, \ldots, \varepsilon_n)^\top$ follows a specified distribution $p(\pmb{\varepsilon} | \boldsymbol{\theta}, \boldsymbol{\sigma}).$
Regressions may be used to describe the relationship between a traceable, highly accurate reference device with values denoted by $x$ and a device to be calibrated with values denoted by $y$. The pairs $(x_i,y_i)$ then denote simultaneous measurements made by the two devices of the same measurand such as, for example, temperature.
A simple example is the Normal straight line regression model (as illustrated in Figure 1):
$$\label{int_reg_eq1} y_i = \theta_1 + \theta_2 x_i + \varepsilon_i , \quad \varepsilon_i \stackrel{iid}{\sim} \text{N}(0, \sigma^2), \quad i=1, \ldots, n \,.$$
The basic goal of regression tasks is to estimate the unknown parameters $\pmb{\theta}$ of the regression function and possibly also the unknown parameters of the error distribution $\pmb{\sigma}$. The estimated regression model may then be used to evaluate the shape of the regression function, predictions or interpolations of intermediate or extrapolated $x$-values, or to invert the regression function to predict $x$-values for new measurements.

# Research

Decisions based on regression analyses require a reliable evaluation of measurement uncertainty. The current state of the art in uncertainty evaluation in metrology (i.e. the GUM and its supplements) provides little guidance for regression, however. One reason is that the GUM guidelines are based on a model that relates the quantity of interest (the measurand) to the input quantities. Yet, regression models cannot be uniquely formulated as such a measurement function. By way of example, Annex H.3 of the GUM nevertheless suggests a possibility for analyzing regression problems. However, this analysis contains elements from both classical (least squares) and Bayesian statistics such that the results are not deduced from state-of-knowledge distributions and usually differ from a purely classical or Bayesian approach which was shown in [Elster et al., 2011].

Consequently, there is a need for guidance and research in metrology for uncertainty evaluation in regression problems. The Joint Committee for Guides in Metrology (JCGM) has recognized this need. PTB Working Group 8.42 lead the development of guidance for Bayesian inference of regression problems within the EMRP project NEW04, which is summarized in a Guide [Elster et al., 2015]. This Guide also contains template solutions for specific regression problems with known values $\boldsymbol{x}$ and is available free of charge at the NEW04 project web page. For regression problems with Gaussian measurement errors and linear regression functions (such as in formula (1)), [Klauenberg et al., 2015_2] provide guidance when extensive numerical calculations (such as Markov Chain Monte Carlo methods) are to be avoided in a Bayesian inference.

Regression problems often involve uncertainty in the x-values as well. Within the EMPIR project 17NRM05 EMUE three adaptable examples were developed, which illustrate different aspects of fitting a straight-line:

• For calibrating a sonic nozzle in line with the GUM, [Martens et.al., 2020a] demonstrates how all uncertainties involved can be quantified and emphasizes the importance of accounting for correlation.
• For two methods measuring haemoglobin, [Martens et.al., 2020b] quantifies the uncertainty when comparing measurement methods. In particular, the example demonstrates how correlations can be accounted for and shows their impact on regression estimates and uncertainties.
• For calibrating a torque measuring system and known x-values, [Martens et.al., 2020c] compares the approaches according to GUM and Bayes. The Bayesian approach is recommended because it accounts for little and different knowledge on the variability of each observation. Analytic expressions are supplied

In addition, PTB Working Group 8.42 carries out research emerging from metrological applications involving regression. For example,

• for the analysis of magnetic field fluctuation thermometry, [Wübbeler et al., 2012] propose and validate a Bayesian and [Wübbeler et al., 2013] a simplified approach  to perform interpolations or predictions based on regression results,
• for the determination of fundamental constants, [Bodnar et al., 2014] provide an objective Bayesian inference and compare it to the Birge ratio method,
• for the analysis of immunological tests called ELISA, [Klauenberg et al., 2015] have developed informative prior distributions which are widely applicable,
• for the calibration of flow meters, [Kok et al., 2015] provide a Bayesian analysis which accounts for constraints on the values of the regression curve.

# Publications

## Publication single view

### Article

Title: Compressive nano-FTIR chemical mapping G. Wübbeler, M. Marschall, E. Rühl, B. Kästner;C. Elster Measurement Science and Technology 2021 33 035402 accepted 10.1088/1361-6501/ac407a 8.4,8.42,LargeScaleDataAna,Regression

Back to the list view