# Regression

Working Group 8.42

# Description

Regression problems occur in many metrological applications, e.g. in everyday calibration tasks (as illustrated in Annex H.3 of the GUM), in the evaluation of interlaboratory comparisons, the characterization of sensors [Matthews et al., 2014], determination of fundamental constants [Bodnar et al., 2014], interpolation or prediction tasks [Wübbeler et al., 2012], and many more. Such problems arise when the quantity of interest cannot be measured directly, but has to be inferred from measurement data (and their uncertainties) using a mathematical model that relates the quantity of interest to the data. For example, regressions may serve to evaluate the functional relation between variables.

#### Definition and Examples

Regression problems often take the form
$$\begin{equation*} y_i = f_{\boldsymbol{\theta}}(x_i) + \varepsilon_i , \quad i=1, \ldots, n \,, \end{equation*}$$
where the measurements $\boldsymbol{y}=(y_1, \ldots, y_n)^\top$ are explained by a function $f_{\boldsymbol{\theta}}$ evaluated at values $\boldsymbol{x}=(x_1, \ldots, x_n)^\top$ and depending on unknown parameters $\boldsymbol{\theta}=(\theta_1, \ldots, \theta_p)^\top$. The measurement error $\pmb{\varepsilon}=(\varepsilon_1, \ldots, \varepsilon_n)^\top$ follows a specified distribution $p(\pmb{\varepsilon} | \boldsymbol{\theta}, \boldsymbol{\sigma}).$
Regressions may be used to describe the relationship between a traceable, highly accurate reference device with values denoted by $x$ and a device to be calibrated with values denoted by $y$. The pairs $(x_i,y_i)$ then denote simultaneous measurements made by the two devices of the same measurand such as, for example, temperature.
A simple example is the Normal straight line regression model (as illustrated in Figure 1):
$$\label{int_reg_eq1} y_i = \theta_1 + \theta_2 x_i + \varepsilon_i , \quad \varepsilon_i \stackrel{iid}{\sim} \text{N}(0, \sigma^2), \quad i=1, \ldots, n \,.$$
The basic goal of regression tasks is to estimate the unknown parameters $\pmb{\theta}$ of the regression function and possibly also the unknown parameters of the error distribution $\pmb{\sigma}$. The estimated regression model may then be used to evaluate the shape of the regression function, predictions or interpolations of intermediate or extrapolated $x$-values, or to invert the regression function to predict $x$-values for new measurements.

# Research

Decisions based on regression analyses require a reliable evaluation of measurement uncertainty. The current state of the art in uncertainty evaluation in metrology (i.e. the GUM and its supplements) provides little guidance for regression, however. One reason is that the GUM guidelines are based on a model that relates the quantity of interest (the measurand) to the input quantities. Yet, regression models cannot be uniquely formulated as such a measurement function. By way of example, Annex H.3 of the GUM nevertheless suggests a possibility for analyzing regression problems. However, this analysis contains elements from both classical (least squares) and Bayesian statistics such that the results are not deduced from state-of-knowledge distributions and usually differ from a purely classical or Bayesian approach which was shown in [Elster et al., 2011].

Consequently, there is a need for guidance and research in metrology for uncertainty evaluation in regression problems. The Joint Committee for Guides in Metrology (JCGM) has recognized this need. PTB Working Group 8.42 lead the development of guidance for Bayesian inference of regression problems within the EMRP project NEW04, which is summarized in a Guide [Elster et al., 2015]. This Guide also contains template solutions for specific regression problems with known values $\boldsymbol{x}$ and is available free of charge at the NEW04 project web page. For regression problems with Gaussian measurement errors and linear regression functions (such as in formula (1)), [Klauenberg et al., 2015_2] provide guidance when extensive numerical calculations (such as Markov Chain Monte Carlo methods) are to be avoided in a Bayesian inference.

Regression problems often involve uncertainty in the x-values as well. Within the EMPIR project 17NRM05 EMUE three adaptable examples were developed, which illustrate different aspects of fitting a straight-line:

• For calibrating a sonic nozzle in line with the GUM, [Martens et.al., 2020a] demonstrates how all uncertainties involved can be quantified and emphasizes the importance of accounting for correlation.
• For two methods measuring haemoglobin, [Martens et.al., 2020b] quantifies the uncertainty when comparing measurement methods. In particular, the example demonstrates how correlations can be accounted for and shows their impact on regression estimates and uncertainties.
• For calibrating a torque measuring system and known x-values, [Martens et.al., 2020c] compares the approaches according to GUM and Bayes. The Bayesian approach is recommended because it accounts for little and different knowledge on the variability of each observation. Analytic expressions are supplied

In addition, PTB Working Group 8.42 carries out research emerging from metrological applications involving regression. For example,

• for the analysis of magnetic field fluctuation thermometry, [Wübbeler et al., 2012] propose and validate a Bayesian and [Wübbeler et al., 2013] a simplified approach  to perform interpolations or predictions based on regression results,
• for the determination of fundamental constants, [Bodnar et al., 2014] provide an objective Bayesian inference and compare it to the Birge ratio method,
• for the analysis of immunological tests called ELISA, [Klauenberg et al., 2015] have developed informative prior distributions which are widely applicable,
• for the calibration of flow meters, [Kok et al., 2015] provide a Bayesian analysis which accounts for constraints on the values of the regression curve.

# Publications

## Publication single view

### Article

Title: Informative prior distributions for ELISA analyses K. Klauenberg, M. Walzel, B. Ebert;C. Elster Biostatistics 2015 16 3 454--64 10.1093/biostatistics/kxu057 1468-4357 http://biostatistics.oxfordjournals.org/content/16/3/454 Regression, 8.42, ELISA Immunoassays are capable of measuring very small concentrations of substances in solutions and have an immense range of application. Enzyme-linked immunosorbent assay (ELISA) tests in particular can detect the presence of an infection, of drugs, or hormones (as in the home pregnancy test). Inference of an unknown concentration via ELISA usually involves a non-linear heteroscedastic regression and subsequent prediction, which can be carried out in a Bayesian framework. For such a Bayesian inference, we are developing informative prior distributions based on extensive historical ELISA tests as well as theoretical considerations. One consideration regards the quality of the immunoassay leading to two practical requirements for the applicability of the priors. Simulations show that the additional prior information can lead to inferences which are robust to reasonable perturbations of the model and changes in the design of the data. On real data, the applicability is demonstrated across different laboratories, for different analytes and laboratory equipment as well as for previous and current ELISAs with sigmoid regression function. Consistency checks on real data (similar to cross-validation) underpin the adequacy of the suggested priors. Altogether, the new priors may improve concentration estimation for ELISAs that fulfill certain design conditions, by extending the range of the analyses, decreasing the uncertainty, or giving more robust estimates. Future use of these priors is straightforward because explicit, closed-form expressions are provided. This work encourages development and application of informative, yet general, prior distributions for other types of immunoassays.

Back to the list view