Case Study 3 - The impact of traceability and uncertainty propagation in the assessment of ocean noise

The data recorded by a sensor operating in the field, possibly as part of a deployed sensor network, are used by different end-user communities for different purposes. For example, in the case of underwater acoustic measurement, the data might be used for event detection and attribution to derive temporal, spatial and amplitude information about the event, which can be anthropogenic or natural. Another use might be for ocean noise monitoring in which the data are used to derive metrics for ambient noise maps for a given spatial region and time period, for example, the percentage of time that an exposure threshold for sound pressure level is exceeded. Yet another use might be for environmental monitoring in which the data are used to derive long-term (e.g., decadal) trends in sound pressure level and to correlate characteristics of the recorded sound with natural and anthropogenic sound sources.

The Infra-AUV project has delivered the first traceable calibration methods supporting acoustic measurements in the ocean at low frequencies, as well as collecting knowledge of the performance of hydrophones in-situ. The project has also investigated methods for the propagation of uncertainty, including that associated with the calibration of a sensor, through models for high-level derived parameters related to various end-user applications. Quantifying reliably the uncertainty for estimates of these high-level parameters is essential when those estimates are used for decision-making or to inform policy, as well as to understand the comparability and consistency of estimates relating to different locations and times.

 

 

Stages in data processing chain: raw digital data collected over four days (top), sound power spectral density level (middle), and percentile values of sound power spectral density level representing summary statistics extracted from the distributions of values taken over each day (bottom). (The dotted lines joining the daily values are included for purposes of visualisation only.)

The models involved in such applications are often complicated and computationally intensive. An example related to underwater acoustic measurement is illustrated in the Figure above, which shows the data at various stages in its processing. The top graph shows the raw digital data (in counts) recorded over four days by a hydrophone in the International Monitoring System of the CTBT. The middle graph shows values of sound power spectral density level (SPSDL) derived from the raw data using information about the measuring system in terms of a scaling factor (for converting counts to values of sound pressure in pascals) and its calibration provided as a frequency response. Finally, the bottom graph shows percentile values of SPSDL representing summary statistics extracted from the distributions of values taken over each day. These statistics are examples of the high-level derived parameters of interest,  providing information about the overall soundscape that allows end-users to estimate  the respective contribution of natural and anthropogenic sound sources.

The complexity of the models means that numerical methods, such as a Monte Carlo method [GUMS1, GUMS2], provide a practical approach to undertake the propagation of uncertainty. The advantages of the approach are that it is generally applicable, it does not require the explicit calculation of sensitivity coefficients, and it makes no assumptions about the nature of the model or about the nature of the probability distributions for the derived parameters. The disadvantage of the approach, however, is that it can be computationally expensive. To illustrate the output of the Monte Carlo calculation, the next Figure shows the approximations to the probability density functions for the five statistical percentiles for the last aggregation period (day 4) for the data shown above. The distributions are well-separated, and closer inspection shows that they appear “Gaussian” in their form.

Combination of influences: approximations to the probability density functions for the five statistical percentiles for the last aggregation period obtained from a Monte Carlo calculation.

In this example, for which there are only twenty high-level derived parameters, it is straightforward to store and work with the covariance matrix for those outputs. However, a challenge is to do so when the observation period lasts for many years, and the size of the covariance matrix becomes prohibitively large to treat in full. For example, considering an aggregation time interval of 1 day for an observation period of 15 years and all five summary statistics would generate 27,375 high-level derived parameters, and a covariance matrix of 749,390,625 elements. However, it is possible to exploit structure in the covariance matrix to provide a compact, albeit approximate, representation of the matrix. The structure arises from the fact that some contributions to the uncertainty propagate through the measurement model as essentially uncorrelated, random effects whereas others are realised as essentially correlated, systematic effects on the derived parameters, and information about the two types of contribution are stored separately and in compact forms.

Correlation matrices obtained from, respectively, full and compact representations of the covariance matrix.

The final Figure above shows the correlation matrices obtained from, respectively, full and compact representations of the covariance matrix, which have a very similar structure.

 

References

[GUMS1] BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML. Supplement 1 to the ‘Guide to the Expression of Uncertainty in Measurement’ – Propagation of distributions using a Monte Carlo method, JCGM 101:2008. BIPM, 2008.

[GUMS2] BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, and OIML. Supplement 2 to the ‘Guide to the Expression of Uncertainty in Measurement’ – Extension to any number of output quantities, JCGM 102:2011. BIPM, 2011.