# Numerical Methods

Working Group 8.43

*PyThia* is an open-source software package designed to generate polynomial chaos surrogates for high-dimensional functions in a non-intrusive fashion. The package has been developed in the context of the ZIM-ZF4014017RR7 project to determine uncertainties associated to the reconstruction values of shape parameters for nano structures such as photolithography masks, but the generality of the underlying mathematical concepts allows to employ PyThia to many other applications without any adaptations. For more information on the *PyThia* software package, please visit the official Homepage.

*PyThia* is publicly available and free to use.

The git repository also describes how to contribute to *PyThia*, which might come in handy if one wants to add some key features for an application.

Using the internet, the easiest way to install PyThia is via `pip`

, i.e.```
```

`pip install pythia-uq`

As an alternative, if for example no internet connection is available or one wants to test the latest development branch of *PyThia*, installation from source can be done with three simple commands as well, i.e.

`git clone gitlab1.ptb.de/pythia/pythia.git `

`cd pythia `

`pip install . `

The polynomial chaos method uses a weighted sum of orthonormal polynomials to approximate some target quantity that depends on multiple stochastic parameters.

Given a set of parameters $y=(y_1,\dots,y_M)\in\Gamma\subseteq\mathbb{R}^M$, assuming each parameter $y_m$ is distributed according to some measure $\pi_m$, it is possible to expand any square integrable function $f\in L^2(\Gamma,\pi;\mathbb{R}^J)$ into

$$f(y) = \sum_{\mu\in\mathbb{N}_0^M}f_\mu P_\mu(y),\qquad\text{for}\qquad f_\mu = \int_\Gamma f(y) P_\mu(y) \,\mathrm{d}\pi(y)\in\mathbb{R}^J.$$

Here we assume that the multivariate parameter distribution is given by $\pi=\prod_{m=1}^M\pi_m$ and that $\{P_\mu\}_{\mu\in\mathbb{N}_0^M}$ are orthogonal and normalized polynomials in $L^2(\Gamma,\pi;\mathbb{R}^J)$.Expansions of this type are called *(generalized) polynomial chaos expansions* (PCE).

An approximation (surrogate) of $f$ can be easily obtained by choosing some $\Lambda\subseteq\mathbb{N}_0^M$ and truncating the PCE

$$f(y) \approx f^{\mathrm{PC}}(y) = \sum_{\mu\in\Lambda}f_\mu P_\mu(y).$$

This surrogate has the advantage that evaluations $f^{\mathrm{PC}}(y^{(i)})$ in points $y^{(i)}\in\Gamma$ require only evaluations of the polynomial basis functions $P_\mu$ and thus circumvents possibly expensive evaluations of the original function $f$.Moreover, derivatives and antiderivatives of the surrogate $f^{\mathrm{PC}}$ are available analytically due to the polynomial.

An reasonable method to approximate the set of coefficient $\{f_\mu\}_{\mu\in\Lambda}$ simultaneously is through linear regression.

If we set

$$

\mathcal{V}_\Lambda = \{ v_N = \sum_{\mu\in\Lambda} v_\mu P_\mu \,\vert\, v_\mu\in\mathbb{R}^J \text{ for all }\mu\in\Lambda\}

$$

as the approximation space we can define the residuum $\mathcal{R}(v_N) = f - v_N$ for all $v_N\in\mathcal{V}_\Lambda$.

If we assume $0\in\Lambda$, $f_0 = \mathbb{E}_\pi[f]$ implies that $\mathcal{R}(v_N)$ is a zero mean random variable.

Minimizing the variance of $\mathcal{R}$ thus leads to the least-squares problem of finding $u_N\in\mathcal{V}_N$ such that

$$

u_N = \operatorname*{arg\, min}_{v_N\in\mathcal{V}_\Lambda} \Vert \mathcal{R}(v_N)\Vert_{L^2(\Gamma,\pi;\mathbb{R}^J)}^2.

$$

Computing the norm $\Vert \mathcal{R}(v_N)\Vert_{L^2(\Gamma,\pi;\mathbb{R}^J)}$ is typically impossible as no antiderivative of $f$ is known.

Using deterministic quadrature formulas to approximate the integral is possible for a small number of parameters ($M \leq 3$) but becomes infeasible for moderate or large $M$ as the number of quadrature points depend exponentially on the parameter dimension.

To avoid the high-dimensional deterministic quadrature, we approximate the integral in a Monte Carlo sense by an empirical semi-norm

$$

\Vert \mathcal{R}(v_N)\Vert_{L^2(\Gamma,\pi;\mathbb{R}^J)}^2

\approx \Vert \mathcal{R}(v_N)\Vert_n^2

:= \sum_{i=1}^n w^{(i)} \Vert \mathcal{R}(v_N(y^{(i)})) \Vert_2^2,

$$

where $y^{(1)},\dots,y^{(n)}$ are random realizations of the parameters $y\sim\pi$ and $w^{(i)}>0$ are appropriate weights for the empirical quadrature.

Using the first order optimality criterion for the empirical minimization problem

$$

u_N = \operatorname*{arg\, min}_{v_N\in\mathcal{V}_\Lambda} \Vert \mathcal{R}(v_N)\Vert_{n}^2

$$

yields a system of equations $\mathbf{G}_N \mathbf{u}_N = \mathbf{F}_N$, which we can solve for the PC coefficients $\mathbf{u}_N$ of $u_N$.

The matrix $\mathbf{G}_N\in\mathbb{R}^{\vert\Lambda\vert\times\vert\Lambda\vert}$ is the empirical Gram matrix and $\mathbf{F}_N\in\mathbb{R}^{\vert\Lambda\vert\times J}$ is the data array given respectively by

$$

\mathbf{G}_N[k,\ell] = \sum_{i=1}^n w^{(i)} P_{\iota(k)}(y^{(i)}) P_{\iota(\ell)}(y^{(i)}) \\

\qquad\text{and}\qquad \\

\mathbf{F}_N[\ell,j] = \sum_{i=1}^n w^{(i)} f_j(y^{(i)}) P_{\iota(\ell)}(y^{(i)}),

$$

where $\iota\colon\mathbb{N}^{\vert\Lambda\vert}\to\Lambda$ is an arbitrary enumeration of the set $\Lambda$.

As a result, we only rely on training data pairs $\{y^{(i)}, f(y^{(i)})\}_{i=1}^n$ and according weights $\{w^{(i)}\}$ to compute the coefficients $\{u_\mu\}$ of the polynomial chaos surrogate.

The image below exemplary shows the approximation of a function $f=(f_1,f_2)\colon[-7,7]\to\mathbb{R}^2$ for varying polynomial degrees by the linear regression described above using *PyThia*.

If we assume the training data are already given via `y_train`, `w_train` and `f_train`, respectively, the approximation can be generated using the following lines of code:

import pythia as pt

# define stochastic parameters

param = pt.parameter.RandomParameter(name="y", domain=[-7, 7], distribution="uniform")

# define multiindex set Lambda

Lambda = pt.index.PCMultiIndex(1)

Lambda.mdx = mdx.mdxFromShape([deg])

Lambda.refresh()

# generate PC surrogate

surrogate = pt.chaos.PolynomialChaos([param], Lambda, y_train, w_train, f_train)

# evaluate surrogate in 100 random realizations

y = np.random.uniform(-7, 7, (100, 1))

val = surrogate.approximation(y)

In many applications we have access to point evaluations of the target function $f$, however evaluating $f$ might be computationally expensive if for example $f$ is the solution of a differential equation.

In such cases it is desirable to keep the number of training data points $n$ as small as possible to circumvent extensive offline computation times to generate the function evaluations.

On the other hand, however, we need to ensure that the training sample points $y^{(i)}$ yield an empirical Gram matrix $\mathbf{G}_N$ which is invertible and well-conditioned to guarantee numerical stability.

In practice, as we use orthonormal polynomials as basis functions, the empirical Gram matrix converges to the identity as $n\to\infty$, hence we require that $\Vert \mathbf{G}_N-I\Vert_2$ is small.

This forces us to choose the training points $\{y^{(i)}\}$ in a clever way so that a minimal number of points yields a well-conditioned $\mathbf{G}_N$.

There exists many random and pseudo-random sampling schemes which perform with varying success depending on the function $f$ we want to approximate.

As we do not want to assume any additional knowledge about $f$, we want to rely purely on random sampling according to the parameter distribution.

The obvious choice for the training data would be to take $y^{(i)}$ as samples of $\pi$ with uniform weights $w^{(i)}=1/n$.

This, however, requires a large number of samples $n\sim\vert\Lambda\vert^2\log(\vert\Lambda\vert)$ to ensure that $\Vert \mathbf{G}_N-I\Vert_2$ is small as $\pi$ is not necessarily designed to approximate the inner products of the basis polynomials efficiently.

As an alternative Cohen & Migliorati 2017 suggests to draw samples from an adapted distribution.

The adapted measure includes information about the basis polynomials and is given by

$$

\mathrm{d}\rho_\Lambda = w_\Lambda^{-1} \mathrm{d}\pi

\qquad\text{for weights }\qquad

w_\Lambda^{-1}(y) = \frac{1}{\vert\Lambda\vert}\sum_{k=0}^{\vert\Lambda\vert-1} \vert P_{\iota(k)}(y)\vert^2.

$$

Using samples $y^{(i)}\sim\rho_\Lambda$ and corresponding $w^{(i)}=w_\Lambda(y^{(i)})$ as training data then yields that we requires only $n\sim\vert\Lambda\vert\log(\vert\Lambda\vert)$ samples to guarantee that $\mathbf{G}_N$ is close to the identity with high probability.

As we at least require $n\geq\vert\Lambda\vert$ sample points to guarantee that $\mathbf{G}_N$ is invertible (with high probability), it can also be shown that this upper bound on $n$ is optimal.

The following figure shows the optimal weighted least-squares density $\rho_\Lambda$ for different standard distributions and sets $\Lambda$.

Generating samples $y^{(i)}\sim\rho_\Lambda$ with *PyThia* can be done in a few lines of code and is implemented for a wide range of different distribution $\pi$.

The code below shows exemplary how the optimal samples for $\pi=\mathcal{U}([0,1]\times[-1,5])$ can be obtained:

import pythia as pt

# define stochastic parameters

param1 = pt.parameter.RandomParameter(name="y1", domain=[0, 1], distribution="uniform")

param2 = pt.parameter.RandomParameter(name="y2", domain=[-1, 5], distribution="uniform")

params = [param1, param2]

# define optimal weighted least-squares sampler on Lambda = {0,...,3} x {0,...,5}

deg1 = 3 # maximal polynomial degree for first parameter

deg2 = 5 # maximal polynomial degree for second parameter

sampler = pt.sampler.WLSTensorSampler(params, [deg1, deg2])

# use the sampler

y_train = sampler.sample(100_000) # draw samples

w_train = sampler.weight(y_train) # compute corresponding weights

val = sampler.pdf(y_train) # evaluate PDF

Often it is not clear how sensitive the target function $f$ is to changes in the individual parameters in any given application.

Thus it can be necessary to conduct a sensitivity analysis to see how much changes in the individual parameters change the target function.

Unfortunately, computing local or global paramter sensitivities is often very time consuming as either derivatives of $f$ have to be approximated or high-dimensional integrals need to be computed.

Assuming we have a polynomial chaos surrogate $f^{\mathrm{PC}}$ of $f$, we can use the results of Sudret 2008 to compute the so called *Sobolev **indices *without any computational overhead.

The Sobol indices $\operatorname{Sob}_{j_1,\dots,j_s}$ describe the share a change of the combination of parameters $j_1,\dots,j_s\in\{1,\dots,M\}$ has on the total variance of $f$.

If we consider the subsets of $\mathbb{N}_0^M$ for which exactly the components $j_1,\dots,j_s$ are larger then zero, i.e.

$$

\mathcal{F}_{j_1,\dots,j_s} = \{ \mu\in\mathbb{N}_0^M \,\vert\, \mu_j \neq 0 \Leftrightarrow j=j_1,\dots,j_s\},

$$

we can express the Sobol indices in terms of the polynomial chaos coefficients $\{f_\mu\}$ by

$$

\operatorname{Sob}_{j_1,\dots,j_s}

= \frac{\sum_{\mu\in\mathcal{F}_{j_1,\dots,j_s}} f_\mu^2}{\sum_{\mu\in\mathbb{N}_{0}^M\setminus\{0\}} f_\mu^2}

\approx \frac{\sum_{\mu\in\mathcal{F}_{j_1,\dots,j_s}\cap\Lambda} f_\mu^2}{\sum_{\mu\in\Lambda\setminus\{0\}} f_\mu^2}

=: \operatorname{Sob}_{j_1,\dots,j_s}^\Lambda.

$$

It is thus very easy to approximate the Sobol indices by squaring und summing the corresponding coefficients of the polynomial chaos expansion, which can be done without any significant computational overhead.

It is even possible to bound the error of the approximation

$$

\vert\operatorname{Sob}_{j_1,\dots,j_s} - \operatorname{Sob}_{j_1,\dots,j_s}^\Lambda\vert

\leq \frac{\Vert \mathcal{R}(u_N)\Vert_2^2}{\operatorname{Var}[u_N]}

$$

in terms of the approximate variance $\operatorname{Var}[u_N] = \sum_{\mu\in\Lambda\setminus\{0\}} f_\mu^2$ and the residuum, which we already minimized during the approximation of the coefficients $\{f_\mu\}_{\mu\in\Lambda}$.

The Sobol indices are automatically computed in *PyThia* upon the initialization of the polynomial chaos object.

Accessing the Sobol indices can be easily done using the following lines of code:

import pythia as pt

# define stochastic parameters

param1 = pt.parameter.RandomParameter(name="y1", domain=[0, 1], distribution="uniform")

param2 = pt.parameter.RandomParameter(name="y2", domain=[-1, 5], distribution="uniform")

param3 = pt.parameter.RandomParameter(name="y2", domain=[-1, 5], distribution="uniform")

params = [param1, param2, param3]

# define multiindex set Lambda

Lambda = pt.index.PCMultiIndex(len(params))

Lambda.mdx = mdx.mdxFromShape([5, 5, 5]) # Lambda = {0, ..., 5}^3

Lambda.refresh()

# generate PC surrogate

surrogate = pt.chaos.PolynomialChaos([param], Lambda, y_train, w_train, f_train)

# acces Sobol indices

# Note: The Sobol indices are ordered according to the subscript tuples given in Lambda.sdx

surrogate.sobolCoeff[0] # Sob_{1}

surrogate.sobolCoeff[2] # Sob_{3}

surrogate.sobolCoeff[4] # Sob_{1, 3}

surrogate.sobolCoeff[6] # Sob_{1, 2, 3}

As a simple benchmark for the performance of *P**yThia* we try to approximate the *PyThia* logo itself.

This is a hard task as approximation of piecwise constant functions with many small domains, as obtained when considering the logo as an indicator function on the unit square, requires large order polynomials for a global approximation.

However, using the optimal weighted least-squares sampling scheme with the regression approach implemented in *PyThia *makes it possible to generate surrogates up to total polynomial degree $60$ in only a few minutes.

The following animation shows the $n$ optimally chosen sample points, where we choose $n=2\vert\Lambda\vert\log(\vert\Lambda\vert)$, used for the training of the surrogate as well as the resulting approximation for the respective total polynomial degree.

Besides this artificial example, however, we note that *PyThia* has been used to generate surrogate models for many different real applications in metrology including the following.

Scatterometry is an indirect and non-destructive optical measurement technique used to determine the shape of nano structures and can be used even if the structure is smaller then the employed wavelength.

In our application we modelled the relative intensities of the reflected wave for different angles of incidence, azimuthal orientations as well as s- and p-polarization of the outgoing wave.

To determine the shape of the cross-section of the grating lines, the surface has been characterized using 6 different parameters.

A schematic for the experimental setup looks like this:

Using *PyThia* we generated a polynomial surrogate for the forward model, which takes a long time to evaluate as it involves approximating the solution of the time-harmonic Maxwell's equations with a finite element method.

This surrogate was used to conduct a sensitivity analysis, confirming the assumed dependence of the forward model on the 6 parameters.

The inverse problem to determine the most likely values of the shape parameters as well as their associated uncertainties has been solved using a Markov-Chain Monte Carlo sampler for the Bayesian posterior relying on the PC surrogate as well.

The following image shows the approximation result of the PC surrogate in the mean values of the parameter distribution compared to the measurements of the relative intensities of the grating.

Grazing incidence X-ray fluorescence (GIXRF) is a measurement technique used to determine the atomic composition of nanostructures.

This is done by exciting electrons from the inner shells of atoms through an X-ray beam.

This causes electrons from outer shells to change to energetically more efficient states, which releases element characteristic fluorescence radiation that can be detected.

Using different grazing incidence angles even allows to characterize the position of the desired material, allowing a reconstruction of the material shape.*PyThia* has been employed to generate a surrogate dependend on 7 different shape parameters for the time consuming forward model, which relies on finite element approximations of the time-harmonic Maxwell's equations.

The surrogate was used for a sensitivity analysis to adapt the parametrization of the grating structure.

Moreover, using this surrogate, it became possible to use a Markov-Chain Monte Carlo sampling scheme to approximate the Bayesian posterior, which gave a complete characterization of the parameter uncertainties.

Title: | Efficient Bayesian inversion for shape reconstruction of lithography masks |
---|---|

Author(s): | N. Farchmin, M. Hammerschmidt, P. I. Schneider, M. Wurm, B. Bodermann, M. Bär;S. Heidenreich |

Journal: | Journal of Micro/Nanolithography, MEMS, and MOEMS |

Year: | 2020 |

Volume: | 2 |

Issue: | 19 |

Pages: | 1--11 |

DOI: | 10.1117/1.JMM.19.2.024001 |

Tags: | 8.4,8.41,8.43,UQ,Scatter-Inv |