Tutorial 04 - Automatic choice of expansion terms
This tutorial covers the automatic generation of sparse PC expansion indices based on a crude approximation of the Sobol indices. To verify the results we compute with PyThia, we use the Sobol function as the object of interest, as the Sobol indices are explicitly known for this function. The Sobol function is given by
Note
The larger \(a_j\), the less is the influence, i.e., the sensitivity, of parameter \(x_j\).
The mean of the Sobol function is \(\mathbb{E}[f]=1\) and the variance reads
With this, we can easily compute the Sobol indices by
An implementation of the Sobol function method sobol_function()
and the analytical Sobol indices method sobol_sc()
is included in the complete script at the end of this tutorial.
First, we choose some values for the coefficients \(a_1,\dots,a_M\) and with this the target function.
import numpy as np
import pythia as pt
a = np.array([1, 2, 4, 8])
def target_function(x: np.ndarray) -> np.ndarray:
return sobol_function(x, a=a)
Additionally, we compute the analytical Sobol indices.
sobol_dict = sobol_sc(a=a, dim=len(a))[0]
sobol_coefficients = np.array(list(sobol_dict.values())).reshape(-1, 1)
Next we define the necessary quantities for the PC expansion. For more information see Tutorial 01 - Approximation of Functions with Polynomial Chaos. As stochastic parameters we need to choose uniformly distributed variables \(x_j\) on \([0,1]\) according to the number of coefficients \(a_1,\dots,a_M\), i.e.,
params = [
pt.parameter.Parameter(name=f"x_{j+1}", domain=(0, 1), distribution="uniform")
for j in range(a.size)
]
We want to compare the approximation results of the PC expansion using automatically generated expansion indices with a choice done by hand. For this we compute a reference PC expansion using multivariate Legendre polynomials of total degree less then \(7\).
dim = 7
indices = pt.index.simplex_set(len(params), dim - 1)
index_set = pt.index.IndexSet(indices)
The last thing we need are training data. We generate the training pairs again by an optimal weighted distribution, see Tutorial 02 - Approximation of n-D functions with Polynomial Chaos for more detail.
N = 10_000
s = pt.least_squares_sampler.OWLSTensorSampler(params, [dim] * len(params))
x_train = s.sample(N)
w_train = s.weight(x_train)
y_train = target_function(x_train)
With this, we can compute a reference PC expansion for our forward model with
surrogate_ref = pt.chaos.PolynomialChaos(params, index_set, x_train, w_train, y_train)
With this out of the way we now focus on choosing a more sparse set of PC expansion indices automatically.
To fit our needs, we specify the maximal number of expansion terms we want the PC expansion to have as well as the truncation threshold.
Computing the “optimal” multiindices can then be done using pt.chaos.find_optimal_indices
.
To show the efficiency of the automatic multiindex computation, we choose an expansion with half the number of terms (\(\sim 100\)) as the reference PC (\(\sim 200\)) and set the truncation threshold to \(10^{-3}\).
max_terms = index_set.shape[0] // 2
threshold = 1.0e-03
indices, sC = pt.chaos.find_optimal_indices(
params, x_train, w_train, y_train, max_terms=max_terms, threshold=threshold
)
index_set = pt.index.IndexSet(indices)
For a detailed explanation on the workings of pt.chaos.find_optimal_indices
we refer to the module documentation.
However, we want to provide a small explanation of the threshold
input.
In principle find_optimal_indices
computes a very inaccurate PC expansion with far too many terms for the given amount of training data to obtain a very crude approximation of as many Sobol indices as possible.
The expansion is chosen so that at least one expansion term is computed for each Sobol index.
Depending on the threshold
the function decides which Sobol indices, i.e., parameter combinations are most relevant.
This means, that the function will exclude all multiindices associated to Sobol indices that have a (combined) contribution of less then threshold
.
The multiindices are then distributed according to the magnitude of the crude Sobol index computation.
We use this option here only for showcasing the results later on.
Computing the PC expansion with the automatically chosen multiindices can now be done as before.
surrogate = pt.chaos.PolynomialChaos(params, index_set, x_train, w_train, y_train)
What remains is to check if the approximation of the Sobol function using only half the expansion terms is comparable in both the approximation error as well as the approximate Sobol indices. If everything went right you should see that the approximation error of both PC expansions is of the same order of magnitude and that the Sobol indices coincide where they are greater then \(10^{-3}\).
error_L2_ref = np.sqrt(np.sum((f_test - f_approx_ref) ** 2) / x_test.shape[0])
error_L2 = np.sqrt(np.sum((f_test - f_approx) ** 2) / x_test.shape[0])
print(f"test error L2 auto: {error_L2:4.2e}")
print(f"test error L2 ref: {error_L2_ref:4.2e}")
# print Sobol coefficients
print("Comparison of Sobol indices")
print(
f" {'sobol_tuples':<13} {'exact':<8} {'pc-auto':<8} {'(crude)':<8} {'pc-ref':<8}"
)
print("-" * 54)
for j, sdx in enumerate(index_set.sobol_tuples):
print(
f" {str(sdx):<12} ", # Sobol index subscripts
f"{sobol_coefficients[j, 0]:<4.2e} ",
f"{surrogate.sobol[j, 0]:<4.2e} ",
f"{sC[j][0]:<4.2e} ",
f"{surrogate_ref.sobol[j, 0]:<4.2e} ",
)
This concludes this tutorial.
Complete Script
The complete source code listed below is located in the tutorials/
subdirectory of the repository root.
1"""Automatic choice of expansion terms."""
2
3from typing import Any
4
5import numpy as np
6import pythia as pt
7
8
9def sobol_function(
10 x: np.ndarray, a: np.ndarray | None = None, **kwargs: Any
11) -> np.ndarray:
12 """Sobol function.
13
14 Parameters
15 ----------
16 x : np.ndarray
17 Input values.
18 a : np.ndarray | None, optional
19 Coefficients, by default None
20
21 Returns
22 -------
23 :
24 Sobol function values.
25
26 Raises
27 ------
28 ValueError
29 Wrong dimension for input `x`.
30 ValueError
31 Wrong shape of Coefficients `a`.
32 """
33 if not 0 < x.ndim < 3:
34 raise ValueError("Wrong ndim: {}".format(x.ndim))
35 if x.ndim == 1:
36 x.shape = 1, -1
37 if a is None:
38 a = np.zeros(x.shape[1])
39 elif not a.shape == (x.shape[1],):
40 raise ValueError("Wrong shape: {}".format(a.shape))
41 return np.prod((abs(4.0 * x - 2.0) + a) / (1.0 + a), axis=1).reshape(-1, 1)
42
43
44def sobol_sc(a: np.ndarray, dim: int = 1, **kwargs: Any) -> dict | np.ndarray:
45 """Sobol function Sobol indices.
46
47 Parameters
48 ----------
49 a : np.ndarray
50 Coefficients.
51 dim : int, optional
52 Parameter dimension, by default 1.
53
54 Returns
55 -------
56 :
57 Sobol indices of Sobol function.
58 """
59 sobol = {}
60 beta = (1.0 + a) ** (-2) / 3
61 var = np.prod(1.0 + beta) - 1.0
62 sobol_tuples = pt.index.IndexSet(pt.index.tensor_set([1] * len(a))).sobol_tuples
63 for sdx in sobol_tuples:
64 sobol[sdx] = 1.0 / var
65 for k in sdx:
66 sobol[sdx] *= beta[k - 1]
67 if dim > 1:
68 return np.array([sobol for _ in range(dim)])
69 else:
70 return sobol
71
72
73print("Tutorial 04 - Automatic choice of expansion terms")
74
75# function definitions
76a = np.array([1, 2, 4, 8])
77
78
79def target_function(x: np.ndarray) -> np.ndarray:
80 """Target function.
81
82 Parameters
83 ----------
84 x : np.ndarray
85 """
86 return sobol_function(x, a=a)
87
88
89# analytical Sobol coefficients
90sobol_dict = sobol_sc(a=a, dim=len(a))[0]
91sobol_coefficients = np.array(list(sobol_dict.values())).reshape(-1, 1)
92
93# setup reference PC surrogate
94params = [
95 pt.parameter.Parameter(name=f"x_{j+1}", domain=(0, 1), distribution="uniform")
96 for j in range(a.size)
97]
98
99dim = 7
100indices = pt.index.simplex_set(len(params), dim - 1)
101index_set = pt.index.IndexSet(indices)
102print("multiindex information:")
103print(f" number of indices: {index_set.shape[0]}")
104print(f" dimension: {index_set.shape[1]}")
105print(f" number of sobol indices: {len(index_set.sobol_tuples)}")
106
107N = 10_000
108print(f"generate training data ({N})")
109s = pt.least_squares_sampler.OWLSTensorSampler(params, [dim] * len(params))
110x_train = s.sample(N)
111w_train = s.weight(x_train)
112y_train = target_function(x_train)
113
114print("compute pc expansion")
115surrogate_ref = pt.chaos.PolynomialChaos(params, index_set, x_train, w_train, y_train)
116
117# auto-generate PC multiindices
118max_terms = index_set.shape[0] // 2
119threshold = 1.0e-03
120print("compute optimal mdx")
121indices, sC = pt.chaos.find_optimal_indices(
122 params, x_train, w_train, y_train, max_terms=max_terms, threshold=threshold
123)
124index_set = pt.index.IndexSet(indices)
125print("automatic multiindex information:")
126print(f" number of indices: {index_set.shape[0]}")
127print(f" dimension: {index_set.shape[1]}")
128print(f" number of sobol indices: {len(index_set.sobol_tuples)}")
129
130print("compute optimal pc expansion")
131surrogate = pt.chaos.PolynomialChaos(params, index_set, x_train, w_train, y_train)
132
133# test PC approximation
134N = 1000
135print(f"generate test data ({N})")
136s_test = pt.sampler.ParameterSampler(params)
137x_test = s_test.sample(N)
138f_test = target_function(x_test)
139f_approx = surrogate.eval(x_test)
140f_approx_ref = surrogate_ref.eval(x_test)
141
142error_L2_ref = np.sqrt(np.sum((f_test - f_approx_ref) ** 2) / x_test.shape[0])
143error_L2 = np.sqrt(np.sum((f_test - f_approx) ** 2) / x_test.shape[0])
144print(f"test error L2 auto: {error_L2:4.2e}")
145print(f"test error L2 ref: {error_L2_ref:4.2e}")
146
147# print Sobol coefficients
148print("Comparison of Sobol indices")
149print(
150 f" {'sobol_tuples':<13} {'exact':<8} {'pc-auto':<8} {'(crude)':<8} {'pc-ref':<8}"
151)
152print("-" * 54)
153for j, sdx in enumerate(index_set.sobol_tuples):
154 print(
155 f" {str(sdx):<12} ", # Sobol index subscripts
156 f"{sobol_coefficients[j, 0]:<4.2e} ",
157 f"{surrogate.sobol[j, 0]:<4.2e} ",
158 f"{sC[j][0]:<4.2e} ",
159 f"{surrogate_ref.sobol[j, 0]:<4.2e} ",
160 )