Uncertainty representations in KIDA

How are rate coefficient uncertainties defined in KIDA?

Type (default=logn)
describes the statistical distribution representing the uncertainty.
The implemented distributions are Normal (norm), Uniform (unif), Lognormal (logn) and Loguniform (logu);
F0 (no default)
is an uncertainty parameter, which meaning and units depend on the Type (see Table 1);
g (default=0; units: Kelvin)
is used to parametrize a possible temperature-dependence of the uncertainty.
Type Distribution F0 meaning F0 unit
norm Normal stdev*: Pr(k0 - F0kk0 + F0) ≅ 68 % as rate coefficient
unif Uniform half range: Pr(k0 - F0kk0 + F0) = 100 % as rate coefficient
logn Lognormal geometric stdev*: Pr(k0F0kk0*F0) ≅ 68 % no units
logu Loguniform geometric half range: Pr(k0F0kk0*F0) = 100 % no units
    * stdev = standard deviation  
Table 1: The types of distributions and uncertainty factors used in KIDA.

How to choose a Type of distribution and the value of the uncertainty factor when submitting data to KIDA ?

When you submit data to KIDA as an experimentalist or an expert, you should provide relevant information to fill the uncertainty-related fields.

The adapted uncertainty representation depends on the available knowledge:

For more details on these distributions, see the Appendix. Keep also in mind that " The assignment of these uncertainties is a subjective assessment of the evaluators. They are not determined by a rigorous, statistical analysis of the database, which is generally too limited to permit such an analysis. Rather, the uncertainties are based on a knowledge of the techniques, the difficulties of the experimental measurements, the potential for systematic errors, and the number of studies conducted and their agreement or lack thereof."[IUPAC01]

Temperature-dependence of the uncertainty factor F(T)


The temperature-dependence of the uncertainty factor is described by the following function
F(T) = F0*exp(g | 1/T - 1/T0|),

where g is a positive parameter (in Kelvin), and T0 is a reference temperature (in KIDA the value of T0 is fixed at 300 K).
This expression is used to model a monotonous increase of uncertainty when temperature gets farther from the reference temperature T0. This is commonly the case when one extrapolates to low-T rate constants measured at room temperature and above [Hébrard09].

How to estimate g for submission to KIDA ?


The simplest method is to use two estimations of uncertainty factors at two different temperatures F0F(T0) and F1F(T1). Inversion of the equation
F1 = F0*exp(g*| 1/T1 -1/T0|)
leads to
g = ln(F1F0)*| 1/T1 -1/T0| -1.

A more accurate/complex method is to use the variance/covariance matrix for parameters of the Kooij/Arrhenius expression, from which standard uncertainty propagation by combination of variances enables to determine the temperature-dependent uncertainty uln k(T) ≡ ln F(T). This curve can then be fitted by the F0*exp(g | 1/T - 1/T0|) expression to estimate F0 and g [Hébrard09, Nagy11]. Please contact the KIDA team if you need help to implement this procedure.

How to deal with branching ratios?

The case of partial rate constants for multi-pathway reactions is intricate because, in order to define a correct uncertainty representation, one has to link the reported values to their experimental origin: As branching ratios present a challenge to both experimentalists and modelers, they can be difficult to measure and, if known at all, they can be affected by large uncertainty factors. Another source of correlation for the partial rate constants is the intrinsic correlation of branching ratios due to their sum-to-one constraint. An issue for modelers is the correct treatment of branching ratios as correlated parameters. One has to be aware that neglecting correlations between branching ratios is a source of spurious output uncertainty, which can scramble considerably the uncertainty budget of chemistry models predictions [Carrasco07a,Carrasco08,Plessis10].

The accurate method

For an accurate and reliable uncertainty propagation, the best solution is to generate Monte Carlo samples of correlated branching ratios to be used for uncertainty propagation. It is therefore necessary to design an unbiased probability density function that accounts for the available data and for the correlation pattern of branching ratios [Carrasco07a,Plessis10,Pernot11].
Please contact the KIDA team if you wish to implement this procedure.

The "least worse" method

Here we explain how to calculate the uncertainty factor for a lognormal representation of the partial rate constants, using the uncertainties on a global rate constant k and a set of branching ratios bi.
The uncertainty on the product ki = k*bi is obtained by standard propagation of variances
σ2(ki) / ki2 = σ 2(k) / k22(bi) / bi2,

from which one derives the uncertainty factor Fi attached to the partial rate constant ki as Fi = exp(σ(ki) / ki).
Again, remain aware that this method ignores the statistical correlations between the partial rate constants.

Why and how to use rate coefficients uncertainty in models?


Using uncertainty information in chemical modeling is vital, notably for extreme environments targeted by KIDA, where most reaction rate coefficients are poorly known, either estimated or extrapolated. Uncertainty management has two goals:
  1. Uncertainty Propagation (UP): to estimate the precision of the model outputs; and
  2. Sensitivity Analysis (SA): to identify key reactions, i.e. those contributing notably to the uncertainty of model outputs and for which better experimental or theoretical estimations are needed [Dobrijevic10].
Improvement of data in KIDA is based on an iterative strategy involving UP and SA [Wakelam10].
The simplest way to implement these methods is through Monte Carlo sampling [Thompson91, Dobrijevic98, Wakelam05, Carrasco07, Wakelam10]. Random draws for a rate coefficient at a any temperature are generated using the formulae in Table 2.

Distribution Formula
Normal k(T) = k0(T) + F(T)*N(0,1)
Uniform k(T) = k0(T) + F(T)*(U(0,1)-0.5)*2
Lognormal k(T) = exp( ln k0(T) + ln F(T)*N(0,1) )
Loguniform k(T) = exp( ln k0(T) + ln F(T)*(U(0,1)-0.5)*2 )
  * U(0,1) is a standard uniform random numbers generator (between 0 and 1)
* N(0,1) is a standard normal/gaussian random numbers generator (centered at 0; variance 1)
Table 2: Generating random samples.

For UP, the code is run for N random draws of the m rate constants of the chemical scheme { ki (j)i = 1, mj = 1, N} (all parameters vary simultaneously), and the N sets of outputs are stored for statistical analysis (mean value, uncertainty factor, input/output correlation...).
A convenient way to perform SA is to calculate the correlation coefficients between the inputs (ki) and outputs of the model. Large correlation coefficients reveal strong influences of inputs on outputs [Dobrijevic10]. Variations of this method include using rank correlation coefficients, or the logarithm of inputs and/or outputs [Helton06,Saltelli04].

Important


In reaction networks, where multiple uncertain rate coefficients are managed simultaneously, care has to be taken that the same random number is used for the whole temperature range of a single reaction.

Appendix

Although the lognormal distribution is the preferred uncertainty representation in KIDA, provision has been set for alternative representations. These are briefly presented here.

The Lognormal distribution (default)

The default approach to specify uncertainty for reaction rate coefficients is to use a Lognormal (logn) distribution characterized by a multiplicative uncertainty factor F0 defining a ``1σ'' confidence interval around the reference value k0 [JPL06], i.e.
Pr(k0F0kk0 * F0) ≅ 68 %
Pr(k0F0 2kk0 * F0 2) ≅ 95 %
Pr(k0F0 3kk0 * F0 3) ≅ 99 %
...
This choice enforces the positivity of the rate coefficient, even for large uncertainty factors (F0 ≥ 2) [Thompson91, Hébrard06]. This is equivalent to state that the value of ln k follows a Normal distribution with mean value ln k0 and standard deviation ln  F0
ln k = ln k0 ± ln F0.

Notes

The Normal distribution

The uncertainty factor has the meaning of a standard deviation, and the uncertainty model is normal additive, i.e.
k = k0 ± F0.

The normal distribution has to be used with care for positive variables as rate constants. For small relative uncertainties (F0k0 << 0.2), there is generally no problem, but the probability to get negative values of k with a normal uncertainty distribution is Pr( k≤0) ≅ 0.5*erfc(0.7*k0F0). This probability increases with F0k0 (between 0 and 50%) as shown in Table 3.

F0k0 0.2 0.5 1 2 5 10
k≤0) 2E-7 0.02 0.16 0.38 0.42 0.46
Table 3: Probability to get negative values of rate constants when using a normal distribution, as a function of relative uncertainty.

The Uniform distribution

This representation is used when the rate coefficient is defined by extreme values, k1 < k2, with no recommended value in the interval. It can be used when the geometric range of the interval covers less than one order of magnitude (k2k1≤10). The limits are related to the KIDA parameters by k1 = k0 - F0 and k2 = k0 + F0.

The Loguniform distribution


This representation is used when the rate coefficient is defined by extreme values, k1 < k2 , with no recommended value in the interval. The loguniform distribution is to be preferred to the uniform distribution when the range of the interval covers more than one order of magnitude ( k2k1≥10). It avoids to overweight the larger values. The mean rate is the geometric mean of the limits k0 = (k1 * k2) 1⁄2 and the uncertainty factor is designed to cover the whole interval: F0 = (k1k2) 1⁄2. The limits are recovered from the KIDA parameters by k1 = k0F0 and k2 = k0*F0.

Bibliographic References