Uncertainty representations in KIDA

How are rate coefficient uncertainties defined in KIDA?

Type (default=logn): describes the statistical distribution representing the uncertainty.
The implemented distributions are Normal (norm), Uniform (unif), Lognormal (logn) and Loguniform (logu);
F₀ (no default): is an uncertainty parameter, which meaning and units depend on the Type (see Table 1);
g (default=0; units: Kelvin): is used to parametrize a possible temperature-dependence of the uncertainty.

Type	Distribution	F₀ meaning	F₀ unit
norm	Normal	stdev^*: Pr(k₀ - F₀ ≤ k ≤ k₀ + F₀) ≅ 68 %	as rate coefficient
unif	Uniform	half range: Pr(k₀ - F₀ ≤ k ≤ k₀ + F₀) = 100 %	as rate coefficient
logn	Lognormal	geometric stdev^: Pr(k₀ ⁄ F₀ ≤ k ≤ k₀F₀) ≅ 68 %	no units
logu	Loguniform	geometric half range: Pr(k₀ ⁄ F₀ ≤ k ≤ k₀*F₀) = 100 %	no units
		^* stdev = standard deviation
Table 1: The types of distributions and uncertainty factors used in KIDA.

How to choose a Type of distribution and the value of the uncertainty factor when submitting data to KIDA ?

When you submit data to KIDA as an experimentalist or an expert, you should provide relevant information to fill the uncertainty-related fields.

The adapted uncertainty representation depends on the available knowledge:

a rate coefficient k₀ and its standard (1σ) uncertainty Δk:
- if you suspect that the relative uncertainty might exceed 20% at some temperature,
  particularly for low-T extrapolation, then use a Lognormal distribution:
  Type=logn, F ₀ = exp(Δ k ⁄ k ₀) (≅ 1 + Δ k ⁄ k ₀); this is the preferred method.
- if the relative uncertainty is small (Δ k ⁄ k ₀ ≤ 0.2) over the whole temperature range
  (including a possible temperature extrapolation) you might use a Normal distribution:
  Type=norm, F ₀ = Δ k
  (but you might as well conform to the preferred Lognormal model, as described above!);
a rate coefficient k₀ known "up to a multiplicative factor X":
use a Lognormal distribution
Type=logn, F ₀ = X.
the lower (k₁) and upper (k₂) limits of the rate coefficient:
- if k ₂ ⁄ k ₁≤10 (over the whole range of temperature), you can use a Uniform distribution:
  Type=unif, k ₀ = ( k ₂ + k ₁)/2, F ₀ = ( k ₂ - k ₁)/2.
- otherwise, you should use a Loguniform distribution:
  Type=logu, k ₀ = ( k ₁ * k ₂) ^1⁄2, F ₀ = ( k ₁ ⁄ k ₂) ^1⁄2.

For more details on these distributions, see the Appendix. Keep also in mind that " The assignment of these uncertainties is a subjective assessment of the evaluators. They are not determined by a rigorous, statistical analysis of the database, which is generally too limited to permit such an analysis. Rather, the uncertainties are based on a knowledge of the techniques, the difficulties of the experimental measurements, the potential for systematic errors, and the number of studies conducted and their agreement or lack thereof."[IUPAC01]

Temperature-dependence of the uncertainty factor F(T)

The temperature-dependence of the uncertainty factor is described by the following function

F(T) = F₀*exp(g | 1/T - 1/T₀|),

where g is a positive parameter (in Kelvin), and T₀ is a reference temperature (in KIDA the value of T₀ is fixed at 300 K).
This expression is used to model a monotonous increase of uncertainty when temperature gets farther from the reference temperature T₀. This is commonly the case when one extrapolates to low-T rate constants measured at room temperature and above [Hébrard09].

How to estimate g for submission to KIDA ?

The simplest method is to use two estimations of uncertainty factors at two different temperatures F₀ ≡ F(T₀) and F₁ ≡ F(T₁). Inversion of the equation

F₁ = F₀*exp(g*| 1/T₁ -1/T₀|)

leads to

g = ln(F₁ ⁄ F₀)*| 1/T₁ -1/T₀| ^-1.

A more accurate/complex method is to use the variance/covariance matrix for parameters of the Kooij/Arrhenius expression, from which standard uncertainty propagation by combination of variances enables to determine the temperature-dependent uncertainty u_ln k(T) ≡ ln F(T). This curve can then be fitted by the F₀*exp(g | 1/T - 1/T₀|) expression to estimate F₀ and g [Hébrard09, Nagy11]. Please contact the KIDA team if you need help to implement this procedure.

How to deal with branching ratios?

The case of partial rate constants for multi-pathway reactions is intricate because, in order to define a correct uncertainty representation, one has to link the reported values to their experimental origin:

when measured directly, the partial rate constants can be treated as independent variables and their respective uncertainties can be represented by one of the implemented distributions, as described above;
when they are obtained by the product of a global reaction rate coefficient k and a set of branching ratios b_i, the partial rate constants k_i cannot be considered any more as independent variables. At the moment, KIDA does not manage partial rate constants correlations, but we discuss below the best methods to handle the situation.

As branching ratios present a challenge to both experimentalists and modelers, they can be difficult to measure and, if known at all, they can be affected by large uncertainty factors. Another source of correlation for the partial rate constants is the intrinsic correlation of branching ratios due to their sum-to-one constraint. An issue for modelers is the correct treatment of branching ratios as correlated parameters. One has to be aware that neglecting correlations between branching ratios is a source of spurious output uncertainty, which can scramble considerably the uncertainty budget of chemistry models predictions [Carrasco07a,Carrasco08,Plessis10].

The accurate method

For an accurate and reliable uncertainty propagation, the best solution is to generate Monte Carlo samples of correlated branching ratios to be used for uncertainty propagation. It is therefore necessary to design an unbiased probability density function that accounts for the available data and for the correlation pattern of branching ratios [Carrasco07a,Plessis10,Pernot11].
Please contact the KIDA team if you wish to implement this procedure.

The "least worse" method

Here we explain how to calculate the uncertainty factor for a lognormal representation of the partial rate constants, using the uncertainties on a global rate constant k and a set of branching ratios b_i.
The uncertainty on the product k_i = k*b_i is obtained by standard propagation of variances

σ²(k_i) / k_i² = σ ²(k) / k²+σ²(b_i) / b_i²,

from which one derives the uncertainty factor F_i attached to the partial rate constant k_i as F_i = exp(σ(k_i) / k_i).
Again, remain aware that this method ignores the statistical correlations between the partial rate constants.

Why and how to use rate coefficients uncertainty in models?

Using uncertainty information in chemical modeling is vital, notably for extreme environments targeted by KIDA, where most reaction rate coefficients are poorly known, either estimated or extrapolated. Uncertainty management has two goals:

Uncertainty Propagation (UP): to estimate the precision of the model outputs; and
Sensitivity Analysis (SA): to identify key reactions, i.e. those contributing notably to the uncertainty of model outputs and for which better experimental or theoretical estimations are needed [Dobrijevic10].

Improvement of data in KIDA is based on an iterative strategy involving UP and SA [Wakelam10].
The simplest way to implement these methods is through Monte Carlo sampling [Thompson91, Dobrijevic98, Wakelam05, Carrasco07, Wakelam10]. Random draws for a rate coefficient at a any temperature are generated using the formulae in Table 2.

Distribution	Formula
Normal	k(T) = k₀(T) + F(T)*N(0,1)
Uniform	k(T) = k₀(T) + F(T)(U(0,1)-0.5)2
Lognormal	k(T) = exp( ln k₀(T) + ln F(T)*N(0,1) )
Loguniform	k(T) = exp( ln k₀(T) + ln F(T)(U(0,1)-0.5)2 )
	* U(0,1) is a standard uniform random numbers generator (between 0 and 1) * N(0,1) is a standard normal/gaussian random numbers generator (centered at 0; variance 1)
Table 2: Generating random samples.

For UP, the code is run for N random draws of the m rate constants of the chemical scheme { k_i ^(j); i = 1, m; j = 1, N} (all parameters vary simultaneously), and the N sets of outputs are stored for statistical analysis (mean value, uncertainty factor, input/output correlation...).
A convenient way to perform SA is to calculate the correlation coefficients between the inputs (k_i) and outputs of the model. Large correlation coefficients reveal strong influences of inputs on outputs [Dobrijevic10]. Variations of this method include using rank correlation coefficients, or the logarithm of inputs and/or outputs [Helton06,Saltelli04].

Important

In reaction networks, where multiple uncertain rate coefficients are managed simultaneously, care has to be taken that the same random number is used for the whole temperature range of a single reaction.

Appendix

Although the lognormal distribution is the preferred uncertainty representation in KIDA, provision has been set for alternative representations. These are briefly presented here.

The Lognormal distribution (default)

The default approach to specify uncertainty for reaction rate coefficients is to use a Lognormal (logn) distribution characterized by a multiplicative uncertainty factor F₀ defining a ``1σ'' confidence interval around the reference value k₀ [JPL06], i.e.

Pr(k₀ ⁄ F₀ ≤ k ≤ k₀ * F₀) ≅ 68 %

Pr(k₀ ⁄ F₀ ² ≤ k ≤ k₀ * F₀ ²) ≅ 95 %

Pr(k₀ ⁄ F₀ ³ ≤ k ≤ k₀ * F₀ ³) ≅ 99 %

...

This choice enforces the positivity of the rate coefficient, even for large uncertainty factors (F₀ ≥ 2) [Thompson91, Hébrard06]. This is equivalent to state that the value of ln k follows a Normal distribution with mean value ln k₀ and standard deviation ln F₀

ln k = ln k₀ ± ln F₀.

Notes

In the KIDA data sheets, uncertainty might appear as Δlog k (the decimal logarithm is used), from which F₀ is readily obtained as F₀ = 10 ^Δlog k.
When a relative uncertainty Δk ⁄ k₀ is available, one can get a quick estimate of the uncertainty factor as F₀ ≅ 1 + Δk ⁄ k₀, but it is more accurate to use F₀ = exp(Δk ⁄ k₀), deriving from the relations (valid for not too large relative uncertainty) ln F₀ = Δ(ln k) = Δk ⁄ k₀.

The Normal distribution

The uncertainty factor has the meaning of a standard deviation, and the uncertainty model is normal additive, i.e.

k = k₀ ± F₀.

The normal distribution has to be used with care for positive variables as rate constants. For small relative uncertainties (F₀ ⁄ k₀ << 0.2), there is generally no problem, but the probability to get negative values of k with a normal uncertainty distribution is Pr( k≤0) ≅ 0.5*erfc(0.7*k₀ ⁄ F₀). This probability increases with F₀ ⁄ k₀ (between 0 and 50%) as shown in Table 3.

Table 3: Probability to get negative values of rate constants when using a normal distribution, as a function of relative uncertainty.
F₀ ⁄ k₀	0.2	0.5	1	2	5	10
k≤0)	2E-7	0.02	0.16	0.38	0.42	0.46

The Uniform distribution

This representation is used when the rate coefficient is defined by extreme values, k₁ < k₂, with no recommended value in the interval. It can be used when the geometric range of the interval covers less than one order of magnitude (k₂ ⁄ k₁≤10). The limits are related to the KIDA parameters by k₁ = k₀ - F₀ and k₂ = k₀ + F₀.

The Loguniform distribution

This representation is used when the rate coefficient is defined by extreme values, k₁ < k₂ , with no recommended value in the interval. The loguniform distribution is to be preferred to the uniform distribution when the range of the interval covers more than one order of magnitude ( k₂ ⁄ k₁≥10). It avoids to overweight the larger values. The mean rate is the geometric mean of the limits k₀ = (k₁ * k₂) ^1⁄2 and the uncertainty factor is designed to cover the whole interval: F₀ = (k₁ ⁄ k₂) ^1⁄2. The limits are recovered from the KIDA parameters by k₁ = k₀ ⁄ F₀ and k₂ = k₀*F₀.

Bibliographic References

[Carrasco07] Carrasco, N. et al. (2007) Planet. Space Sci. 55:141-157. doi:10.1016/j.pss.2006.06.004
[Carrasco07a] Carrasco, N. & Pernot, P. (2007) J. Phys. Chem. A 111:3507-3512. doi:10.1021/jp067306y
[Carrasco08] Carrasco, N. et al. (2008) Planetary and Space Science 56:1644-1657. doi:10.1016/j.pss.2008.04.007
[Dobrijevic98] Dobrijevic, M. & Parisot, J. (1998) Planet. Space Sci. 46:491-505.
[Dobrijevic10] Dobrijevic, M. et al. (2010) Adv. Space Res. 45:77-91. doi:10.1016/j.asr.2009.06.005
[Hébrard06] Hébrard, E. et al. (2006) J. Photochem. Photobiol. A 7:211-230.
[Hébrard09] Hébrard, E. et al. (2009) J. Phys. Chem. A 113:11227-11237. doi:10.1021/jp905524e
[Helton06] Helton, J.C. et al. (2006) Rel. Eng. Sys. Safety 91:1175-1209. doi:10.1016/j.ress.2005.11.017
[IUPAC01] IUPAC: Subcommittee for Gas Kinetic Data Evaluation (2001) Guide to the datasheets. Download
[JPL06] Sander, S.P. et al. (2006) JPL Publication 06-2. Download
[Nagy11] Nagy, T. & Turányi T. (2011) Int. J. Chem. Kin. 43:359-378. doi:10.1002/kin.20551
[Pernot11] Pernot, P. et al. (2011) J. Phys.: Conf. Ser. 300:012027. doi:10.1088/1742-6596/300/1/012027
[Plessis10] Plessis, S. et al. (2010) J. Chem. Phys. 133:134411. doi:10.1063/1.3479907
[Saltelli04] Saltelli, A. et al. (2004) Chem. Rev. 105:2811–2828. doi:10.1021/cr040659d
[Thompson91] Thompson, A. & Stewart, R. (1991) J. Geophys. Res. 96:13089-13108; Stewart, R. & Thompson, A. (1996) J. Geophys. Res. 101:20935-20964.
[Wakelam05] Wakelam, V. et. al. (2005) A&A 444:883-891.
[Wakelam10] Wakelam, V. et. al. (2010) Space Sci. Rev. 156:13-72. doi:10.1007/s11214-010-9712-5 / Preprint

Uncertainty representations in KIDA

How are rate coefficient uncertainties defined in KIDA?

How to choose a Type of distribution and the value of the uncertainty factor when submitting data to KIDA ?

Temperature-dependence of the uncertainty factor F(T)

How to estimate g for submission to KIDA ?

How to deal with branching ratios?

The accurate method

The "least worse" method

Why and how to use rate coefficients uncertainty in models?

Important

Appendix

The Lognormal distribution (default)

Notes

The Normal distribution

The Uniform distribution

The Loguniform distribution

Bibliographic References

Developped at
Laboratoire d’Astrophysique de Bordeaux,
Observatoire Aquitain des Science de l'Univers and
University of Virginia.

Copyright 2009-2024 KIDA

Uncertainty representations in KIDA

How are rate coefficient uncertainties defined in KIDA?

How to choose a Type of distribution and the value of the uncertainty factor when submitting data to KIDA ?

Temperature-dependence of the uncertainty factor F(T)

How to estimate g for submission to KIDA ?

How to deal with branching ratios?

The accurate method

The "least worse" method

Why and how to use rate coefficients uncertainty in models?

Important

Appendix

The Lognormal distribution (default)

Notes

The Normal distribution

The Uniform distribution

The Loguniform distribution

Bibliographic References

Developped at Laboratoire d’Astrophysique de Bordeaux, Observatoire Aquitain des Science de l'Univers and University of Virginia.

Copyright 2009-2024 KIDA

Developped at
Laboratoire d’Astrophysique de Bordeaux,
Observatoire Aquitain des Science de l'Univers and
University of Virginia.