Vignette: the far for records

Introduction

The aim of this vignette is to provide a quick introduction to the package farr. The package provides a set of functions to estimate the \(far\), the fraction of attributable risk for record events. The \(far\) was introduced in P. Naveau et al. (2018). The original code was provided by Philippe Naveau and I simply put the code in a package format. Be aware that errors could have been introduced in the process. the R package farr is available in Github. It can be installed from R using the package devtools with the following command:

library(devtools)
devtools::install_github("thaos/farr")

This vignette is divided in 3 parts:

A brief presentation of what the \(far\) is and the context it is used in.
Toy examples to present how to use the farr package.
An example close to an actual climate application to give an idea of how this method can be applied in climate sciences. We will show how to repoduce some results of P. Naveau et al. (2018).

Remark: This vignettes is originally made for me as a reminder of what I learned about the \(far\) but I hope it could be also useful to others. This vignette reflects my level of understanding of the topic and thus, it could contain mistakes or erroneous information (I hope not too big nor too many). Constructive comments or suggestions on possible ways to improve this document are welcome. For instance, it could be about better ways to explain or illustrate certain ideas or tips about more efficient ways to program in R. Corrections related to the English language are also welcome.

Event attribution, the FAR and the far.

Attribution of climate change

The \(far\) was proposed by P. Naveau et al. (2018) as one possible tool to tackle the question of event attribution. Event attribution is a sub-field of the attribution of climate change. In the following, we will to provide a brief idea of what attribution of climate change is but for more detailed explanation see for instance Bindoff et al. (2013) or P. A. Stott et al. (2016). The field of attribution of climate change aims to explain observed (or more precisely detected) changes in the climate system and provide the causes of thoses changes. One particular focus of attribution of climate change is the effect of human activity on the climate. One way of assessing whether human activity has a significant causal impact on the climate is through the concept of counterfactuals (A. Hannart et al. 2016). For instance, how different would the climate be today if human activity stayed at the same level as the pre-industrial ? By comparing the factual world ( the world as it is) with the counterfactual world (a world that could have been), one can assess the causal role of a given factor.

In climate sciences, we often make the analogy that climate is a probabilistic distribution and weather is one realization of this distribution.

Following this analogy, the attribution of climate change from a causal counterfactual theory standpoint would consist in comparing two probabilistic distributions representing two climates, one for the factual world and the other for counterfactual world. First studies of climate change attribution aimed at explaining long-term changes in the mean of the distribution.

(Extreme) event attribution

With the same analogy, event attribution compares in this case the probability of having one event (defined as a set of possible outcomes of the distribution) occuring in the factual and counterfactual world. When talking about extreme event attribution, one would focus on the tail of the distribution. In this case, the class of event most usually studied in the litterature are threshold exceedences, i.e. an event is to said to have occured when a univariate variable exceeds a given threshold.

In the figure above, let us consider that \(Z\) is the climate variable of interest in the factual world and \(X\) is the corresponding variable in the counterfactual world. The event is defined as the variable of interest exceending the threshold \(RL\) (standing for Return Level) which gives \(\{Z > RL\}\) for the factual world and \(\{X > RL\}\) for the counterfactual world. Then, we want to compare the probability of having this event in both worlds:

\[\begin{align} p_0 &= P( X > RL) & \text{for the counterfactual world} \\ p_1 &= P( Z > RL) & \text{for the factual world} \end{align}\]

The \(FAR\) (Fraction of Attributable Risk)

In climate sciences, the \(FAR\) (Fraction of Attributable Risk) has been introduced by P. A. Stott, Stone, and Allen (2004) to provide a synthetic number for this comparison:

\[ FAR = \dfrac{p_1 - p_0}{p_1} = 1 - \dfrac{p_0}{p_1} \] where \(p_1\) corresponds to the probability of having the event in factual world and \(p_0\) the probability of having the same event in conterfactual world. When the \(FAR\) is significatively different from \(0\), it means that the conjonction of factors that were “added” in the factual world with respect to the counterfactual world has a causal effect on the event of interest.

For instance, if we were interesting in the event that the annual average temperature over exceeding \(40\) degrees and we want to assess the causal role of anthropogenic forcings, we would compare the probability of having this event in our factual world and the probability of having the same event in an idealized, counterfactual world were the anthorpogic forcings remained null. If the \(FAR\) were to be significatively different from 0, we would deduce that anthropogenic forcings have a causal effect on having annual average temperature above 40 degrees in Europe.

A. Hannart et al. (2016) also note that \(\max(FAR, 0)\) can be also interpreted as the probability of necessary causation in Pearl’s probabilistic causal theory (J. Pearl 1999)

The \(far\) (fraction of attributable risk for records)

P. Naveau et al. (2018) present statistical methods to estimate the FAR for a different class of events: records.

Records

P. Naveau et al. (2018) follow the definition of records provided in Resnick (1987). For a time-serie \(\mathbf{Y} = (Y_1, Y_2, \ldots, Y_T)\) a new record is broken at time \(r\) if the observed value \(Y_r\) is strictly higher than all previous observed values: \[ Y_r > \max(Y_1, \ldots, Y_{r-1})\]

In particular, when the random variables \(Y_t\) are independently and identically distributed (\(i.i.d.\)), the probability of observing a record at time \(r\) is: \[ P \big[Y_r > \max(Y_1, \ldots, Y_{r-1})\big] = \dfrac{1}{r} \] For \(i.i.d.\) variables, as we could have intuitively guessed, the longer the time-series, the harder (less likely) it is to break a new record.

Record events in the factual and counterfactual worlds

One key idea of the article is to take advantage of the former result for the attribution of event that are records. Indeed, as mentionned earlier, a lot of attribution studies aim at assessing the causal effect of human activity on a given event. Thus, the factual world is defined as the climate response when taking into account anthropogenic and natural forcings in the climate system whereas the counterfactual world correspond to the climate response that we would have if only natural forcings were present. Let us denote \(\mathbf{Z} = (Z_1, Z_2, \ldots, Z_T)\) and \(\mathbf{X} = (X_1, X_2, \ldots, X_T)\) the time-series of our variable of interest in the factual world and the counterfactual world respectively. The event we are looking at is defined as having a record at time \(r\) :

\[\begin{align} X_r &> \max(X_1, \ldots, X_{r-1}) & \text{for the counterfactual world} \\ Z_r &> \max(X_1, \ldots, X_{r-1}) & \text{for the factual world} \end{align}\]

When the climate is only driven by natural forcings, it is sometimes reasonnable to assume that the climate in the counterfactual world is stationnary and that the time-series \(X\) is also stationnary. If we furthermore assume that random variables \(X_t\) are \(i.i.d.\), then we directly have the probably of having a record after a sequence of length \(r-1\) in the counterfactual world: \[ p_{0,r} = P\big[X_r > \max(X_1, \ldots, X_{r-1})\big] = \dfrac{1}{r} \] Then, to obtain the \(FAR\) for records, we just need to estime to estimate \(p_{1,r}\): \[ p_{1,r} = P\big[Z_r > \max(X_1, \ldots, X_{r-1})\big] \] which corresponds to the probability that an a realization of \(Z_r\) in the factual world have been a record in the counterfactual world if \((r-1)\) values of \(X_r\) where already observed. By comparing \(p_{1,r}\) and \(p_{0,r}\), one can assess whether human activity has a causal effect on having a record after a sequence of length \((r-1)\).

Finally, the fraction of attributable risk for a record over a sequence of length \((r-)1\) is: \[ far(r) = 1 - \dfrac{p_{0,r}} {p_{1,r}} \]

Statistical inference for \(p_{1,r}\) and the far

P. Naveau et al. (2018) provide a statiscal procedure to estimate \(p_1,r\) which then leads to an estimation of the \(far\) for records and its confidence interval. The different elements of this procedure are described in the table below. We only want to provide a general idea of how the method is built. For the complete explanation and all the proofs, please refer to P. Naveau et al. (2018) .

From the assumptions that the random variables \(X_i\) are \(i.i.d.\) and that the random variables \(Z_i\) are also \(i.i.d.\), one can notice that \(p_{0,r} = \frac{1}{r}\) and that \(p_{1,r}\) can be written as an expectation (first column of the table). In particular: \[ p_{1,2} = \mathbb{E}\big(G(Z)\big)\] where \(G\) is the cumulative distribution function (CDF) of the variables \(X\).

Moreover, if the random variable \(W = -log\big(G(Z)\big)\) follows an exponential distribution with mean \(\theta\), \(p_{1,r}\) and \(far(r)\) only depend on the parameter \(\theta\). Hence an estimation of \(\theta\) directly provides a way to estimate \(p_{1,r}\) and \(far(r)\).

To estimate \(\theta\), we use the fact that \(\theta\) can be expressed as a function of \(p_{1,2}\) and that \(p_{1,2}= \mathbb{E}\big(G(Z)\big)\) can be easily estimated by its empirical counterpart: \[ \hat{p}_{1,2} = \dfrac{1}{n}\sum_{i = 1}^{n}\mathbb{G}_m(Z_i)\] P. Naveau et al. (2018) show that this estimator is asymptotically normal. Then, \(\hat{p}_{1,2}\) can be used to estimate \(\theta\). The asympotic gaussianity of \(\hat\theta\) can be derived from the asymptotic properties of \(\hat{p}_{1,2}\) and the delta-method. In the same way, one can build an estimator for \(far(r)\) from \(\hat\theta\) and deduce its asymptotic gaussianity with the delta method.

In practice, one needs to first check whether the assumption that \(W = -log\big(G(Z)\big)\) follows an exponential distribution is reasonable. \(W\) being not directly observed, an empirical estimate of \(W\) is used: \[ \hat{W_i} = - \log \hat{G}_m(Z_i)\] where \(\hat{G}_m\) is a kernel empirical estimate of the CDF of \(X\). Then the Cox and Oakes (Cox, D. and Oates, D. 1984) test is used to assess whether \(\hat{W_i}\) are distributed according to an exponential distribution.

the farr package implements this procedure for the estimation of the \(far\), the fraction of attributable risk for records. In the next part, we will show how to use the farr package on idealized examples.

Toy examples

In this part, we will use the package to compute the fraction of attributable risk for records on three idealized cases described in P. Naveau et al. (2018). For this three idealized cases, the variable \(W\) follows an exponential distribution. Additionaly, the theoretical values for the \(far\) are also known for those three cases.

Those three cases are:

the shifted Gumbel distribution.
the inflated Frechet distribution.
the Weibull-EVT type distribution.

We will provide more details on those distribution in their respective examples.

First, let us load the packages that are used in this vignette.

R packages

# far for records
library(farr)

# extreme value distribution
library(evd)

# plotting and mapping
library(fields)
library(maps)

Just in case, we provide the session information for the sake of reproducibility.

Session information

print(sessionInfo(), locale = FALSE)

## R version 3.4.4 (2018-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.1 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] fields_9.6      maps_3.3.0      spam_2.2-0      dotCall64_1.0-0
## [5] evd_2.3-3       farr_0.0.0.9000
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.18       knitr_1.20         magrittr_1.5      
##  [4] cubature_1.4       lattice_0.20-35    quadprog_1.5-5    
##  [7] stringr_1.3.1      tools_3.4.4        quantreg_5.36     
## [10] htmltools_0.3.6    MatrixModels_0.4-1 yaml_2.2.0        
## [13] rprojroot_1.3-2    digest_0.6.15      Matrix_1.2-14     
## [16] evaluate_0.11      rmarkdown_1.10     np_0.60-8         
## [19] stringi_1.2.4      compiler_3.4.4     backports_1.1.2   
## [22] boot_1.3-20        SparseM_1.77

Shifted Gumbel distribution

The counterfactual variable \(X\) follows a Gumbel distribution with CDF: \[G(x) = \exp(-\exp(-x))\] and the factual variable \(Z\) is defined as: \[ Z = \mu + X \] with \(x\) real, \(\mu \geq 0\).

This implies that:

\(X\) follows a Gumbel distribution with a location parameter equal to \(0\) and a scale parameter equal to \(1\).
\(Z\) follows a Gumbel distribution with a location parameter equal to \(\mu\) and a scale parameter equal to \(1\).

Simulation of \(X\) and \(Z\)

# RNG
set.seed(1)

# sample size
size <- 500


# Gumbel 
# Simulations
mu = 0.5
x = rgev(size, loc = 0, scale = 1, shape = 0)
z = rgev(size, loc = mu, scale = 1, shape = 0)

Fit of \(\theta\)

The function estim_theta.wexp estimates the parameter \(\theta\) using the hypothesis that \(W \sim Exp(1/\theta)\). It takes as parameters a vector for the realisations of \(X\) and a vector for the realizations of \(Z\). It returns a list with the following elements:

theta_hat the estimate of theta.
sigma_theta_hat the standard deviation of the estimator of theta.
W the estimate of W.
co_test the result of the Cox and Oakes test to assess whether W follows an exponential distribution.

theta_fit <- estim_theta.wexp(x = x, z = z)

## 
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 /
Multistart 1 of 1 |
Multistart 1 of 1 |

# Check null hypothesis that W ~ Exp(1/theta)
print(theta_fit$co_test)

## $statistic
## [1] -27.34511
## 
## $p.value
## [1] 0.340337

par(mfrow = c(1, 3))
hist(theta_fit)
ecdf(theta_fit)
qqplot(theta_fit)

The results of the Cox and Oakes test as well as the graphical diagnostics suggest that it is reasonable to assume that \(W\) follows an exponential distribution. Hence, we continue with the estimation of the \(far\).

Estimation of far(r)

From the estimate of \(\theta\), we estimate \(far(r)\) for different values of \(r\) with the function farr_fit.exp. farr_fit.exp takes as argument:

theta_hat the value of the estimated theta.
sigma_theta_hat the value of the estimated standard devitation of \(\hat\theta\).
rp a vector of return periods for which the \(far\) is to be computed.

The function returns an object of class ("farrfit_exp", "farrfit"). It is a list containing the following elements:

rp the return periods for which the \(far\) is estimated.
farr_hat the estimate of the \(far\) for each return period rp.
sigma_farr_hat the standard deviation of the estimator of the \(far\).

rp = c(seq(from = 2, to = 5, length = 10), seq(from = 6, to = 20, length = 5))

# Theoretical results
# asymptotic limit for the far and the FAR in this case with a Gumble distributiom
glim <- gumble_lim(mu  = mu)
# theoretical far 
gfarr <- gumble_farr(r = rp, mu = mu)

farr_fit.exp <- estim_farr.wexp(theta_hat = theta_fit$theta_hat, sigma_theta_hat = theta_fit$sigma_theta_hat, rp = rp) 
print(farr_fit.exp)

## [1] "return periods, estimated far and estimated standard deviation for each return period:"
##           rp  farr_hat sigma_farr_hat
## 1   2.000000 0.1643882     0.02488993
## 2   2.333333 0.1878723     0.02844564
## 3   2.666667 0.2054853     0.03111242
## 4   3.000000 0.2191843     0.03318658
## 5   3.333333 0.2301435     0.03484591
## 6   3.666667 0.2391102     0.03620354
## 7   4.000000 0.2465823     0.03733490
## 8   4.333333 0.2529050     0.03829221
## 9   4.666667 0.2583244     0.03911275
## 10  5.000000 0.2630212     0.03982389
## 11  6.000000 0.2739804     0.04148322
## 12  9.500000 0.2941684     0.04453988
## 13 13.000000 0.3034860     0.04595065
## 14 16.500000 0.3088506     0.04676291
## 15 20.000000 0.3123376     0.04729087

plot(farr_fit.exp, lwd = 2, main = "far for the shifted Gumble")
lines(rp, gfarr, col = "red", lty = 1, lwd = 2)
abline(h = glim, col = "red", lty = 2, lwd = 2)
legend("bottomright",
       legend = c("estimated far", "theoretical far", "theoretical asymptotic limit"),
       col = c("black", "red", "red"),
       lwd = 2,
       lty = c(1, 1, 2))

The graph presents the estimated \(far\) for different return period \(r\). The shadded area represents the 90%-confidence-interval band for the estimated \(far\). The true theoretical value of the \(far\) is within the confidence intervals. This indicates that the estimate in this case is rather good.

Remarks: When the Cox and Oakes test rejects the null hypothesis that \(W\) follows an exponential distribution, one can use the non-parametric estimation of the \(far\) described in the annexes of (P. Naveau et al. 2018).

This package provides this method of estimation through the function estim_farr.np. It takes as arguments:

x a vector of realizations of the variable of interest in the counterfactual world.
z a vector of realizations of the variable interest in the factual world.
rp the return periods for which the \(far\) is to be estimated.

and it returns an object of class ("farrfit_np", "farrfit"). It is a list containing the following elements:

rp the return periods for which the \(far\) is estimated.
farr_hat the estimate of the \(far\) for each return period rp.
sigma_farr_hat the standard deviation of the estimator of the \(far\) assuming asymptotic gaussianity.

farr_fit.np <- estim_farr.np(x = x, z = z, rp = rp) 
print(farr_fit.np)

## [1] "return periods, estimated far and estimated standard deviation for each return period:"
##           rp  farr_hat sigma_farr_hat
## 1   2.000000 0.1643882     0.02485809
## 2   2.333333 0.1898060     0.02947633
## 3   2.666667 0.2093837     0.03339624
## 4   3.000000 0.2250111     0.03682268
## 5   3.333333 0.2378540     0.03987786
## 6   3.666667 0.2486691     0.04264054
## 7   4.000000 0.2579671     0.04516456
## 8   4.333333 0.2661033     0.04748852
## 9   4.666667 0.2733318     0.04964117
## 10  5.000000 0.2798381     0.05164462
## 11  6.000000 0.2962462     0.05691805
## 12  9.500000 0.3354125     0.06956478
## 13 13.000000 0.3624683     0.07653943
## 14 16.500000 0.3837086     0.07958484
## 15 20.000000 0.4012196     0.07925654

plot(farr_fit.np, lwd = 2, main = "non-parametric far the for shifted Gumble")
lines(rp, gfarr, col = "red", lty = 1, lwd = 2)
abline(h = glim, col = "red", lty = 2, lwd = 2)
legend("bottomright",
       legend = c("estimated far", "theoretical far", "theoretical asymptotic limit"),
       col = c("black", "red", "red"),
       lwd = 2,
       lty = c(1, 1, 2))

Inflated Frechet distribution

The counterfactual variable \(X\) follows a Frechet distribution with CDF: \[G(x) = \exp(-x^{-\frac{1}{\xi}})\] and the factual variable \(Z\) is defined as: \[ Z = \sigma X\] with \(x > 0\), \(\xi > 0\) and \(\sigma > 0\).

This implies that:

\(X\) follows a Frechet distribution with a location parameter equal to \(1\) and a scale parameter equal to its shape parameter \(\xi\).
\(Z\) follows a Frechet distribution with a location parameter equal to \(\sigma\), a scale parameter equal to \(\sigma \xi\), and a shape parameter equal to \(\xi\).

Simulation of \(X\) and \(Z\)

# RNG
set.seed(1)

# sample size
size <- 500


# Frechet
# Simulations
sigma = 1.412538
xi <- .15
x = rgev(size, loc = 1, scale = xi, shape = xi)
z = rgev(size, loc = sigma , scale = sigma * xi, shape = xi)

Fit of \(\theta\)

theta_fit <- estim_theta.wexp(x = x, z = z)

## 
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 /
Multistart 1 of 1 |
Multistart 1 of 1 |

# Check null hypothesis that W ~ Exp(1/theta)
print(theta_fit$co_test)

## $statistic
## [1] 24.75705
## 
## $p.value
## [1] 0.3846487

par(mfrow = c(1, 3))
hist(theta_fit)
ecdf(theta_fit)
qqplot(theta_fit)

Estimation of far(r)

rp = c(seq(from = 2, to = 5, length = 10), seq(from = 6, to = 20, length = 5))

# Theoretical results
# asymptotic limit for the far and the FAR in this case with a Gumble distributiom
flim <- frechet_lim(sigma  = sigma, xi = xi)
# theoretical far 
ffarr <- frechet_farr(r = rp, sigma = sigma, xi = xi)



# Parametric estination
farr_fit.exp <- estim_farr.wexp(theta_hat = theta_fit$theta_hat, sigma_theta_hat = theta_fit$sigma_theta_hat, rp = rp) 
print(farr_fit.exp)

## [1] "return periods, estimated far and estimated standard deviation for each return period:"
##           rp  farr_hat sigma_farr_hat
## 1   2.000000 0.4492373    0.005874243
## 2   2.333333 0.5134141    0.006713420
## 3   2.666667 0.5615466    0.007342803
## 4   3.000000 0.5989831    0.007832324
## 5   3.333333 0.6289322    0.008223940
## 6   3.666667 0.6534361    0.008544353
## 7   4.000000 0.6738560    0.008811364
## 8   4.333333 0.6911343    0.009037296
## 9   4.666667 0.7059443    0.009230953
## 10  5.000000 0.7187797    0.009398788
## 11  6.000000 0.7487288    0.009790405
## 12  9.500000 0.8038983    0.010511803
## 13 13.000000 0.8293612    0.010844756
## 14 16.500000 0.8440216    0.011036456
## 15 20.000000 0.8535509    0.011161061

# Non-parametric estimation
farr_fit.np <- estim_farr.np(x = x, z = z, rp = rp) 
print(farr_fit.np)

## [1] "return periods, estimated far and estimated standard deviation for each return period:"
##           rp  farr_hat sigma_farr_hat
## 1   2.000000 0.4492373    0.005921643
## 2   2.333333 0.5133924    0.006859384
## 3   2.666667 0.5614971    0.007602178
## 4   3.000000 0.5989003    0.008214602
## 5   3.333333 0.6288114    0.008735374
## 6   3.666667 0.6532728    0.009189106
## 7   4.000000 0.6736460    0.009592185
## 8   4.333333 0.6908739    0.009955939
## 9   4.666667 0.7056299    0.010288443
## 10  5.000000 0.7184079    0.010595607
## 11  6.000000 0.7481676    0.011404336
## 12  9.500000 0.8025564    0.013512414
## 13 13.000000 0.8272305    0.015055783
## 14 16.500000 0.8412285    0.016258665
## 15 20.000000 0.8502698    0.017193387

par(mfrow = c(1, 2 ))
plot(farr_fit.exp, lwd = 2, main = "parametric far \n for the inflated Frechet")
lines(rp, ffarr, col = "red", lty = 1, lwd = 2)
abline(h = flim, col = "red", lty = 2, lwd = 2)
legend("bottomright",
       legend = c("estimated far", "theoretical far", "asymptotic limit"),
       col = c("black", "red", "red"),
       lwd = 2,
       lty = c(1, 1, 2))
plot(farr_fit.np, lwd = 2, main = "non-parametric far \n for the inflated Frechet")
lines(rp, ffarr, col = "red", lty = 1, lwd = 2)
abline(h = flim, col = "red", lty = 2, lwd = 2)
legend("bottomright",
       legend = c("estimated far", "theoretical far", "asymptotic limit"),
       col = c("black", "red", "red"),
       lwd = 2,
       lty = c(1, 1, 2))

Weibull-EVT type distribution

The counterfactual variable \(X\) follows a Weibull distribution with CDF: \[G(x) = \exp(-(1 + \xi x)_+^{-1 / \xi})\] where the function \((x)_+\) correspond to \(\max(0, x)\). The factual variable \(Z\) is defined as: \[ Z = \mu + \sigma X \] with \(\xi < 0\), \(x < - 1/ \xi\), \(\sigma > 0\) and \(\mu = (1 - \sigma)/(-\xi)\).

This implies that:

\(X\) follows a Weibull distribution with a location parameter equal to \(0\), a scale parameter equal to \(1\) and shape parameter equal to \(\xi\).
\(Z\) follows a Weibull distribution with a location parameter equal to \(\mu\), a scale parameter equal to \(sigma\) and shape parameter equal to \(\xi\).

Simulation of \(X\) and \(Z\)

# RNG
set.seed(1)

# sample size
size <- 500


# Weibull 
# Simulations
xi <- -0.05
sigma <-  1.1
mu <- (1 - sigma) / (-xi)
x = rgev(size, loc = 0, scale = 1, shape = xi)
z = rgev(size, loc = mu, scale =  sigma, shape = xi)

Fit of \(\theta\)

theta_fit <- estim_theta.wexp(x = x, z = z)

## 
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 /
Multistart 1 of 1 |
Multistart 1 of 1 |

# Check null hypothesis that W ~ Exp(1/theta)
print(theta_fit$co_test)

## $statistic
## [1] -42.97697
## 
## $p.value
## [1] 0.1304728

par(mfrow = c(1, 3))
hist(theta_fit)
ecdf(theta_fit)
qqplot(theta_fit)

Remark: For the Weibull-EVT type distribution, we see with the QQ-plot that values of the upper tail of \(\hat{W}\) are higher than the expected values if \(\hat{W}\) were to follow an exponential distribution. This is a bit unexpected results since in theory for the Weibull-EVT distribution, \(W\) follows an exponential distribution. In this case, it is related to the fact that we do not perform the test directly on \(W\) but on its estimate \(\hat{W}\). Indeed, our empirical estimations of \(G\), the CDF of \(X\) and \(W\) exhibit biases. If we replace our empirical estimator of \(G\) by the true \(G\), the p-value of the Cox and Oakes test is bigger and it is harder to reject with confidence the null hypothesis that \(W\) follows an exponential distribution.

# True value of W
W = -log(pgev(z, shape = xi))
# Kolmogorov-Smirnov test for an exponential distribution
ks.test(W, "pexp", rate = 1/mean(W))

## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  W
## D = 0.035828, p-value = 0.5423
## alternative hypothesis: two-sided

# Cox and Oakes test of exponentiality
CoxOakes(W)

## $statistic
## [1] 9.15193
## 
## $p.value
## [1] 0.7496358

# plot of true value of W vs estimates of W
xylim <- range(W, theta_fit$W[is.finite(theta_fit$W)])
plot(W, theta_fit$W,
     main = "W_hat bias", xlab = "true", ylab = "estimate",
     xlim = xylim, ylim = xylim)
abline(a = 0, b = 1, col= "red")

Here, we see that in this case the emprical estimator of \(W\) has a bias. It overestimates higher values of \(W\).

Since we theoretically know that \(W\) is following an exponential distribution, we can continue with the estimation of the far. Indeed, the estimates of \(W\) is not used when constructing the estimator of the \(far\).

Estimation of far(r)

rp = c(seq(from = 2, to = 5, length = 10), seq(from = 6, to = 20, length = 5))

# Theoretical results
# asymptotic limit for the far and the FAR in this case with a Gumble distributiom
wlim <- weibull_lim(sigma  = sigma, xi = xi)
# theoretical far 
wfarr <- weibull_farr(r = rp, sigma = sigma, xi = xi)



# Parametric estination
farr_fit.exp <- estim_farr.wexp(theta_hat = theta_fit$theta_hat, sigma_theta_hat = theta_fit$sigma_theta_hat, rp = rp) 
print(farr_fit.exp)

## [1] "return periods, estimated far and estimated standard deviation for each return period:"
##           rp  farr_hat sigma_farr_hat
## 1   2.000000 -3.327356      0.4052740
## 2   2.333333 -3.802692      0.4631703
## 3   2.666667 -4.159195      0.5065925
## 4   3.000000 -4.436474      0.5403653
## 5   3.333333 -4.658298      0.5673836
## 6   3.666667 -4.839790      0.5894894
## 7   4.000000 -4.991034      0.6079110
## 8   4.333333 -5.119009      0.6234984
## 9   4.666667 -5.228702      0.6368591
## 10  5.000000 -5.323769      0.6484384
## 11  6.000000 -5.545593      0.6754566
## 12  9.500000 -5.954216      0.7252271
## 13 13.000000 -6.142811      0.7481981
## 14 16.500000 -6.251396      0.7614239
## 15 20.000000 -6.321976      0.7700206

# Non-parametric estimation
farr_fit.np <- estim_farr.np(x = x, z = z, rp = rp) 
print(farr_fit.np)

## [1] "return periods, estimated far and estimated standard deviation for each return period:"
##           rp  farr_hat sigma_farr_hat
## 1   2.000000 -3.327356      0.4206988
## 2   2.333333 -3.595443      0.5033723
## 3   2.666667 -3.745376      0.5691041
## 4   3.000000 -3.831236      0.6229105
## 5   3.333333 -3.880482      0.6682637
## 6   3.666667 -3.907921      0.7074596
## 7   4.000000 -3.922057      0.7420601
## 8   4.333333 -3.928013      0.7731599
## 9   4.666667 -3.928969      0.8015395
## 10  5.000000 -3.926934      0.8277604
## 11  6.000000 -3.913463      0.8969762
## 12  9.500000 -3.867490      1.0692250
## 13 13.000000 -3.829211      1.1488383
## 14 16.500000 -3.788691      1.1297925
## 15 20.000000 -3.750584      0.9940890

par(mfrow = c(1, 2 ))
plot(farr_fit.exp, lwd = 2, main = "parametric far \n for Weibull-EVT type distribution")
lines(rp, wfarr, col = "red", lty = 1, lwd = 2)
abline(h = wlim, col = "red", lty = 2, lwd = 2)
legend("topright",
       legend = c("estimated far", "theoretical far", "asymptotic limit"),
       col = c("black", "red", "red"),
       lwd = 2,
       lty = c(1, 1, 2))
plot(farr_fit.np, lwd = 2, main = "non-parametric far \n for Weibull-EVT type distribution")
lines(rp, wfarr, col = "red", lty = 1, lwd = 2)
abline(h = wlim, col = "red", lty = 2, lwd = 2)
legend("topright",
       legend = c("estimated far", "theoretical far", "asymptotic limit"),
       col = c("black", "red", "red"),
       lwd = 2,
       lty = c(1, 1, 2))

The parametric estimation gives satisfying results when having 500 points for each \(X\) and \(Z\). The non-parametric estimate of the \(far\) does not performs as good as the parametric estimation. The theoretical \(far\), for a return period \(r = 20\), does not fall within the 90%-confidence interval. The non-parametric estimation is improved when we increase the number of points, for instance to 1000 points for both \(X\) and \(Z\).

A climate case study

In this part, we will try to reproduce results presented in P. Naveau et al. (2018) on CMIP5 data.

Data

As in P. Naveau et al. (2018), we select data of annual maxima of temperature daily maxima (\(tasmax\)) simulated from the Meteo-France CNRM-CM5 model (Voldoire et al. 2013). For the factual world, we select the r1i1p1 run from the historical experiment were all available forcings are taken into account. From this run, we only take the last 30 years of simulations (1975-2005) and we assumed that the climate system is approximately stationnary during that period.

For the counterfactual world, we use the corresponding r1i1p1 run from the historical natural experiment (historicalNAT) in which the anthropogenic contributions in the climate forcings have been removed. With only natural forcings, we assume that the climate system is stationary for the full length of the simulations (1850-2012). Hence, all available data are used in this case.

Contrary to P. Naveau et al. (2018), we will only perform the analysis over the mediterranean region, simply to limit reasonnably the necessary computing time.

All CMIP5 data are publicly available, for instance through the ESGF-COG nodes.

To facilitate their use in this vignette, the data from the CNRM-CM5 model have been prepared and compiled in a .rds file farr_cmip5_ex.rds available here.

First, let us load the file farr_cmip5_ex.rds into a variable named cmip5_exin R.

cmip5_ex <- readRDS(file = "farr_cmip5_ex.rds")
print(str(cmip5_ex))

## List of 2
##  $ tasmax_all: num [1:36, 1:15, 1:31] 310 323 321 320 320 ...
##   ..- attr(*, "dimnames")=List of 3
##   .. ..$ lon : chr [1:36(1d)] "-9.8" "-8.4" "-7.0" "-5.6" ...
##   .. ..$ lat : chr [1:15(1d)] "30.1" "31.5" "32.9" "34.3" ...
##   .. ..$ year: chr [1:31] "1975" "1976" "1977" "1978" ...
##  $ tasmax_nat: num [1:36, 1:15, 1:158] 310 319 318 318 319 ...
##   ..- attr(*, "dimnames")=List of 3
##   .. ..$ lon : chr [1:36(1d)] "-9.8" "-8.4" "-7.0" "-5.6" ...
##   .. ..$ lat : chr [1:15(1d)] "30.1" "31.5" "32.9" "34.3" ...
##   .. ..$ year: chr [1:158] "1850" "1851" "1852" "1853" ...
## NULL

The file cmip5_ex is a list that contains the historical and the historicalNAT datasets stored in an array format:

tasmax_all is an array of dimensions (36 x 15 x 31) corresponding to lon, lat and year. It contains the yearly maxima of daily temperature maxima simulated by the CNRM-CM5 model where all forcings were considered. These data will serve as our factual world.
tasmax_nat is an array of dimensions (36 x 15 x 158) corresponding to lon, lat and year. It contains the yearly maxima of diyly temperature maxima simulated by the CNRM-CM5 model where only natural forcings were considered. These data will serve as our counterfactual world.

To make it easier to use, will attach cmip5_ex. the arrays tasmax_all and tasmax_nat can now be accessed directly by their name.

attach(cmip5_ex)
str(tasmax_all)

##  num [1:36, 1:15, 1:31] 310 323 321 320 320 ...
##  - attr(*, "dimnames")=List of 3
##   ..$ lon : chr [1:36(1d)] "-9.8" "-8.4" "-7.0" "-5.6" ...
##   ..$ lat : chr [1:15(1d)] "30.1" "31.5" "32.9" "34.3" ...
##   ..$ year: chr [1:31] "1975" "1976" "1977" "1978" ...

str(tasmax_nat)

##  num [1:36, 1:15, 1:158] 310 319 318 318 319 ...
##  - attr(*, "dimnames")=List of 3
##   ..$ lon : chr [1:36(1d)] "-9.8" "-8.4" "-7.0" "-5.6" ...
##   ..$ lat : chr [1:15(1d)] "30.1" "31.5" "32.9" "34.3" ...
##   ..$ year: chr [1:158] "1850" "1851" "1852" "1853" ...

To have a first idea of the data, we plot a maps of maxima of the \(tasmax\) in the factual and counterfactual worlds.

First visualisation

# compute maps of maxima
maxtasmax_all <- apply(tasmax_all, c(1, 2), max, na.rm = TRUE)
maxtasmax_nat <- apply(tasmax_nat, c(1, 2), max, na.rm = TRUE)
worldmap <- maps::map("world", plot = FALSE)

lon <- as.numeric(rownames(maxtasmax_all))
lat <- as.numeric(colnames(maxtasmax_all))
zlim <- range(maxtasmax_all, maxtasmax_nat) 
# from http://colorbrewer2.org
pal_tasmax <- c('#ffffcc','#ffeda0','#fed976','#feb24c','#fd8d3c',
                '#fc4e2a','#e31a1c','#bd0026','#800026')
par(mfrow = c(1, 3))
image.plot(x = lon, y = lat, 
           z = maxtasmax_all, 
           zlim = zlim, 
           legend.lab = "tasmax(K)", horizontal = TRUE,
           col = pal_tasmax,
           main  = "Factual world: \n max of tasmax" )
lines(worldmap)

image.plot(x = lon, y = lat, 
           z = maxtasmax_nat, 
           zlim = zlim, 
           legend.lab = "tasmax(K)", horizontal = TRUE,  
           col = pal_tasmax, nlevel = length(pal_tasmax),
           main  = "Counterfactual world: \n max of tasmax" )
lines(worldmap)

maxtasmax_diff <- maxtasmax_all - maxtasmax_nat
zlim_diff <- max(abs(maxtasmax_diff)) *1.1 * c(-1, 1)
# from http://colorbrewer2.org
pal_difftasmax <- c('#8c510a','#bf812d','#dfc27d','#f6e8c3','#f5f5f5',
                    '#c7eae5','#80cdc1','#35978f','#01665e')
image.plot(x = lon, y = lat, 
           z = maxtasmax_all - maxtasmax_nat, 
           zlim = zlim_diff, 
           col = pal_difftasmax, nlevel = length(pal_difftasmax),
           legend.lab = "tasmax(K)", horizontal = TRUE,  
           main  = "Factual vs Counterfactual: \n diff of tasmax max" )
lines(worldmap)

In terms of maxima of \(tasmax\), the factual world and the counterfactual world do not appear to be too different form each other. Maxima of \(tasmax\) tend to be higher in lands than in oceans. Additionaly, highers maxima tend to be reached when going closer to the tropics. Quantitavively, the maximum of the difference in the maximun of \(tasmax\) is about 4 degrees (K).

Study of records with the \(far\).

Let us now see whether the two worlds differ in terms of records.

Estimation of \(\theta\)

For each grid point, we will estimate the parameter \(i\theta\).

ilonlat <- expand.grid(lon = seq_along(lon), lat = seq_along(lat))

# looping over all grid points
theta_fits <- mapply(function(ilon, ilat){
  estim_theta.wexp(x = tasmax_nat[ilon, ilat, ], z = tasmax_all[ilon, ilat, ]) 
}, ilon = ilonlat$lon, ilat = ilonlat$lat, 
SIMPLIFY = FALSE)

str(theta_fits[[1]])

## List of 4
##  $ theta_hat      : num 0.646
##  $ sigma_theta_hat: num 0.193
##  $ W              : num [1:31] 1.0512 0.0164 1.5804 0.0731 0.1089 ...
##  $ co_test        :List of 2
##   ..$ statistic: num -3.37
##   ..$ p.value  : num 0.637
##  - attr(*, "class")= chr [1:2] "thetafit_wexp" "thetafit"

# Extracting theta_hat and pvalues of Cox and Oakes test
theta_mat <- vapply(theta_fits,
                      FUN = function(x) x$theta_hat,
                      FUN.VALUE = 1)
theta_mat <- matrix(theta_mat, nrow = length(lon), ncol = length(lat))

pvalues_mat <- vapply(theta_fits,
                      FUN = function(x) x$co_test$p.value,
                      FUN.VALUE = 1)
pvalues_mat <- matrix(pvalues_mat, nrow = length(lon), ncol = length(lat))

theta_mat_na <- theta_mat
theta_mat_na[pvalues_mat < 0.05] <- NA

zlim_pvalues <- c(0, 1)
# from http://colorbrewer2.org
pal_pvalues <- rev(c('#ffffff','#f0f0f0','#d9d9d9','#bdbdbd','#969696','#737373','#525252','#252525','#000000'))
breaks_pvalues <- c(seq(0, 0.1, length.out = 8), 0.2, 1)
par(mfrow = c(1, 3))
image.plot(x = lon, y = lat, 
           z = pvalues_mat, 
           zlim = zlim_pvalues, 
           legend.lab = "p-value", horizontal = TRUE,
           col = pal_pvalues, breaks = breaks_pvalues,
           main  = "Cox and Oakes test: \n p-value" )
lines(worldmap)

zlim_theta <- range(theta_mat)
# from http://colorbrewer2.org
pal_theta <- c('#f7fcfd','#e0ecf4','#bfd3e6','#9ebcda','#8c96c6','#8c6bb1','#88419d','#810f7c','#4d004b')
image.plot(x = lon, y = lat, 
           z = theta_mat, 
           zlim = zlim_theta, 
           legend.lab = "theta", horizontal = TRUE,  
           col = pal_theta, nlevel = length(pal_theta),
           main  = "theta estimate" )
lines(worldmap)

zlim_theta <- range(theta_mat)
# from http://colorbrewer2.org
pal_theta <- c('#f7fcfd','#e0ecf4','#bfd3e6','#9ebcda','#8c96c6','#8c6bb1','#88419d','#810f7c','#4d004b')
image.plot(x = lon, y = lat, 
           z = theta_mat_na, 
           zlim = zlim_theta, 
           legend.lab = "theta", horizontal = TRUE,  
           col = pal_theta, nlevel = length(pal_theta),
           main  = "theta estimate with \n Cox and Oakes p-value > 0.05" )
lines(worldmap)

# percentage of grid point where the null hypothesis
# that W follows an exponential distribution is rejected
rejection_rate <- mean(pvalues_mat < 0.05) * 100

The Cox and Oakes test does not reject the null hypothesis that \(W\) follows an exponential distribution in a majority of cases. At a significance level of 5%, the test was rejected for 20% of the grid points. For the other 80%, it is reasonable to estimate the \(far\) with the parametric method by assuming that \(W\) follows an exponential distribution.

Estimation of the \(far\).

In the following, we will estimate the value of the \(far\) for a return period \(r_\theta = 1 + 1 / \sqrt\theta\). In P. Naveau et al. (2018), this specific return period was chosen because in theory, under the null hypothesis that \(W\) follows an exponential distribution, this is the return period that maximize the probability of necessary and sufficient causation (A. Hannart et al. 2016).

farr_estimates <- lapply(theta_fits, function(object){
  theta_hat <- object$theta_hat
  rp_theta <-  1 + 1/theta_hat
  estim_farr.wexp(theta_hat = theta_hat, sigma_theta_hat = object$sigma_theta_hat, rp = rp_theta)
})

str(farr_estimates[[1]])

## List of 3
##  $ rp            : num 2.55
##  $ farr_hat      : Named num 0.215
##   ..- attr(*, "names")= chr "farr_2.5470618824753"
##  $ sigma_farr_hat: num 0.117
##  - attr(*, "class")= chr [1:2] "farrfit_wexp" "farrfit"

# Extracting theta_hat and pvalues of Cox and Oakes test
rp_mat <- vapply(farr_estimates,
                      FUN = function(x) x$rp,
                      FUN.VALUE = 1)
rp_mat <- matrix(rp_mat, nrow = length(lon), ncol = length(lat))

far_mat <- vapply(farr_estimates,
                      FUN = function(x) x$farr_hat,
                      FUN.VALUE = 1)
far_mat <- matrix(far_mat, nrow = length(lon), ncol = length(lat))

ifar_signif_mat <- vapply(farr_estimates,
                      FUN = function(x){
                        cf <- confint(x)
                        !(0 >= cf[, 1] & 0 <= cf[, 2])
                      }, FUN.VALUE = 1)
ifar_signif_mat <- matrix(ifar_signif_mat, nrow = length(lon), ncol = length(lat))

rp_mat_na <- rp_mat
rp_mat_na[pvalues_mat < 0.05] <- NA
far_mat_na <- far_mat
far_mat_na[pvalues_mat < 0.05] <- NA
ifar_signif_mat <- ifar_signif_mat + 0 * far_mat_na


zlim_rp <- range(rp_mat)
pal_rp <- c('#fcfbfd','#efedf5','#dadaeb','#bcbddc','#9e9ac8','#807dba','#6a51a3','#54278f','#3f007d')
# from http://colorbrewer2.org
par(mfrow = c(1, 3))
image.plot(x = lon, y = lat, 
           z = rp_mat_na, 
           zlim = zlim_rp, 
           legend.lab = "r_theta", horizontal = TRUE,
           col = pal_rp,
           main  = "Return period r_theta \n for the far" )
lines(worldmap)

zlim_far <- max(abs(far_mat)) * c(-1, 1) * 1.2
# from http://colorbrewer2.org
pal_far <- c('#762a83','#9970ab','#c2a5cf','#e7d4e8','#f7f7f7','#d9f0d3','#a6dba0','#5aae61','#1b7837')
image.plot(x = lon, y = lat, 
           z = far_mat_na, 
           zlim = zlim_far, 
           legend.lab = "far(r_theta)", horizontal = TRUE,  
           col = pal_far, nlevel = length(pal_far),
           main  = "far(r_theta) with \n Cox and Oakes p-value > 0.05" )
lines(worldmap)

image.plot(x = lon, y = lat, 
           z = ifar_signif_mat, 
           legend.lab = "0 = false, 1 = true", horizontal = TRUE,  
           nlevel = 2, breaks = c(0, 0.5, 1),
           col = c('#eeeeee', '#000000'),
           main  = "Is far(r_theta) significant \n at significance level 5% ?" )
lines(worldmap)

From the plots, one can see that estimates for \(far(r_\theta)\) range from about \(-0.2\) to \(0.5\). Most estimated \(far\) are positive, meaning that yearly \(tasmax\) tend to be higher in the factual world than in the counterfactual world. On the contrary, a negative \(far\) suggests that yearly \(tasmax\) tend to be higher in the counterfactual world than in the factual world. However, none of the estimated \(far\) are statistically different from \(0\) according to the 95%-asymptoric-confidence intervals. Thus, it is not possible to conclude that in terms of yearly \(tasmax\) records that the climates from the factual and counterfactual world are statistically different. Consequently, the causal effect of anthropogenic forcings on the records of annual maxima temperature could not be shown to be significant.

Estimation of the \(far\) with non-parametric method.

To finish, let us that we can also estimate the \(far\) non parametrically. For this method, the return period, \(r\), has to be higher or equal to \(2\). In order to compare with parametric estimation of the \(far\), we will use the same \(r_\theta\). However, when \(r_\theta\) is lower than \(2\), \(r_\theta\) is set at \(2\).

nonpara_farr_estimates <- mapply(function(ilon, ilat){
                                 estim_farr.np(x = tasmax_nat[ilon, ilat, ],
                                                z = tasmax_all[ilon, ilat, ],
                                                rp = max(rp_mat[ilon, ilat], 2))
                                 }, ilon = ilonlat$lon, ilat = ilonlat$lat, 
                                 SIMPLIFY = FALSE)

str(nonpara_farr_estimates[[1]])

## List of 3
##  $ rp            : num 2.55
##  $ farr_hat      : Named num 0.224
##   ..- attr(*, "names")= chr "farr_2.5470618824753"
##  $ sigma_farr_hat: num 0.117
##  - attr(*, "class")= chr [1:2] "farrfit_np" "farrfit"

# Extracting theta_hat and pvalues of Cox and Oakes test
nonpara_rp_mat <- pmax(rp_mat, 2)

nonpara_far_mat <- vapply(nonpara_farr_estimates,
                      FUN = function(x) x$farr_hat,
                      FUN.VALUE = 1)
nonpara_far_mat <- matrix(nonpara_far_mat, nrow = length(lon), ncol = length(lat))

inonparafar_signif_mat <- vapply(nonpara_farr_estimates,
                      FUN = function(x){
                        cf <- confint(x)
                        !(0 >= cf[, 1] & 0 <= cf[, 2])
                      }, FUN.VALUE = 1)
inonparafar_signif_mat <- matrix(inonparafar_signif_mat, nrow = length(lon), ncol = length(lat))

zlim_rp <- range(rp_mat)
pal_rp <- c('#fcfbfd','#efedf5','#dadaeb','#bcbddc','#9e9ac8','#807dba','#6a51a3','#54278f','#3f007d')
# from http://colorbrewer2.org
par(mfrow = c(1, 3))
image.plot(x = lon, y = lat, 
           z = nonpara_rp_mat, 
           zlim = zlim_rp, 
           legend.lab = "r_theta", horizontal = TRUE,
           col = pal_rp,
           main  = "Return period r_theta \n for the far" )
lines(worldmap)

# from http://colorbrewer2.org
pal_far <- c('#762a83','#9970ab','#c2a5cf','#e7d4e8','#f7f7f7','#d9f0d3','#a6dba0','#5aae61','#1b7837')
image.plot(x = lon, y = lat, 
           z = nonpara_far_mat, 
           zlim = zlim_far, 
           legend.lab = "far(r_theta)", horizontal = TRUE,  
           col = pal_far, nlevel = length(pal_far),
           main  = "far(r_theta) with \n non-parametric estimation") 
lines(worldmap)

image.plot(x = lon, y = lat, 
           z = inonparafar_signif_mat, 
           legend.lab = "0 = false, 1 = true", horizontal = TRUE,  
           nlevel = 2, breaks = c(0, 0.5, 1),
           col = c('#eeeeee', '#000000'),
           main  = "Is far(r_theta) significant \n at significance level 5% ?" )
lines(worldmap)

The results with the non-parametric estimation of the \(far\) is similar those obtained with the parametric method when making the hypothesis that \(W\) follows an exponential distribution. The spatial distribution of the \(far\) is similar and in both cases, none of the estimated \(far\) is shown to be significantly different from \(0\), meaning that there is no clear difference in terms of records of yearly \(tasmax\) between the factual world and counterfactual world.

Conclusion

In this vignette, we wanted to provide a quick presentation of the \(far\), the fraction of attributable risk for records and show how studies of the \(far\) can be performed using the R package farr. We hope that we managed to give a feel on how the package work and how it could be applied in the context of climate sciences.

Any questions, comments or suggestions are welcome and can be sent at sthao@lsce.ipsl.fr

References

Bindoff, N.L., P.A. Stott, K.M. AchutaRao, M.R. Allen, N. Gillett, D. Gutzler, K. Hansingo, et al. 2013. “Detection and Attribution of Climate Change: From Global to Regional.” In Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change, edited by T.F. Stocker, D. Qin, G.-K. Plattner, M. Tignor, S.K. Allen, J. Boschung, A. Nauels, Y. Xia, V. Bex, and P.M. Midgley, 867–952. Cambridge, United Kingdom; New York, NY, USA: Cambridge University Press. doi:10.1017/CBO9781107415324.022.

Cox, D., and Oates, D. 1984. Analysis of Survival Data. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. New York.

Hannart, A., J. Pearl, F. E. L. Otto, P. Naveau, and M. Ghil. 2016. “Causal Counterfactual Theory for the Attribution of Weather and Climate-Related Events.” Bulletin of the American Meteorological Society 97 (1): 99–110. doi:10.1175/BAMS-D-14-00034.1.

Naveau, Philippe, Aurélien Ribes, Francis Zwiers, Alexis Hannart, Alexandre Tuel, and Pascal Yiou. 2018. “Revising Return Periods for Record Events in a Climate Event Attribution Context.” Journal of Climate 31 (9): 3411–22. doi:10.1175/JCLI-D-16-0752.1.

Pearl, Judea. 1999. “Probabilities of Causation: Three Counterfactual Interpretations and Their Identification.” Synthese 121 (1): 93–149. doi:10.1023/A:1005233831499.

Resnick, S. I. 1987. Extreme Values, Regular Variation, and Point Processes. Springer-Verlag.

Stott, Peter A., Nikolaos Christidis, Friederike E. L. Otto, Ying Sun, Jean-Paul Vanderlinden, Geert Jan van Oldenborgh, Robert Vautard, et al. 2016. “Attribution of Extreme Weather and Climate-Related Events: Attribution of Extreme Weather and Climate-Related Events.” Wiley Interdisciplinary Reviews: Climate Change 7 (1): 23–41. doi:10.1002/wcc.380.

Stott, Peter A., D. A. Stone, and M. R. Allen. 2004. “Human Contribution to the European Heatwave of 2003.” Nature 432 (7017): 610–14. doi:10.1038/nature03089.

Voldoire, A., E. Sanchez-Gomez, D. Salas y Mélia, B. Decharme, C. Cassou, S. Sénési, S. Valcke, et al. 2013. “The CNRM-CM5.1 Global Climate Model: Description and Basic Evaluation.” Climate Dynamics 40 (9-10): 2091–2121. doi:10.1007/s00382-011-1259-y.

Vignette: the far for records

Soulivanh Thao (sthao@lsce.ipsl.fr)

July 20, 2018

Introduction

Event attribution, the FAR and the far.

Attribution of climate change

(Extreme) event attribution

The \(FAR\) (Fraction of Attributable Risk)

The \(far\) (fraction of attributable risk for records)

Records

Record events in the factual and counterfactual worlds

Statistical inference for \(p_{1,r}\) and the far

Toy examples

R packages

Session information

Shifted Gumbel distribution

Simulation of \(X\) and \(Z\)

Fit of \(\theta\)

Estimation of far(r)

Inflated Frechet distribution

Simulation of \(X\) and \(Z\)

Fit of \(\theta\)

Estimation of far(r)

Weibull-EVT type distribution

Simulation of \(X\) and \(Z\)

Fit of \(\theta\)

Estimation of far(r)

A climate case study

Data

First visualisation

Study of records with the \(far\).

Estimation of \(\theta\)

Estimation of the \(far\).

Estimation of the \(far\) with non-parametric method.

Conclusion

References