Title: | Smooth Additive Quantile Regression Models |
---|---|
Description: | Smooth additive quantile regression models, fitted using the methods of Fasiolo et al. (2020) <doi:10.1080/01621459.2020.1725521>. See Fasiolo at al. (2021) <doi:10.18637/jss.v100.i09> for an introduction to the package. Differently from 'quantreg', the smoothing parameters are estimated automatically by marginal loss minimization, while the regression coefficients are estimated using either PIRLS or Newton algorithm. The learning rate is determined so that the Bayesian credible intervals of the estimated effects have approximately the correct coverage. The main function is qgam() which is similar to gam() in 'mgcv', but fits non-parametric quantile regression models. |
Authors: | Matteo Fasiolo [aut, cre], Ben Griffiths [aut], Simon N. Wood [ctb], Margaux Zaffran [ctb], Yannig Goude [ctb], Raphael Nedellec [ctb] |
Maintainer: | Matteo Fasiolo <[email protected]> |
License: | GPL (>=2) |
Version: | 2.0.0 |
Built: | 2025-02-03 05:11:01 UTC |
Source: | https://github.com/mfasiolo/qgam |
Data set on electricity demand from Sidney, Australia. The data has been downloaded from https://www.ausgrid.com.au, and it originally contained electricity demand from 300 customers, at 30min resolution. We discarded 53 customers because their demand was too irregular, and we integrated the demand data with temperature data from the National Climatic Data Center, covering the same period.
data(AUDem)
data(AUDem)
AUDem
is a list, where AUDem$meanDem
is a data.frame
containing the following variables:
the day of the year, from 1 to 365;
the time of day, ranging from 18 to 22, where 18 indicates the period from 17:00 to 17:30, 18.5 the period from 17:30 to 18:00 and so on;
the demand (in KW) during a 30min period, averaged over the 247 households;
factor variable indicating the day of the week;
the external temperature at Sidney airport, in degrees Celsius;
local date and time;
the lagged mean demand, that is the average demand (dem) during the same 30min period of the previous day;
The second element is AUDem$qDem48
which is a matrix with as many rows as AUDem$meanDem
. Each rows contains 20 equally spaced empirical quantiles of the lagged individual electricity demand of the 247 customers.
A list where AUDem$meanDem
is a data.frame and AUDem$qDem48
a matrix.
library(qgam) data(AUDem) # Mean demand over the period plot(AUDem$meanDem$dem, type = 'l') # 20 quantiles of individual demand over 5 days matplot(seq(0.01, 0.99, length.out = 20), t(AUDem$qDem48[c(1, 50, 75, 100, 250), ]), type = 'l', ylab = "Electricity demand (KW)", xlab = expression("Probability level " * "(p)"), lty = 1)
library(qgam) data(AUDem) # Mean demand over the period plot(AUDem$meanDem$dem, type = 'l') # 20 quantiles of individual demand over 5 days matplot(seq(0.01, 0.99, length.out = 20), t(AUDem$qDem48[c(1, 50, 75, 100, 250), ]), type = 'l', ylab = "Electricity demand (KW)", xlab = expression("Probability level " * "(p)"), lty = 1)
Generic function for checking R objects which produces, for instance, convergence tests or diagnostic plots.
For qgam
objects check.qgam()
will be used.
check(obj, ...)
check(obj, ...)
obj |
the object to be checked. |
... |
extra arguments, mainly used by graphic functions. |
Reports the results of convergence tests and/or produces diagnostic plots.
Matteo Fasiolo <[email protected]>.
####### # Using check.qgam ####### library(qgam) set.seed(0) dat <- gamSim(1, n=200) b<-qgam(y~s(x0)+s(x1)+s(x2)+s(x3), data=dat, qu = 0.5) plot(b, pages=1) check(b, pch=19, cex=.3)
####### # Using check.qgam ####### library(qgam) set.seed(0) dat <- gamSim(1, n=200) b<-qgam(y~s(x0)+s(x1)+s(x2)+s(x3), data=dat, qu = 0.5) plot(b, pages=1) check(b, pch=19, cex=.3)
Provides some visual plots showing how the calibration criterion and the effective degrees of freedom of each smooth component vary with the learning rate.
## S3 method for class 'learn' check(obj, sel = 1:2, ...)
## S3 method for class 'learn' check(obj, sel = 1:2, ...)
obj |
the output of a call to |
sel |
this function produces two plots, set this parameter to 1 to plot only the first, to 2 to plot only the second or leave it to 1:2 to plot both. |
... |
currently not used, here only for compatibility reasons. |
The first plot shows how the calibrations loss, which we are trying to minimize, varies with the
log learning rate. This function should look quite smooth, if it doesn't then try to increase
err
or control$K
(the number of bootstrap samples) in the original call to
tuneLearn
. The second plot shows how the effective degrees of freedom of each smooth term
vary with log(sigma). Generally as log(sigma) increases the complexity of the fit decreases, hence
the slope is negative.
It produces several plots.
Matteo Fasiolo <[email protected]>.
Fasiolo, M., Wood, S.N., Zaffran, M., Nedellec, R. and Goude, Y., 2020. Fast calibrated additive quantile regression. Journal of the American Statistical Association (to appear). https://www.tandfonline.com/doi/full/10.1080/01621459.2020.1725521.
library(qgam) set.seed(525) dat <- gamSim(1, n=200) b <- tuneLearn(lsig = seq(-0.5, 1, length.out = 10), y~s(x0)+s(x1)+s(x2)+s(x3), data=dat, qu = 0.5) check(b)
library(qgam) set.seed(525) dat <- gamSim(1, n=200) b <- tuneLearn(lsig = seq(-0.5, 1, length.out = 10), y~s(x0)+s(x1)+s(x2)+s(x3), data=dat, qu = 0.5) check(b)
Provides some visual checks to verify whether the Brent optimizer used by tuneLearnFast()
worked correctly.
## S3 method for class 'learnFast' check(obj, sel = NULL, ...)
## S3 method for class 'learnFast' check(obj, sel = NULL, ...)
obj |
the output of a call to |
sel |
integer vector determining which of the plots will be produced. For instance if |
... |
currently not used, here only for compatibility reasons. |
The top plot in the first page shows the bracket used to estimate log(sigma) for each quantile.
The brackets are delimited by the crosses and the red dots are the estimates. If a dot falls very close to one of the crosses,
that might indicate problems. The bottom plot shows, for each quantile, the value of parameter err
used. Sometimes the algorithm
needs to increase err
above its user-defined value to achieve convergence. Subsequent plots show, for each quantile, the value
of the loss function corresponding to each value of log(sigma) explored by Brent algorithm.
It produces several plots.
Matteo Fasiolo <[email protected]>.
Fasiolo, M., Wood, S.N., Zaffran, M., Nedellec, R. and Goude, Y., 2020. Fast calibrated additive quantile regression. Journal of the American Statistical Association (to appear). https://www.tandfonline.com/doi/full/10.1080/01621459.2020.1725521.
library(qgam) set.seed(525) dat <- gamSim(1, n=200) b <- tuneLearnFast(y ~ s(x0)+s(x1)+s(x2)+s(x3), data = dat, qu = c(0.4, 0.5), control = list("tol" = 0.05)) # <- sloppy tolerance to speed-up calibration check(b) check(b, 3) # Produces only third plot
library(qgam) set.seed(525) dat <- gamSim(1, n=200) b <- tuneLearnFast(y ~ s(x0)+s(x1)+s(x2)+s(x3), data = dat, qu = c(0.4, 0.5), control = list("tol" = 0.05)) # <- sloppy tolerance to speed-up calibration check(b) check(b, 3) # Produces only third plot
Takes a fitted gam object produced by qgam()
and produces some diagnostic information
about the fitting procedure and results. It is partially based on mgcv::gam.check
.
## S3 method for class 'qgam' check(obj, nbin = 10, lev = 0.05, ...)
## S3 method for class 'qgam' check(obj, nbin = 10, lev = 0.05, ...)
obj |
the output of a |
nbin |
number of bins used in the internal call to |
lev |
the significance levels used by |
... |
extra arguments to be passed to |
This function provides two plots. The first shows how the number of responses falling below the fitted
quantile (y-axis) changes with the fitted quantile (x-axis). To be clear: if the quantile is fixed to, say, 0.5
we expect 50% of the responses to fall below the fit. See ?cqcheck()
for details. The second plot related
to |F(hat(mu)) - F(mu0)|
, which is the absolute bias attributable to the fact that qgam is using
a smoothed version of the pinball-loss. The absolute bias is evaluated at each observation, and an histogram
is produced. See Fasiolo et al. (2017) for details. The function also prints out the integrated absolute bias,
and the proportion of observations lying below the regression line. It also provides some convergence
diagnostics (regarding the optimization), which are the same as in mgcv::gam.check
.
It reports also the maximum (k') and the selected degrees of freedom of each smooth term.
Simply produces some plots and prints out some diagnostics.
Matteo Fasiolo <[email protected]>, Simon N. Wood.
Fasiolo, M., Wood, S.N., Zaffran, M., Nedellec, R. and Goude, Y., 2020. Fast calibrated additive quantile regression. Journal of the American Statistical Association (to appear). https://www.tandfonline.com/doi/full/10.1080/01621459.2020.1725521.
library(qgam) set.seed(0) dat <- gamSim(1, n=200) b<-qgam(y~s(x0)+s(x1)+s(x2)+s(x3), data=dat, qu = 0.5) plot(b, pages=1) check.qgam(b, pch=19, cex=.3)
library(qgam) set.seed(0) dat <- gamSim(1, n=200) b<-qgam(y~s(x0)+s(x1)+s(x2)+s(x3), data=dat, qu = 0.5) plot(b, pages=1) check.qgam(b, pch=19, cex=.3)
Given an additive quantile model, fitted using qgam
, cqcheck
provides some plots
that allow to check what proportion of responses, y
, falls below the fitted quantile.
cqcheck( obj, v, X = NULL, y = NULL, nbin = c(10, 10), bound = NULL, lev = 0.05, scatter = FALSE, ... )
cqcheck( obj, v, X = NULL, y = NULL, nbin = c(10, 10), bound = NULL, lev = 0.05, scatter = FALSE, ... )
obj |
the output of a |
v |
if a 1D plot is required, |
X |
a dataframe containing the data used to obtain the conditional quantiles. By default it is NULL, in which
case predictions are made using the model matrix in |
y |
vector of responses. Its i-th entry corresponds to the i-th row of X. By default it is NULL, in which
case it is internally set to |
nbin |
a vector of integers of length one (1D case) or two (2D case) indicating the number of bins to be used
in each direction. Used only if |
bound |
in the 1D case it is a numeric vector whose increasing entries represent the bounds of each bin.
In the 2D case a list of two vectors should be provided. |
lev |
the significance levels used in the plots, this determines the width of the confidence intervals. Default is 0.05. |
scatter |
if TRUE a scatterplot is added (using the |
... |
extra graphical parameters to be passed to |
Having fitted an additive model for, say, quantile qu=0.4
one would expect that about 40
responses fall below the fitted quantile. This function allows to visually compare the empirical number
of responses (qu_hat
) falling below the fit with its theoretical value (qu
). In particular,
the responses are binned, which the bins being constructed along one or two variables (given be arguments
v
). Let (qu_hat[i]
) be the proportion of responses below the fitted quantile in the ith bin.
This should be approximately equal to qu
, for every i. In the 1D case, when v
is a single
character or a numeric vector, cqcheck
provides a plot where: the horizontal line is qu
,
the dots correspond to qu_hat[i]
and the grey lines are confidence intervals for qu
. The
confidence intervals are based on qbinom(lev/2, siz, qu)
, if the dots fall outside them, then
qu_hat[i]
might be deviating too much from qu
. In the 2D case, when v
is a vector of two
characters or a matrix with two columns, we plot a grid of bins. The responses are divided between the bins
as before, but now don't plot the confidence intervals. Instead we report the empirical proportions qu_hat[i]
for the non-empty bin, and with colour the bins in red if qu_hat[i]<qu
and in green otherwise. If
qu_hat[i]
falls outside the confidence intervals we put an * next to the numeric qu_hat[i]
and
we use more intense colours.
Simply produces a plot.
Matteo Fasiolo <[email protected]>.
####### # Bivariate additive model y~1+x+x^2+z+x*z/2+e, e~N(0, 1) ####### ## Not run: library(qgam) set.seed(15560) n <- 500 x <- rnorm(n, 0, 1); z <- rnorm(n) X <- cbind(1, x, x^2, z, x*z) beta <- c(0, 1, 1, 1, 0.5) y <- drop(X %*% beta) + rnorm(n) dataf <- data.frame(cbind(y, x, z)) names(dataf) <- c("y", "x", "z") #### Fit a constant model for median qu <- 0.5 fit <- qgam(y~1, qu = qu, data = dataf) # Look at what happens along x: clearly there is non linear pattern here cqcheck(obj = fit, v = c("x"), X = dataf, y = y) #### Add a smooth for x fit <- qgam(y~s(x), qu = qu, data = dataf) cqcheck(obj = fit, v = c("x"), X = dataf, y = y) # Better! # Lets look across x and z. As we move along z (x2 in the plot) # the colour changes from green to red cqcheck(obj = fit, v = c("x", "z"), X = dataf, y = y, nbin = c(5, 5)) # The effect look pretty linear cqcheck(obj = fit, v = c("z"), X = dataf, y = y, nbin = c(10)) #### Lets add a linear effect for z fit <- qgam(y~s(x)+z, qu = qu, data = dataf) # Looks better! cqcheck(obj = fit, v = c("z")) # Lets look across x and y again: green prevails on the top-left to bottom-right # diagonal, while the other diagonal is mainly red. cqcheck(obj = fit, v = c("x", "z"), nbin = c(5, 5)) ### Maybe adding an interaction would help? fit <- qgam(y~s(x)+z+I(x*z), qu = qu, data = dataf) # It does! The real model is: y ~ 1 + x + x^2 + z + x*z/2 + e, e ~ N(0, 1) cqcheck(obj = fit, v = c("x", "z"), nbin = c(5, 5)) ## End(Not run)
####### # Bivariate additive model y~1+x+x^2+z+x*z/2+e, e~N(0, 1) ####### ## Not run: library(qgam) set.seed(15560) n <- 500 x <- rnorm(n, 0, 1); z <- rnorm(n) X <- cbind(1, x, x^2, z, x*z) beta <- c(0, 1, 1, 1, 0.5) y <- drop(X %*% beta) + rnorm(n) dataf <- data.frame(cbind(y, x, z)) names(dataf) <- c("y", "x", "z") #### Fit a constant model for median qu <- 0.5 fit <- qgam(y~1, qu = qu, data = dataf) # Look at what happens along x: clearly there is non linear pattern here cqcheck(obj = fit, v = c("x"), X = dataf, y = y) #### Add a smooth for x fit <- qgam(y~s(x), qu = qu, data = dataf) cqcheck(obj = fit, v = c("x"), X = dataf, y = y) # Better! # Lets look across x and z. As we move along z (x2 in the plot) # the colour changes from green to red cqcheck(obj = fit, v = c("x", "z"), X = dataf, y = y, nbin = c(5, 5)) # The effect look pretty linear cqcheck(obj = fit, v = c("z"), X = dataf, y = y, nbin = c(10)) #### Lets add a linear effect for z fit <- qgam(y~s(x)+z, qu = qu, data = dataf) # Looks better! cqcheck(obj = fit, v = c("z")) # Lets look across x and y again: green prevails on the top-left to bottom-right # diagonal, while the other diagonal is mainly red. cqcheck(obj = fit, v = c("x", "z"), nbin = c(5, 5)) ### Maybe adding an interaction would help? fit <- qgam(y~s(x)+z+I(x*z), qu = qu, data = dataf) # It does! The real model is: y ~ 1 + x + x^2 + z + x*z/2 + e, e ~ N(0, 1) cqcheck(obj = fit, v = c("x", "z"), nbin = c(5, 5)) ## End(Not run)
Given an additive quantile model, fitted using qgam
, cqcheck2DI
provides some interactive
2D plots that allow to check what proportion of responses, y
, falls below the fitted quantile.
This is an interactive version of the cqcheck
function.
cqcheckI( obj, v, X = NULL, y = NULL, run = TRUE, width = "100%", height = "680px" )
cqcheckI( obj, v, X = NULL, y = NULL, run = TRUE, width = "100%", height = "680px" )
obj |
the output of a |
v |
if a 1D plot is required, |
X |
a dataframe containing the data used to obtain the conditional quantiles. By default it is NULL, in which
case predictions are made using the model matrix in |
y |
vector of responses. Its i-th entry corresponds to the i-th row of X. By default it is NULL, in which
case it is internally set to |
run |
if TRUE (default) the function produces an interactive plot, otherwise it returns the corresponding shiny app. |
width |
the width of the main plot. Default is "100%". |
height |
width the width of the main plot. Default is "680px". |
This is an interactive version of the cqcheck
, see ?cqcheck
for details. The main interactive
feature is that one can select an area by brushing, and then double-click to zoom in. In the 1D case the vertical
part of the selected area is not use: we zoom only along the x axis. Double-clicking without brushing zooms out.
Simply produces an interactive plot.
Matteo Fasiolo <[email protected]>.
## Not run: ####### # Example 1: Bivariate additive model y~1+x+x^2+z+x*z/2+e, e~N(0, 1) ####### library(qgam) set.seed(15560) n <- 1000 x <- rnorm(n, 0, 1); z <- rnorm(n) X <- cbind(1, x, x^2, z, x*z) beta <- c(0, 1, 1, 1, 0.5) y <- drop(X %*% beta) + rnorm(n) dataf <- data.frame(cbind(y, x, z)) names(dataf) <- c("y", "x", "z") #### Fit a constant model for median qu <- 0.5 fit <- qgam(y~1, qu = qu, data = dataf) # Look at what happens along x: clearly there is non linear pattern here cqcheckI(obj = fit, v = c("x"), X = dataf, y = y) #### Add a smooth for x fit <- qgam(y~s(x), qu = qu, data = dataf) cqcheckI(obj = fit, v = c("x"), X = dataf, y = y) # Better! # Lets look across across x and z. As we move along z (x2 in the plot) # the colour changes from green to red cqcheckI(obj = fit, v = c("x", "z"), X = dataf, y = y) # The effect look pretty linear cqcheckI(obj = fit, v = c("z"), X = dataf, y = y) #### Lets add a linear effect for z fit <- qgam(y~s(x)+z, qu = qu, data = dataf) # Looks better! cqcheckI(obj = fit, v = c("z")) # Lets look across x and y again: green prevails on the top-left to bottom-right # diagonal, while the other diagonal is mainly red. cqcheckI(obj = fit, v = c("x", "z")) ### Maybe adding an interaction would help? fit <- qgam(y~s(x)+z+I(x*z), qu = qu, data = dataf) # It does! The real model is: y ~ 1 + x + x^2 + z + x*z/2 + e, e ~ N(0, 1) cqcheckI(obj = fit, v = c("x", "z")) ## End(Not run)
## Not run: ####### # Example 1: Bivariate additive model y~1+x+x^2+z+x*z/2+e, e~N(0, 1) ####### library(qgam) set.seed(15560) n <- 1000 x <- rnorm(n, 0, 1); z <- rnorm(n) X <- cbind(1, x, x^2, z, x*z) beta <- c(0, 1, 1, 1, 0.5) y <- drop(X %*% beta) + rnorm(n) dataf <- data.frame(cbind(y, x, z)) names(dataf) <- c("y", "x", "z") #### Fit a constant model for median qu <- 0.5 fit <- qgam(y~1, qu = qu, data = dataf) # Look at what happens along x: clearly there is non linear pattern here cqcheckI(obj = fit, v = c("x"), X = dataf, y = y) #### Add a smooth for x fit <- qgam(y~s(x), qu = qu, data = dataf) cqcheckI(obj = fit, v = c("x"), X = dataf, y = y) # Better! # Lets look across across x and z. As we move along z (x2 in the plot) # the colour changes from green to red cqcheckI(obj = fit, v = c("x", "z"), X = dataf, y = y) # The effect look pretty linear cqcheckI(obj = fit, v = c("z"), X = dataf, y = y) #### Lets add a linear effect for z fit <- qgam(y~s(x)+z, qu = qu, data = dataf) # Looks better! cqcheckI(obj = fit, v = c("z")) # Lets look across x and y again: green prevails on the top-left to bottom-right # diagonal, while the other diagonal is mainly red. cqcheckI(obj = fit, v = c("x", "z")) ### Maybe adding an interaction would help? fit <- qgam(y~s(x)+z+I(x*z), qu = qu, data = dataf) # It does! The real model is: y ~ 1 + x + x^2 + z + x*z/2 + e, e ~ N(0, 1) cqcheckI(obj = fit, v = c("x", "z")) ## End(Not run)
The elf
family implements the Extended log-F density of Fasiolo et al. (2017) and it is supposed
to work in conjuction with the extended GAM methods of Wood et al. (2017), implemented by
mgcv
. It differs from the elflss
family, because here the scale of the density (sigma, aka the learning rate) is a single scalar,
while in elflss
it can depend on the covariates. At the moment the family is mainly intended for internal use,
use the qgam
function to fit quantile GAMs based on ELF.
elf(theta = NULL, link = "identity", qu, co)
elf(theta = NULL, link = "identity", qu, co)
theta |
a scalar representing the log-scale log(sigma). |
link |
the link function between the linear predictor and the quantile location. |
qu |
parameter in (0, 1) representing the chosen quantile. For instance, to fit the median choose |
co |
positive constant used to determine parameter lambda of the ELF density (lambda = co / sigma). Can be vector valued. |
This function is meant for internal use only.
An object inheriting from mgcv's class extended.family
.
Matteo Fasiolo <[email protected]> and Simon N. Wood.
Fasiolo, M., Wood, S.N., Zaffran, M., Nedellec, R. and Goude, Y., 2020. Fast calibrated additive quantile regression. Journal of the American Statistical Association (to appear). https://www.tandfonline.com/doi/full/10.1080/01621459.2020.1725521.
Wood, Simon N., Pya, N. and Safken, B. (2017). Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association.
library(qgam) set.seed(2) dat <- gamSim(1,n=400,dist="normal",scale=2) # Fit median using elf directly: FAST BUT NOT RECOMMENDED fit <- gam(y~s(x0)+s(x1)+s(x2)+s(x3), family = elf(co = 0.1, qu = 0.5), data = dat) plot(fit, scale = FALSE, pages = 1) # Using qgam: RECOMMENDED fit <- qgam(y~s(x0)+s(x1)+s(x2)+s(x3), data=dat, qu = 0.8) plot(fit, scale = FALSE, pages = 1)
library(qgam) set.seed(2) dat <- gamSim(1,n=400,dist="normal",scale=2) # Fit median using elf directly: FAST BUT NOT RECOMMENDED fit <- gam(y~s(x0)+s(x1)+s(x2)+s(x3), family = elf(co = 0.1, qu = 0.5), data = dat) plot(fit, scale = FALSE, pages = 1) # Using qgam: RECOMMENDED fit <- qgam(y~s(x0)+s(x1)+s(x2)+s(x3), data=dat, qu = 0.8) plot(fit, scale = FALSE, pages = 1)
The elflss
family implements the Extended log-F (ELF) density of Fasiolo et al. (2017) and it is supposed
to work in conjuction with the general GAM fitting methods of Wood et al. (2017), implemented by
mgcv
. It differs from the elf
family, because here the scale of the density
(sigma, aka the learning rate) can depend of the covariates, while in
while in elf
it is a single scalar. NB this function was use within the qgam
function, but
since qgam
version 1.3 quantile models with varying learning rate are fitted using different methods
(a parametric location-scale model, see Fasiolo et al. (2017) for details.).
elflss(link = list("identity", "log"), qu, co, theta, remInter = TRUE)
elflss(link = list("identity", "log"), qu, co, theta, remInter = TRUE)
link |
vector of two characters indicating the link function for the quantile location and for the log-scale. |
qu |
parameter in (0, 1) representing the chosen quantile. For instance, to fit the median choose |
co |
positive vector of constants used to determine parameter lambda of the ELF density (lambda = co / sigma). |
theta |
a scalar representing the intercept of the model for the log-scale log(sigma). |
remInter |
if TRUE the intercept of the log-scale model is removed. |
This function is meant for internal use only.
An object inheriting from mgcv's class general.family
.
Matteo Fasiolo <[email protected]> and Simon N. Wood.
Fasiolo, M., Wood, S.N., Zaffran, M., Nedellec, R. and Goude, Y., 2020. Fast calibrated additive quantile regression. Journal of the American Statistical Association (to appear). https://www.tandfonline.com/doi/full/10.1080/01621459.2020.1725521.
Wood, Simon N., Pya, N. and Safken, B. (2017). Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association.
## Not run: set.seed(651) n <- 1000 x <- seq(-4, 3, length.out = n) X <- cbind(1, x, x^2) beta <- c(0, 1, 1) sigma = 1.2 + sin(2*x) f <- drop(X %*% beta) dat <- f + rnorm(n, 0, sigma) dataf <- data.frame(cbind(dat, x)) names(dataf) <- c("y", "x") # Fit median using elflss directly: NOT RECOMMENDED fit <- gam(list(y~s(x, bs = "cr"), ~ s(x, bs = "cr")), family = elflss(theta = 0, co = rep(0.2, n), qu = 0.5), data = dataf) plot(x, dat, col = "grey", ylab = "y") tmp <- predict(fit, se = TRUE) lines(x, tmp$fit[ , 1]) lines(x, tmp$fit[ , 1] + 3 * tmp$se.fit[ , 1], col = 2) lines(x, tmp$fit[ , 1] - 3 * tmp$se.fit[ , 1], col = 2) ## End(Not run)
## Not run: set.seed(651) n <- 1000 x <- seq(-4, 3, length.out = n) X <- cbind(1, x, x^2) beta <- c(0, 1, 1) sigma = 1.2 + sin(2*x) f <- drop(X %*% beta) dat <- f + rnorm(n, 0, sigma) dataf <- data.frame(cbind(dat, x)) names(dataf) <- c("y", "x") # Fit median using elflss directly: NOT RECOMMENDED fit <- gam(list(y~s(x, bs = "cr"), ~ s(x, bs = "cr")), family = elflss(theta = 0, co = rep(0.2, n), qu = 0.5), data = dataf) plot(x, dat, col = "grey", ylab = "y") tmp <- predict(fit, se = TRUE) lines(x, tmp$fit[ , 1]) lines(x, tmp$fit[ , 1] + 3 * tmp$se.fit[ , 1], col = 2) lines(x, tmp$fit[ , 1] - 3 * tmp$se.fit[ , 1], col = 2) ## End(Not run)
Calculates log(1+exp(x))
in a numerically stable fashion.
log1pexp(x)
log1pexp(x)
x |
a numeric vector. |
We follow the recipe of Machler (2012), that is formula (10) page 7.
A numeric vector where the i-th entry is equal to log(1+exp(x[i]))
, but computed more stably.
Matteo Fasiolo <[email protected]>.
Machler, M. (2012). Accurately computing log(1-exp(-|a|)). URL: https://cran.r-project.org/package=Rmpfr/vignettes/log1mexp-note.pdf.
set.seed(141) library(qgam); x <- rnorm(100, 0, 100) log1pexp(x) - log1p(exp(x))
set.seed(141) library(qgam); x <- rnorm(100, 0, 100) log1pexp(x) - log1p(exp(x))
This function fits a smooth additive regression model to several quantiles.
mqgam( form, data, qu, discrete = FALSE, lsig = NULL, err = NULL, multicore = !is.null(cluster), cluster = NULL, ncores = detectCores() - 1, paropts = list(), control = list(), argGam = NULL )
mqgam( form, data, qu, discrete = FALSE, lsig = NULL, err = NULL, multicore = !is.null(cluster), cluster = NULL, ncores = detectCores() - 1, paropts = list(), control = list(), argGam = NULL )
form |
A GAM formula, or a list of formulae. See ?mgcv::gam details. |
data |
A data frame or list containing the model response variable and covariates required by the formula. By default the variables are taken from environment(formula): typically the environment from which gam is called. |
qu |
A vectors of quantiles of interest. Each entry should be in (0, 1). |
discrete |
If TRUE then covariate discretisation is used for faster model fitting. See |
lsig |
The value of the log learning rate used to create the Gibbs posterior. By defauls |
err |
An upper bound on the error of the estimated quantile curve. Should be in (0, 1). If it is a vector, it should be of the
same length of |
multicore |
If TRUE the calibration will happen in parallel. |
cluster |
An object of class |
ncores |
Number of cores used. Relevant if |
paropts |
a list of additional options passed into the foreach function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the .export and .packages arguments to supply them so that all cluster nodes have the correct environment set up for computing. |
control |
A list of control parameters. The only one relevant here is |
argGam |
A list of parameters to be passed to |
A list with entries:
fit
= a gamObject
, one for each entry of qu
. Notice that the
slots model
and smooth
of each object has been removed to save memory.
See ?gamObject
.
model
= the model
slot of the gamObject
s in the fit
slot. This is the same for every
fit, hence only one copy is stored.
smooth
= the smooth
slot of the gamObject
s in the fit
slot. This is the same for every
fit, hence only one copy is stored.
calibr
= a list which is the output of an internal call to tuneLearnFast
, which is used for calibrating
the learning rate. See ?tuneLearnFast
for details.
Matteo Fasiolo <[email protected]>.
Fasiolo, M., Wood, S.N., Zaffran, M., Nedellec, R. and Goude, Y., 2020. Fast calibrated additive quantile regression. Journal of the American Statistical Association (to appear). https://www.tandfonline.com/doi/full/10.1080/01621459.2020.1725521.
Fasiolo, M., Wood, S.N., Zaffran, M., Nedellec, R. and Goude, Y., 2021. qgam: Bayesian Nonparametric Quantile Regression Modeling in R. Journal of Statistical Software, 100(9), 1-31, doi:10.18637/jss.v100.i09.
##### # Multivariate Gaussian example #### library(qgam) set.seed(2) dat <- gamSim(1, n=300, dist="normal", scale=2) fit <- mqgam(y~s(x0)+s(x1)+s(x2)+s(x3), data=dat, qu = c(0.2, 0.8)) invisible( qdo(fit, 0.2, plot, pages = 1) ) ##### # Univariate "car" example #### library(qgam); library(MASS) # Fit for quantile 0.8 using the best sigma quSeq <- c(0.2, 0.4, 0.6, 0.8) set.seed(6436) fit <- mqgam(accel~s(times, k=20, bs="ad"), data = mcycle, qu = quSeq) # Plot the fit xSeq <- data.frame(cbind("accel" = rep(0, 1e3), "times" = seq(2, 58, length.out = 1e3))) plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80)) for(iq in quSeq){ pred <- qdo(fit, iq, predict, newdata = xSeq) lines(xSeq$times, pred, col = 2) }
##### # Multivariate Gaussian example #### library(qgam) set.seed(2) dat <- gamSim(1, n=300, dist="normal", scale=2) fit <- mqgam(y~s(x0)+s(x1)+s(x2)+s(x3), data=dat, qu = c(0.2, 0.8)) invisible( qdo(fit, 0.2, plot, pages = 1) ) ##### # Univariate "car" example #### library(qgam); library(MASS) # Fit for quantile 0.8 using the best sigma quSeq <- c(0.2, 0.4, 0.6, 0.8) set.seed(6436) fit <- mqgam(accel~s(times, k=20, bs="ad"), data = mcycle, qu = quSeq) # Plot the fit xSeq <- data.frame(cbind("accel" = rep(0, 1e3), "times" = seq(2, 58, length.out = 1e3))) plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80)) for(iq in quSeq){ pred <- qdo(fit, iq, predict, newdata = xSeq) lines(xSeq$times, pred, col = 2) }
Evaluates the pinball loss.
pinLoss(y, mu, qu, add = TRUE)
pinLoss(y, mu, qu, add = TRUE)
y |
points at which the loss is evaluated. |
mu |
location parameter of the pinball loss. |
qu |
quantile level of the loss. |
add |
if TRUE the losses at which quantile level will be added up. |
A numeric vector or matrix of evaluate losses.
Matteo Fasiolo <[email protected]>.
n <- 1000 x <- seq(0, 4, length.out = n) plot(x, pinLoss(x, rep(2, n), qu = 0.9, add = FALSE), type = 'l', ylab = "loss")
n <- 1000 x <- seq(0, 4, length.out = n) plot(x, pinLoss(x, rep(2, n), qu = 0.9, add = FALSE), type = 'l', ylab = "loss")
mqgam
Contrary to qgam
, mqgam
does not output a standard gamObject
, hence
methods such as predict.gam
or plot.gam
cannot be used directly. qdo
provides a simple wrapper for such methods.
qdo(obj, qu = NULL, fun = I, ...)
qdo(obj, qu = NULL, fun = I, ...)
obj |
the output of a |
qu |
A vector whose elements must be in (0, 1). Each element indicates a quantile of interest,
which should be an element of |
fun |
The method or function that we want to use on the |
... |
Additional arguments to be passed to |
A list where the i-th entry is the output of fun
(whatever that is) corresponding to quantile qu[i]
.
Matteo Fasiolo <[email protected]>.
library(qgam); library(MASS) quSeq <- c(0.4, 0.6) set.seed(737) fit <- mqgam(accel~s(times, k=20, bs="ad"), data = mcycle, qu = quSeq) qdo(fit, 0.4, summary) invisible(qdo(fit, 0.4, plot, pages = 1)) # Return the object for qu = 0.6 and then plot it tmp <- qdo(fit, 0.6) plot(tmp)
library(qgam); library(MASS) quSeq <- c(0.4, 0.6) set.seed(737) fit <- mqgam(accel~s(times, k=20, bs="ad"), data = mcycle, qu = quSeq) qdo(fit, 0.4, summary) invisible(qdo(fit, 0.4, plot, pages = 1)) # Return the object for qu = 0.6 and then plot it tmp <- qdo(fit, 0.6) plot(tmp)
This function fits a smooth additive regression model for a single quantile.
qgam( form, data, qu, discrete = FALSE, lsig = NULL, err = NULL, multicore = !is.null(cluster), cluster = NULL, ncores = detectCores() - 1, paropts = list(), control = list(), argGam = NULL )
qgam( form, data, qu, discrete = FALSE, lsig = NULL, err = NULL, multicore = !is.null(cluster), cluster = NULL, ncores = detectCores() - 1, paropts = list(), control = list(), argGam = NULL )
form |
A GAM formula, or a list of formulae. See ?mgcv::gam details. |
data |
A data frame or list containing the model response variable and covariates required by the formula. By default the variables are taken from environment(formula): typically the environment from which gam is called. |
qu |
The quantile of interest. Should be in (0, 1). |
discrete |
If TRUE then covariate discretisation is used for faster model fitting. See |
lsig |
The value of the log learning rate used to create the Gibbs posterior. By defauls |
err |
An upper bound on the error of the estimated quantile curve. Should be in (0, 1).
Since qgam v1.3 it is selected automatically, using the methods of Fasiolo et al. (2017).
The old default was |
multicore |
If TRUE the calibration will happen in parallel. |
cluster |
An object of class |
ncores |
Number of cores used. Relevant if |
paropts |
a list of additional options passed into the foreach function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the .export and .packages arguments to supply them so that all cluster nodes have the correct environment set up for computing. |
control |
A list of control parameters. The only one relevant here is |
argGam |
A list of parameters to be passed to |
A gamObject
. See ?gamObject
.
Matteo Fasiolo <[email protected]>.
Fasiolo, M., Wood, S.N., Zaffran, M., Nedellec, R. and Goude, Y., 2020. Fast calibrated additive quantile regression. Journal of the American Statistical Association (to appear). https://www.tandfonline.com/doi/full/10.1080/01621459.2020.1725521.
Fasiolo, M., Wood, S.N., Zaffran, M., Nedellec, R. and Goude, Y., 2021. qgam: Bayesian Nonparametric Quantile Regression Modeling in R. Journal of Statistical Software, 100(9), 1-31, doi:10.18637/jss.v100.i09.
##### # Univariate "car" example #### library(qgam); library(MASS) # Fit for quantile 0.5 using the best sigma set.seed(6436) fit <- qgam(accel~s(times, k=20, bs="ad"), data = mcycle, qu = 0.5) # Plot the fit xSeq <- data.frame(cbind("accel" = rep(0, 1e3), "times" = seq(2, 58, length.out = 1e3))) pred <- predict(fit, newdata = xSeq, se=TRUE) plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80)) lines(xSeq$times, pred$fit, lwd = 1) lines(xSeq$times, pred$fit + 2*pred$se.fit, lwd = 1, col = 2) lines(xSeq$times, pred$fit - 2*pred$se.fit, lwd = 1, col = 2) ## Not run: # You can get a better fit by letting the learning rate change with "accel" # For instance fit <- qgam(list(accel ~ s(times, k=20, bs="ad"), ~ s(times)), data = mcycle, qu = 0.8) pred <- predict(fit, newdata = xSeq, se=TRUE) plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80)) lines(xSeq$times, pred$fit, lwd = 1) lines(xSeq$times, pred$fit + 2*pred$se.fit, lwd = 1, col = 2) lines(xSeq$times, pred$fit - 2*pred$se.fit, lwd = 1, col = 2) ## End(Not run) ##### # Multivariate Gaussian example #### library(qgam) set.seed(2) dat <- gamSim(1,n=400,dist="normal",scale=2) fit <- qgam(y~s(x0)+s(x1)+s(x2)+s(x3), data=dat, qu = 0.5) plot(fit, scale = FALSE, pages = 1) ###### # Heteroscedastic example ###### ## Not run: set.seed(651) n <- 2000 x <- seq(-4, 3, length.out = n) X <- cbind(1, x, x^2) beta <- c(0, 1, 1) sigma = 1.2 + sin(2*x) f <- drop(X %*% beta) dat <- f + rnorm(n, 0, sigma) dataf <- data.frame(cbind(dat, x)) names(dataf) <- c("y", "x") fit <- qgam(list(y~s(x, k = 30, bs = "cr"), ~ s(x, k = 30, bs = "cr")), data = dataf, qu = 0.95) plot(x, dat, col = "grey", ylab = "y") tmp <- predict(fit, se = TRUE) lines(x, tmp$fit) lines(x, tmp$fit + 2 * tmp$se.fit, col = 2) lines(x, tmp$fit - 2 * tmp$se.fit, col = 2) ## End(Not run)
##### # Univariate "car" example #### library(qgam); library(MASS) # Fit for quantile 0.5 using the best sigma set.seed(6436) fit <- qgam(accel~s(times, k=20, bs="ad"), data = mcycle, qu = 0.5) # Plot the fit xSeq <- data.frame(cbind("accel" = rep(0, 1e3), "times" = seq(2, 58, length.out = 1e3))) pred <- predict(fit, newdata = xSeq, se=TRUE) plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80)) lines(xSeq$times, pred$fit, lwd = 1) lines(xSeq$times, pred$fit + 2*pred$se.fit, lwd = 1, col = 2) lines(xSeq$times, pred$fit - 2*pred$se.fit, lwd = 1, col = 2) ## Not run: # You can get a better fit by letting the learning rate change with "accel" # For instance fit <- qgam(list(accel ~ s(times, k=20, bs="ad"), ~ s(times)), data = mcycle, qu = 0.8) pred <- predict(fit, newdata = xSeq, se=TRUE) plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80)) lines(xSeq$times, pred$fit, lwd = 1) lines(xSeq$times, pred$fit + 2*pred$se.fit, lwd = 1, col = 2) lines(xSeq$times, pred$fit - 2*pred$se.fit, lwd = 1, col = 2) ## End(Not run) ##### # Multivariate Gaussian example #### library(qgam) set.seed(2) dat <- gamSim(1,n=400,dist="normal",scale=2) fit <- qgam(y~s(x0)+s(x1)+s(x2)+s(x3), data=dat, qu = 0.5) plot(fit, scale = FALSE, pages = 1) ###### # Heteroscedastic example ###### ## Not run: set.seed(651) n <- 2000 x <- seq(-4, 3, length.out = n) X <- cbind(1, x, x^2) beta <- c(0, 1, 1) sigma = 1.2 + sin(2*x) f <- drop(X %*% beta) dat <- f + rnorm(n, 0, sigma) dataf <- data.frame(cbind(dat, x)) names(dataf) <- c("y", "x") fit <- qgam(list(y~s(x, k = 30, bs = "cr"), ~ s(x, k = 30, bs = "cr")), data = dataf, qu = 0.95) plot(x, dat, col = "grey", ylab = "y") tmp <- predict(fit, se = TRUE) lines(x, tmp$fit) lines(x, tmp$fit + 2 * tmp$se.fit, col = 2) lines(x, tmp$fit - 2 * tmp$se.fit, col = 2) ## End(Not run)
Calculates the sigmoid function and its derivatives.
sigmoid(y, deriv = FALSE)
sigmoid(y, deriv = FALSE)
y |
a numeric vector. |
deriv |
if |
If deriv==FALSE
, it returns a numeric vector equal to 1/(1+exp(-x))
. If
deriv==TRUE
it returns a list where the slot $D0
contains 1/(1+exp(-x))
,
while $D1
, $D2
and $D3
contain its first three derivatives.
Matteo Fasiolo <[email protected]>.
library(qgam) set.seed(90) h <- 1e-6 p <- rnorm(1e4, 0, 1e6) sigmoid(p[1:50]) - 1/(1+exp(-p[1:50])) ##### Testing sigmoid derivatives e1 <- abs((sigmoid(p+h) - sigmoid(p-h)) / (2*h) - sigmoid(p, TRUE)[["D1"]]) / (2*h) e2 <- abs((sigmoid(p+h, TRUE)$D1 - sigmoid(p-h, TRUE)$D1) / (2*h) - sigmoid(p, TRUE)[["D2"]]) / (2*h) e3 <- abs((sigmoid(p+h, TRUE)$D2 - sigmoid(p-h, TRUE)$D2) / (2*h) - sigmoid(p, TRUE)[["D3"]]) / (2*h) if( any(c(e1, e2, e3) > 1) ) stop("Sigmoid derivatives are not estimated accurately")
library(qgam) set.seed(90) h <- 1e-6 p <- rnorm(1e4, 0, 1e6) sigmoid(p[1:50]) - 1/(1+exp(-p[1:50])) ##### Testing sigmoid derivatives e1 <- abs((sigmoid(p+h) - sigmoid(p-h)) / (2*h) - sigmoid(p, TRUE)[["D1"]]) / (2*h) e2 <- abs((sigmoid(p+h, TRUE)$D1 - sigmoid(p-h, TRUE)$D1) / (2*h) - sigmoid(p, TRUE)[["D2"]]) / (2*h) e3 <- abs((sigmoid(p+h, TRUE)$D2 - sigmoid(p-h, TRUE)$D2) / (2*h) - sigmoid(p, TRUE)[["D3"]]) / (2*h) if( any(c(e1, e2, e3) > 1) ) stop("Sigmoid derivatives are not estimated accurately")
The learning rate (sigma) of the Gibbs posterior is tuned either by calibrating the credible intervals for the fitted curve, or by minimizing the pinball loss on out-of-sample data. This is done by bootrapping or by k-fold cross-validation. Here the calibration loss function is evaluated on a grid of values provided by the user.
tuneLearn( form, data, lsig, qu, discrete = FALSE, err = NULL, multicore = !is.null(cluster), cluster = NULL, ncores = detectCores() - 1, paropts = list(), control = list(), argGam = NULL )
tuneLearn( form, data, lsig, qu, discrete = FALSE, err = NULL, multicore = !is.null(cluster), cluster = NULL, ncores = detectCores() - 1, paropts = list(), control = list(), argGam = NULL )
form |
A GAM formula, or a list of formulae. See ?mgcv::gam details. |
data |
A data frame or list containing the model response variable and covariates required by the formula. By default the variables are taken from environment(formula): typically the environment from which gam is called. |
lsig |
A vector of value of the log learning rate (log(sigma)) over which the calibration loss function is evaluated. |
qu |
The quantile of interest. Should be in (0, 1). |
discrete |
If TRUE then covariate discretisation is used for faster model fitting. See |
err |
An upper bound on the error of the estimated quantile curve. Should be in (0, 1).
Since qgam v1.3 it is selected automatically, using the methods of Fasiolo et al. (2017).
The old default was |
multicore |
If TRUE the calibration will happen in parallel. |
cluster |
An object of class |
ncores |
Number of cores used. Relevant if |
paropts |
a list of additional options passed into the foreach function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the .export and .packages arguments to supply them so that all cluster nodes have the correct environment set up for computing. |
control |
A list of control parameters for
|
argGam |
A list of parameters to be passed to |
A list with entries:
lsig
= the value of log(sigma) resulting in the lowest loss.
loss
= vector containing the value of the calibration loss function corresponding
to each value of log(sigma).
edf
= a matrix where the first colums contain the log(sigma) sequence, and the remaining
columns contain the corresponding effective degrees of freedom of each smooth.
convProb
= a logical vector indicating, for each value of log(sigma), whether the outer
optimization which estimates the smoothing parameters has encountered convergence issues.
FALSE
means no problem.
Matteo Fasiolo <[email protected]>.
Fasiolo, M., Wood, S.N., Zaffran, M., Nedellec, R. and Goude, Y., 2020. Fast calibrated additive quantile regression. Journal of the American Statistical Association (to appear). https://www.tandfonline.com/doi/full/10.1080/01621459.2020.1725521.
library(qgam); library(MASS) # Calibrate learning rate on a grid set.seed(41444) sigSeq <- seq(1.5, 5, length.out = 10) closs <- tuneLearn(form = accel~s(times,k=20,bs="ad"), data = mcycle, lsig = sigSeq, qu = 0.5) plot(sigSeq, closs$loss, type = "b", ylab = "Calibration Loss", xlab = "log(sigma)") # Pick best log-sigma best <- sigSeq[ which.min(closs$loss) ] abline(v = best, lty = 2) # Fit using the best sigma fit <- qgam(accel~s(times,k=20,bs="ad"), data = mcycle, qu = 0.5, lsig = best) summary(fit) pred <- predict(fit, se=TRUE) plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80)) lines(mcycle$times, pred$fit, lwd = 1) lines(mcycle$times, pred$fit + 2*pred$se.fit, lwd = 1, col = 2) lines(mcycle$times, pred$fit - 2*pred$se.fit, lwd = 1, col = 2)
library(qgam); library(MASS) # Calibrate learning rate on a grid set.seed(41444) sigSeq <- seq(1.5, 5, length.out = 10) closs <- tuneLearn(form = accel~s(times,k=20,bs="ad"), data = mcycle, lsig = sigSeq, qu = 0.5) plot(sigSeq, closs$loss, type = "b", ylab = "Calibration Loss", xlab = "log(sigma)") # Pick best log-sigma best <- sigSeq[ which.min(closs$loss) ] abline(v = best, lty = 2) # Fit using the best sigma fit <- qgam(accel~s(times,k=20,bs="ad"), data = mcycle, qu = 0.5, lsig = best) summary(fit) pred <- predict(fit, se=TRUE) plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80)) lines(mcycle$times, pred$fit, lwd = 1) lines(mcycle$times, pred$fit + 2*pred$se.fit, lwd = 1, col = 2) lines(mcycle$times, pred$fit - 2*pred$se.fit, lwd = 1, col = 2)
The learning rate (sigma) of the Gibbs posterior is tuned either by calibrating the credible intervals for the fitted curve, or by minimizing the pinball loss on out-of-sample data. This is done by bootrapping or by k-fold cross-validation. Here the loss function is minimized, for each quantile, using a Brent search.
tuneLearnFast( form, data, qu, discrete = FALSE, err = NULL, multicore = !is.null(cluster), cluster = NULL, ncores = detectCores() - 1, paropts = list(), control = list(), argGam = NULL )
tuneLearnFast( form, data, qu, discrete = FALSE, err = NULL, multicore = !is.null(cluster), cluster = NULL, ncores = detectCores() - 1, paropts = list(), control = list(), argGam = NULL )
form |
A GAM formula, or a list of formulae. See ?mgcv::gam details. |
data |
A data frame or list containing the model response variable and covariates required by the formula. By default the variables are taken from environment(formula): typically the environment from which gam is called. |
qu |
The quantile of interest. Should be in (0, 1). |
discrete |
If TRUE then covariate discretisation is used for faster model fitting. See |
err |
An upper bound on the error of the estimated quantile curve. Should be in (0, 1).
Since qgam v1.3 it is selected automatically, using the methods of Fasiolo et al. (2017).
The old default was |
multicore |
If TRUE the calibration will happen in parallel. |
cluster |
An object of class |
ncores |
Number of cores used. Relevant if |
paropts |
a list of additional options passed into the foreach function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the .export and .packages arguments to supply them so that all cluster nodes have the correct environment set up for computing. |
control |
A list of control parameters for
|
argGam |
A list of parameters to be passed to |
A list with entries:
lsig
= a vector containing the values of log(sigma) that minimize the loss function,
for each quantile.
err
= the error bound used for each quantile. Generally each entry is identical to the
argument err
, but in some cases the function increases it to enhance stability.
ranges
= the search ranges by the Brent algorithm to find log-sigma, for each quantile.
store
= a list, where the i-th entry is a matrix containing all the locations (1st row) at which
the loss function has been evaluated and its value (2nd row), for the i-th quantile.
final_fit
= a list, where the i-th entry is a list with the estimated conditional quantile
(mustart
), smoothing parameters (in.out$sp
) and scale parameter
(in.out$scale
, should always be equal to 1) at the optimal value of log(sigma).
Matteo Fasiolo <[email protected]>.
Fasiolo, M., Wood, S.N., Zaffran, M., Nedellec, R. and Goude, Y., 2020. Fast calibrated additive quantile regression. Journal of the American Statistical Association (to appear). https://www.tandfonline.com/doi/full/10.1080/01621459.2020.1725521.
library(qgam); library(MASS) ### # Single quantile fit ### # Calibrate learning rate on a grid set.seed(5235) tun <- tuneLearnFast(form = accel~s(times,k=20,bs="ad"), data = mcycle, qu = 0.2) # Fit for quantile 0.2 using the best sigma fit <- qgam(accel~s(times, k=20, bs="ad"), data = mcycle, qu = 0.2, lsig = tun$lsig) pred <- predict(fit, se=TRUE) plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80)) lines(mcycle$times, pred$fit, lwd = 1) lines(mcycle$times, pred$fit + 2*pred$se.fit, lwd = 1, col = 2) lines(mcycle$times, pred$fit - 2*pred$se.fit, lwd = 1, col = 2) ### # Multiple quantile fits ### # Calibrate learning rate on a grid quSeq <- c(0.25, 0.5, 0.75) set.seed(5235) tun <- tuneLearnFast(form = accel~s(times, k=20, bs="ad"), data = mcycle, qu = quSeq) # Fit using estimated sigmas fit <- mqgam(accel~s(times, k=20, bs="ad"), data = mcycle, qu = quSeq, lsig = tun$lsig) # Plot fitted quantiles plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80)) for(iq in quSeq){ pred <- qdo(fit, iq, predict) lines(mcycle$times, pred, col = 2) } ## Not run: # You can get a better fit by letting the learning rate change with "accel" # For instance tun <- tuneLearnFast(form = list(accel ~ s(times, k=20, bs="ad"), ~ s(times)), data = mcycle, qu = quSeq) fit <- mqgam(list(accel ~ s(times, k=20, bs="ad"), ~ s(times)), data = mcycle, qu = quSeq, lsig = tun$lsig) # Plot fitted quantiles plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80)) for(iq in quSeq){ pred <- qdo(fit, iq, predict) lines(mcycle$times, pred, col = 2) } ## End(Not run)
library(qgam); library(MASS) ### # Single quantile fit ### # Calibrate learning rate on a grid set.seed(5235) tun <- tuneLearnFast(form = accel~s(times,k=20,bs="ad"), data = mcycle, qu = 0.2) # Fit for quantile 0.2 using the best sigma fit <- qgam(accel~s(times, k=20, bs="ad"), data = mcycle, qu = 0.2, lsig = tun$lsig) pred <- predict(fit, se=TRUE) plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80)) lines(mcycle$times, pred$fit, lwd = 1) lines(mcycle$times, pred$fit + 2*pred$se.fit, lwd = 1, col = 2) lines(mcycle$times, pred$fit - 2*pred$se.fit, lwd = 1, col = 2) ### # Multiple quantile fits ### # Calibrate learning rate on a grid quSeq <- c(0.25, 0.5, 0.75) set.seed(5235) tun <- tuneLearnFast(form = accel~s(times, k=20, bs="ad"), data = mcycle, qu = quSeq) # Fit using estimated sigmas fit <- mqgam(accel~s(times, k=20, bs="ad"), data = mcycle, qu = quSeq, lsig = tun$lsig) # Plot fitted quantiles plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80)) for(iq in quSeq){ pred <- qdo(fit, iq, predict) lines(mcycle$times, pred, col = 2) } ## Not run: # You can get a better fit by letting the learning rate change with "accel" # For instance tun <- tuneLearnFast(form = list(accel ~ s(times, k=20, bs="ad"), ~ s(times)), data = mcycle, qu = quSeq) fit <- mqgam(list(accel ~ s(times, k=20, bs="ad"), ~ s(times)), data = mcycle, qu = quSeq, lsig = tun$lsig) # Plot fitted quantiles plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80)) for(iq in quSeq){ pred <- qdo(fit, iq, predict) lines(mcycle$times, pred, col = 2) } ## End(Not run)
Dataset on UK electricity demand, taken from the national grid (https://www.nationalgrid.com/).
data(UKload)
data(UKload)
UKload
contains the following variables:
net electricity demand between 11:30am and 12am.
instantaneous temperature, averaged over several English cities.
exponential smooth of wM
, that is wM_s95[i] = a*wM_s95[i-1] + (1-a)*wM[i]
with a=0.95
.
periodic index in [0, 1]
indicating the position along the year.
factor variable indicating the day of the week.
progressive counter, useful for defining the long term trend.
lagged version of NetDemand
, that is NetDemand.48[i] = NetDemand[i-2]
.
binary variable indicating holidays.
should be obvious.
should be obvious.
See Fasiolo et al. (2017) for details.
matrix of replicate data series
Fasiolo, M., Goude, Y., Nedellec, R. and Wood, S. N. (2017). Fast calibrated additive quantile regression. Available at https://arxiv.org/abs/1707.03307.
library(qgam) data(UKload) plot(UKload$NetDemand, type = 'l')
library(qgam) data(UKload) plot(UKload$NetDemand, type = 'l')