| negbin {mgcv} | R Documentation |
The gam modelling function is designed to be able to use
the negbin family (a modification of MASS library negative.binomial family
by Venables and Ripley), or the nb function designed for integrated estimation of
parameter theta. θ is the parameter such that var(y) = μ + μ^2/θ, where μ = E(y).
Two approaches to estimating theta are available (with gam only):
With negbin then if ‘performance iteration’ is used for smoothing parameter estimation
(see gam), then smoothing parameters are chosen by GCV and
theta is chosen in order to ensure that the Pearson estimate of the scale
parameter is as close as possible to 1, the value that the scale parameter should have.
If ‘outer iteration’ is used for smoothing parameter selection with the nb family then
theta is estimated alongside the smoothing parameters by ML or REML.
To use the first option, set the optimizer argument of gam to "perf" (it can sometimes fail to converge).
negbin(theta = stop("'theta' must be specified"), link = "log")
nb(theta = NULL, link = "log")
theta |
Either i) a single value known value of theta or ii) two values of theta specifying the
endpoints of an interval over which to search for theta (this is an option only for |
link |
The link function: one of |
nb allows estimation of the theta parameter alongside the model smoothing parameters, but is only useable with gam (not bam or gam).
For negbin, if a single value of theta is supplied then it is always taken as the known fixed value and this is useable with bam and gamm. If theta is two
numbers (theta[2]>theta[1]) then they are taken as specifying the range of values over which to search for
the optimal theta. This option should only be used with performance iteration estimation (see gam argument optimizer), in which case the method
of estimation is to choose theta so that the GCV (Pearson) estimate
of the scale parameter is one (since the scale parameter
is one for the negative binomial). In this case theta estimation is nested within the IRLS loop
used for GAM fitting. After each call to fit an iteratively weighted additive model to the IRLS pseudodata,
the theta estimate is updated. This is done by conditioning on all components of the current GCV/Pearson
estimator of the scale parameter except theta and then searching for the
theta which equates this conditional estimator to one. The search is
a simple bisection search after an initial crude line search to bracket one. The search will
terminate at the upper boundary of the search region is a Poisson fit would have yielded an estimated
scale parameter <1.
The following negbin based approaches are now deprecated:
If outer iteration is used then theta is estimated by searching for the value yielding the lowest AIC. The search is either over the supplied array of values, or is a grid search over the supplied range, followed by a golden section search. A full fit is required for each trial theta, so the process is slow, but speed is enhanced by making the changes in theta as small as possible, from one step to the next, and using the previous smothing parameter and fitted values to start the new fit.
In a simulation test based on 800 replicates of
the first example data, given below, the GCV based (performance iteration) method yielded models with,
on avergage 6% better MSE performance than the AIC based (outer iteration) method.
theta had a 0.86 correlation coefficient between the two methods.
theta estimates averaged 3.36 with a standard deviation of 0.44 for the AIC based method and
3.22 with a standard deviation of 0.43 for the GCV based method. However the GCV based method is
less computationally reliable, failing in around 4% of replicates.
For negbin an object inheriting from class family, with additional elements
dvar |
the function giving the first derivative of the variance function w.r.t. |
d2var |
the function giving the second derivative of the variance function w.r.t. |
getTheta |
A function for retrieving the value(s) of theta. This also useful for retriving the
estimate of |
For nb an object inheriting from class extended.family.
gamm and bam do not support theta estimation
The negative binomial functions from the MASS library are no longer supported.
Simon N. Wood simon.wood@r-project.org
modified from Venables and Ripley's negative.binomial family.
Venables, B. and B.R. Ripley (2002) Modern Applied Statistics in S, Springer.
library(mgcv)
set.seed(3)
n<-400
dat <- gamSim(1,n=n)
g <- exp(dat$f/5)
## negative binomial data...
dat$y <- rnbinom(g,size=3,mu=g)
## known theta fit ...
b0 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(3),data=dat)
plot(b0,pages=1)
print(b0)
## same with theta estimation...
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=nb(),data=dat)
plot(b,pages=1)
print(b)
b$family$getTheta(TRUE) ## extract final theta estimate
## unknown theta via performance iteration...
b1 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(c(1,10)),
optimizer="perf",data=dat)
plot(b1,pages=1)
print(b1)
## another example...
set.seed(1)
f <- dat$f
f <- f - min(f)+5;g <- f^2/10
dat$y <- rnbinom(g,size=3,mu=g)
b2 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=nb(link="sqrt"),
data=dat,method="REML")
plot(b2,pages=1)
print(b2)
rm(dat)