| Title: | Data and Functions Used in Linear Models and Regression with R: An Integrated Approach |
|---|---|
| Description: | Data files and a few functions used in the book 'Linear Models and Regression with R: An Integrated Approach' by Debasis Sengupta and Sreenivas Rao Jammalamadaka (2019). |
| Authors: | Debasis Sengupta [aut], S. Rao Jammalamadaka [aut], Jinwen Qiu [aut], Kaushik Jana [cre] |
| Maintainer: | Kaushik Jana <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.3 |
| Built: | 2026-05-12 08:31:41 UTC |
| Source: | https://github.com/janakaushik/lmreg |
Air speed data, which is part of a larger data set from a designed experiment (Wilkie, 1962).
data(airspeed)data(airspeed)
A data frame with 18 observations on the following 3 variables.
PosmaxspeedThe position of highest speed of air blown down the space between a roughened rod and a smoothed pipe surrounding it. The position is defined as the distance (in inches) from the center of the rod, in excess of 1.4 inches
ReynoldsReynolds number of air flow (dimensionless)
RibhtHeight of ribs on the roughened rod (in inches)
Wilkie, D. (1962) A method of analysis of mixed level factorial experiments. Applied Statistics, pp.184-195.
data(airspeed) head(airspeed)data(airspeed) head(airspeed)
Six synthetic data sets with similar regression summary, for illustrating the importance of regression diagnostics.
data(anscombeplus)data(anscombeplus)
A data frame with 20 observations on 8 synthetic real-valued variables, labelled as x1, y1, y2, y3, y4, y5, x2, y6.
x1Explanatory variable of first five data sets
y1Response variable of first data set
y2Response variable of second data set
y3Response variable of third data set
y4Response variable of fourth data set
y5Response variable of fifth data set
x2Explanatory variable of sixth data set
y6Response variable of sixth data set
This data set is presented by Sengupta and Jammalamadaka (2019), after expanding on the ideas of Anscombe (1973)
Anscombe, F.J. (1973), Graphs in statistical analysis, American Statistician, vol.27, pp.17-21.
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach, World Scientific Publishing Co., Table 5.1.
data(anscombeplus) head(anscombeplus)data(anscombeplus) head(anscombeplus)
Apple crop volume under various ground covers underneath tree (Pearce, 1983)
data(appletree)data(appletree)
A data frame with 24 observations on the following 4 variables.
WeightTotal weight (in pounds) of apple produced in a plot in four years, post-treatment
TreatmentFive types of permanent cropping under the apple tree (coded as 1 to 5), or no cropping at all (0)
BlockBlocks coded as 1 to 4
VolumeTotal crop volume (in bushels) in four years, pre-treatment
Pearce, S.C. (1983) The Agricultural Field Experiment, Wiley, Chechester, p.284.
data(appletree) head(appletree)data(appletree) head(appletree)
Computes an orthonormal basis of the column space of a given matrix.
basis(M, tol=sqrt(.Machine$double.eps))basis(M, tol=sqrt(.Machine$double.eps))
M |
Matrix for which basis of the column space is needed. |
tol |
A relative tolerance to determine rank through qr decomposition |
Returns a semi-orthogonal matrix with columns forming an orthonormal basis of the column space of M.
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
basis(matrix(c(2,1,3,4,2,3,2,6,4,2,6,8),4,3))basis(matrix(c(2,1,3,4,2,3,2,6,4,2,6,8),4,3))
Stacks up in columns the values of all the binary variables that can be associated with different levels of a categorical variable.
binaries(x)binaries(x)
x |
A categorical variable (either numeric or character). |
The name of each new variable is of the type v.x, where x is the level of the categorical variable for which this binary variable is equal to 1.
A set of binary vectors, each having the value 1 for a unique level of x.
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
x <- c(1,2,2,3,1,1,2,3,3,2,1) binaries(x) binaries(as.factor(x))x <- c(1,2,2,3,1,1,2,3,3,2,1) binaries(x) binaries(as.factor(x))
Produces two-sided Bonferroni and Scheffe simultaneous confidence intervals, together with corresponding single confidence intervals, for any vector of estimable functions A.beta in a linear model.
cisimult(y, X, A, alpha, tol=sqrt(.Machine$double.eps))cisimult(y, X, A, alpha, tol=sqrt(.Machine$double.eps))
y |
Responese vector in linear model. |
X |
Design/model matrix or matrix containing values of explanatory variables (generally including intercept). |
A |
Coefficient matrix (A.beta is the vector for which confidence interval is needed). |
alpha |
Collective non-coverage probability of confidence intervals. |
tol |
A relative tolerance to detect zero singular values while computing generalized inverse, in case X is rank deficient (default = sqrt(.Machine$double.eps)). |
Normal distribution of response (given explanatory variables and/or factors) is assumed.
The three sets of confidence intervals listed as below:
BFCB |
Two-sided Bonferroni simultaneous confidence intervals. |
SFCB |
Two-sided Scheffe simultaneous confidence intervals. |
SNCB |
The single confidence intervals. |
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
data(denim) attach(denim) X <- cbind(1, binaries(Denim), binaries(Laundry)) A <- rbind(c(0,1,-1,0,0,0,0), c(0,1,0,-1,0,0,0), c(0,0,1,-1,0,0,0)) cisimult(Abrasion, X, A, 0.05, tol = 1e-10) detach(denim)data(denim) attach(denim) X <- cbind(1, binaries(Denim), binaries(Laundry)) A <- rbind(c(0,1,-1,0,0,0,0), c(0,1,0,-1,0,0,0), c(0,0,1,-1,0,0,0)) cisimult(Abrasion, X, A, 0.05, tol = 1e-10) detach(denim)
Computes point estimate and confidence interval for a single linear parametric function in a linear model.
cisngl(y, X, p, alpha, type, tol=sqrt(.Machine$double.eps))cisngl(y, X, p, alpha, type, tol=sqrt(.Machine$double.eps))
y |
Responese vector in linear model. |
X |
Design/model matrix or matrix containing values of explanatory variables (generally including intercept). |
p |
Coefficient vector of linear parametric function for which confidence interval is needed. |
alpha |
Non-coverage probability of confidence interval. |
type |
Type of confidence interval ("lower", "upper", "both"). |
tol |
A relative tolerance to detect zero singular values while computing generalized inverse, in case X is rank deficient (default = sqrt(.Machine$double.eps)). |
Normal distribution of response (given explanatory variables and/or factors) is assumed.
Returns a list of two objects:
estimate |
Point estimate. |
ci |
Confidence interval. |
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
library(MASS) data(birthwt) attach(birthwt) X <- cbind(1, smoke, binaries(race)) p <- c(0,1,0,0,0) cisngl(bwt, X, p, 0.05, type = "upper", tol = 1e-10) cisngl(bwt, X, p, 0.05, type = "both", tol = 1e-10) detach(birthwt)library(MASS) data(birthwt) attach(birthwt) X <- cbind(1, smoke, binaries(race)) p <- c(0,1,0,0,0) cisngl(bwt, X, p, 0.05, type = "upper", tol = 1e-10) cisngl(bwt, X, p, 0.05, type = "both", tol = 1e-10) detach(birthwt)
Computes the table of condition indices and model matrix singular vectors for a linear model.
cisv(lmobj)cisv(lmobj)
lmobj |
An object produced by lm fitting. |
Columns containing different elements of a singular vector are labelled either as (Intercept) or by the variable name.
Returns the table of condition indices and model matrix right singular vectors for the chosen model, with singular vectors appearing as rows next to the corresponding condition index. Columns containing different elements of a singular vector are labelled either as (Intercept) or by the variable name.
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
data(imf2015) lmimf <- lm(UNMP~CAB+DEBT+EXP+GDP+INFL+INV, data = imf2015) cisv(lmimf)data(imf2015) lmimf <- lm(UNMP~CAB+DEBT+EXP+GDP+INFL+INV, data = imf2015) cisv(lmimf)
Computes an orthonormal basis of the orthogonal complement of the column space of a given matrix.
compbasis(M, tol=sqrt(.Machine$double.eps))compbasis(M, tol=sqrt(.Machine$double.eps))
M |
Matrix for which basis of the orthogonal complement of the column space is needed. |
tol |
A relative tolerance to determine rank through qr decomposition |
Returns a semi-orthogonal matrix with columns forming an orthonormal basis of the orthogonal complement of the column space of M.
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
compbasis(matrix(c(3,3,3,3),2,2))compbasis(matrix(c(3,3,3,3),2,2))
Computes confidence ellipsiod for a vector of estimable functions in a linear model.
confelps(y, X, A, alpha, tol=sqrt(.Machine$double.eps))confelps(y, X, A, alpha, tol=sqrt(.Machine$double.eps))
y |
Responese vector in linear model. |
X |
Design/model matrix or matrix containing values of explanatory variables (generally including intercept). |
A |
Coefficient matrix (A.beta is the vector for which confidence interval is needed). |
alpha |
The non-coverage probability of confidence ellipsoid. |
tol |
A relative tolerance to detect zero singular values while computing generalized inverse, in case X is rank deficient (default = sqrt(.Machine$double.eps)). |
Normal distribution of response (given explanatory variables and/or factors) is assumed.
Returns a list of three objects:
CenterOfEllipse |
Center of ellipsoid. |
MatrixOfEllipse |
Matrix of ellipsoid, for describing quadratic form in terms of the vector of deviations from center of ellipsoid. |
threshold |
Upper limit of quadratic form that completes specification of ellipsoid. |
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
data(denim) attach(denim) X <- cbind(1,binaries(Denim),binaries(Laundry)) A <- rbind(c(0,1,0,-1,0,0,0),c(0,0,1,-1,0,0,0)) confelps(Abrasion, X, A, 0.05,tol=1e-12) detach(denim)data(denim) attach(denim) X <- cbind(1,binaries(Denim),binaries(Laundry)) A <- rbind(c(0,1,0,-1,0,0,0),c(0,0,1,-1,0,0,0)) confelps(Abrasion, X, A, 0.05,tol=1e-12) detach(denim)
Effects of Laundering Cycles and denim treatment on edge abrasion of denim jeans (Card et al., 2006). Data simulated to match means/SDs.
data(denim)data(denim)
A data frame with 90 observations on the following 3 variables.
LaundryThree levels of laundry cycles (1 = 0 cycle, 2 = 5 cycles, 3 = 25 cycles)
DenimThree types of denim treatments (1 = pre-washed, 2 = stone-washed, 3 = enzyme washed)
Abrasionabrasion score (lower score means higher damage)
Card, A., Moore, M.A. and Ankeny, M. (2006) Garment washed jeans: Impact of launderings on physical properties. Int. J. Clothing Sc. Tech., 18, pp.43-52.
data(denim) head(denim)data(denim) head(denim)
Across-countries median of median price ratio (MPR) of some medicines available in the private market under the generic name and the brand name of the originator (Gelders et al., 2005).
data(drugprice)data(drugprice)
A data frame with 13 observations on the following 2 variables.
DrugGeneric name of drug, a character vector
QuantityUnit for price computation, a character vector
OriginatorMPROriginator median price ratio, a numeric vector
GenericMPRGeneric median price ratio, a numeric vector
The data comes from a World Health Organization (WHO) commissioned study on variation of drug prices over a number of developing countries. For comparability, the price in a particular region is expressed as a ratio (called median price ratio or MPR) with respect to the organization's drug price indicator median values. The data reflect the across-country median of these ratios in respect of 13 medicines, most of which are in the WHO list of essential medicines.
Gelders, S., Ewen, M., Noguchi, N. and Laing R. (2005). Price, Availability and Affordability: An International Comparison of Chronic Disease Medicines, Background report prepared for the WHO Planning Meeting on the Global Initiative for Treatment of Chronic Diseases, Cairo, December 2005.
data(drugprice) head(drugprice)data(drugprice) head(drugprice)
Computes the Frobenius norm of a given matrix.
frob(M)frob(M)
M |
Matrix whose Frobenius norm is to be computed. |
A scalar value, describing the Frobenius norm (positive square root of sum of squared elements) of M.
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
frob(matrix(2,3,2))frob(matrix(2,3,2))
Prepares Analysis of Variance table for testing a general linear hypothesis in a linear model
ganova(y, X, A, xi, tol=sqrt(.Machine$double.eps))ganova(y, X, A, xi, tol=sqrt(.Machine$double.eps))
y |
Responese vector in linear model. |
X |
Design matrix or matrix containing values of explanatory variables (generally including intercept). |
A |
Coefficient matrix (A.beta = xi is the null hypothesis to be tested). |
xi |
A vector (A.beta = xi is the null hypothesis to be tested). |
tol |
A relative tolerance to detect zero singular values while computing generalized inverse, in case the model matrix is rank deficient (default = sqrt(.Machine$double.eps)). |
Returns analysis of variance table for testing A.beta = xi in the linear model with response vector y and matrix of explanatory variables/factors X.
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
data(denim) attach(denim) X <- cbind(1,binaries(Denim), binaries(Laundry)) A <- rbind(c(0,1,-1,0,0,0,0), c(0,1,0,-1,0,0,0)) xi <- c(0, 0) ganova(Abrasion, X, A, xi) detach(denim)data(denim) attach(denim) X <- cbind(1,binaries(Denim), binaries(Laundry)) A <- rbind(c(0,1,-1,0,0,0,0), c(0,1,0,-1,0,0,0)) xi <- c(0, 0) ganova(Abrasion, X, A, xi) detach(denim)
Heights of some adolescent girls, aged 7 to 12, in the southern part of Kolkata, India around the year 2008.
data(girlgrowth)data(girlgrowth)
A data frame with 905 observations on the following 2 variables.
AgeAge of girls (in years)
HeightHeight of girls (in cm)
Dasgupta (2015), Physical Growth, Body Composition and Nutritional Status of Bengali School aged Children, Adolescents and Young adults of Calcutta, India: Effects of Socioeconomic Factors on Secular Trends, Report 158, Ney-van Hoogstraten Foundation, The Netherlands.
data(girlgrowth) head(girlgrowth)data(girlgrowth) head(girlgrowth)
Prepares the Analysis of Variance table for testing adequacy of a subset model within a linear model.
hanova(lm1, lm2)hanova(lm1, lm2)
lm1 |
An lm object describing full model. |
lm2 |
An lm object describing subset model. |
Normal distribution of response (given explanatory variables and/or factors) is assumed. The program simply reformats the
output of the anova function.
Returns analysis of variance table for testing adequacy of lm2 within lm1.
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
data(birthwt) lmbw <- lm(bwt ~ smoke+factor(race), data = birthwt) lm1 <- lm(bwt ~ smoke, data = birthwt) hanova(lm1,lmbw)data(birthwt) lmbw <- lm(bwt ~ smoke+factor(race), data = birthwt) lm1 <- lm(bwt ~ smoke, data = birthwt) hanova(lm1,lmbw)
Light absorbance for positive control samples in an ELISA test for HIV (Hoaglin et al., 1991).
data(hiv)data(hiv)
A data frame with 75 observations on the following 3 variables.
AbsorbanceMeasurement of absorbance of light (dimensionless)
LotFive levels of lot
RunFive levels of run
Hoaglin, D.C., Mosteller, F. and Tukey, J.W. (1991) Fundamentals of Exploratory Analysis of Variance, Wiley, New York, p.107.
data(hiv) head(hiv)data(hiv) head(hiv)
Compressive strength and moisture content of wood in hoop trees (Williams, 1959).
data(hoop)data(hoop)
A data frame with 50 observations on the following 4 variables.
TempTemperature (in Celsius)
TreeHoop tree number
StrengthMaximum compressive strength parallel to the grain (in MPa)
MoistureMoisture content (100 times water mass/dry wood mass)
Williams, E.J. (1959) Regression Analysis, Wiley, New York.
data(hoop) head(hoop)data(hoop) head(hoop)
Reduces a general hypothesis in a linear model into a pair of completely testable and completely untestable hypotheses.
hypsplit(X, A, xi, tol=sqrt(.Machine$double.eps))hypsplit(X, A, xi, tol=sqrt(.Machine$double.eps))
X |
Design/model matrix or matrix containing values of explanatory variables (generally including intercept). |
A |
Coefficient matrix (A.beta = xi is the null hypothesis to be split). |
xi |
A vector (A.beta = xi is the null hypothesis to be tested). |
tol |
A relative tolerance to detect zero singular values while computing generalized inverse, in case X is rank deficient (default = sqrt(.Machine$double.eps)). |
A list of two objects:
testable |
Coefficient matrix and constant vector for testable part of hypotheses. |
untestable |
Coefficient matrix and constant vector for untestable part of hypotheses. |
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
data(denim) attach(denim) X <- cbind(1, binaries(Denim), binaries(Laundry)) A <- rbind(c(0,1,0,0,0,0,0), c(0,0,1,0,0,0,0), c(0,0,0,1,0,0,0)) xi <- c(0,0,0) hypotheses <- hypsplit(X, A, xi, tol=1e-13) hypotheses[[1]] # testable hypotheses[[2]] # untestable detach(denim)data(denim) attach(denim) X <- cbind(1, binaries(Denim), binaries(Laundry)) A <- rbind(c(0,1,0,0,0,0,0), c(0,0,1,0,0,0,0), c(0,0,0,1,0,0,0)) xi <- c(0,0,0) hypotheses <- hypsplit(X, A, xi, tol=1e-13) hypotheses[[1]] # testable hypotheses[[2]] # untestable detach(denim)
Carries out test of a single linear hypothesis in a linear model.
hyptest(lmobj, p, xi = 0, type = "both")hyptest(lmobj, p, xi = 0, type = "both")
lmobj |
An object produced by lm fitting. |
p |
A numeric vector containing coefficients of the linear combination of model parameters. |
xi |
A numeric variable containing hypothesized value of the linear combination of model parameters (default = 0). |
type |
A character variable indicating the type of alternative: "upper" (one-sided), "lower" (one-sided) or "both" (default, two-sided). |
It is assumed that all the model parameters are estimable and the linear model is homoscedastic and normal.
Returns the estimated value of the linear combination of model parameters, its standard error, the t-statistic, the degrees of freedom and the p-value.
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
data(lifelength) lmlife <- lm(Lifelength~factor(Category), data = lifelength) p <- c(0,0,0,1,-1,0,0,0) hyptest(lmlife, p, xi = 1, type = "upper")data(lifelength) lmlife <- lm(Lifelength~factor(Category), data = lifelength) p <- c(0,0,0,1,-1,0,0,0) hyptest(lmlife, p, xi = 1, type = "upper")
The estimated or reported figures of a number of economic variables for a few countries in the year 2015, extracted from IMF World Economic Outlook (2017)
data(imf2015)data(imf2015)
A data frame with 33 observations on the following 8 variables.
CountryCountry name, a character vector
CABCurrent account balance as % of GDP, a numeric vector
DEBTGovernmentt gross debt as % of GDP, a numeric vector
EXPGovernment total expenditure as % of GDP, a numeric vector
GDPGDP per capita, current prices in '000 US$, a numeric vector
INFLInflation, average consumer prices in %, a numeric vector
INVTotal investment as % of GDP, a numeric vector
UNMPUnemployment as % of labor force, a numeric vector
http://www.imf.org/external/pubs/ft/weo/2017/01/weodata/weoselgr.aspx.
data(imf2015) head(imf2015)data(imf2015) head(imf2015)
Computes an orthonormal basis of the intersection of column spaces of two given matrices.
intsectbasis(A, B, tol1=sqrt(.Machine$double.eps), tol2=sqrt(.Machine$double.eps))intsectbasis(A, B, tol1=sqrt(.Machine$double.eps), tol2=sqrt(.Machine$double.eps))
A |
First matrix. |
B |
Second matrix with identical number of rows. |
tol1 |
A relative tolerance to detect zero singular values while computing generalized inverse, in case the matrix concerned is rank deficient (default = sqrt(.Machine$double.eps)). |
tol2 |
A tolerance to detect if there is any non-zero singular value of a 'parallel sum' matrix, without which the intersection space is null (default = sqrt(.Machine$double.eps)). |
Returns a semi-orthogonal matrix with columns forming an orthonormal basis of the intersection of the column spaces of A and B.
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
A<-matrix(2,3,5) B<-matrix(3,3,2) intsectbasis(A,B, tol1=sqrt(.Machine$double.eps), tol2=1e-14)A<-matrix(2,3,5) B<-matrix(3,3,2) intsectbasis(A,B, tol1=sqrt(.Machine$double.eps), tol2=1e-14)
Measurements of four dimensions of flowers of three species of the plant Iris (Iris setosa, Iris versicolor, and Iris virginica).
data(Iris)data(Iris)
A data frame with 150 observations on the following 6 variables.
Species_NoSpecies number
Petal_widthPetal width (in cm)
Petal_lengthPetal length (in cm)
Sepal_widthSepal width (in cm)
Sepal_lengthSepal length (in cm)
Species_nameSpecies names: Setosa, Verginica or Versicolor, a character vector
Fisher, R.A. (1936) The use of multiple measurements in taxonomic problems. Ann. Eugenics, 7, pp.179-188.
data(Iris) head(Iris)data(Iris) head(Iris)
Checks whether column space of one matrix is a subset of the column space of another matrix.
is.included(B, A, tol1=sqrt(.Machine$double.eps), tol2=sqrt(.Machine$double.eps))is.included(B, A, tol1=sqrt(.Machine$double.eps), tol2=sqrt(.Machine$double.eps))
B |
The matrix whose column space is to be checked for being a subset. |
A |
The matrix whose column space is to be checked for being a superset. |
tol1 |
A relative tolerance to detect zero singular values while computing generalized inverse, in case A is rank deficient (default = sqrt(.Machine$double.eps)). |
tol2 |
A relative tolerance to detect whether there is sufficient closeness between B and A.ginv(A).B (default = sqrt(.Machine$double.eps)). |
A logical value (TRUE if the column space of B is contained in the column space of A).
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
A <- cbind(c(2,1,-2),c(3,1,-1)) I <- diag(1,3) is.included(A, I, tol1=sqrt(.Machine$double.eps), tol2=1e-15) is.included(I, A, tol1=1e-14, tol2=sqrt(.Machine$double.eps)) is.included(projector(A), A, tol1=1e-15, tol2=1e-14) is.included(A, projector(A))A <- cbind(c(2,1,-2),c(3,1,-1)) I <- diag(1,3) is.included(A, I, tol1=sqrt(.Machine$double.eps), tol2=1e-15) is.included(I, A, tol1=1e-14, tol2=sqrt(.Machine$double.eps)) is.included(projector(A), A, tol1=1e-15, tol2=1e-14) is.included(A, projector(A))
Computes the intercept augmented variance inflation factors for a linear model.
ivif(lmobj)ivif(lmobj)
lmobj |
An object produced by lm fitting. |
Returns the intercept augmented variance inflation factors for the model, with each VIF labelled either as (Intercept) or by the variable name.
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
data(imf2015) lmimf <- lm(UNMP~CAB+DEBT+EXP+GDP+INFL+INV, data = imf2015) ivif(lmimf)data(imf2015) lmimf <- lm(UNMP~CAB+DEBT+EXP+GDP+INFL+INV, data = imf2015) ivif(lmimf)
Measurements of an angular dimension (beta angle) found in kink bands of Daling phyllite in the Darjeeling-Sikkim Himalayas.
data(kinks)data(kinks)
A data frame with 100 observations on the following 3 variables.
betaBeta angle in kink bands (in degrees)
orderFold order (1 = main fold, 2 = sub-fold, 3,4 = sub-folds of successively higher order)
typeType of kink band (1 = conjugate, 2 = dextral, 3 = sinistral)
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach, World Scientific Publishing Co., Table 6.8.
data(kinks) head(kinks)data(kinks) head(kinks)
Monthly total counts of homicides and rapes in the city of Los Angeles from January 1975 to December 1993.
data(LAcrime)data(LAcrime)
A data frame with 228 observations on the following 7 variables.
YearYear of record
MonthMonth of record
PopulationPopulation of the city in the year of record
TempCelsiusMonthly average temperature recorded at the Los Angeles International Airport (in Celsius)
FahrenheitMonthly average temperature recorded at the Los Angeles International Airport (in Fahrenheit)
HomicideTotal count of homicides in the month and year of record
RapeTotal count of rapes in the month and year of record
The crime data: Carlson, S.M. (1998), Uniform Crime Reports: Monthly Weapon-Specific Crime and Arrest Time Series, 1975-1993, ICPSR06792-v1, Interuniversity Consortium for Political and Social Research, Ann Arbor, MI (https://www.icpsr.umich.edu/icpsrweb/NACJD/studies/6792). Temperature data for LAX (WMO ID 72295): National Oceanic and Atmospheric Administration, USA (http://www.ncdc.noaa.gov/ghcnm/v2.php)
data(LAcrime) head(LAcrime)data(LAcrime) head(LAcrime)
Pre- and post-treatment scores on abundance of leprosy for patients receiving different treatments (Senedecor and Cochran, 1967).
data(leprosy)data(leprosy)
A data frame with 30 observations on the following 3 variables.
treatmentTreatment type: A, D or F (placebo), a character vector
prePre-treatment score, a numerical vector
postPost-treatment score, a numerical vector
Snedecor, G.W. and Cochran, W.G. (1967) Statistical Methods, Iowa State University, Ames, p.421.
data(leprosy) head(leprosy)data(leprosy) head(leprosy)
William Guy's nineteenth century data on the age at death of persons belonging to different professions.
data(lifelength)data(lifelength)
A data frame with 690 observations on the following 2 variables.
CategoryCode for profession: 1 = historian, 2 = poet, 3 = painter, 4 = musician, 5 = mathematician or astronomer, 6 = chemist or natural philosopher, 7 = naturalist, 8 = engineer, architect or surveyor
LifelengthAge (in years) of deceased
Guy, W. (1859) On the duration of life as affected by the pursuits of literature, science and art. J. Statist. Soc. London, 22.
data(lifelength) head(lifelength)data(lifelength) head(lifelength)
Produces p-values of Bonferroni and Scheffe multiple comparison tests of several testable linear hypotheses.
multcomp(y, X, A, xi, tol=sqrt(.Machine$double.eps))multcomp(y, X, A, xi, tol=sqrt(.Machine$double.eps))
y |
Responese vector in linear model. |
X |
Design/model matrix or matrix containing values of explanatory variables (generally including intercept). |
A |
Coefficient matrix (A.beta=xi is the set of multiple hypotheses that has to be tested). |
xi |
A vector of values (A.beta=xi is the set of multiple hypotheses that has to be tested). |
tol |
A relative tolerance to detect zero singular values while computing generalized inverse, in case X is rank deficient (default = sqrt(.Machine$double.eps)). |
Normal distribution of response (given explanatory variables and/or factors) is assumed.
Returns F statistics and p-values of Bonferroni and Scheffe multiple comparison tests of the set of linear hypotheses. A set of five vectors:
A |
Specified coefficient matrix. |
xi |
Specified values of A.beta. |
Fstat |
Set of F-ratios for each hypothesis. |
Bonferroni.p |
Set of Bonferroni p-values for different hypotheses. |
Scheffe.p |
Set of Scheffe p-values for different hypotheses. |
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
data(denim) attach(denim) X <- cbind(1,binaries(Denim),binaries(Laundry)) A <- rbind(c(0,1,-1,0,0,0,0),c(0,1,0,-1,0,0,0),c(0,0,1,-1,0,0,0)) xi <- c(0,0,0) multcomp(Abrasion, X, A, xi, tol=1e-14) detach(denim)data(denim) attach(denim) X <- cbind(1,binaries(Denim),binaries(Laundry)) A <- rbind(c(0,1,-1,0,0,0,0),c(0,1,0,-1,0,0,0),c(0,0,1,-1,0,0,0)) xi <- c(0,0,0) multcomp(Abrasion, X, A, xi, tol=1e-14) detach(denim)
Times recorded by winners of men's olympic sprint finals in different categories from 1900 to 1988 (Lunn and McNeil, 1991).
data(olympic)data(olympic)
A data frame with 20 observations on the following 6 variables.
YearOlympic year
X100mWinner's time (in seconds) for 100 meters sprint
X200mWinner's time (in seconds) for 200 meters sprint
X400mWinner's time (in seconds) for 400 meters sprint
X800mWinner's time (in seconds) for 800 meters sprint
X1500mWinner's time (in seconds) for 1500 meters sprint
There are three missing years in the data; 1916, 1940 and 1944, when world wars prevented the olympic games from being held.
Lunn, A.D. and McNeil, D.R. (1991) Computer-Interactive Data Analysis, Wiley, Chichester.
data(olympic) head(olympic)data(olympic) head(olympic)
Survival times of animals exposed to poison and treatment (Box and Cox, 1964).
data(poison)data(poison)
A data frame with 48 observations on the following 3 variables.
SurvtimeSurvival time (in 10 hour units)
TreatmentTreatment type: 1 = treatment A, 2 = treatment B, 3 = treatment C, 4 = treatment D
PoisonPoison type: 1 = Poison I, 2 = Poison II, 3 = Poison III
Box, G.E.P. and Cox, D.R. (1964) An analysis of transformations. J. Roy. Statist. Soc. Ser. B, 26, pp.211-252.
data(poison) head(poison)data(poison) head(poison)
Computes the orthogonal projection matrix for the column space of a given matrix.
projector(M, tol=sqrt(.Machine$double.eps))projector(M, tol=sqrt(.Machine$double.eps))
M |
A matrix for which the orthogonal projection matrix is to be computed. |
tol |
A relative tolerance to detect zero singular values while computing generalized inverse, in case M is rank deficient (default = sqrt(.Machine$double.eps)). |
Returns the orthogonal projection matrix for the column space of M.
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
projector(matrix(3,3,3))projector(matrix(3,3,3))
Measurements of male Egyptian skulls from time periods ranging from 4000 BC to 150 AD.
data(skulls)data(skulls)
A data frame with 150 observations on the following 5 variables.
MBMaximal breadth (in mm)
BHBasibregmatic height (in mm)
BLBasialveolar length (in mm)
NHNasal height (in mm)
YearApproximate Year of Skull Formation (negative = B.C., positive = A.D.)
Thomson, A. and Randall-Maciver, R. (1905) Ancient Races of the Thebaid, Oxford University Press, Oxford.
data(skulls) head(skulls)data(skulls) head(skulls)
Energy absorbed by four machines for Charpy V-notch testing.
data(splett2)data(splett2)
A data frame with 99 observations on the following 2 variables.
EnergyEnergy absorbed by machine (in foot-pounds)
MachineMachine type (1 = Tinius1, 2 = Tinius2, 3 = Satec, 4 = Tokyo)
Dataplot webpage of the National Institute of Standards and Technology (NIST),
USA (https://www.itl.nist.gov/div898/software/dataplot/data/SPLETT2.DAT).
data(splett2) head(splett2)data(splett2) head(splett2)
Distance of galactic objects from Earth and their velocities (Hubble, 1929).
data(stars1)data(stars1)
A data frame with 24 observations on the following 2 variables.
DistanceDistance from Earth (in million parsec; 1 parsec = 3.26 light years)
VelocityVelocity of galaxy (in km/s)
Hubble, E. (1929) A relation between distance and radial velocity among extra galactic nebulae. Proc. Nat. Acad. Sc. 15, pp.168-73.
data(stars1) head(stars1)data(stars1) head(stars1)
Distance of additional galactic objects from Earth and their velocities (Humason, 1936).
data(stars2)data(stars2)
A data frame with 21 observations on the following 2 variables.
DistanceDistance from Earth (in million parsec; 1 parsec = 3.26 light years)
VelocityVelocity of Galaxy (in km/s)
The galactic objects in this data set are much further away from Earth than those in the data set stars1.txt. These became available within a few years of the publication of Hubble's original work, through rapid advancesment in technology. Although the new data cemented Hubble's hypothesis that distant objects have proportionately higher velocity (as they should in a universe expanding with constant acceleration), the constant of proportionality turned out to be somewhat different from Hubble's original estimate.
Humason, M.L. (1936) The apparent radial velocities of 100 extra galactic nebula. Astrophys. J. 83, pp.10-22.
data(stars2) head(stars2)data(stars2) head(stars2)
Computes a basis which, together with a basis of some columns of a matrix, constitute a basis of the column space of the entire matrix.
supplbasis(A, B, tol=sqrt(.Machine$double.eps))supplbasis(A, B, tol=sqrt(.Machine$double.eps))
A |
Sub-matrix containing some columns of a matrix. |
B |
Sub-matrix containing remaining columns of same matrix. |
tol |
A relative tolerance to detect rank deficiency during qr decomposition (default = sqrt(.Machine$double.eps)). |
Returns a semi-orthogonal matrix whose columns, together with a basis of the column space of A, constitute a basis of the column space of the entire matrix (A:B).
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
A <- cbind(c(2,1,-2),c(3,1,-1)) B <- diag(c(1,1,0)) supplbasis(A,B)A <- cbind(c(2,1,-2),c(3,1,-1)) B <- diag(c(1,1,0)) supplbasis(A,B)
Computes the trace of a given matrix.
tr(M)tr(M)
M |
A matrix whose trace is to be computed. |
A scalar value, describing the trace of M.
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
tr(matrix(2,2,2))tr(matrix(2,2,2))
The measured hemoglobin content in the blood of brown trout that were randomly allocated to four troughs, where different concentrations of sulfamerazine in food were administered 35 days prior to measurement (Gutsell, 1951).
data(trout)data(trout)
A data frame with 40 observations on the following 2 variables.
SulfamerazineConcentrations of sulfamerazine (in grams per 100 pounds of fish)
HemoglobinHemoglobin content (in grams per 100 ml of blood)
Gutsell, James S. (1951) The effect of sulfamerazine on the erythrocyte and hemoglobin content of trout blood, Biometrics 7(2), pp.171-179.
data(trout) head(trout)data(trout) head(trout)
Waist circumference and adipose tissue data (Daniel and Cross, 2013).
data(waist)data(waist)
A data frame with 109 observations on the following 2 variables.
WaistWaist circumference (in centimeters)
ATArea of lower abdominal adipose tissue (in squared centimeters)
Daniel, W.W. and Cross, C.L. (2013) Biostatistics: A Foundation for Analysis in the Health Sciences, tenth edition, Wiley, New York, Table 9.3.1.
data(waist) head(waist)data(waist) head(waist)
The midyear population of the world for the years 1981-2000.
data(worldpop)data(worldpop)
A data frame with 20 observations on the following 2 variables.
YearCalendar year
Pop.billionPopulation (in billion)
U.S. Census Bureau, International Data Base (http://www.census.gov/ipc/www/idbnew.html)
data(worldpop) head(worldpop)data(worldpop) head(worldpop)
Men's and women's world record times for various out-door running distances, recognized by the International Association of Athletics Federations (IAAF) as of 17 November, 2017.
data(worldrecord)data(worldrecord)
A data frame with 10 observations on the following 3 variables.
DistanceRunning distance (in meters)
MenRecordMen's record time (in seconds)
WomenRecordWomen's record time (in seconds)
International Association of Athletics Federations (https://www.iaaf.org/records/by-category/world-records).
data(worldrecord) head(worldrecord)data(worldrecord) head(worldrecord)
Wright brothers' 1901 wind tunnel data on pressure over different types of wings at different angles.
data(Wright)data(Wright)
A data frame with 222 observations on the following 3 variables.
PressureAir pressure (in psi)
AngleAngle of wing (in degrees)
WingWing type
Dataplot webpage of the National Institute of Standards and Technology (NIST),
USA (https://www.itl.nist.gov/div898/software/dataplot/data/WRIGHT11.DAT)
data(Wright) head(Wright)data(Wright) head(Wright)
Prepares design matrix for two way classified data with single observation per cell and response vector in corresponding order.
yX(response, treatments, blocks)yX(response, treatments, blocks)
response |
Response vector as provided (numeric). |
treatments |
Vector of treatment levels as provided (either numeric or character). |
blocks |
Vector of block levels as provided (either numeric or character). |
Returns a list with following components.
X |
A binary matrix with number of rows equal to length of response and number of columns equal to the total number of levels of treatments and blocks plus one. Each row has exactly three 1s: in the first position and in the two positions representing the treatment and block levels. |
y |
Numeric vector of response values, permuted to correspond with the rows of X. |
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
data(airspeed) yX(airspeed$Posmaxspeed,airspeed$Reynolds,airspeed$Ribht)data(airspeed) yX(airspeed$Posmaxspeed,airspeed$Reynolds,airspeed$Ribht)
Prepares design matrix for balanced two way classified data and response vector in corresponding order.
yXm(response, treatments, blocks)yXm(response, treatments, blocks)
response |
Response vector as provided (numeric). |
treatments |
Vector of treatment levels as provided (either numeric or character). |
blocks |
Vector of block levels as provided (either numeric or character). |
Returns a list with following components.
X |
A binary matrix with number of rows equal to length of response and number of columns equal to the total number of levels of treatments and blocks plus one. Each row has exactly three 1s: in the first position and in the two positions representing the treatment and block levels. |
y |
Numeric vector of response values, permuted to correspond with the rows of X. |
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
data(poison) yXm(poison$Survtime,poison$Treatment,poison$Poison)data(poison) yXm(poison$Survtime,poison$Treatment,poison$Poison)
Prepares design matrix for nested model with groups and subgroups and response vector in corresponding order.
yXn(response, group, subgroup)yXn(response, group, subgroup)
response |
Response vector as provided (numeric). |
group |
Vector of group labels as provided (either numeric or character). |
subgroup |
Vector of subgroup labels as provided (either numeric or character). |
Returns a list with following components.
X |
A binary matrix with number of rows equal to length of response and number of columns equal to the total number of levels of treatments and blocks plus one. Each row has exactly three 1s: in the first position and in the two positions representing the group and the subgroup. |
y |
Numeric vector of response values, permuted to correspond with the rows of X. |
Debasis Sengupta <[email protected]>, Jinwen Qiu <[email protected]>
Sengupta and Jammalamadaka (2019), Linear Models and Regression with R: An Integrated Approach.
data(kinks) yXn(kinks$beta,kinks$type,kinks$order)data(kinks) yXn(kinks$beta,kinks$type,kinks$order)