Function to calculate predictions and uncertainties of predictions from estimates from multivariate regression analysis of survey data with the item count technique.
# S3 method for ictreg predict(object, newdata, newdata.diff, direct.glm, se.fit = FALSE, interval = c("none", "confidence"), level = 0.95, avg = FALSE, sensitive.item, ...)
object | Object of class inheriting from "ictreg" |
---|---|
newdata | An optional data frame containing data that will be used to make predictions from. If omitted, the data used to fit the regression are used. |
newdata.diff | An optional data frame used to compare predictions with predictions from the data in the provided newdata data frame. |
direct.glm | A glm object from a logistic binomial regression predicting responses to a direct survey item regarding the sensitive item. The predictions from the ictreg object are compared to the predictions based on this glm object. |
se.fit | A switch indicating if standard errors are required. |
interval | Type of interval calculation. |
level | Significance level for confidence intervals. |
avg | A switch indicating if the mean prediction and associated statistics across all obserations in the dataframe will be returned instead of predictions for each observation. |
sensitive.item | For multiple sensitive item design list experiments, specify which sensitive item fits to use for predictions. Default is the first sensitive item. |
... | further arguments to be passed to or from other methods. |
predict.ictreg
produces a vector of predictions or a matrix
of predictions and bounds with column names fit, lwr, and upr if interval is
set. If se.fit is TRUE, a list with the following components is returned:
vector or matrix as above
standard error of prediction
predict.ictreg
produces predicted values, obtained by evaluating the
regression function in the frame newdata (which defaults to
model.frame(object)
. If the logical se.fit
is TRUE
,
standard errors of the predictions are calculated. Setting interval
specifies computation of confidence intervals at the specified level or no
intervals.
If avg
is set to TRUE
, the mean prediction across all
observations in the dataset will be calculated, and if the se.fit
option is set to TRUE
a standard error for this mean estimate will be
provided. The interval
option will output confidence intervals
instead of only the point estimate if set to TRUE
.
Two additional types of mean prediction are also available. The first, if a
newdata.diff
data frame is provided by the user, calculates the mean
predicted values across two datasets, as well as the mean difference in
predicted value. Standard errors and confidence intervals can also be added.
For difference prediction, avg
must be set to TRUE
.
The second type of prediction, triggered if a direct.glm
object is
provided by the user, calculates the mean difference in prediction between
predictions based on an ictreg
fit and a glm
fit from a direct
survey item on the sensitive question. This is defined as the revealed
social desirability bias in Blair and Imai (2010).
Blair, Graeme and Kosuke Imai. (2012) ``Statistical Analysis of List Experiments." Political Analysis, Vol. 20, No 1 (Winter). available at http://imai.princeton.edu/research/listP.html
Imai, Kosuke. (2011) ``Multivariate Regression Analysis for the Item Count Technique.'' Journal of the American Statistical Association, Vol. 106, No. 494 (June), pp. 407-416. available at http://imai.princeton.edu/research/list.html
ictreg
for model fitting
data(race) race.south <- race.nonsouth <- race race.south[, "south"] <- 1 race.nonsouth[, "south"] <- 0# NOT RUN { # Fit EM algorithm ML model with constraint with no covariates ml.results.south.nocov <- ictreg(y ~ 1, data = race[race$south == 1, ], method = "ml", treat = "treat", J = 3, overdispersed = FALSE, constrained = TRUE) ml.results.nonsouth.nocov <- ictreg(y ~ 1, data = race[race$south == 0, ], method = "ml", treat = "treat", J = 3, overdispersed = FALSE, constrained = TRUE) # Calculate average predictions for respondents in the South # and the the North of the US for the MLE no covariates # model, replicating the estimates presented in Figure 1, # Imai (2010) avg.pred.south.nocov <- predict(ml.results.south.nocov, newdata = as.data.frame(matrix(1, 1, 1)), se.fit = TRUE, avg = TRUE) avg.pred.nonsouth.nocov <- predict(ml.results.nonsouth.nocov, newdata = as.data.frame(matrix(1, 1, 1)), se.fit = TRUE, avg = TRUE) # Fit linear regression lm.results <- ictreg(y ~ south + age + male + college, data = race, treat = "treat", J=3, method = "lm") # Calculate average predictions for respondents in the # South and the the North of the US for the lm model, # replicating the estimates presented in Figure 1, Imai (2010) avg.pred.south.lm <- predict(lm.results, newdata = race.south, se.fit = TRUE, avg = TRUE) avg.pred.nonsouth.lm <- predict(lm.results, newdata = race.nonsouth, se.fit = TRUE, avg = TRUE) # Fit two-step non-linear least squares regression nls.results <- ictreg(y ~ south + age + male + college, data = race, treat = "treat", J=3, method = "nls") # Calculate average predictions for respondents in the South # and the the North of the US for the NLS model, replicating # the estimates presented in Figure 1, Imai (2010) avg.pred.nls <- predict(nls.results, newdata = race.south, newdata.diff = race.nonsouth, se.fit = TRUE, avg = TRUE) # Fit EM algorithm ML model with constraint ml.constrained.results <- ictreg(y ~ south + age + male + college, data = race, treat = "treat", J=3, method = "ml", overdispersed = FALSE, constrained = TRUE) # Calculate average predictions for respondents in the South # and the the North of the US for the MLE model, replicating the # estimates presented in Figure 1, Imai (2010) avg.pred.diff.mle <- predict(ml.constrained.results, newdata = race.south, newdata.diff = race.nonsouth, se.fit = TRUE, avg = TRUE) # Calculate average predictions from the item count technique # regression and from a direct sensitive item modeled with # a logit. # Estimate logit for direct sensitive question data(mis) mis.list <- subset(mis, list.data == 1) mis.sens <- subset(mis, sens.data == 1) # Fit EM algorithm ML model fit.list <- ictreg(y ~ age + college + male + south, J = 4, data = mis.list, method = "ml") # Fit logistic regression with directly-asked sensitive question fit.sens <- glm(sensitive ~ age + college + male + south, data = mis.sens, family = binomial("logit")) # Predict difference between response to sensitive item # under the direct and indirect questions (the list experiment). # This is an estimate of the revealed social desirability bias # of respondents. See Blair and Imai (2010). avg.pred.social.desirability <- predict(fit.list, direct.glm = fit.sens, se.fit = TRUE) # }