This function performs K-fold cross-validation for an HBAM or FBAM model in order to estimate the expected log pointwise predictive density for a new dataset (ELPD). Multiple chains for one or more folds can be run in parallel using the future
package.
Usage
hbam_cv(
self = NULL,
stimuli = NULL,
model = "HBAM",
allow_miss = 0,
req_valid = NA,
req_unique = 2,
prefs = NULL,
group_id = NULL,
prep_data = TRUE,
data = NULL,
K = 10,
chains = 2,
warmup = 1000,
iter = 3000,
seed = 1,
sigma_alpha = NULL,
sigma_beta = 0.35,
sigma_mu_alpha = NULL,
sigma_mu_beta = 0.3,
...
)
Arguments
- self
A numerical vector of N ideological self-placements. Any missing data must be coded as NA. This argument will not be used if the data have been prepared in advance via the
prep_data()
function.- stimuli
An N × J matrix of numerical stimulus placements, where J is the number of stimuli. Any missing data must be coded as NA. This argument will not be used if the data have been prepared in advance via the
prep_data()
function.- model
Character: Name of the model to be used. Defaults to HBAM.
- allow_miss
Integer specifying how many missing stimulus positions to be accepted for an individual still to be included in the analysis. This argument will not be used if the data have been prepared in advance via the
prep_data()
function. Defaults to 0.- req_valid
Integer specifying how many valid observations to require for a respondent to be included in the analysis. The default is
req_valid = J - allow_miss
, but if specified,req_valid
takes precedence. This argument will not be used if the data have been prepared in advance via theprep_data()
function.- req_unique
Integer specifying how may unique positions on the ideological scale each respondent is required to have used when placing the stimuli in order to be included in the analysis. The default is
req_unique = 2
. This argument will not be used if the data have been prepared in advance via theprep_data()
function.- prefs
An N × J matrix of numerical stimulus ratings or preference scores. These data are only required by the HBAM_R and HBAM_R_MINI models and will be ignored when fitting other models.
- group_id
Integer vector of length N identifying which group each respondent belongs to. The supplied vector should range from 1 to the total number of groups in the data, and all integers between these numbers should be represented in the supplied data. These data are only required by models with "MULTI" in their name and will be ignored when fitting other models.
- prep_data
Logical: Should the data be prepared before fitting the model? (Or have the data been prepared in advance by first running the
prep_data()
andprep_data_cv()
functions)? If so, setprep_data = FALSE
.) Defaults toprep_data = TRUE
.- data
A list of data produced by
prep_data()
followed byprep_data_cv()
.- K
An integer above 2, specifying the number of folds to use in the analysis. Defaults to 10.
- chains
A positive integer specifying the number of Markov chains to use per fold. Defaults to 2.
- warmup
A positive integer specifying the number of warmup (aka burn-in) iterations per chain. It defaults to 1000. The number of warmup iterations should be smaller than
iter
.- iter
A positive integer specifying the number of iterations for each chain (including warmup). It defaults to 3000 as running fewer chains for longer is a more efficient way to obtain a certain number of draws (and cross-validation can be computationally expensive).
- seed
An integer passed on to
set.seed
before creating the folds to increase reproducibility and comparability. Defaults to 1 and only applies to fold-creation when the argumentprep_data
isTRUE
. The suppliedseed
argument is also used to generate seeds for the sampling algorithm.- sigma_alpha
A positive numeric value specifying the standard deviation of the prior on the shift parameters in the FBAM model, or the standard deviation of the parameters' deviation from the group-means in FBAM_MULTI models. (This argument will be ignored by HBAM models.) Defaults to B / 4, where B measures the length of the survey scale as the number of possible placements on one side of the center.
- sigma_beta
A positive numeric value specifying the standard deviation of the prior on the logged stretch parameters in the FBAM model, or the standard deviation of the logged parameters' deviation from the group-means in FBAM_MULTI models. (This argument will be ignored by HBAM models.) Defaults to .35.
- sigma_mu_alpha
A positive numeric value specifying the standard deviation of the prior on the group-means of the shift parameters in MULTI-type models. Defaults to B / 5.
- sigma_mu_beta
A positive numeric value specifying the standard deviation of the prior on the group-means of the logged stretch parameters in MULTI-type models. Defaults to .3.
- ...
Arguments passed to
rstan::sampling()
.
Value
A list of classes kfold
and loo
, which contains the following named elements:
"estimates"
: A1x2
matrix containing the ELPD estimate and its standard error. The columns have names"Estimate"
and"SE"
."pointwise"
: ANx1
matrix with column name"elpd_kfold"
containing the pointwise contributions for each data point.
Examples
# \donttest{
# Loading and re-coding ANES 1980 data:
data(LC1980)
LC1980[LC1980 == 0 | LC1980 == 8 | LC1980 == 9] <- NA
# Making a small subset of the data for illustration:
self <- LC1980[1:50, 1]
stimuli <- LC1980[1:50, -1]
# Preparing to run chains in parallel using 2 cores via the future package:
# Note: You would normally want to use all physical cores for this.
future::plan(future::multisession, workers = 2)
# Performing 10-fold cross-validation for the HBAM_MINI model:
# Note: You would typically want to run the chains for more iterations.
cv_hbam_mini <- hbam_cv(self, stimuli, model = "HBAM_MINI",
chains = 1, warmup = 500, iter = 1000)
#> Summary of prepared data (values for supplied data in paretheses)
#> - Number of respondents: 42 (50)
#> - Number of stimuli: 6 (6)
#> - Number of stimuli obs.: 252 (289)
#> - Range of observations: [-3, 3] ([1, 7])
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
# Performing 10-fold cross-validation for the FBAM model:
cv_FBAM <- hbam_cv(self, stimuli, model = "FBAM",
chains = 1, warmup = 500, iter = 1000)
#> Summary of prepared data (values for supplied data in paretheses)
#> - Number of respondents: 42 (50)
#> - Number of stimuli: 6 (6)
#> - Number of stimuli obs.: 252 (289)
#> - Range of observations: [-3, 3] ([1, 7])
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
# Comparing the results using the loo package:
loo::loo_compare(list(HBAM_MINI = cv_hbam_mini,
FBAM = cv_FBAM))
#> elpd_diff se_diff
#> FBAM 0.0 0.0
#> HBAM_MINI -8.3 7.1
# Stop the cluster of parallel sessions:
future::plan(future::sequential)
# }