Model fitting

Lennart Oelschläger

2021-11-12

RprobitB estimates a (latent class) (mixed) (multinomial) probit model in a Bayesian framework via Gibbs sampling.

To fit a model to choice data, apply the function

model = mcmc(data = data)

where data must be the output of either prepare() or simulate(), see the vignette about data management.

The function mcmc() has the following optional arguments:

Prior settings

Bayesian analysis enables to impose prior beliefs on the model parameters. It is possible to either express strong prior knowledge using informative prior distributions or to express vague knowledge using diffuse prior distributions. RprobitB applies the following conjugate priors:

Per default, RprobitB applies the diffuse prior approach, setting \(\delta_1=\dots=\delta_C=1\); \(\psi\) and \(\xi\) equal to the zero vector; \(\Psi\) and \(\Xi\) equal to the identity matrix; \(\nu\) and \(\kappa\) equal to \(P_r+2\) and \(J+1\), respectively (to obtain proper priors); \(\Theta\) and \(\Lambda\) equal to the identity matrix.

Alternatively, the parameters can be chosen based on estimation results of similar choice settings, resulting in informative priors.

Bayes estimation of the probit model via Gibbs sampling

The Bayesian analysis of the (latent class) (mixed) (multinomial) probit model builds upon the work of (McCulloch and Rossi 1994), (Nobile 1998), (Allenby and Rossi 1998), and (Imai and Dyk 2005). A key ingredient is the concept of data augmentation, cf. (Albert and Chib 1993), which treats the latent utilities as parameters themselves. Conditional on the latent utilities, the multinomial probit model constitutes a standard Bayesian linear regression set-up, which renders drawing from the posterior distribution feasible without the need to evaluate any likelihood.

Gibbs sampling from the joint posterior distribution of a latent class mixed multinomial probit model proceeds by iteratively drawing and updating each model parameter conditional on the other parameters.

Parameter normalization

Samples obtained from the scheme described above still lack identification (see the introductory vignette). Therefore, subsequent to the sampling, the normalizations

are required for the \(i\)th updates in each iterations \(i\), cf. (Imai and Dyk 2005), where \((\Sigma^{(i)})_{11}\) denotes the top-left element of \(\Sigma^{(i)}\).

The draws for \(s\) and \(z\) do not need to be normalized. The draws for \(U\) and \(\beta\) could be normalized if the results are of interest in the analysis.

Alternatively, the samples can be normalized such that any variance of \(\Sigma\) or any element of \(\alpha\) equals any fixed non-negative value.

The normalization of a fitted model can be changed afterwards via

model = transform(model = model, scale = scale)

where model is the output of mcmc() and scale is a named list of three elements, determining the parameter normalization, as described above.

Burning and thinning

The theory behind Gibbs sampling constitutes that the sequence of samples produced by the updating scheme can be considered as a Markov chain with stationary distribution equal to the desired joint posterior distribution. It takes a certain number of iterations for that stationary distribution to be approximated reasonably well. Therefore, it is common practice to discard the first \(B\) out of \(R\) samples (the so-called burn-in period). Furthermore, correlation between nearby samples should be expected. In order to obtain independent samples, we consider only every \(Q\)th sample when averaging values to compute parameter statistics like expectation and standard deviation.

Adequate values for \(R\), \(B\) and \(Q\) depend on the complexity of the considered Bayesian framework. Per default, RprobitB sets R = 1e4, B = R/2 and Q = 10.

The independence of the samples can be verified by computing the serial correlation and the convergence of the Gibbs sampler can be checked by considering trace plots.

Updating the number of latent classes

Updating the number \(C\) of latent classes is done within the Gibbs sampler by executing the following weight-based updating scheme within the second half of the burn-in period2:

These rules contain choices on the values for \(\varepsilon_{\text{min}}\), \(\varepsilon_{\text{max}}\) and \(\varepsilon_{\text{distmin}}\). The adequate value for \(\varepsilon_{\text{distmin}}\) depends on the scale of the parameters. Per default, RprobitB sets

Examples

### probit model
p = simulate(form = choice ~ var | 0, N = 100, T = 10, J = 2)
m1 = mcmc(data = p)
### multinomial probit model
mnp = simulate(form = choice ~ var | 0, N = 100, T = 10, J = 3)
m2 = mcmc(data = mnp)
### mixed multinomial probit model
mmnp = simulate(form = choice ~ 0 | var, N = 100, T = 10, J = 3, re = "var")
m3 = mcmc(data = mmnp)
### latent classes mixed multinomial probit model
lcmmnp = simulate(form = choice ~ 0 | var, N = 100, T = 10, J = 3, re = "var",
                  parm = list("C" = 2))
m4 = mcmc(data = lcmmnp, latent_classes = list("C" = 2))
### update of latent classes
m5 = mcmc(data = lcmmnp, latent_classes = list("update" = TRUE))

References

Albert, James H., and Siddhartha Chib. 1993. “Bayesian Analysis of Binary and Polychotomous Response Data.” Journal of the American Statistical Association 88.
Allenby, Greg M., and Peter Rossi. 1998. “Marketing Models of Consumer Heterogeneity.” Journal of Econometrics 89.
Imai, Kosuke, and David A. van Dyk. 2005. “A Bayesian Analysis of the Multinomial Probit Model Using Marginal Data Augmentation.” Journal of Econometrics 124.
McCulloch, Robert, and Peter Rossi. 1994. “An Exact Likelihood Analysis of the Multinomial Probit Model.” Journal of Econometrics 64.
Nobile, Agostino. 1998. “A Hybrid Markov Chain for the Bayesian Analysis of the Multinomial Probit Model.” Statistics and Computing 8.

  1. Per default, the first error-term variance is fixed to 1, i.e. scale = list("parameter" = "s", "index" = 1, "value" = 1). Note that you can set "parameter" = "a" only if the model has parameters with a fixed coefficient (i.e. P_f>0).↩︎

  2. It is reasonable to wait a certain number of iterations before the next update to allow for readjustments, which is implemented via the latent_classes$buffer argument.↩︎