Overview The purpose of this document is to assist researchers in running the two FORTRAN programs written in support of our paper, "Simulation of Multinomial Probit Probabilities and Imputation of Missing Data," published in the 1998 volume of Advances in Econometrics, edited by Thomas B. Fomby and R. Carter Hill. In addition to this help document, you must obtain the following files from Steven Stern's web page at the University of Virginia: 1. "monte.dat" 2. "monte.do" 3. "monte.log" 4. "m-estim.f" 5. "m-simul.f" 6. "monte.parms" 7. "makeo" We have run these programs on an IBM RS/6000 (running AIX) at the University of Virginia. With perhaps some effort on the researcher's part, we think these FORTRAN programs can be adapted for use with other data and run on other computer platforms. Note: Running these programs will not directly produce results found in our Advances in Econometrics paper. Results in our paper come from applications based on data from the World Bank's Jamaican Survey of Living Conditions. However, that data is not available to the public, so we have created a new dataset, "monte.dat", and modified our programs appropriately for other researchers to use. I. Introduction The file "monte.dat" is to represent a database for which our imputation algorithm is suitably applied. In principle, "monte.dat" contains 1,000 observations on 15 variables (denote the variables x1 through x15). In practice, however, some variables are "missing" for some observations (in the file, "monte.dat", values of "-9" should be interpreted as "missing"). The Stata log-file, "monte.log", documents patterns of missing values in the dataset. For example, 54 observations (of the 1,000-member sample) are missing information about x1; 105 are missing x2; ...; 156 are missing x15. In our paper, we suggest how missing values can be imputed by simulating values based on the estimated joint density function and all observed variables. The two FORTRAN programs, "m-estim.f" and "m-simul.f", apply our algorithm to impute missing values to the data in "monte.dat". The variables in "monte.dat" are organized such that: variables x1-x5 are distributed continuously; variables x6-x10 are observed binary (only take values of "0" or "1"); variables x11-x15 are observed ordered-discrete. Variables x11 and x14 take values "1" through "4"; variables x12 and x15 take values "1" through "5"; variable x13 takes values "1" through "6". The first FORTRAN program, "m-estim.f", reads the available data from "monte.dat" and the requisite parameters from "monte.parms", then estimates the mean and covariance matrix for all 15 variables in the database. The important output files are: 1. "m-estim.out" 2. "xmean.dat"; "xcov.datfirst" "xmean.dat" contains estimated means for the fifteen variables; "xcut.dat" contains estimated cut-off points for ordered discrete variables; "xcov.dat" contains the estimated covariance matrix for all fifteen variables. "xcov.datfirst" is a first-stage estimate of the covariance matrix. In practice, the covariance matrix estimated at the first stage might not be positive definite. "xcov.dat" is a positive definite transformation of the first stage estimate stored in "xcov.datfirst." In the "monte.dat" application, the two data files are identical because the first stage covariance matrix estimate is positive definite. The second FORTRAN program, "m-simul.f", takes as input "monte.parms", "xmean.dat", "xcut.dat" and "xcov.dat". Based on that information it simulates random variables from a joint normal density function and uses the simulated draws to impute missing values to the data. The important output files are: 1. "m-simul.out" 2. "m-simul.dat" Note: "m-simul.dat" is the final database including all imputed values. Each original observation in "monte.dat" contributes 10 observations to "m-simul.dat" because we impute missing values using 10 different random draws for each missing variable. Observations from "monte.dat" who are not missing any variables simply are written to "m-simul.dat" 10 times. Researchers should first examine three output files from our runs of these programs -- "monte.log", "m-estim.out" and "m-simul.out". II. How We Ran Our Jobs A. We placed all of the following programs in a subdirectory on a Unix-based workstation with a Fortran-77 compiler: 1. "monte.dat" 2. "monte.parms" 3. "m-estim.f" 4. "m-simul.f" 5. "makeo" B. We ran "makeo" to compile the Fortran programs and create executable file, "m-estim", by issuing the following command: % make -f makeo C. We ran the executable object code as follows: % nohup m-estim & III. How to Adapt Our Programs for Another Application A. Create your own parameter file by modifying "monte.parms" The parameter file "monte.parms" provides information about the dataset "monte.dat", which both FORTRAN programs require. Thus, the first thing you must do is adapt "monte.parms" to your database. The first 8 lines of "monte.parms" contains the requisite information for the FORTRAN programs; the text at the bottom of the file simply documents what those parameters mean. You will need to adapt the following parameters for your dataset: nfac - the number of observations included in the dataset (ours is "1000") nvar - the number of variables included in the dataset (ours is "15") nxkind - describes the "type" of each variable (continuously distributed == "3"; binary == "2"; ordered discrete == "1") nxcut - the number of cut-off points for each variable ("0" for continuously distributed variables; "3" for binary variables; "number of outcomes +1" for ordered discrete variables). Note: "m-estim.f" will always set the smallest cut-off point to "-20.00"; the second smallest cut-off point to "0.00" and the largest cut-off point to "+20.00". "m-estim.f" actually estimates cut-off points 3 through m-1 for ordered discrete variables. B. Create your own estimation program by modifying "m-estim.f" C. Create your own simulation by modifying "m-simul.f"