$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ 1 1 MARGINAL BOX-COX TRANSFORMATIONS OF MULTIVARIATE DATA TO "NEAR" NORMALITY USING UTRANS HAMPARSUM BOZDOGAN AND DONALD E. RAMIREZ DEPARTMENT OF MATHEMATICS MATH-ASTRO BUILDING UNIVERSITY OF VIRGINIA CHARLOTTESVILLE, VIRGINIA 22903 TECHNICAL REPORT 14 (OCTOBER 7, 1986) 1 INTRODUCTION THIS TECHNICAL REPORT PRESENTS THE PROGRAM UTRANS WHICH IS DEVELOPED IN OUR PAPER: TESTING OF MODEL FIT: ASSESSING AND BOX-COX TRANSFORMATIONS OF MUTIVARIATE DATA TO "NEAR" NORMALITY, COMPUTATIONAL STATISTICS QUARTERLY 3,127-150(1987) UTRANS IS A FORTRAN 77 PROGRAM WHICH COMPUTES THE MULTIVARIATE SKEWNESS AND KURTOSIS OF A SAMPLE MATRIX. MARDIA'S TEST STATISTIC FOR MULTIVARIATE NORMALITY IS COMPUTED FOR THE ORIGINAL DATA. THE DATA IS STANDARDIZED TO HAVE MEAN 10, STANDARD DEVIATION 1.5, AND POSITIVE SKEWNESS. THE STANDARDIZED DATA IS TRANSFORMED BY MARGINAL BOX-COX TRANSFORMATIONS TO "NEARLY" NORMAL DATA. THE SKEWNESS AND KURTOSIS IS RECOMPUTED ON THE TRANSFORMED DATA FOR COMPARISON. THE PROGRAM UTRANS REQUIRES THAT THE IMSL LIBRARY BE AVAILABLE. INDEX APPENDIX A CONTAINS THE RAW DATA SET WITH N=50 AND P=4 APPENDIX B CONTAINS THE TERMINAL OUTPUT FROM UTRANS APPENDIX C CONTAINS THE FULL OUTPUT FROM THE PROGRAM UTRANS APPENDIX D CONTAINS THE FORTRAN PROGRAM UTRANS APPENDIX E CONTAINS THE CALL STATEMENTS FROM UTRANS $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ APPENDIX A: THE ORIGINAL DATA 6.277 5.340 3.106 4.596 5.804 5.543 3.500 4.065 5.200 5.543 3.657 3.143 6.111 5.194 3.139 5.806 7.045 5.716 3.507 6.045 6.871 5.774 3.419 4.839 6.611 5.778 3.500 4.500 6.571 6.429 3.286 4.429 6.700 5.150 4.200 4.700 7.179 5.893 4.250 5.893 6.889 5.306 3.806 4.194 5.813 4.667 3.042 5.688 6.000 5.606 2.848 4.545 6.731 5.962 3.385 4.808 5.935 4.457 3.065 5.696 7.708 5.944 3.306 5.875 5.886 6.029 3.343 5.057 6.750 4.750 3.500 5.125 4.903 3.968 4.226 5.032 6.603 4.444 4.270 4.746 4.231 4.231 4.846 5.000 4.500 6.400 4.100 4.700 5.973 4.459 3.946 4.378 4.556 5.056 3.333 4.722 5.644 4.667 3.511 5.333 4.857 4.857 5.762 4.143 6.125 5.125 3.625 3.625 5.032 4.097 3.742 4.548 4.233 5.100 3.933 3.700 6.433 4.299 4.299 4.791 5.815 4.815 4.815 5.333 5.455 4.818 3.545 4.455 6.617 4.483 4.083 4.517 7.708 4.369 5.062 4.538 5.333 4.667 5.067 4.600 4.688 4.917 4.417 5.250 5.375 6.250 4.750 4.500 7.194 6.389 4.139 6.111 8.000 6.583 4.875 5.208 8.087 7.348 6.174 5.043 8.192 7.615 5.654 5.769 9.839 7.774 6.484 7.419 5.714 7.286 7.143 6.286 7.125 7.625 4.500 5.750 6.000 5.467 6.600 4.800 5.583 5.667 4.833 6.583 5.667 5.667 6.000 7.000 6.778 6.222 5.556 5.000 5.529 5.765 5.235 4.529 11.36 7.929 7.857 8.070 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ APPENDIX B: TERMINAL OUTPUT FROM UTRANS OK, seg utrans ENTER THE NAME OF THE OUTPUT FILE atmos4.out OUTPUT FILE IS atmos4.out PROGRAM NORMALIZES MULTIVARIATE DATA VERSION = OCTOBER 7,1986 WHAT FILE IS YOUR DATA IN? atmos4 DATA IS FROM FILE = atmos4 ENTER THE NUMBER OF VARIABLES 4 NUMBER OF VARIABLES = 4 ENTER THE NUMBER OF CASES 50 NUMBER OF CASES = 50 ENTER THE FORTRAN FORMAT (E.G., (4(1X,G18.8)) ) OR RETURN FOR FREEFIELD READING DATA... FORMAT IS FREEFIELD DATA READ COMPUTING MEAN VECTOR 1 6.3045400 2 5.5488000 3 4.3648200 4 5.0896600 *****DATA MATRIX NOW HAS MEAN ZERO***** COMPUTING MAXIMUM LIKELIHOOD SIGMA MATRIX COMPUTING CORRELATION MATRIX COMPUTING INVERSE OF COVARIANCE MATRIX COMPUTING SKEWNESS AND KURTOSIS THEORETICAL SAMPLE SKEWNESS 0.0000 5.41186 KURTOSIS 24.00 25.9684 TEST OF SKEWNESS OBSERVED CHI-SQUARE = 45.098869 DF = 20.000000 LEVEL OF SIGNIFICANCE FOR CHI-SQUARE = 0.10699034E-02 TEST OF KURTOSIS Z = 1.0044789 LEVEL OF SIGNIFICANCE FOR TWO-SIDED TEST = 0.31514784 CORRELATION BETWEEN DATA AND NORMAL SCORES 1 : 0.95025115 2 : 0.97373591 3 : 0.95144780 4 : 0.96397745 COMPUTING SKEWNESS AND KURTOSIS COLUMN = 1 STANDARD DEVIATION = 1.3310393 SKEWNESS = 1.2900473 KURTOSIS = 5.9159360 COLUMN = 2 STANDARD DEVIATION = 1.0154826 SKEWNESS = 0.66678475 KURTOSIS = 2.6971699 COLUMN = 3 STANDARD DEVIATION = 1.1521073 SKEWNESS = 1.0468105 KURTOSIS = 3.4937931 COLUMN = 4 STANDARD DEVIATION = 0.93353492 SKEWNESS = 0.90852162 KURTOSIS = 4.1901167 *****DATA NOW HAS MEAN = 10.000000 ***** STANDARD DEVIATION = 1.5000000 AND POSITIVE SKEWNESS COMPUTING BOX-COX TRANSFORM FOR VARIABLE 1 THE VALUE OF L(LAMBDA) = -19.768188 AT LAMBDA = 1.0000000 NUMBER OF ITERATIONS = 0 MAXIMUM VALUE OF L(LAMBDA) = -14.691226 AT LAMBDA = -1.3717306 NUMBER OF ITERATIONS = 6 THE P-VALUE FOR THE BOX-COX TRANSFORMATION = 0.14399451E-02 COMPUTING BOX-COX TRANSFORM FOR VARIABLE 2 THE VALUE OF L(LAMBDA) = -19.768188 AT LAMBDA = 1.0000000 NUMBER OF ITERATIONS = 0 MAXIMUM VALUE OF L(LAMBDA) = -16.818024 AT LAMBDA = -1.2822683 NUMBER OF ITERATIONS = 5 THE P-VALUE FOR THE BOX-COX TRANSFORMATION = 0.15138070E-01 COMPUTING BOX-COX TRANSFORM FOR VARIABLE 3 THE VALUE OF L(LAMBDA) = -19.768188 AT LAMBDA = 1.0000000 NUMBER OF ITERATIONS = 0 MAXIMUM VALUE OF L(LAMBDA) = -12.732495 AT LAMBDA = -2.7609076 NUMBER OF ITERATIONS = 5 THE P-VALUE FOR THE BOX-COX TRANSFORMATION = 0.17600102E-03 COMPUTING BOX-COX TRANSFORM FOR VARIABLE 4 THE VALUE OF L(LAMBDA) = -19.768188 AT LAMBDA = 1.0000000 NUMBER OF ITERATIONS = 0 MAXIMUM VALUE OF L(LAMBDA) = -16.867214 AT LAMBDA = -0.74980676 NUMBER OF ITERATIONS = 4 THE P-VALUE FOR THE BOX-COX TRANSFORMATION = 0.16008435E-01 REDOING SKEWNESS AND KURTOSIS ON NORMALIZED DATA COMPUTING MEAN VECTOR 1 0.69703793 2 0.73790545 3 0.36150860 4 1.0931779 *****DATA MATRIX NOW HAS MEAN ZERO***** COMPUTING MAXIMUM LIKELIHOOD SIGMA MATRIX COMPUTING CORRELATION MATRIX COMPUTING INVERSE OF COVARIANCE MATRIX COMPUTING SKEWNESS AND KURTOSIS THEORETICAL SAMPLE SKEWNESS 0.0000 2.92903 KURTOSIS 24.00 21.4391 TEST OF SKEWNESS OBSERVED CHI-SQUARE = 24.408596 DF = 20.000000 LEVEL OF SIGNIFICANCE FOR CHI-SQUARE = 0.22500002 TEST OF KURTOSIS Z = -1.3068548 LEVEL OF SIGNIFICANCE FOR TWO-SIDED TEST = 0.19126205 CORRELATION BETWEEN DATA AND NORMAL SCORES 1 : 0.99191560 2 : 0.99297306 3 : 0.99092794 4 : 0.98335311 **** STOP $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ 1 1 APPENDIX C: FULL OUTPUT FROM UTRANS OUTPUT FILE IS atmos4.out PROGRAM NORMALIZES MULTIVARIATE DATA VERSION = AUGUST 14,1985 DATA IS FROM FILE = atmos4 NUMBER OF VARIABLES = 4 NUMBER OF CASES = 50 FORMAT IS FREEFIELD DATA READ COMPUTING MEAN VECTOR 1 6.3045400 2 5.5488000 3 4.3648200 4 5.0896600 *****DATA MATRIX NOW HAS MEAN ZERO***** COMPUTING MAXIMUM LIKELIHOOD SIGMA MATRIX SIGMA MATRIX 1.7362324 0.81250847 1.0105808 0.57882362 0.55523224 1.3008043 0.68060426 0.45248597 0.49810442 0.85405770 COMPUTING CORRELATION MATRIX CORRELATION MATRIX WITH I-TH DIAG ELEMENT THE STANDARD DEVIATION 1.3176617 0.61339248 1.0052765 0.38515571 0.48426511 1.1405281 0.55891675 0.48705274 0.47257507 0.92415242 COMPUTING INVERSE OF COVARIANCE MATRIX SIGMA INVERSE 1.0766867 -0.62620892 1.8201491 -0.13256943E-01 -0.41212926 1.1110351 -0.51851706 -0.22493571 -0.41906544 1.9476711 COMPUTING SKEWNESS AND KURTOSIS THEORETICAL SAMPLE SKEWNESS 0.0000 5.41186 KURTOSIS 24.00 25.9684 TEST OF SKEWNESS OBSERVED CHI-SQUARE = 45.098869 DF = 20.000000 LEVEL OF SIGNIFICANCE FOR CHI-SQUARE = 0.10699034E-02 TEST OF KURTOSIS Z = 1.0044789 LEVEL OF SIGNIFICANCE FOR TWO-SIDED TEST = 0.31514784 1 Q-Q PLOT OF VARIABLE 1 0.56D+01 +*********+*********+*********+*********+*********+*********+ * I * * I * * I + * * I * 0.48D+01 + I + * I * * I * * I * * I * 0.40D+01 + I + * I * * I * * I + * * I * 0.32D+01 + I + * I * * I * * I * * I * 0.24D+01 + I + * I * * I * * I + * * I ++ * 0.16D+01 + I + * I ++ * * I * * I * * I + * 0.80D+00 + I +++ + * I ++ * * I +++ * * I +++ * * I + * 0.00D+00 +-------------------------------+---------------------------+ * ++ * * +++ * * +++ I * * +++ I * -0.80D+00 + ++ I + * ++ I * * + I * * + I * * ++ I * -0.16D+01 + + I + * + + I * * I * * + + I * * I * -0.24D+01 +*********+*********+*********+*********+*********+*********+ -0.24D+01 -0.80D+00 0.80D+00 0.24D+01 CORRELATION BETWEEN DATA AND NORMAL SCORES 1 : 0.95025115 1 Q-Q PLOT OF VARIABLE 2 0.24D+01 +*********+*********+*********+*********+*********+*******+*+ * I * * I + * * I * * I + + * 0.20D+01 + I + * I * * I * * I + + * * I * 0.16D+01 + I + * I * * I * * I * * I * 0.12D+01 + I + * I * * I + * * I * * I +++ * 0.80D+00 + I + * I + * * I + * * I * * I + * 0.40D+00 + I ++ + * I + * * I ++ * * I + * * I ++ * 0.00D+00 +-----------------------------++----------------------------+ * + * * I * * ++I * * + I * -0.40D+00 + + I + * ++ I * * I * * + I * * +++ I * -0.80D+00 + + I + * +++ I * * I * * + I * * +++ I * -0.12D+01 + + I + * + + I * * I * * + I * * I * -0.16D+01 +*+*******+*********+*********+*********+*********+*********+ -0.24D+01 -0.80D+00 0.80D+00 0.24D+01 CORRELATION BETWEEN DATA AND NORMAL SCORES 2 : 0.97373591 1 Q-Q PLOT OF VARIABLE 3 0.64D+01 +*********+*********+*********+*********+*********+*********+ * I * * I * * I * * I * 0.56D+01 + I + * I * * I * * I * * I * 0.48D+01 + I + * I * * I * * I * * I * 0.40D+01 + I + * I * * I * * I + * * I * 0.32D+01 + I + * I * * I * * I + * * I * 0.24D+01 + I + * I + * * I + * * I * * I + * 0.16D+01 + I + + * I + * * I + * * I + * * I * 0.80D+00 + I + + * I ++ * * I +++ * * I + * * I + * 0.00D+00 +--------------------------------++-------------------------+ * ++++ * * ++ * * ++ I * * ++ I * -0.80D+00 + +++++ I + * ++++ I * * ++ I * * + + ++ I * * + I * -0.16D+01 +*********+*********+*********+*********+*********+*********+ -0.24D+01 -0.80D+00 0.80D+00 0.24D+01 CORRELATION BETWEEN DATA AND NORMAL SCORES 3 : 0.95144780 1 Q-Q PLOT OF VARIABLE 4 0.30D+01 +*********+*********+*********+*********+*********+*******+*+ * I * * I * * I * * I * 0.25D+01 + I + * I * * I + * * I * * I * 0.20D+01 + I + * I + * * I * * I * * I * 0.15D+01 + I + + * I * * I * * I + * * I * 0.10D+01 + I ++ + * I * * I ++ * * I +++ * * I + * 0.50D+00 + I + * I * * I * * I +++ * * I + * 0.00D+00 +--------------------------------++-------------------------+ * I++ * * I * * +++ * * ++ I * -0.50D+00 + +++ I + * +++++ I * * ++ I * * I * * + + I * -0.10D+01 + + I + * I * * I * * I * * + I * -0.15D+01 + + I + * I * * I * * I * * + I * -0.20D+01 +*********+*********+*********+*********+*********+*********+ -0.24D+01 -0.80D+00 0.80D+00 0.24D+01 CORRELATION BETWEEN DATA AND NORMAL SCORES 4 : 0.96397745 COMPUTING SKEWNESS AND KURTOSIS COLUMN = 1 STANDARD DEVIATION = 1.3310393 SKEWNESS = 1.2900473 KURTOSIS = 5.9159360 COLUMN = 2 STANDARD DEVIATION = 1.0154826 SKEWNESS = 0.66678475 KURTOSIS = 2.6971699 COLUMN = 3 STANDARD DEVIATION = 1.1521073 SKEWNESS = 1.0468105 KURTOSIS = 3.4937931 COLUMN = 4 STANDARD DEVIATION = 0.93353492 SKEWNESS = 0.90852162 KURTOSIS = 4.1901167 *****DATA NOW HAS MEAN = 10.000000 ***** STANDARD DEVIATION = 1.5000000 AND POSITIVE SKEWNESS STANDARDIZED DATA WITH MEAN = 10.000000 STANDARD DEVIATION = 1.5000000 9.9689641 9.6915752 8.3610642 9.2067892 9.4359220 9.9914326 8.8740372 8.3535806 8.7552509 9.9914326 9.0784452 6.8721149 9.7818922 9.4759142 8.4040289 11.151012 10.834453 10.246976 8.8831509 11.535036 10.638366 10.332650 8.7685783 9.5972406 10.345362 10.338558 8.8740372 9.0525368 10.300284 11.300170 8.5954173 8.9384543 10.445659 9.4109205 9.7854106 9.3738959 10.985463 10.508428 9.8505087 11.290803 10.658651 9.6413528 9.2724376 8.5608573 9.4460645 8.6974666 8.2777386 10.961410 9.6568020 10.084492 8.0251579 9.1248426 10.480594 10.610350 8.7243116 9.5474299 9.5835510 8.3872693 8.3076837 10.974265 11.581614 10.583762 8.6214566 11.261881 9.5283310 10.709318 8.6696292 9.9475220 10.502006 8.8200684 8.8740372 10.056784 8.4205501 7.6649526 9.8192616 9.9073522 10.336346 8.3680666 9.8765480 9.4478086 7.6632471 8.0534378 10.626478 9.8559347 7.9663937 11.257333 9.6552144 9.3738959 9.6263747 8.3902235 9.4547123 8.8565077 8.0295023 9.2720702 8.6566095 9.4092455 9.2556118 8.6974666 8.8883588 10.390998 8.3687109 8.9781213 11.819075 8.4789107 9.7976694 9.3739922 9.0367824 7.6465905 8.5659252 7.8555024 9.1891120 9.1296630 7.6655010 9.3370640 9.4377868 7.7671001 10.144767 8.1538827 9.9143049 9.5201144 9.4483183 8.9160819 10.586117 10.390998 9.0426203 8.9205133 8.9326255 8.9802310 10.352123 8.4256746 9.6330811 9.0798523 11.581614 8.2572818 10.907702 9.1135950 8.9051338 8.6974666 10.914212 9.2132164 8.1782582 9.0667491 10.067936 10.257634 8.9524652 11.035764 10.501490 9.0525368 11.002367 11.241085 9.7059909 11.641085 11.910680 11.527648 10.664235 10.190148 12.008723 12.657653 12.355484 9.9250269 12.127052 13.052046 11.678463 11.091561 13.983120 13.286910 12.759092 13.742774 9.3344975 12.566071 13.617085 11.922274 10.924608 13.066818 10.175999 11.061032 9.6568020 9.8791708 12.910120 9.5345755 9.1868685 10.174597 10.609553 12.399492 9.2815314 10.174597 12.128942 13.069526 10.533560 10.994404 11.550871 9.8559347 9.1260138 10.319356 11.132941 9.0991339 15.693814 13.515865 14.546686 14.788798 COMPUTING BOX-COX TRANSFORM FOR VARIABLE 1 THE VALUE OF L(LAMBDA) = -19.768188 AT LAMBDA = 1.0000000 NUMBER OF ITERATIONS = 0 MAXIMUM VALUE OF L(LAMBDA) = -14.691226 AT LAMBDA = -1.3717306 NUMBER OF ITERATIONS = 6 THE P-VALUE FOR THE BOX-COX TRANSFORMATION = 0.14399451E-02 COMPUTING BOX-COX TRANSFORM FOR VARIABLE 2 THE VALUE OF L(LAMBDA) = -19.768188 AT LAMBDA = 1.0000000 NUMBER OF ITERATIONS = 0 MAXIMUM VALUE OF L(LAMBDA) = -16.818024 AT LAMBDA = -1.2822683 NUMBER OF ITERATIONS = 5 THE P-VALUE FOR THE BOX-COX TRANSFORMATION = 0.15138070E-01 COMPUTING BOX-COX TRANSFORM FOR VARIABLE 3 THE VALUE OF L(LAMBDA) = -19.768188 AT LAMBDA = 1.0000000 NUMBER OF ITERATIONS = 0 MAXIMUM VALUE OF L(LAMBDA) = -12.732495 AT LAMBDA = -2.7609076 NUMBER OF ITERATIONS = 5 THE P-VALUE FOR THE BOX-COX TRANSFORMATION = 0.17600102E-03 COMPUTING BOX-COX TRANSFORM FOR VARIABLE 4 THE VALUE OF L(LAMBDA) = -19.768188 AT LAMBDA = 1.0000000 NUMBER OF ITERATIONS = 0 MAXIMUM VALUE OF L(LAMBDA) = -16.867214 AT LAMBDA = -0.74980676 NUMBER OF ITERATIONS = 4 THE P-VALUE FOR THE BOX-COX TRANSFORMATION = 0.16008435E-01 NORMALIZED DATA IN FORMAT(4(1X,G18.8)) IS 0.69789955 0.73748427 0.36117013 1.0812383 0.69546408 0.73910835 0.36132621 1.0621429 0.69183614 0.73910835 0.36137944 1.0193406 0.69708063 0.73624343 0.36118459 1.1150180 0.70125666 0.74040713 0.36132868 1.1204993 0.70055265 0.74082619 0.36129689 1.0889788 0.69944144 0.74085480 0.36132621 1.0780199 0.69926381 0.74505960 0.36124578 1.0755772 0.69983014 0.73585674 0.36153284 1.0846202 0.70177857 0.74166161 0.36154494 1.1170510 0.70062691 0.73720096 0.36142595 1.0670875 0.69551347 0.73117486 0.36114126 1.1121882 0.69651199 0.73959002 0.36104671 1.0795404 0.69996346 0.74213157 0.36128419 1.0880222 0.69617081 0.72885369 0.36115176 1.1123827 0.70368247 0.74200997 0.36125372 1.1166340 0.69590950 0.74257816 0.36126816 1.0954685 0.70004466 0.73204105 0.36132621 1.0974116 0.68979465 0.72260912 0.36153917 1.0947446 0.69940606 0.72870354 0.36154970 1.0860826 0.68438352 0.72612644 0.36166862 1.0938106 0.68669615 0.74488967 0.36150772 1.0846202 0.69637102 0.72887673 0.36146644 1.0737886 0.68715164 0.73500985 0.36126429 1.0853221 0.69456451 0.73117486 0.36133009 1.1031328 0.68946108 0.73311797 0.36180377 1.0651580 0.69715113 0.73563429 0.36136896 1.0435283 0.69070460 0.72438395 0.36140643 1.0796410 0.68440152 0.73540984 0.36146281 1.0469104 0.69863660 0.72697385 0.36155651 1.0874939 0.69552443 0.73270045 0.36166301 1.1031328 0.69344685 0.73273049 0.36134194 1.0764780 0.69946792 0.72915167 0.36150332 1.0785968 0.70368247 0.72782166 0.36170557 1.0793053 0.69269162 0.73117486 0.36170639 1.0813704 0.68819239 0.73370313 0.36158324 1.1008889 0.69295472 0.74398663 0.36165099 1.0780199 0.70183594 0.74482482 0.36151767 1.1219571 0.70463723 0.74593790 0.36167380 1.0997339 0.70490973 0.74977198 0.36184944 1.0950637 0.70523166 0.75093308 0.36179047 1.1141398 0.70945054 0.75158727 0.36187919 1.1467316 0.69496314 0.74949044 0.36193191 1.1257124 0.70157031 0.75097501 0.36160115 1.1136856 0.69651199 0.73851349 0.36188944 1.0877739 0.69421049 0.74004682 0.36166628 1.1317432 0.69469637 0.74004682 0.36183108 1.1395564 0.70016360 0.74381345 0.36177787 1.0938106 0.69389182 0.74076168 0.36173269 1.0790022 0.71231393 0.75220008 0.36197654 1.1567365 REDOING SKEWNESS AND KURTOSIS ON NORMALIZED DATA COMPUTING MEAN VECTOR 1 0.69703793 2 0.73790545 3 0.36150860 4 1.0931779 *****DATA MATRIX NOW HAS MEAN ZERO***** COMPUTING MAXIMUM LIKELIHOOD SIGMA MATRIX SIGMA MATRIX 0.34090695E-04 0.22545239E-04 0.56050850E-04 0.24452897E-06 0.54158642E-06 0.54037785E-07 0.67451236E-04 0.70680385E-04 0.18850133E-05 0.64458710E-03 COMPUTING CORRELATION MATRIX CORRELATION MATRIX WITH I-TH DIAG ELEMENT THE STANDARD DEVIATION 0.58387238E-02 0.51575780 0.74867116E-02 0.18016218 0.31119162 0.23246029E-03 0.45502073 0.37184933 0.31939260 0.25388720E-01 COMPUTING INVERSE OF COVARIANCE MATRIX SIGMA INVERSE 45027.099 -14512.636 26421.698 56288.719 -168206.54 21713820. -3285.0172 -886.65900 -50945.295 2141.3405 COMPUTING SKEWNESS AND KURTOSIS THEORETICAL SAMPLE SKEWNESS 0.0000 2.92903 KURTOSIS 24.00 21.4391 TEST OF SKEWNESS OBSERVED CHI-SQUARE = 24.408596 DF = 20.000000 LEVEL OF SIGNIFICANCE FOR CHI-SQUARE = 0.22500002 TEST OF KURTOSIS Z = -1.3068548 LEVEL OF SIGNIFICANCE FOR TWO-SIDED TEST = 0.19126205 1 Q-Q PLOT OF VARIABLE 1 0.24D-01 +*********+*********+*********+*********+*********+*********+ * I * * I * * I * * I * 0.20D-01 + I + * I * * I * * I * * I * 0.16D-01 + I + * I + * * I * * I * * I + * 0.12D-01 + I + * I * * I * * I * * I * 0.80D-02 + I + + + * I + * * I ++ * * I * * I +++ * 0.40D-02 + I + + * I ++++ * * I +++ * * I + * * I + * 0.00D+00 +-----------------------------++----------------------------+ * ++++ * * +++ I * * +++ I * * ++ I * -0.40D-02 + ++ I + * I * * + I * * + I * * ++ I * -0.80D-02 + I + * + I * * + I * * + I * * I * -0.12D-01 + I + * + + I * * I * * I * * I * -0.16D-01 +*********+*********+*********+*********+*********+*********+ -0.24D+01 -0.80D+00 0.80D+00 0.24D+01 CORRELATION BETWEEN DATA AND NORMAL SCORES 1 : 0.99191560 1 Q-Q PLOT OF VARIABLE 2 0.24D-01 +*********+*********+*********+*********+*********+*********+ * I * * I * * I * * I * 0.20D-01 + I + * I * * I * * I * * I * 0.16D-01 + I + * I * * I + * * I + * * I + + * 0.12D-01 + I + + * I + * * I * * I * * I * 0.80D-02 + I + + * I +++ * * I + * * I + * * I + * 0.40D-02 + I +++ + * I ++ * * I ++ * * +++ * * + * 0.00D+00 +-----------------------------------------------------------+ * ++I * * + I * * ++ I * * + I * -0.40D-02 + + I + * ++ I * * + I * * +++ I * * I * -0.80D-02 + I + * +++ I * * + I * * + I * * + I * -0.12D-01 + + I + * I * * + I * * I * * + I * -0.16D-01 +*********+*********+*********+*********+*********+*********+ -0.24D+01 -0.80D+00 0.80D+00 0.24D+01 CORRELATION BETWEEN DATA AND NORMAL SCORES 2 : 0.99297306 1 Q-Q PLOT OF VARIABLE 3 0.50D-03 +*********+*********+*********+*********+*********+*********+ * I * * I + * * I * * I + * 0.40D-03 + I + * I + + * * I * * I + * * I + * 0.30D-03 + I + + * I + * * I + * * I * * I + * 0.20D-03 + I ++ + * I * * I +++ * * I + * * I * 0.10D-03 + I + + * I + * * I * * I ++ * * I+ * 0.00D+00 +----------------------------++-----------------------------+ * I * * + I * * I * * + I * -0.10D-03 + + I + * + I * * + I * * + I * * ++++ I * -0.20D-03 + I + * ++ I * * ++ I * * ++ I * * I * -0.30D-03 + I + * + I * * + I * * + + I * * I * -0.40D-03 + I + * I * * I * * + I * * I * -0.50D-03 +*********+*********+*********+*********+*********+*********+ -0.24D+01 -0.80D+00 0.80D+00 0.24D+01 CORRELATION BETWEEN DATA AND NORMAL SCORES 3 : 0.99092794 1 Q-Q PLOT OF VARIABLE 4 0.12D+00 +*********+*********+*********+*********+*********+*********+ * I * * I * * I * * I * 0.10D+00 + I + * I * * I * * I * * I * 0.80D-01 + I + * I * * I * * I * * I + * 0.60D-01 + I + * I * * I + * * I + * * I * 0.40D-01 + I + + * I * * I + * * I ++ * * I ++ * 0.20D-01 + I ++++ + * I * * I * * I +++ * * I ++ * 0.00D+00 +------------------------------+++--------------------------+ * +++ * * +++ I * * ++++ I * * ++++++ I * -0.20D-01 + + I + * I * * + + I * * + I * * I * -0.40D-01 + I + * I * * + + I * * I * * I * -0.60D-01 + I + * I * * I * * + I * * I * -0.80D-01 +*********+*********+*********+*********+*********+*********+ -0.24D+01 -0.80D+00 0.80D+00 0.24D+01 CORRELATION BETWEEN DATA AND NORMAL SCORES 4 : 0.98335311 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ APPENDIX D: THE FORTRAN PROGRAM UTRANS PROGRAM UTRANS 0000001 C THIS PROGRAM CALCULATES SAMPLE SKEWNESS AND KURTOSIS TO TEST 0000002 C THE NORMALITY OF MULTIVARIATE DATA BEFORE AND AFTER A BOX-COX 0000003 C TRANSFORMATION. THE TRANSFORMATION MAKES EACH MARGINAL 0000004 C DISTRIBUTION APPROXIMATELY NORMAL. 0000005 C DEVELOPED BY HAMPARSUM BOZDOGAN AND DONALD E. RAMIREZ 0000006 C REVISED BY MARYANN DUCHNA MARCH 23, 1988 0000007 C MATHEMATICS DEPARTMENT 0000008 C MATHEMATICS-ASTRONOMY BUILDING 0000009 C UNIVERSITY OF VIRGINIA 0000010 C CHARLOTTESVILLE, VIRGINIA 22901 0000011 C WK IS A WORK VECTOR FOR IMSL'S LINV3F 0000012 DOUBLE PRECISION X,SIGMA,B,WK,COLX,MEAN 0000013 INTEGER NMAX,PMAX,LENX 0000014 C NMAX IS THE MAXIMUM NUMBER OF INPUT OBSERVATIONS 0000015 C PMAX IS THE MAXIMUM NUMBER OF INPUT VARIABLES 0000016 PARAMETER(NMAX=500, 0000017 + PMAX=30) 0000018 C NOTE: IF PARAMETER STATEMENT ABOVE IS CHANGED IT MUST ALSO 0000019 C BE CHANGED IN SUBROUTINES SKKUR AND DOSTD AND FUNCTION F 0000020 C AND DOBXCX AND RESULT AND NEWDAT 0000021 COMMON /BLK/ X(NMAX,PMAX),SIGMA(PMAX,PMAX),B(NMAX),WK(2*NMAX), 0000022 + MEAN(PMAX) 0000023 COMMON /BC/ COLX(NMAX),LENX 0000024 CHARACTER VERSON*70 0000025 LOGICAL STD 0000026 PARAMETER(VERSON='COPYRIGHT @ MARCH 23, 1988') 0000027 INTEGER P,N,IOUT 0000028 EXTERNAL SKKUR,RIPGDF,BOXCOX,DOSTD,FLOPEN,NEWDAT 0000029 CALL FLOPEN(IOUT) 0000030 PRINT 700 0000031 WRITE(IOUT,700) 0000032 700 FORMAT(/,1X,'PROGRAM NORMALIZES MULTIVARIATE DATA') 0000033 PRINT 710 , VERSON 0000034 WRITE(IOUT,710) VERSON 0000035 710 FORMAT(/,1X,'VERSION = ',A) 0000036 CALL RIPGDF(P,PMAX,N,NMAX,X,NMAX,PMAX,IOUT,STD) 0000037 CALL SKKUR(N,P,IOUT) 0000038 IF (STD) CALL DOSTD(N,P,X,NMAX,PMAX,IOUT,MEAN) 0000039 CALL BOXCOX(N,P,X,NMAX,PMAX,IOUT) 0000040 CALL NEWDAT(N,P,IOUT) 0000041 PRINT 720 0000042 WRITE(IOUT,720) 0000043 720 FORMAT(/,1X,'REDOING SKEWNESS AND KURTOSIS ON NORMALIZED DATA',/) 0000044 CALL SKKUR(N,P,IOUT) 0000045 CLOSE(IOUT) 0000046 STOP 0000047 END 0000048 *********************************************************************** 0000049 SUBROUTINE SKKUR(N,P,IOUT) 0000050 C SKKUR COMPUTES SKEWNESS AND KURTOSIS OF X 0000051 C WRITTEN BY L. BOBBITT 5/20/85 0000052 DOUBLE PRECISION X,SIGMA,B,WK,MEAN 0000053 INTEGER N,P,IOUT 0000054 INTEGER NMAX,PMAX 0000055 PARAMETER(NMAX=500, 0000056 + PMAX=30) 0000057 COMMON /BLK/ X(NMAX,PMAX),SIGMA(PMAX,PMAX),B(NMAX),WK(2*NMAX), 0000058 + MEAN(PMAX) 0000059 EXTERNAL LINV3F,MDCH,MDNORD,QQPLOT 0000060 INTRINSIC DBLE,SQRT 0000061 C LOCAL VARS 0000062 INTEGER I,J,L,IJOB,IER,M,II,LL,PTR(PMAX) 0000063 REAL CS,DF,P1 0000064 DOUBLE PRECISION DN,BS,BK,SUM,D1,D2,P2,Z,AVG, 0000065 + CORR(((PMAX+1)*PMAX)/2) 0000066 DN=DBLE(N) 0000067 C COMPUTE MEAN 0000068 PRINT 700 0000069 WRITE(IOUT,700) 0000070 700 FORMAT(/,1X,'COMPUTING MEAN VECTOR') 0000071 DO 30 I=1,P 0000072 AVG=0.D0 0000073 DO 10 L=1,N 0000074 AVG=AVG+X(L,I) 0000075 10 CONTINUE 0000076 AVG=AVG/DN 0000077 PRINT 710,I,AVG 0000078 WRITE(IOUT,710) I,AVG 0000079 710 FORMAT(5X,I5,G18.8) 0000080 MEAN(I)=AVG 0000081 DO 20 L=1,N 0000082 X(L,I)=X(L,I)-MEAN(I) 0000083 20 CONTINUE 0000084 30 CONTINUE 0000085 C X NOW HAS MEAN ZERO 0000086 C COMPUTE COVARIANCE 0000087 PRINT 730 0000088 WRITE(IOUT,730) 0000089 730 FORMAT(/,1X,'COMPUTING MAXIMUM LIKELIHOOD SIGMA MATRIX') 0000090 DO 60 I=1,P 0000091 DO 50 J=1,I 0000092 SUM=0.0D0 0000093 DO 40 L=1,N 0000094 SUM=SUM+X(L,I)*X(L,J) 0000095 40 CONTINUE 0000096 SUM=SUM/DN 0000097 SIGMA(I,J)=SUM 0000098 SIGMA(J,I)=SUM 0000099 50 CONTINUE 0000100 60 CONTINUE 0000101 WRITE(IOUT,740) 0000102 740 FORMAT(/,1X,'SIGMA MATRIX ') 0000103 DO 70 I=1,P 0000104 WRITE(IOUT,750) (SIGMA(I,J),J=1,I) 0000105 750 FORMAT(/(4(1X,G18.8))) 0000106 70 CONTINUE 0000107 C COMPUTE CORRELATION 0000108 PRINT 760 0000109 WRITE(IOUT,760) 0000110 760 FORMAT(/,1X,'COMPUTING CORRELATION MATRIX') 0000111 WRITE(IOUT,770) 0000112 770 FORMAT(/,1X,'CORRELATION MATRIX WITH I-TH DIAG ELEMENT THE', 0000113 + ' STANDARD DEVIATION') 0000114 L=0 0000115 II=0 0000116 DO 90 ,I=1,P 0000117 II=II+I 0000118 PTR(I)=II 0000119 CORR(II)=SQRT(SIGMA(I,I)) 0000120 DO 80 ,J=1,I-1 0000121 L=L+1 0000122 CORR(L)=SIGMA(I,J)/( CORR(II) * CORR( PTR(J) ) ) 0000123 80 CONTINUE 0000124 L=L+1 0000125 90 CONTINUE 0000126 L=0 0000127 DO 100,I=1,P 0000128 WRITE(IOUT,780) (CORR(LL),LL=L+1,L+I) 0000129 780 FORMAT(/(4(1X,G18.8))) 0000130 L=L+I 0000131 100 CONTINUE 0000132 C COMPUTE INVERSE OF COVARIANCE 0000133 PRINT 790 0000134 WRITE(IOUT,790) 0000135 790 FORMAT(/,1X,'COMPUTING INVERSE OF COVARIANCE MATRIX') 0000136 IJOB=1 0000137 D1=1.0D0 0000138 CALL LINV3F(SIGMA,B,IJOB,P,PMAX,D1,D2,WK,IER) 0000139 IF(IER.NE.0) THEN 0000140 PRINT 800, IER 0000141 WRITE(IOUT,800) IER 0000142 800 FORMAT(/,1X,'SIGMA MATRIX IS POSSIBLY SINGULAR, IER= ',I5) 0000143 ENDIF 0000144 WRITE(IOUT,810) 0000145 810 FORMAT(/,1X,'SIGMA INVERSE ') 0000146 DO 110, I=1,P 0000147 WRITE(IOUT,820) (SIGMA(I,J),J=1,I) 0000148 820 FORMAT(/,(4(1X,G18.8))) 0000149 110 CONTINUE 0000150 C COMPUTE SKEWNESS IN BS 0000151 PRINT 830 0000152 WRITE(IOUT,830) 0000153 830 FORMAT(/,1X,'COMPUTING SKEWNESS AND KURTOSIS') 0000154 BS=0.0D0 0000155 BK=0.0D0 0000156 DO 170, I=1,N 0000157 DO 140, J=1,N 0000158 SUM=0.0D0 0000159 DO 130, M=1,P 0000160 DO 120, L=1,P 0000161 SUM = SUM + X(I,L)*SIGMA(L,M)*X(J,M) 0000162 120 CONTINUE 0000163 130 CONTINUE 0000164 BS=BS+SUM**3 0000165 140 CONTINUE 0000166 C COMPUTE KURTOSIS IN BK 0000167 SUM=0.0D0 0000168 DO 160, M=1,P 0000169 DO 150, L=1,P 0000170 SUM=SUM+X(I,L)*SIGMA(L,M)*X(I,M) 0000171 150 CONTINUE 0000172 160 CONTINUE 0000173 BK=BK+SUM*SUM 0000174 170 CONTINUE 0000175 BS=BS/(DN*DN) 0000176 BK=BK/DN 0000177 C PRINT OUTPUT 0000178 PRINT 840 0000179 WRITE(IOUT,840) 0000180 840 FORMAT(/,1X,8X,2X,'THEORETICAL',11X,'SAMPLE') 0000181 PRINT 850,'SKEWNESS',0.0D0,BS 0000182 WRITE(IOUT,850) 'SKEWNESS',0.0D0,BS 0000183 850 FORMAT(/,1X,A,2X,G11.4,2X,G20.6) 0000184 PRINT 860,'KURTOSIS',DBLE(P*(P+2)),BK 0000185 WRITE(IOUT,860) 'KURTOSIS',DBLE(P*(P+2)),BK 0000186 860 FORMAT(1X,A,2X,G11.4,2X,G20.6) 0000187 C SIGNIFICANCE FOR SKEWNESS 0000188 CS=N*BS/6.0 0000189 DF=P*(P+1)*(P+2)/6.0 0000190 CALL MDCH(CS,DF,P1,IER) 0000191 P1=1.0 - P1 0000192 PRINT 870, CS 0000193 WRITE(IOUT,870) CS 0000194 870 FORMAT(/,1X,'TEST OF SKEWNESS',/,1X,5X,'OBSERVED CHI-SQUARE = ', 0000195 + G18.8) 0000196 PRINT 880, DF 0000197 WRITE(IOUT,880) DF 0000198 880 FORMAT(1X,5X,'DF = ',G18.8) 0000199 PRINT 890, P1 0000200 WRITE(IOUT,890) P1 0000201 890 FORMAT(1X,5X,'LEVEL OF SIGNIFICANCE FOR CHI-SQUARE = ',G18.8) 0000202 C SIGNIFICANCE FOR KURTOSIS 0000203 Z=(BK-P*(P+2))/((8.0*P*(P+2)/N)**0.5) 0000204 CALL MDNORD(Z,P2) 0000205 PRINT 900, Z 0000206 WRITE(IOUT,900) Z 0000207 900 FORMAT(/,1X,'TEST OF KURTOSIS',/,1X,5X,'Z = ',G18.8) 0000208 IF(Z.GT.0.0D0) THEN 0000209 P2=(1.0D0-P2)*2.0D0 0000210 ELSE 0000211 P2=2.0D0*P2 0000212 ENDIF 0000213 PRINT 910, P2 0000214 WRITE(IOUT,910) P2 0000215 910 FORMAT(1X,5X,'LEVEL OF SIGNIFICANCE FOR TWO-SIDED TEST = ', 0000216 + G18.8,///) 0000217 CALL QQPLOT(X,NMAX,N,P,WK,IOUT) 0000218 DO 180,I=1,P 0000219 DO 190,L=1,N 0000220 X(L,I)=X(L,I)+MEAN(I) 0000221 190 CONTINUE 0000222 180 CONTINUE 0000223 C ORIGINAL X 0000224 RETURN 0000225 END 0000226 *********************************************************************** 0000227 SUBROUTINE QQPLOT(X,IN,N,P,WORK,IOUT) 0000228 C WRITTEN BY LARRY BOBBITT 0000229 C REVISED BY JIM SYTA 7/16/85 0000230 C PLOTS A QQ-PLOT OF THE DATA IN THE ARRAY X 0000231 INTEGER IN,N,P,IOUT 0000232 DOUBLE PRECISION X(IN,P),WORK(N,2) 0000233 C X (INPUT) ARRAY CONTAINING THE DATA 0000234 C IN (INPUT) ROW DIM OF X 0000235 C N (INPUT) SAMPLE SIZE 0000236 C P (INPUT) NUMBER OF VARIABLES 0000237 C WORK IS A WORK VECTOR OF LENGTH N 0000238 C IOUT (INPUT) UNIT TO WRITE PLOTS TO 0000239 C REQUIRES IMSL ROUTINES VSRTAD,USPLTD,MDNRIS 0000240 EXTERNAL VSRTAD,USPLOD,UGETIO,MDNRIS,CORR 0000241 INTRINSIC DBLE 0000242 C LOCAL VARIABLES 0000243 INTEGER NTITLE,NXLABL,NYLABL 0000244 INTEGER I,II,IOPT,IER,NIN 0000245 REAL UNIFP,QUANTL 0000246 DOUBLE PRECISION RANGE(4),CORVAL 0000247 CHARACTER ICHAR*1,ITITLE*72,IXLABL*36,IYLABL*36 0000248 CALL UGETIO(3,NIN,IOUT) 0000249 DO 10,II=1,N 0000250 C CONTINUITY CORRECTION USED BY MINITAB 0000251 UNIFP = (II - 0.375) / (N + 0.25) 0000252 CALL MDNRIS(UNIFP,QUANTL,IER) 0000253 WORK(II,1) = DBLE(QUANTL) 0000254 10 CONTINUE 0000255 PRINT 700 0000256 700 FORMAT(/,1X,'CORRELATION BETWEEN DATA AND NORMAL SCORES') 0000257 DO 40,I=1,P 0000258 DO 20,II=1,N 0000259 WORK(II,2)=X(II,I) 0000260 20 CONTINUE 0000261 CALL VSRTAD(WORK(1,2),N) 0000262 DO 30,II=1,4 0000263 30 RANGE(II)=0.D0 0000264 ICHAR='+' 0000265 IOPT=0 0000266 ITITLE=' ' 0000267 WRITE(ITITLE,710) I 0000268 710 FORMAT('Q-Q PLOT OF VARIABLE ',I3) 0000269 NTITLE=25 0000270 NYLABL=0 0000271 NXLABL=0 0000272 CALL USPLOD(WORK(1,1),WORK(1,2),IN,N,1,1,ITITLE,NTITLE,IXLABL, 0000273 + NXLABL,IYLABL,NYLABL,RANGE,ICHAR,IOPT,IER) 0000274 CALL CORR(WORK(1,2),WORK(1,1),CORVAL,N) 0000275 WRITE(IOUT,700) 0000276 C700 FORMAT(/,1X,'CORRELATION BETWEEN DATA AND NORMAL SCORES') 0000277 PRINT 720, I,CORVAL 0000278 720 FORMAT(/,1X,I5,' : ',G18.8) 0000279 WRITE(IOUT,730) I,CORVAL 0000280 730 FORMAT(/,1X,I5,' : ',G18.8,///) 0000281 40 CONTINUE 0000282 RETURN 0000283 END 0000284 ************************************************************************0000285 SUBROUTINE RIPGDF(NVAR,MVARS,N,MOBS,X,IX,IXCOL,IOUT,STD) 0000286 C READ_INTERACTIVELY_THE_PARAMETERS_AND_GET_DATA_FROM_A_FILE 0000287 C WRITTEN BY L. BOBBITT 0000288 INTEGER NVAR,MVARS,N,MOBS,IX,IXCOL,IOUT 0000289 LOGICAL STD 0000290 EXTERNAL QUERY 0000291 DOUBLE PRECISION X(IX,IXCOL) 0000292 C LOCAL VARIABLES 0000293 INTEGER I,J 0000294 CHARACTER NAME*120,TITLE*50 0000295 10 CONTINUE 0000296 PRINT* 0000297 PRINT*,'WHAT FILE IS YOUR DATA IN?' 0000298 READ 700,NAME 0000299 700 FORMAT(A) 0000300 OPEN(UNIT=5,FILE=NAME,STATUS='OLD',ERR=40) 0000301 PRINT 710, NAME 0000302 WRITE(IOUT,710) NAME 0000303 710 FORMAT(/,1X,'DATA IS FROM FILE = ',A) 0000304 PRINT* 0000305 PRINT*,'ENTER THE NUMBER OF VARIABLES' 0000306 READ*,NVAR 0000307 IF(NVAR.GT.MVARS) THEN 0000308 PRINT*,'MAXIMUM NO. OF VARIABLES IS ',MVARS 0000309 STOP 0000310 ENDIF 0000311 PRINT 720, NVAR 0000312 WRITE(IOUT,720) NVAR 0000313 720 FORMAT(/,1X,'NUMBER OF VARIABLES = ',I5) 0000314 PRINT* 0000315 PRINT*,'ENTER THE NUMBER OF CASES' 0000316 READ*,N 0000317 IF(N.GT.MOBS) THEN 0000318 PRINT*,'MAXIMUM NO. OF CASES IS ',MOBS 0000319 STOP 0000320 ENDIF 0000321 PRINT 730, N 0000322 WRITE(IOUT,730) N 0000323 730 FORMAT(/,1X,'NUMBER OF CASES = ',I5) 0000324 PRINT* 0000325 PRINT*,'ENTER THE FORTRAN FORMAT (E.G., (4(1X,G18.8)) )' 0000326 PRINT*,'OR RETURN FOR FREEFIELD' 0000327 READ 700,NAME 0000328 C700 FORMAT(A) 0000329 PRINT*,'READING DATA...' 0000330 IF(NAME.EQ.' ') THEN 0000331 PRINT 740 0000332 WRITE(IOUT,740) 0000333 740 FORMAT(/,1X,'FORMAT IS FREEFIELD') 0000334 DO 20 I=1,N 0000335 READ(5,*)(X(I,J),J=1,NVAR) 0000336 20 CONTINUE 0000337 ELSE 0000338 PRINT 750,NAME 0000339 WRITE(IOUT,750) NAME 0000340 750 FORMAT(/,1X,'FORMAT IS ',A) 0000341 DO 30 I=1,N 0000342 READ(5,NAME)(X(I,J),J=1,NVAR) 0000343 30 CONTINUE 0000344 ENDIF 0000345 PRINT 760 0000346 WRITE(IOUT,760) 0000347 760 FORMAT(/,1X,'DATA READ') 0000348 CLOSE(5) 0000349 TITLE='DO YOU WANT THE DATA STANDARDIZED?' 0000350 CALL QUERY(TITLE,STD) 0000351 IF(STD) GOTO 50 0000352 PRINT 770 0000353 WRITE(IOUT,770) 0000354 770 FORMAT(/,1X,'DATA WILL NOT BE STANDARDIZED') 0000355 RETURN 0000356 50 PRINT 780 0000357 WRITE(IOUT,780) 0000358 780 FORMAT(/,1X,'DATA WILL BE STANDARDIZED') 0000359 RETURN 0000360 40 CONTINUE 0000361 PRINT* 0000362 PRINT*,'ERROR IN OPENING FILE ',NAME 0000363 PRINT*,'TRY AGAIN' 0000364 GOTO 10 0000365 END 0000366 ************************************************************************0000367 SUBROUTINE QUERY(TITLE,RESPON) 0000368 LOGICAL RESPON 0000369 CHARACTER*50 TITLE 0000370 C LOCAL VARIABLES 0000371 CHARACTER*1 ANS 0000372 PRINT 710,TITLE 0000373 710 FORMAT(/,1X,A) 0000374 10 CONTINUE 0000375 PRINT 720 0000376 720 FORMAT(' ENTER Y OR N *****CAPS ONLY*****') 0000377 READ 730,ANS 0000378 730 FORMAT(A) 0000379 IF((ANS.NE.'Y').AND.(ANS.NE.'N')) THEN 0000380 GOTO 10 0000381 ENDIF 0000382 RESPON=(ANS.EQ.'Y') 0000383 RETURN 0000384 END 0000385 ************************************************************************0000386 SUBROUTINE FLOPEN(IOUT) 0000387 C FLOPEN READS THE NAME OF THE OUTPUT FILE AND OPENS IT. 0000388 INTEGER IOUT 0000389 C LOCAL VARIABLES 0000390 CHARACTER FN*40 0000391 LOGICAL EX 0000392 10 CONTINUE 0000393 PRINT* 0000394 PRINT*,'ENTER THE NAME OF THE OUTPUT FILE' 0000395 READ 700,FN 0000396 700 FORMAT(A) 0000397 INQUIRE(FILE=FN,EXIST=EX) 0000398 IF (EX) THEN 0000399 PRINT* 0000400 PRINT*,'FILE = ',FN,' EXISTS. TRY AGAIN.' 0000401 GOTO 10 0000402 ENDIF 0000403 IOUT=6 0000404 OPEN(FILE=FN,UNIT=IOUT,ERR=20) 0000405 PRINT 710,FN 0000406 WRITE(IOUT,710) FN 0000407 710 FORMAT(/,1X,'OUTPUT FILE IS ',A) 0000408 RETURN 0000409 20 CONTINUE 0000410 PRINT* 0000411 PRINT*,'ERROR IN OPENING FILE = ',FN 0000412 PRINT*,'TRY AGAIN' 0000413 GOTO 10 0000414 END 0000415 ************************************************************************0000416 SUBROUTINE CORR(X,Y,CORVAL,N) 0000417 C CORR COMMPUTES THE CORRELATION BETWEEN THE DATA IN X AND THE 0000418 C NORMAL SCORES IN Y. SUBROUTINE QQPLOT CALLS CORR. 0000419 DOUBLE PRECISION X,Y,CORVAL 0000420 INTEGER N 0000421 DIMENSION X(N),Y(N) 0000422 C LOCAL VARIABLES 0000423 INTEGER I 0000424 DOUBLE PRECISION MX,MY,SSXY,SSXX,SSYY 0000425 INTRINSIC SQRT 0000426 MX = 0.0D0 0000427 MY = 0.0D0 0000428 DO 10, I = 1,N 0000429 MX = MX + X(I) 0000430 MY = MY + Y(I) 0000431 10 CONTINUE 0000432 MX = MX/N 0000433 MY = MY/N 0000434 SSXY = 0.0D0 0000435 SSXX = 0.0D0 0000436 SSYY = 0.0D0 0000437 DO 20, I = 1,N 0000438 SSXY = SSXY + (X(I) - MX) * (Y(I) - MY) 0000439 SSXX = SSXX + (X(I) - MX)**2 0000440 SSYY = SSYY + (Y(I) - MY)**2 0000441 20 CONTINUE 0000442 CORVAL = SSXY/SQRT(SSXX*SSYY) 0000443 RETURN 0000444 END 0000445 ************************************************************************0000446 SUBROUTINE DOSTD(N,P,X,NMAX,PMAX,IOUT,MEAN) 0000447 C DOSTD COMPUTES THE STANDARD DEVIATION, SKEWNESS, AND KURTOSIS FOR 0000448 C EACH COLUMN OF X. THE DATA IN X IS THEN STANDARDIZED TO HAVE 0000449 C MEAN MU, STANDARD DEVIATION NEWSTD, AND POSITIVE SKEWNESS. 0000450 C WRITTEN BY JIM SYTA 0000451 INTEGER N,P,NMAX,PMAX,IOUT 0000452 DOUBLE PRECISION X(NMAX,PMAX),MEAN(PMAX) 0000453 C LOCAL VARIABLES 0000454 INTEGER I,J 0000455 DOUBLE PRECISION MU,NEWSTD,SK,XKUR,STD,Z 0000456 PARAMETER (MU=10.0D0, 0000457 + NEWSTD=1.5D0) 0000458 INTRINSIC SQRT 0000459 PRINT 700 0000460 WRITE(IOUT,700) 0000461 700 FORMAT(/,1X,'COMPUTING SKEWNESS AND KURTOSIS') 0000462 DO 60,I=1,N 0000463 DO 70,J=1,P 0000464 X(I,J)=X(I,J)-MEAN(J) 0000465 70 CONTINUE 0000466 60 CONTINUE 0000467 C X NOW HAS MEAN ZERO 0000468 C COMPUTE STANDARD DEVIATION IN STD, SKEWNESS IN SK, AND 0000469 C KURTOSIS IN XKUR 0000470 DO 40,J=1,P 0000471 STD = 0.0D0 0000472 SK = 0.0D0 0000473 XKUR = 0.0D0 0000474 DO 10,I=1,N 0000475 10 CONTINUE 0000476 DO 20 I = 1,N 0000477 STD = STD+(X(I,J))**2 0000478 SK = SK + (X(I,J))**3 0000479 XKUR =XKUR+(X(I,J))**4 0000480 20 CONTINUE 0000481 STD = SQRT(STD/(N-1)) 0000482 SK = SK/N/STD**3 0000483 XKUR =XKUR/N/STD**4 0000484 C COMPUTE STANDARDIZED DATA AND PUT IN X 0000485 DO 30,I=1,N 0000486 Z = (X(I,J))/STD 0000487 IF (SK.LT.0.0) THEN 0000488 Z = MU - NEWSTD*Z 0000489 ELSE 0000490 Z = MU + NEWSTD*Z 0000491 ENDIF 0000492 X(I,J) = Z 0000493 IF (Z .LT. 0.0) THEN 0000494 PRINT 730, I,J 0000495 WRITE(IOUT,730) I,J 0000496 730 FORMAT(1X,'WARNING: DATA IS NEGATIVE AT I,J = ',I5,1X,I5)0000497 ENDIF 0000498 30 CONTINUE 0000499 PRINT 740, J 0000500 WRITE(IOUT,740) J 0000501 740 FORMAT(/,1X,'COLUMN = ',I5) 0000502 PRINT 750, STD 0000503 WRITE(IOUT,750) STD 0000504 750 FORMAT(1X,5X,'STANDARD DEVIATION = ',G18.8) 0000505 PRINT 760, SK 0000506 WRITE(IOUT,760) SK 0000507 760 FORMAT(1X,5X,'SKEWNESS = ',G18.8) 0000508 PRINT 770, XKUR 0000509 WRITE(IOUT,770) XKUR 0000510 770 FORMAT(1X,5X,'KURTOSIS = ',G18.8) 0000511 40 CONTINUE 0000512 PRINT 710, MU 0000513 WRITE(IOUT,710) MU 0000514 710 FORMAT(/,1X,'*****DATA NOW HAS MEAN = ',G18.8,'*****') 0000515 PRINT 720, NEWSTD 0000516 WRITE(IOUT,720) NEWSTD 0000517 720 FORMAT(1X,5X,'STANDARD DEVIATION = ',G18.8, 0000518 + /,1X,5X,'AND POSITIVE SKEWNESS') 0000519 WRITE(IOUT,780) MU,NEWSTD 0000520 780 FORMAT(/,1X,'STANDARDIZED DATA WITH MEAN = ',G18.8, 0000521 + /,1X,9X,'STANDARD DEVIATION = ',G18.8,/) 0000522 DO 50,I=1,N 0000523 WRITE(IOUT,790) (X(I,J),J=1,P) 0000524 790 FORMAT(4(1X,G18.8)) 0000525 50 CONTINUE 0000526 RETURN 0000527 END 0000528 *********************************************************************** 0000529 SUBROUTINE BOXCOX(N,P,X,NMAX,PMAX,IOUT) 0000530 C BOXCOX CALLS DOBXCX FOR EACH COLUMN OF X. 0000531 INTEGER N,P,NMAX,PMAX,IOUT 0000532 DOUBLE PRECISION X(NMAX,PMAX) 0000533 C LOCAL VARIABLE 0000534 INTEGER J 0000535 EXTERNAL DOBXCX 0000536 DO 10,J=1,P 0000537 PRINT 700, J 0000538 WRITE(IOUT,700) J 0000539 700 FORMAT(/,1X,'COMPUTING BOX-COX TRANSFORM FOR VARIABLE ',I5) 0000540 CALL DOBXCX(N,X(1,J),IOUT,J,P) 0000541 10 CONTINUE 0000542 RETURN 0000543 END 0000544 *********************************************************************** 0000545 SUBROUTINE DOBXCX(N,X,IOUT,COL,P) 0000546 C DOBXCX FINDS THE LAMBDA VALUE WHICH MAXIMIZES THE LOG LIKELIHOOD 0000547 C FUNCTION FOR EACH COLUMN OF X. 0000548 C WRITTEN BY JOCK BLACK --APRIL 1, 1983 0000549 C REVISED BY JIM SYTA --JULY 17, 1985 0000550 INTEGER N,NMAX,LENX,IOUT,COL,P 0000551 DOUBLE PRECISION XL,XR,X(N),COLX 0000552 PARAMETER (NMAX = 500) 0000553 COMMON /BC/ COLX(NMAX),LENX 0000554 C LOCAL VARIABLES 0000555 INTEGER NSIG,ITMAX,IER,ZERO,I 0000556 DOUBLE PRECISION F,EPS,XAPP,LOGLK1,LOGLK0,CS,PROB,ONE 0000557 LOGICAL STORE 0000558 EXTERNAL F,ZFALSE,RESULT,MDNORD 0000559 INTRINSIC SQRT 0000560 LENX=N 0000561 DO 10, I = 1,N 0000562 COLX(I) = X(I) 0000563 10 CONTINUE 0000564 C XL AND XR ARE THE BOUNDS BETWEEN WHICH ZFALSE SEARCHES FOR XAPP, 0000565 C THE LAMBDA VALUE FOR THE TRANSFORMATION 0000566 XL = -6.0D0 0000567 XR = 4.0D0 0000568 20 CONTINUE 0000569 EPS = 1.0 D-8 0000570 NSIG = 8 0000571 ITMAX = 100 0000572 C FIND LAMBDA AND RETURN AS XAPP 0000573 CALL ZFALSE(F,EPS,NSIG,XL,XR,XAPP,ITMAX,IER) 0000574 IF (IER.EQ.129) THEN 0000575 PRINT*,' FUNCTION HAS SAME SIGN AT LEFT AND RIGHT ENDPOINTS' 0000576 PRINT*,'CURRENT VALUES ARE ',XL,XR 0000577 PRINT*,'ENTER ANOTHER PAIR OF VALUES' 0000578 READ*,XL,XR 0000579 PRINT 690,XL,XR 0000580 WRITE(IOUT,690) XL,XR 0000581 690 FORMAT(/,1X,'REDOING ZERO SEARCH BETWEEN ',G18.8,G18.8) 0000582 GOTO 20 0000583 ENDIF 0000584 STORE = .FALSE. 0000585 ONE = 1.0D0 0000586 ZERO = 0 0000587 CALL RESULT(X,N,ONE,ZERO,LOGLK0,STORE,IOUT,COL) 0000588 STORE = .TRUE. 0000589 CALL RESULT(X,N,XAPP,ITMAX,LOGLK1,STORE,IOUT,COL) 0000590 CS = -2.0D0*(LOGLK0 - LOGLK1) 0000591 CS = -SQRT(CS) 0000592 CALL MDNORD(CS,PROB) 0000593 PRINT 700, 2.0D0 * PROB 0000594 WRITE(IOUT,700) 2.0D0 * PROB 0000595 700 FORMAT(/,1X,5X,'THE P-VALUE FOR THE BOX-COX TRANSFORMATION = ', 0000596 + G18.8,/) 0000597 RETURN 0000598 END 0000599 *********************************************************************** 0000600 DOUBLE PRECISION FUNCTION F(Y) 0000601 C F FINDS THE DERIVATIVE OF THE LOG LIKELIHOOD FUNCTION. ZFALSE IN 0000602 C SUBROUTINE DOBXCX DETERMINES LAMBDA BY FINDING THE ROOT OF F. 0000603 INTEGER NMAX,N 0000604 PARAMETER (NMAX = 500) 0000605 DOUBLE PRECISION Y,X(NMAX) 0000606 COMMON /BC/ X,N 0000607 C LOCAL VARIABLES 0000608 INTEGER I 0000609 DOUBLE PRECISION SUMA,SUMB,SUMC,SUMD,SUME,TEMP 0000610 INTRINSIC LOG 0000611 SUMA = 0.0D0 0000612 SUMB = 0.0D0 0000613 SUMC = 0.0D0 0000614 SUMD = 0.0D0 0000615 SUME = 0.0D0 0000616 F = 0.0D0 0000617 DO 10,I=1,N 0000618 TEMP = X(I)**Y 0000619 SUMA = SUMA + LOG(X(I)) 0000620 SUMB = SUMB + TEMP*TEMP * LOG(X(I)) 0000621 SUMC = SUMC + TEMP 0000622 SUMD = SUMD + LOG(X(I)) * (TEMP) 0000623 SUME = SUME + TEMP*TEMP 0000624 10 CONTINUE 0000625 F = N/Y + SUMA - ((N * SUMB - SUMC * SUMD) 0000626 + /(SUME - (1.0D0/N) * ((SUMC)**2))) 0000627 RETURN 0000628 END 0000629 *********************************************************************** 0000630 SUBROUTINE RESULT(X,N,LAMBDA,ITMAX,L,LEQMLE,IOUT,COL) 0000631 C RESULT PERFORMS THE BOX-COX TRANSFORMATION. 0000632 LOGICAL LEQMLE 0000633 INTEGER NMAX,PMAX,N,ITMAX,IOUT,COL 0000634 PARAMETER(NMAX=500, 0000635 + PMAX=30) 0000636 DOUBLE PRECISION X(NMAX),LAMBDA,L,OLDX,SIGMA,B,WK 0000637 COMMON /BLK/ OLDX(NMAX,PMAX),SIGMA(PMAX,PMAX),B(NMAX),WK(2*NMAX) 0000638 C LOCAL VARIABLES 0000639 INTEGER I 0000640 DOUBLE PRECISION SUMA,SUMB,SUMC,TEMP 0000641 INTRINSIC LOG 0000642 SUMA = 0.0D0 0000643 SUMB = 0.0D0 0000644 SUMC = 0.0D0 0000645 DO 10, I = 1,N 0000646 TEMP = X(I) ** LAMBDA 0000647 SUMA = SUMA + TEMP 0000648 SUMB = SUMB + TEMP * TEMP 0000649 SUMC = SUMC + LOG(X(I)) 0000650 10 CONTINUE 0000651 L = -0.5D0 * N * LOG((SUMB - SUMA * SUMA/N)/(N * LAMBDA * LAMBDA))0000652 + +((LAMBDA - 1.0D0) * SUMC) 0000653 PRINT* 0000654 IF (ITMAX .NE. 0) THEN 0000655 PRINT 700, L 0000656 WRITE(IOUT,700) L 0000657 700 FORMAT(/,1X,5X,'MAXIMUM VALUE OF L(LAMBDA) = ',G18.8) 0000658 ELSE 0000659 PRINT 710, L 0000660 WRITE(IOUT,710) L 0000661 710 FORMAT(/,1X,5X,'THE VALUE OF L(LAMBDA) = ',G18.8) 0000662 ENDIF 0000663 PRINT 720, LAMBDA 0000664 WRITE(IOUT,720) LAMBDA 0000665 720 FORMAT(1X,5X,'AT LAMBDA = ',G18.8) 0000666 PRINT 730, ITMAX 0000667 WRITE(IOUT,730) ITMAX 0000668 730 FORMAT(1X,5X,'NUMBER OF ITERATIONS = ',I5) 0000669 IF (LEQMLE) THEN 0000670 DO 20, I = 1,N 0000671 TEMP = X(I) ** LAMBDA 0000672 OLDX(I,COL) = (TEMP - 1.0D0)/LAMBDA 0000673 20 CONTINUE 0000674 ENDIF 0000675 RETURN 0000676 END 0000677 *********************************************************************** 0000678 SUBROUTINE NEWDAT(N,P,IOUT) 0000679 C NEWDAT WRITES THE NORMALIZED DATA TO THE OUTPUT FILE 0000680 INTEGER NMAX,PMAX,N,P,IOUT 0000681 PARAMETER(NMAX=500, 0000682 + PMAX=30) 0000683 DOUBLE PRECISION OLDX,SIGMA,B,WK 0000684 COMMON /BLK/ OLDX(NMAX,PMAX),SIGMA(PMAX,PMAX),B(NMAX),WK(2*NMAX) 0000685 C LOCAL VARIABLES 0000686 INTEGER I,J 0000687 WRITE(IOUT,700) 0000688 700 FORMAT(/,1X,'NORMALIZED DATA IN FORMAT(4(1X,G18.8)) IS',/) 0000689 DO 10,I=1,N 0000690 WRITE(IOUT,710) (OLDX(I,J),J=1,P) 0000691 710 FORMAT(4(1X,G18.8)) 0000692 10 CONTINUE 0000693 RETURN 0000694 *********************************************************************** 0000695 C SUBROUTINE LINV3F(A,B,IJOB,N,IA,D1,D2,WKAREA,IER) 0000696 C IMSL ROUTINE FOR COMPUTING DETERMINANTS 0000697 C END 0000698 *********************************************************************** 0000699 C SUBROUTINE MDCH(CS,DF,P,IER) 0000700 C IMSL ROUTINE TO DETERMINE P VALUE FOR CHI-SQUARED PROBABILITY 0000701 C DISTRIBUTION FUNCTION 0000702 C END 0000703 *********************************************************************** 0000704 C SUBROUTINE MDNORD(Y,P) 0000705 C IMSL ROUTINE TO DETERMINE NORMAL PROBABILITY DISTRIBUTION 0000706 C FUNCTION OF A DOUBLE PRECISION ARGUMENT 0000707 C END 0000708 *********************************************************************** 0000709 C SUBROUTINE MDNRIS(P,Y,IER) 0000710 C IMSL ROUTINE TO COMPUTE INVERSE STANDARD NORMAL PROBABILITY 0000711 C DISTRIBUTION FUNCTION 0000712 C END 0000713 *********************************************************************** 0000714 C SUBROUTINE UGETIO(IOPT,NIN,NOUT) 0000715 C IMSL ROUTINE TO RETRIEVE CURRENT VALUES AND TO SET NEW VALUES 0000716 C FOR INPUT AND OUTPUT UNIT IDENTIFIERS 0000717 C END 0000718 *********************************************************************** 0000719 C SUBROUTINE USPLOD(X,Y,IY,N,M,INC,ITITLE,NTITLE,IXLABL,NXLABL, 0000720 C IYLABL,NYLABL,RANGE,ICHAR) 0000721 C IMSL ROUTINE TO PRODUCE A PRINTER PLOT (DOUBLE PRECISION) 0000722 C END 0000723 *********************************************************************** 0000724 C SUBROUTINE VSRTAD(A,LA) 0000725 C IMSL ROUTINE TO SORT DOUBLE PRECISION ARRAYS BY ALGEBRAIC VALUE 0000726 C END 0000727 *********************************************************************** 0000728 C SUBROUTINE ZFALSE(F,EPS,NSIG,XL,XR,XAPP,ITMAX,IER) 0000729 C IMSL ROUTINE TO OBTAIN ZERO OF A FUNCTION 0000730 *********************************************************************** 0000731 END 0000732 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ APPENDIX E: THE CALL STATEMENTS FROM UTRANS 1: PROGRAM UTRANS 30: CALL FLOPEN(IOUT) 37: CALL RIPGDF(P,PMAX,N,NMAX,X,NMAX,PMAX,IOUT,STD) 38: CALL SKKUR(N,P,IOUT) 39: IF (STD) CALL DOSTD(N,P,X,NMAX,PMAX,IOUT,MEAN) 40: CALL BOXCOX(N,P,X,NMAX,PMAX,IOUT) 41: CALL NEWDAT(N,P,IOUT) 45: CALL SKKUR(N,P,IOUT) 50: SUBROUTINE SKKUR(N,P,IOUT) 139: CALL LINV3F(SIGMA,B,IJOB,P,PMAX,D1,D2,WK,IER) 191: CALL MDCH(CS,DF,P1,IER) 205: CALL MDNORD(Z,P2) 218: CALL QQPLOT(X,NMAX,N,P,WK,IOUT) 228: SUBROUTINE QQPLOT(X,IN,N,P,WORK,IOUT) 249: CALL UGETIO(3,NIN,IOUT) 253: CALL MDNRIS(UNIFP,QUANTL,IER) 262: CALL VSRTAD(WORK(1,2),N) 273: CALL USPLOD(WORK(1,1),WORK(1,2),IN,N,1,1,ITITLE,NTITLE,IXLABL, 274: + NXLABL,IYLABL,NYLABL,RANGE,ICHAR,IOPT,IER) 275: CALL CORR(WORK(1,2),WORK(1,1),CORVAL,N) 286: SUBROUTINE RIPGDF(NVAR,MVARS,N,MOBS,X,IX,IXCOL,IOUT,STD) 351: CALL QUERY(TITLE,STD) 368: SUBROUTINE QUERY(TITLE,RESPON) 387: SUBROUTINE FLOPEN(IOUT) 417: SUBROUTINE CORR(X,Y,CORVAL,N) 447: SUBROUTINE DOSTD(N,P,X,NMAX,PMAX,IOUT,MEAN) 530: SUBROUTINE BOXCOX(N,P,X,NMAX,PMAX,IOUT) 541: CALL DOBXCX(N,X(1,J),IOUT,J,P) 546: SUBROUTINE DOBXCX(N,X,IOUT,COL,P) 574: CALL ZFALSE(F,EPS,NSIG,XL,XR,XAPP,ITMAX,IER) 588: CALL RESULT(X,N,ONE,ZERO,LOGLK0,STORE,IOUT,COL) 590: CALL RESULT(X,N,XAPP,ITMAX,LOGLK1,STORE,IOUT,COL) 593: CALL MDNORD(CS,PROB) 631: SUBROUTINE RESULT(X,N,LAMBDA,ITMAX,L,LEQMLE,IOUT,COL) 679: SUBROUTINE NEWDAT(N,P,IOUT) The following subroutines are from IMSL: LINV MDCH MDNORD UGETIO MDNRIS VSRTAD USPLOD ZFALSE $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$