1 1 TRANSFORMATION OF DATA BY POWER TRANSFORMATION TO NEARLY QUADRATIC DISTRIBUTIONS CLARK GAYLORD AND DONALD E. RAMIREZ DEPARTMENT OF MATHEMATICS MATH-ASTRO BUILDING UNIVERSITY OF VIRGINIA CHARLOTTESVILLE, VIRGINIA 22903 TECHNICAL REPORT (JUNE 28, 1989) 1 INTRODUCTION THIS TECHNICAL REPORT PRESENTS THE PROGRAM QPOWER WHICH IS DEVELOPED IN OUR PAPER: MONOTONE REGRESSION SPLINES FOR SMOOTHED BOOTSTRAPING QPOWER IS A FORTRAN 77 PROGRAM WHICH FINDS THE OPTIMAL POWER TRANSFORMATION TO TRANSFORM DATA TO A EARLY QUADRATIC DISTRIBUTION. THE PROGRAM SMOOTH REQUIRES THAT THE IMSL LIBRARY BE AVAILABLE. WE USE QPOWER TO PREPR0CESS DATA BEFORE RUNNING OUR PROGRAM SMOOTH. INDEX APPENDIX A CONTAINS THE RAW DATA SET WITH N=100 APPENDIX B CONTAINS THE TERMINAL OUTPUT FROM QPOWER APPENDIX C CONTAINS THE FULL OUTPUT FROM THE PROGRAM QPOWER APPENDIX D CONTAINS THE FORTRAN PROGRAM QPOWER APPENDIX E CONTAINS THE CALL STATEMENTS FROM QPOWER APPENDIX A: THE ORIGINAL DATA 28.67 63.29 15.68 10.27 3.99 29.80 15.27 33.65 44.46 4.89 40.29 32.77 5.48 3.04 9.74 5.89 29.64 38.57 0.02 97.50 10.61 32.28 36.41 12.35 6.85 13.56 10.03 8.66 38.12 2.10 33.85 11.90 20.33 32.78 0.16 50.55 2.01 1.40 18.51 26.19 36.19 54.16 44.58 44.23 9.26 0.51 13.89 23.97 101.75 10.11 20.63 4.73 57.23 31.14 7.72 0.19 20.83 15.11 34.16 8.39 9.64 24.67 65.89 6.03 34.42 7.19 27.49 63.32 27.99 25.83 15.26 17.66 57.91 27.90 32.77 5.94 14.46 7.76 5.22 1.05 15.53 25.84 2.39 28.73 6.80 11.24 8.22 1.58 1.82 25.63 73.62 22.89 9.15 42.09 10.45 32.32 27.81 48.66 8.22 18.35 APPENDIX B: TERMINAL OUTPUT FROM QPOWER ernie> seg qpower ENTER NAME OF DATA FILE exp25.data WHAT FILE WOULD YOU LIKE YOUR TRANSFORMED DATA IN? exp25.trans MIN IS 0.02000 MAX IS 101.75000 THE ROOT IS SEARCHED FOR BETWEEN 0.1000000000000 AND 10.00000000000 DO YOU WISH TO CHANGE THESE BOUNDS? (Y OR N) n CHECKING: LAMBDA, DLOGLK = 0.10000 1528.03961 CHECKING: LAMBDA, DLOGLK = 10.00000 -370.54489 CHECKING: LAMBDA, DLOGLK = 8.06783 -368.04989 CHECKING: LAMBDA, DLOGLK = 6.52119 -364.93670 CHECKING: LAMBDA, DLOGLK = 4.44553 -357.03955 CHECKING: LAMBDA, DLOGLK = 2.34618 -331.85839 CHECKING: LAMBDA, DLOGLK = 0.92054 -226.20964 CHECKING: LAMBDA, DLOGLK = 0.34358 119.58660 CHECKING: LAMBDA, DLOGLK = 0.54311 -87.05443 CHECKING: LAMBDA, DLOGLK = 0.45905 -22.58939 CHECKING: LAMBDA, DLOGLK = 0.42739 8.54126 CHECKING: LAMBDA, DLOGLK = 0.43608 -0.46508 CHECKING: LAMBDA, DLOGLK = 0.43563 -0.00908 CHECKING: LAMBDA, DLOGLK = 0.43561 0.00870 CHECKING: LAMBDA, DLOGLK = 0.43562 0.00000 MAXIMUM VALUE OF L(LAMBDA) = 41.69529 AT LAMBDA = 0.43562 NUMBER OF ITERATIONS = 12.00000 VALUE OF L(ONE) = -45.93663 THE P-VALUE FOR THE BOX-COX TRANSFORMATION = 0.00000 **** STOP 1 1 APPENDIX C: FULL OUTPUT FROM QPOWER MIN IS 0.02000 MAX IS 101.75000 SHIFTED DATA 0.28453 0.62032 0.15854 0.10606 0.04515 0.29549 0.15456 0.33283 0.43768 0.05388 0.39724 0.32430 0.05960 0.03594 0.10092 0.06358 0.29394 0.38055 0.00664 0.95213 0.10936 0.31954 0.35960 0.12624 0.07289 0.13797 0.10373 0.09045 0.37619 0.02682 0.33477 0.12187 0.20364 0.32439 0.00800 0.49675 0.02595 0.02003 0.18598 0.26048 0.35747 0.53177 0.43885 0.43545 0.09627 0.01140 0.14117 0.23894 0.99336 0.10451 0.20655 0.05233 0.56154 0.30849 0.08133 0.00829 0.20849 0.15301 0.33778 0.08783 0.09995 0.24573 0.64554 0.06494 0.34030 0.07619 0.27308 0.62061 0.27793 0.25698 0.15446 0.17774 0.56814 0.27706 0.32430 0.06406 0.14670 0.08172 0.05708 0.01663 0.15708 0.25708 0.02963 0.28511 0.07241 0.11547 0.08618 0.02178 0.02410 0.25504 0.72051 0.22847 0.09520 0.41469 0.10781 0.31993 0.27619 0.47842 0.08618 0.18443 SEARCH WILL BE BETWEEN 0.10000 10.00000 TRANSFORMED DATA 0.57837 0.81219 0.44829 0.37628 0.25938 0.58797 0.44336 0.61926 0.69772 0.28014 0.66886 0.61229 0.29274 0.23483 0.36823 0.30109 0.58663 0.65648 0.11257 0.97886 0.38133 0.60836 0.64048 0.40594 0.31956 0.42196 0.37266 0.35106 0.65319 0.20673 0.62083 0.39976 0.49995 0.61237 0.12207 0.73728 0.20377 0.18204 0.48058 0.55654 0.63882 0.75948 0.69853 0.69617 0.36073 0.14240 0.42620 0.53601 0.99710 0.37387 0.50305 0.27660 0.77772 0.59910 0.33518 0.12398 0.50510 0.44141 0.62325 0.34660 0.36668 0.54259 0.82641 0.30388 0.62527 0.32578 0.56812 0.81236 0.57249 0.55328 0.44323 0.47118 0.78169 0.57171 0.61229 0.30209 0.43339 0.33588 0.28728 0.16789 0.44649 0.55337 0.21591 0.57889 0.31863 0.39047 0.34375 0.18879 0.19733 0.55145 0.86693 0.52564 0.35898 0.68151 0.37897 0.60868 0.57092 0.72530 0.34375 0.47883 MAXIMUM VALUE OF L(LAMBDA) = 41.69529 AT LAMBDA = 0.43562 NUMBER OF ITERATIONS = 12.00000 VALUE OF L(ONE) = -45.93663 THE P-VALUE FOR THE BOX-COX TRANSFORMATION = 0.00000 APPENDIX D: THE FORTRAN PROGRAM QPOWER PROGRAM MAIN c VERSION -- JUNE 1989 INTEGER NSIG, ITMAX, IER, NX, DIMX, NY PARAMETER (DIMX = 205) DOUBLE PRECISION DLOGLK, EPS, XL, XR, XAPP, X(DIMX), + XMIN, XMAX, Y(DIMX) COMMON /XDATA/ X, NX, XMIN, XMAX + /YDATA/ Y, NY EXTERNAL DLOGLK, READVL, ZFALSE, RESULT, BOUNDS, SHIFTX CALL READVL NY = NX CALL SHIFTX XL = 0.1 D0 XR = 10.0 D0 CALL BOUNDS (XL, XR) EPS = 1.0 D-6 NSIG = 6 ITMAX = 100 CALL ZFALSE (DLOGLK, EPS, NSIG, XL, XR, XAPP, ITMAX, IER) CALL RESULT (XAPP, ITMAX) CLOSE (UNIT = 6) END cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc SUBROUTINE READVL INTEGER NX, I, DIMX PARAMETER (DIMX = 205) DOUBLE PRECISION X(DIMX), XMIN, XMAX CHARACTER*72 INFILE, OUTFIL INTRINSIC MIN, MAX COMMON /XDATA/ X, NX, XMIN, XMAX WRITE (*,*) 'ENTER NAME OF DATA FILE' READ (*,'(A)') INFILE OPEN (UNIT = 5, FILE = INFILE) REWIND (UNIT = 5) NX = 0 XMIN = 1.0D6 XMAX = -1.0D6 DO 10, I = 1, 205 READ (5, *, END=15) X(I) NX = NX + 1 XMIN = MIN (XMIN, X(I)) XMAX = MAX (XMAX, X(I)) 10 CONTINUE 15 CONTINUE WRITE (*,*) ' ' WRITE (*,*) 'WHAT FILE WOULD YOU LIKE YOUR TRANSFORMED DATA IN?' READ (*,'(A)') OUTFIL OPEN (UNIT = 6, FILE = OUTFIL) WRITE (6,81) 'MIN IS ', XMIN WRITE (*,81) 'MIN IS ', XMIN WRITE (6,83) 'MAX IS ', XMAX WRITE (*,83) 'MAX IS ', XMAX 81 FORMAT (/1X, A, F10.5) 83 FORMAT (/1X, A, F10.5, /) RETURN END cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc SUBROUTINE SHIFTX INTEGER NX, I, DIMX, NY PARAMETER (DIMX = 205) DOUBLE PRECISION X(DIMX), XMIN, XMAX, Y(DIMX), RANGE, TWTHRD, + THIRD INTRINSIC MIN, MAX COMMON /XDATA/ X, NX, XMIN, XMAX + /YDATA/ Y, NY RANGE = XMAX - XMIN THIRD = 1.0D0 / 3.0D0 TWTHRD = 2.0D0 / 3.0D0 WRITE (6,81) 'SHIFTED DATA' DO 100, I = 1, NX Y(I) = ((X(I) -XMIN) * (NX - 1) / RANGE + TWTHRD)/(NX + THIRD) WRITE (6,83) Y(I) 100 CONTINUE 81 FORMAT (/1X, A, /) 83 FORMAT (1X, F10.5) RETURN END cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc SUBROUTINE BOUNDS (XL, XR) DOUBLE PRECISION XL, XR CHARACTER*1 ANSWER WRITE (*,*) ' ' WRITE (*,*) 'THE ROOT IS SEARCHED FOR BETWEEN ', XL, ' AND ', XR WRITE (*,*) 'DO YOU WISH TO CHANGE THESE BOUNDS? (Y OR N)' 10 CONTINUE READ '(A1)', ANSWER IF ((ANSWER.EQ.'Y').OR.(ANSWER.EQ.'y')) THEN WRITE (*,*) ' ' WRITE (*,*) 'ENTER LEFT AND RIGHT BOUNDS' READ*, XL, XR ELSE IF ((ANSWER.NE.'N').AND.(ANSWER.NE.'n')) THEN WRITE (*,*) '***ENTRY ERROR*** PLEASE TYPE Y OR N' GOTO 10 ENDIF WRITE (6,81) 'SEARCH WILL BE BETWEEN ', XL, XR 81 FORMAT (/1X, A, F10.5, 2X, F10.5) RETURN END cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc FUNCTION DLOGLK(LAMBDA) INTEGER DIMX, NY, I PARAMETER (DIMX = 205) DOUBLE PRECISION Y(DIMX), LAMBDA, SUMA, SUMB, SUMC, ZERO, + TEMPE, TEMPL, DLOGLK, ONE, TEMP COMMON /YDATA/ Y, NY INTRINSIC LOG, DBLE ZERO = 0.0 D0 ONE = 1.0 D0 SUMA = ZERO SUMB = ZERO SUMC = ZERO DO 20, I= 1, NY TEMPE = Y(I)**LAMBDA TEMPL = LOG(Y(I)) TEMP = TEMPL*TEMPE SUMA = SUMA + TEMPL SUMB = SUMB + TEMP / TEMPE SUMC = SUMC + TEMP / (ONE - TEMPE) 20 CONTINUE DLOGLK = SUMB - SUMC + SUMA + DBLE(NY)/LAMBDA WRITE (*, 81) 'CHECKING: LAMBDA, DLOGLK = ', LAMBDA, DLOGLK 81 FORMAT (1X, A, F10.5, 2X, F10.5) RETURN END cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc FUNCTION LOGLK (LAMBDA) INTEGER NY, I, DIMX PARAMETER (DIMX = 205) DOUBLE PRECISION Y(DIMX), LOGLK, LAMBDA, SUMA, SUMB, SUMC, + ZERO, TEMPL, TEMPE, ONE, TEMPL1, TEMPL2 INTRINSIC LOG, DBLE COMMON /YDATA/ Y, NY ZERO = 0.0 D0 ONE = 1.0 D0 SUMA = ZERO SUMB = ZERO SUMC = ZERO DO 10, I = 1, NY TEMPE = Y(I) ** LAMBDA TEMPL = LOG(Y(I)) TEMPL1 = LOG(TEMPE) TEMPL2 = LOG(ONE - TEMPE) SUMA = SUMA + TEMPL SUMB = SUMB + TEMPL1 SUMC = SUMC + TEMPL2 10 CONTINUE LOGLK = DBLE(NY)*LOG(6.0D0) + SUMB + SUMC + (LAMBDA-ONE)*SUMA + + DBLE(NY)*LOG(LAMBDA) RETURN END cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc SUBROUTINE RESULT (LAMBDA, ITMAX) INTEGER NY, I, ITMAX, DIMX PARAMETER (DIMX = 205) DOUBLE PRECISION Y(DIMX), LAMBDA, ONE, BCY, LOGLK, LOGLK0, LOGLK1, + CS, TWO, P INTRINSIC LOG, SQRT EXTERNAL LOGLK, MDNORD COMMON /YDATA/ Y, NY ONE = 1.0D0 TWO = 2.0D0 WRITE (6, 81) 'TRANSFORMED DATA' DO 10, I = 1, NY BCY = Y(I)**LAMBDA WRITE(6,83) BCY 10 CONTINUE LOGLK1 = LOGLK (LAMBDA) WRITE (6, 85) 'MAXIMUM VALUE OF L(LAMBDA) = ',LOGLK1 WRITE (*, 85) 'MAXIMUM VALUE OF L(LAMBDA) = ',LOGLK1 WRITE (6, 85) 'AT LAMBDA = ', LAMBDA WRITE (*, 85) 'AT LAMBDA = ', LAMBDA WRITE (6, 85) 'NUMBER OF ITERATIONS = ', ITMAX WRITE (*, 85) 'NUMBER OF ITERATIONS = ', ITMAX LOGLK0 = LOGLK (ONE) WRITE (6, 85) 'VALUE OF L(ONE) = ',LOGLK0 WRITE (*, 85) 'VALUE OF L(ONE) = ',LOGLK0 CS = -TWO * (LOGLK0 - LOGLK1) CS = -SQRT (CS) CALL MDNORD (CS, P) WRITE (6, 85) 'THE P-VALUE FOR THE BOX-COX TRANSFORMATION = ', + TWO * P WRITE (*, 85) 'THE P-VALUE FOR THE BOX-COX TRANSFORMATION = ', + TWO * P 81 FORMAT (/1X, A, /) 83 FORMAT (1X, F10.5) 85 FORMAT (/1X, A, F10.5) RETURN END APPENDIX E: THE CALL STATEMENTS FROM qpower 1: PROGRAM MAIN 10: CALL READVL 12: CALL SHIFTX 15: CALL BOUNDS (XL, XR) 19: CALL ZFALSE (DLOGLK, EPS, NSIG, XL, XR, XAPP, ITMAX, IER) 20: CALL RESULT (XAPP, ITMAX) 24: SUBROUTINE READVL 58: SUBROUTINE SHIFTX 79: SUBROUTINE BOUNDS (XL, XR) 152: SUBROUTINE RESULT (LAMBDA, ITMAX) 179: CALL MDNORD (CS, P) The following subroutines are from IMSL: ZFALSE MDNORD