Data that relates growth to age has the obvious property that chronologic age is continuous and linearly increasing. This fact can be exploited to make an approximation to the growth curve of any individual with line segments connecting the measured points. Although this is only a linear approximation to each individual pair of data points, there are minimal assumptions being made about the aggregate growth curve.
The first step is to select longitudinal pairs of successive observations from the combined longitudinal and cross-sectional data set. The next step is to group these longitudinal pairs as and save each 4--tuple to a separate line in an ASCII data file as shown below.
An individual who has been measured at four time points would generate three line segments to contribute to the analysis as shown in the example above.
A concrete version of the same example shows three 4--tuples of data from two subjects.
The first two lines of this example represent data from one subject who was measured at three different occasions; at age 10 the subject's score was 30, at age 15 the subject's score was 35, and at age 20 the subject's score was 40. These three data points are used to construct the two 4--tuples which comprise the first two lines of the example. The third line of the example comes from a second subject who was measured at age 40 with a score of 70 and was measured at age 55 with a score of 60.
We have written a program called svf which will read this data file, accumulate the data into a vector field, perform summary statistics on the vector field, and output the vector field statistics as an svf plot. The logic of the svf program is described in technical detail below. Readers mainly interested in applications may skip to the results section that follows.
The svf program reads an ASCII data file where each record occupies one text line and is formatted as . The values may be optionally separated by commas.
Of course, must be greater than for svf to calculate a positive change in x . Since x need not necessarily represent time, one could easily envision applications of the svf plot for which the change in x is negative. If , an infinite slope will result and divide by zero follows shortly thereafter. Therefore, cases where are trapped and excluded from the svf plot. However, x values in the svf data file can be expressed as floating point numbers if a more exact measurement of x has been collected.
A vector field matrix is created with the number of rows and columns specified from the command line, or with the default 25 rows and 25 columns if no rows and columns were specified. This matrix will hold the mean angle, sample size, and standard deviation of the angle for each cell in the matrix. Each cell in the matrix is treated as a ``bucket'' into which summary statistics are accumulated. As a default, the x and y axes are each partitioned into 25 divisions. Each data pair will belong to one and only one cell in the vector field matrix. There is a tradeoff here between fewer buckets giving one a better estimate of the data within that bucket and a greater number of buckets giving one a better estimate of changes in the field.
A pair of data points , is read and the slope of the line segment connecting these two points is calculated as follows.
Using this slope, intermediate data points are calculated for each integer value of x between and . Thus we now have data points as coordinates which are intermediate values on a line segment between and . These p data points are now accumulated into the vector field matrix.
In order to perform our summary statistics in a linear metric, we convert the slope to an angle. This conversion has the additional advantage that it can represent directional component for negative as well as positive change in x . The angle, , which represents the slope for each of the p data points is calculated as
But, the distribution of varies proportionally to . Therefore, in order to accumulate statistics over these angles, we weight the angle by value of
This will result in our summary mean angle being a weighted mean of the angles, , within a vector field cell.
This transformation has application to regression estimation of angular expressions of change scores by removing an inherent bias due to the length of measurement lags (see Appendix).
For each of the p data points we find the vector field cell to which it belongs and accumulate the data point into that vector field cell. This accumulation is composed of three steps:
These accumulation operations will allow us to calculate the mean angle and standard deviation of the angle for each cell once all of the data records have been read.
As soon as all of the data records have been read and processed, summary statistics are calculated for each cell in the vector field matrix.
The mean angle of the cell is simply calculated from the sum of the angles
and the standard deviation, , of the angle is calculated using a standard computational formula for the weighted standard deviation
A postscript file is generated from the vector field matrix to plot a vector field graph. In our example, this graph plots age on the X axis and score on the Y axis. The sample size, mean angle and standard deviation of the angle are plotted for each cell. The mean angle of the cell is represented by an arrow with an identical angle. The sample size of the cell, n , is represented by the length of the arrow. The arrow lengths are normalized so that the cell with the largest n has an arrow length which fully fills a single cell. The 95% confidence interval around the mean angle of the cell is represented by a gray error circular sector as shown in Figure 4.
Figure 4. Components of the statistical vector field plot.
Figure 4 illustrates how the vector field graph represents the summary statistics of the vector field matrix. A small portion of the vector field graph has been enlarged and labeled with the components of the plot. The boundary of a single vector field cell is illustrated here by a dashed box near the center of the figure, but this box is not drawn in the final svf plot. The direction of each arrow is a graphical representation of the mean angle of the data which is aggregated within that field cell. The length of each arrow shows the number of samples that fall within that field cell. The gray error circular around each arrow represents one standard deviation around the mean angle for that field cell. It should be noted that a cell with a small sample size may have also have a small standard deviation. One is encouraged to not regard the direction of the very short arrows (the cells with small sample sizes) as being particularly informative. As the arrow becomes longer, the direction of the arrow can be taken more seriously.