The statistical vector field ( svf) method provides a robust look at combined cross--sectional and longitudinal data. The advantages of the method lie in its simplicity and the relatively few distributional assumptions that need be made. The svf plot provides an intuitive graphical summary of unbalanced repeated measures data even when the data may be censored, selected or non--randomly missing with respect to one or more variables. The svf plot performs a similar graphical function using longitudinal data as that that performed by a density contour plot using cross-sectional data alone.
There are a number of problems with svf which would need to be addressed before it could be used as a predictive tool. The first problem arises when the length of the longitudinal line segment is shorter than the size of the vector field cell. Currently the method effectively rounds all age lags up to an integral multiple of the vector field cell size as measured on the age axis. However, this approximation is not optimal and should be replaced with a more accurate estimate. One approach might be to construct a grid of radial basis weighting functions covering the field such that one basis function was centered on each rectangular partition, and such that the sum of the weights of the basis functions was uniform across the field.
The second problem is that currently no information is preserved by svf which links multiple line segments from the same individual. Better estimation could be obtained by using the multiple data points to produce a smoothed interpolation curve for each individual. In the case of individuals with only two measurements this method would reduce to the simple line segment case. Individuals with multiple measures could be represented with a curve with a flexibility parameter. If the flexibility was set to 0, the curve would reduce to a least squares regression line through the multiple points. As the flexibility was increased the curve would approach the data points until at infinite flexibility the curve would reduce to the multiple line segments presently in use by the svf software. One method for producing such a smoothed curve is locally weighted regression estimation [Cleveland 1979].
The third problem is that the svf software does not attempt to preserve coherency between rectangular partitions in the vector field. This is both an advantage and a disadvantage. The advantage is that no assumptions need be made about the continuity (or even existence) of an underlying growth curve. However if we are willing to assume the existence of continuously differentiable underlying growth curves, the vector field could be estimated as a growth surface using finite element mesh techniques, and could then be tested for goodness of fit against a second set of data.
Other statistical quantities which could be displayed by the svf algorithm include the median and quartiles. Such a display would allow a more detailed display of the skew of the distribution within each vector field cell.
These results present a brief glimpse of some possibilities for further visual methods and visual explorations. By far and away the most exciting result of the svf technique is the simplicity of the output in Figures 5, 6 and 7. We hope that this exploratory technique can be improved to provide clearer growth hypotheses which can be checked with more rigorous confirmatory techniques. If so, the statistical vector field will have served its intended purpose.