Psyc344/506
Exploratory and Graphical
Data Analysis

Professor Steven M. Boker

The process by which psychological knowledge advances involves a cycle of theory development, experimental design and hypothesis testing. But after the hypothesis test either does or doesn't reject a null hypothesis, where does the idea for the next experiment come from?

Exploratory data analysis completes this research cycle by helping to form and change new theories. After the planned hypothesis testing for an experiment is finished, exploratory data analysis can look for patterns in these data that may have been missed by the original hypothesis tests. Successful exploratory analyses help the researcher modify theories and modify or design novel experiments with focussed hypothesis tests.

A second use of exploratory data analysis is in diagnostics for hypothesis tests. There are many reasons why a hypothesis test might fail. There are even times when a hypothesis test will reject the null for an unexpected reason. By becoming familiar with data through exploratory methods, the informed researcher can understand what went wrong (or what went right for the wrong reason).

The initial part of the course will introduce the rationale and scope of exploratory data analysis. Next, we will examine how perceptual and cognitive illusions can affect our judgement with respect to exploratory and graphical techniques. Next, we will dive in and try a variety of techniques for the presentation and graphical exploration of univariate, bivariate and multivariate data. We will then use these graphical techniques in the service of other exploratory methods such as data screening, outlier analysis, residual analysis, transformations, and time series analysis. The remainder of the course will be devoted to an integration of these techniques into projects of interest to the students.

Computer work associated with the course will primarily involve the Splus software. Additional assignments may introduce the use of Mathematica for visualization of multivariate data. It is expected that students will learn to be sufficiently familiar with Splus that they can access available routines to perform interactive exploratory analyses. Students will also acquire sufficient skill in writing Splus scripts such that they can perform the data manipulations necessary to use exploratory analysis in practical applications to their own research problems.

R Example Files.

  • Univariate 1 -- Download the R script Univariate1.R and run each of the sections. You will also need the dataset galaxy.dat

  • Univariate 2 -- Download the R script Univariate2.S and run each of the sections. You will also need the dataset iris.dat

  • Univariate 3 -- Download the R script Univariate3.S and run each of the sections.

  • Univariate 4 -- Download the R script Univariate4.S and run each of the sections.

  • Transformations 1. Download the R script Transformations1.R for use in today's in-class exercise.

  • Transformations 2. Download the R script Transformations2.R for use in today's in-class exercise.

  • Bivariate 1. Download the R script Bivariate1.R for use in today's in-class exercise.

  • Bivariate 2. Download the R script Bivariate2.R for use in today's in-class exercise.

  • Outliers and CIs 1. Download the R script OutliersCIs1.R for use in today's in-class exercise.

  • Smoothing 1. Download the R script Smoothing1.R for use in today's in-class exercise.

  • Smoothing 2. Download the R script Smoothing2.R for use in today's in-class exercise.

  • Three-D 1. Download the R script ThreeD1.R for use in today's in-class exercise.

  • Three-D 2. Download the R script ThreeD2.R for use in today's in-class exercise.

  • Time Series 1. Download the R script TimeSeries1.R for use in today's in-class exercise.

  • Time Series 2. Download the R script TimeSeries2.R for use in today's in-class exercise.

  • Time Series 3. Download the R script TimeSeries3.R for use in today's in-class exercise.

  • Vector Fields 1. Download the R script VectorFields1.R for use in today's in-class exercise.

  • Vector Fields 2. Download the R script VectorFields2.R for use in today's in-class exercise.
  • Vector Fields 3. Download the R script VectorFields3.R for use in today's in-class exercise.

Handouts.

  • Splus Manuals can be downloaded from the Insightful Corporation.

  • CRAN is the acronym for the Comprehensive R Archive Network and has free copies of the R software and extensive documentation.

  • The Trellis Graphics User's Manual and A Tour of Trellis Graphics can be found here at the Bell Labs Trellis Graphics site.

Data.

  • The iris sepal and petal dimensions data that R. A. Fisher used as an example are provided in Splus sdd format and R read.table format. These are the data that are shown in the matrix scatterplot at the top of this page.

Steven M. Boker
Department of Psychology
University of Virginia
Gilmer Hall Room 102
Charlottesville, VA 22903
Office: 434-243-7275, FAX: 434-982-4766
e-mail: boker@virginia.edu