Analysis II
Anthropology 4841/7841 Dell 2, Room 102
University of Virginia Tuesday: 3:30-6:00
Spring 2017 Fraser D.Neiman

This is a second course in statistical methods useful in many disciplines, including archaeology, anthropology, and environmental sciences. The course aims to equip students with the skills required to describe and build statistical models of variation in data, link statistical models to the terms of scientific process models, and use the results to evaluate hypotheses about the processes responsible for the variation. The course emphasizes practical data analysis using R. Topics covered include: distance matrices and Mantel regression, cluster analysis, principal components analysis, multidimensional scaling, correspondence analysis, linear discriminant analysis, classical linear models for continuous variables with Gaussian errors, generalized linear models for variables with non-Gaussian errors: logistic and Poisson regression, nonparametric regression, and fundamentals of Bayesian estimation.

Learning how to use statistical models to analyze data is like learning to ride a bicycle. Simply reading about how it's done will not get you very far. Hence this course emphasizes using the statistical methods we cover to analyze real data in the context of real scientific problems.

Quantitative Analysis I (ANTH 4840/7840) or an introductory statistics course and a basic knowledge of R. Students should have a basic familiarity with the following: probability distributions (binomial, Gaussian, t, F, and chi-square), confidence intervals, null hypothesis significance tests, Regression, ANOVA, and ANCOVA.

There is no required text for the course. All readings will be posted on Collab. However, you may want to purchase the following:

Gelman, Andrew and Jennifer Hill
2007 Data Analysis Using Regression and Hierarchical/Multilevel Models. Cambridge University Press, New York.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani
2013 An Introduction to Statistical Learning: with Applications in R. Springer, New York.

Course Schedule and Reading List
For a weekly breakdown of topics and reading, consult the Class Schedule and Reading List.

Written work for the course includes weekly problems sets, which I will post on Collab, and an in-class presentation on a research project of your choosing. Graduate students will also complete a final paper on their project.

Problem Sets will be assigned every week. Completed problems sets are due at the beginning of class when we will discuss the results. You should come to class prepared to contirbute your insights and questions to this discussion. Late problem sets will not be accepted.

I encourage you to work with your classmates in completing the problem sets. However, I expect you to write your own code and your own description and interpretation of the results. Your write up should include not only your numerical results, illustrated with appropriate graphics, but also what you think they mean, both in statistical and substantive terms, and the code you used to produce them.

The Final Project is your opportunity to use analytical methods we have learned to evaluate theoretically informed hypotheses about the dynamics reponsible for patterns in real data. There are two options. One is to attempt to reproduce the results in a published research paper for which the raw data are available. The other is to develop your own research project. In either case you will need to:
  • Clearly identify the research questions.
  • Describe the theoretical or process model(s) that guide the investigation.
  • Trace how the terms of the models map onto variables you can measure.
  • Explain the statistical models you will use to identify patterns in the data.
  • Assess agreement of your results with theoretical expections.
I encourage you to choose issues and data in which you have a personal research interest and, therefore, a basic familiarity with the current background literature. The data should contain information on several different variables that are of potential relevance to your problem and your analysis should feature methods covered in the course.

A one-page prospectus for the Final Project is due on March 14. Presentations should be no longer than 20 minutes. Graduate student papers should not exceed 15 pages (double-spaced), exclusive of graphics.

For undergraduates grades of the course are computed as PS*.7 + CD*.1 + P*.2. For graduate students, the formula is PS*.7 + CD*.1 + P*1. + FP*.1, where PS = mean score for all problem sets, CD = mean score for contributions to class discussion, P = score for the presentation , and FP = score for the final project.

Office Hours, etc.
My office is in the Monticello Archaeology Lab. A bit of a hike. But you are welcome to come up for a chat. Official office hours are from 8:00-10:00, Monday morning. Or email me for an appointment. In addition, "ll be hanging out at the Scholar's Lab in Alderman, Fridays from 3:00 to 4:30 to chat about geeky stuff and answer questions.

Required Software: R and RStudio
You will need to install both R and RStudio on your laptop, in that order. Plan on bringing your laptop with both programs installed to every class, including the first one. Here are links for the downloads: If you need help with the installations, try the kind folks at the Statlab, which is also a useful resource more generally.

Helpful R Resources

Some Cool Stats Blogs
The stats blogosphere is a lively place and well worth exploring. Here are some good starting places...