We had a great Town Hall Meeting of our department earlier this week, which was focused on issues around reproducibility, which
Mike Frank has already discussed in his blog. A number of the questions that were raised by both faculty and graduate students centered around training, and this has gotten many of us thinking about how we should update our quantitive training to address these concerns. Currently the graduate statistics course is fairly standard, covering basic topics in probability and statistics including basic probability theory, sampling distributions, null hypothesis testing, general(ized) linear models (regression, ANOVA), and mixed models, with exercises done primarily using R. While many of these topics remain essential for psychologists and neuroscientists, it's equally clear that there are a number of other topics that we might want to cover that are highly relevant to issues of reproducibility:
- the statistics of reproducibility (e.g., implications of power for predictive validity; Ioannidis, 2005)
- Bayesian estimation and inference
- bias/variance tradeoffs and regularization
- generalization and cross-validation
- model-fitting and model comparison
There are also a number of topics that are clearly related to reproducibility but fall more squarely under the topic of "software hygiene":
- data management
- code validation and testing
- version control
- reproducible workflows (e.g., virtualization/containerization)
- literate programming
I would love to hear your thoughts about what a 21st century graduate statistics course in psychology/neuroscience should cover- please leave comments below!