Wednesday, December 9, 2015

Reproducible analysis in the MyConnectome project

Today our paper describing the MyConnectome project was published in Nature Communications.  This paper is unlike any that I have ever worked on before (and probably ever will again), as it reflects analyses of data collected on myself over the course of 18 months from 2012-2014.  A lot has been said already about what the results might or might not mean.  What I want to discuss here is the journey that ultimately led me to develop a reproducible shared analysis platform for the study.

Data collection was completed in April 2014, shortly before I moved to the Bay Area, and much of that summer was spent analyzing the data.  As I got deeper into the analyses, it became clear that we needed a way to efficiently and automatically reproduce the entire set of analyses.  For example, there were a couple of times during the data analysis process when my colleagues at Wash U updated their preprocessing strategy, which meant that I had to rerun all of the statistical analyses that relied upon those preprocessed data. This ultimately led me to develop a python package (https://github.com/poldrack/myconnectome) that implements all of the statistical analyses (which use a mixture of python, R, and **cough** MATLAB) and provides a set of wrapper scripts to run them.  This package made it fairly easy for me to rerun the entire set of statistical analyses on my machine by executing a single script, and provided me with confidence that I could reproduce any of the results that went into the paper.  

The next question was: Can anyone else (including myself at some later date) reproduce the results?  I had performed the analyses on my Mac laptop using a fairly complex software stack involving many different R and python packages, using a fairly complex set of imaging, genomic, metabolomic, and behavioral data.  (The imaging and -omics data had been preprocessed on large clusters at the Texas Advanced Computing Center (TACC) and Washington University; I didn’t attempt to generalize this part of the workflow).  I started by trying to replicate the analyses on a Linux system; identifying all of the necessary dependencies was an exercise in patience, as the workflow would break at increasingly later points in the process.  Once I had the workflow running, the first analyses showed very different results between the platforms; after the panic subsided (fortunately this happened before the paper was submitted!), I tracked the problem down to the R forecast package on Linux versus Mac (code to replicate issue available here).  It turned out that the auto.arima() function (which is the workhorse of our time series analyses) returned substantially different results on Linux and Mac platforms if the Y variable was not scaled (due apparently to a bug on the Linux side), but very close results when the Y variable was scaled. Fortunately, the latest version of the forecast package (6.2) gives identical results across Linux and Mac regardless of scaling, but the experience showed just how fragile our results can be when we rely upon complex black-box analysis software, and how we shouldn't take cross-platform reproducibility for granted (see here for more on this issue in the context of MRI analysis).

Having generalized the analyses to a second platform, the next logical step was to generalize it to any machine.  After discussing the options with a number of people in the open science community, the two most popular candidates were provisioning of a virtual machine (VM) using Vagrant, or creating a Docker container.  I ultimately chose to go with the Vagrant solution, primarily because it was substantially easier; in principle you simply set up a Vagrantfile that describes all of the dependencies, and type “vagrant up”.    Of course, this “easy” solution took many hours to actually implement successfully because it required reconstruction of all of the dependencies that I had taken for granted on the other systems, but once it was done we had a system that allows anyone to recreate the full set of statistical analyses exactly on their system, which is available at https://github.com/poldrack/myconnectome-vm

A final step was to provide a straightforward way for people to view the complex set of results.  Our visualization guru, Vanessa Sochat, developed a flask application (https://github.com/vsoch/myconnectome-explore) that provides a front end to all of the HTML reports generated by the various analyses, as well as a results browser that allows one to browse the 38,363 statistical tests that were computed for project.  This browser is available locally if one installs and runs the VM, and is also accessible publicly from http://results.myconnectome.org
Dashboard for analyses

Browser for timeseries analysis results

We have released code and data with papers in the past, but this is the first paper I have ever published that attempts to include a fully reproducible snapshot of the statistical analyses.  I learned a number of lessons in the process of doing this:
  1. The development of a reproducible workflow saved me from publishing a paper with demonstrably irreproducible results, due to the OS-specific software bug mentioned above.  This in itself makes the entire process worthwhile from my standpoint.
  2. Converting a standard workflow to a fully reproducible workflow is difficult. It took many hours of work beyond the standard analyses in order to develop a working VM with all of the analyses automatically run; that doesn’t even count the time that went into developing the browser. Had I started the work within a virtual machine from the beginning, it would have been much easier, but still would require extra work beyond that needed for the basic analyses.
  3. Ensuring longevity of a working pipeline is even harder.  The week before the paper was set to published I tried a fresh install of the VM to make sure it was still working.  It wasn’t.  The problem was simple (miniconda had changed the name of its installation directory), and highlighted a significant flaw in our strategy, which was that we had not specified software versions in our VM provisioning.  I hope that we can add that in the future, but for now, we have to keep our eyes out for the disruptive effects of software updates.
I look forward to your comments and suggestions about how to better implement reproducible workflows in the future, as this is one of the major interests of our Center for Reproducible Neuroscience.

Sunday, November 1, 2015

Are good science and great storytelling compatible?

Chris Chambers has a piece in the Guardian ("Are we finally getting serious about fixing science?") discussing a recent report about reproducibility from the UK Academy of Medical Sciences, based on a meeting held earlier this year in London. A main theme of the piece is that scientists need to focus more on going good science and less on "storytelling":
Some time in 1999, as a 22 year-old fresh into an Australian PhD programme, I had my first academic paper rejected. “The results are only moderately interesting”, chided an anonymous reviewer. “The methods are solid but the findings are not very important”, said another. “We can only publish the most novel studies”, declared the editor as he frogmarched me and my boring paper to the door.
I immediately asked my supervisor where I’d gone wrong. Experiment conducted carefully? Tick. No major flaws? Tick. Filled a gap in the specialist literature? Tick. Surely it should be published even if the results were a bit dull? His answer taught me a lesson that is (sadly) important for all life scientists. “You have to build a narrative out of your results”, he said. “You’ve got to give them a story”. It was a bombshell. “But the results are the results!” I shouted over my coffee. “Shouldn’t we just let the data tell their own story?” A patient smile. “That’s just not how science works, Chris.”
He was right, of course, but perhaps it’s the way science should work. 

None of us in the reproducibility community would dispute that the overselling of results in service of high-profile publications is problematic, and I doubt that Chambers really believes that our papers should just be data dumps presented without context or explanation.  But by likening the creation of a compelling narrative about one's results to "selling cheap cars", this piece goes too far.  Great science is not just about generating reproducible results and "letting the data tell their own story"; it should also give us deeper insights into how the world works, and those insights are fundamentally built around and expressed through narratives, because humans are story-telling animals.    We have all had the experience of sitting through a research talk that involved lots of data and no story, and it's a painful experience; this speaks to the importance of solid narrative in our communication of scientific ideas.

Narrative becomes even more important when we think about conveying our science to the public. Non-scientists are not in a position to "let the data speak to them" because most of them don't speak the language of data; instead, they speak the language of human narrative. It is only by abstracting away from the data to come up with narratives such as "memory is not like a videotape recorder" or "self-control relies on the prefrontal cortex" that we can bring science to the public in a way that can actually have impact on behavior and policy.

I think it would be useful to stop conflating scientific storytelling with "embellishing and cherry-picking".   Great storytelling (be it spoken or written) is just as important to the scientific enterprise as great methods, and we shouldn't let our zeal for the latter eclipse the importance of the former.

Wednesday, August 26, 2015

New course on decision making: Seeking feedback

I am currently developing a new course on the psychology of decision making that I will teach at Stanford in the Spring Quarter of 2016. I've looked at the various textbooks on this topic and I'm not particularly happy with any of them, so I am rolling my own syllabus and will use readings from the primary literature.  I have developed a draft syllabus and would love to get feedback: Are there important topics that I am missing?  Different readings that I should consider?  Topics I should consider dropping?  Please leave comments with your suggestions, or email me at poldrack@gmail.com!

Part 1: What is a decision? 

1. Varieties of decision making (overview of course)


Part 2: Normative decision theory: How an optimal system should make decisions

2. axiomatic approach from economics
- TBD reading on expected utility theory


3. Bayesian decision theory
Körding, K. P. (2007). Decision Theory: What “Should” the Nervous System Do? Science, 318(5850), 606–610. http://doi.org/10.1126/science.1142998

4. Information accumulation
Smith & Ratcliff, 2004, Psychology and neurobiology of simple decisions.  TINS.


Part 3: Psychology: How humans make decisions

5. Anomalies: the ascendence of psychology and behavioral economics
Kahneman, D. (2003). A perspective on judgment and choice. American Psychologist,
58, 697-720

6. Judgment: Anchoring and adjustment
Chapman, G.B. & Johnson, E.J. (2002). Incorporating the irrelevant: Anchors in
judgment of belief and value

7. Heuristics: availability, representativeness
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases.
Science, 185, 1124-1131. 

8. Risk and uncertainty: Risk perception, risk attitudes
Slovic, P. (1987). Perception of risk. Science, 236, 280-285

9. Prospect theory 
Kahneman, D. & Tversky A. (1984). Choices, values, and frames. American
Psychologist, 39, 341–350.

10. Framing, endowment effects, and applications of prospect theory
Kahneman, D., Knetsch, J.L., & Thaler, R.H. (1991). The endowment effect, loss
aversion, and status quo bias. Journal of Economic Perspectives, 5, 193-206.

11. Varieties of utility
Kahneman, Wakker, & Sarin (1997). Back to Bentham: Explorations of experienced utility.  Quarterly Journal of Economics.

12. Intertemporal choice and self-control
Mischel, W., Shoda, Y., & Rodriguez, M.L. (1989). Delay of gratification in children. Science, 244, pp. 933-938.

13. Emotion and decision making
Rottenstreich, Y. & Hsee, C.K. (2001). Money, kisses and electric shocks: On the
affective psychology of risk. Psychological Science, 12, 185-190.

14. Social decision making and game theory
TBD

Part 4: Neuroscience of decision making

15. Neuroscience of simple decisions
Sugrue, Corrado, & Newsome (2005). Choosing the greater of two goods: neural currencies for valuation and decision making. Nature Reviews Neuroscience.

16. Neuroscience of Value-based decision making
Rangel et al., 2008, A framework for studying the neurobiology of value-based decision making

17. Reinforcement learning and dopamine, wanting/liking
Schultz, Montague, and Dayan (1997) A neural substrate of prediction and reward

18. Decision making in simple organisms
Reading TBD (c. elegans, snails, slime mold, etc)
possibilities:


Part 5: Ethical issues

19. Free will
Roskies (2006) Neuroscientific challenges to free will and responsibility.
OR:
Shadlen & Roskies (2012). The neurobiology of decision-making and responsibility: reconciling mechanism and mindedness.