Friday, July 22, 2016

Having my cake and eating it too?

Several years ago I blogged about some of the challenges around doing science in a field with emerging methodological standards.  Today, a person going by the handle "Student" posted a set of pointed questions to this post, which I am choosing to respond to here as a new post rather than burying them in the comments on the previous post. Here are the comments:

Dr. Poldrack has been at the forefront of advocating for increased rigor and reproducibility in neuroimaging and cognitive neuroscience. This paper provides many useful pieces of advice concerning the reporting of fMRI studies, and my comments are related to this paper and to other papers published by Dr. Poldrack. One of the sections in this paper deals specifically with the reporting of methods and associated parameters related to the control of type I error across multiple tests. In this section, Dr. Poldrack and colleagues write that "When cluster-based inference is used, this should be clearly noted and both the threshold used to create the clusters and the threshold for cluster size should be reported". I strongly agree with this sentiment, but find it frustrating that in later papers, Dr. Poldrack seemingly disregards his own advice with regard to the reporting of extent thresholds, opting to report only that data were cluster-corrected at P<0.05 (e.g. http://cercor.oxfordjournals.org/content/20/3/524.long, http://cercor.oxfordjournals.org/cgi/content/abstract/18/8/1923, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2876211/). In another paper (http://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19915091/), the methods report that "Z (Gaussianised T ) statistic images were thresholded using cluster-corrected statistics with a height threshold of Z > 2.3 (unless otherwise noted) and a cluster probability threshold of P < 0.05, whole- brain corrected using the theory of Gaussian random fields", although every figure presented in the paper notes that the statistical maps shown were thresholded at Z>1.96, P<0.05, corrected. This last instance is particularly confusing, and borders on being misleading. While these are arguably minor omissions, I find it odd that I am thus far unable to find a paper where Dr. Poldrack actually follows his own advice here.  
In another opinion paper regarding fMRI analyses and reporting (http://www.ncbi.nlm.nih.gov/pubmed/21856431), Dr. Poldrack states “Some simple methodological improvements could make a big difference. First, the field needs to agree that inference based on uncorrected statistical results is not acceptable (cf. Bennett et al., 2009). Many researchers have digested this important fact, but it is still common to see results presented at thresholds such as uncorrected p<.005. Because such uncorrected thresholds do not adapt to the data (e.g., the number of voxels tests or their spatial smoothness), they are certain to be invalid in almost every situation (potentially being either overly liberal or overly conservative).” This is a good point, but given the fact that Dr. Poldrack has published papers in high impact journals that rely heavily on inferences from data using uncorrected thresholds (e.g. http://www.ncbi.nlm.nih.gov/pubmed/16157284), and does not appear to have issued any statements to the journals regarding their validity, one wonders whether Dr. Poldrack wants to have his cake and eat it too, so to say. A similar point can be made regarding Dr. Poldrack’s attitude regarding the use of small volume correction. In this paper, he states “Second, I have become increasingly concerned about the use of “small volume corrections” to address the multiple testing problem. The use of a priori masks to constrain statistical testing is perfectly legitimate, but one often gets the feeling that the masks used for small volume correction were chosen after seeing the initial results (perhaps after a whole-brain corrected analysis was not significant). In such a case, any inferences based on these corrections are circular and the statistics are useless”. While this is also true, one wonders whether Dr. Poldrack only trusts his group to use this tool correctly, since it is frequently employed in his papers. 
In a third opinion paper (http://www.ncbi.nlm.nih.gov/pubmed/20571517), Dr. Poldrack discusses the problem of circularity in fMRI analyses. While this is also an important topic, Dr. Poldrack’s group has also published papers using circular analyses (e.g. http://www.jneurosci.org/content/27/14/3743.full.pdf, http://www.jneurosci.org/content/26/9/2424, http://www.ncbi.nlm.nih.gov/pubmed/17255512). 
I would like to note that the reason for this comment is not to malign Dr. Poldrack or his research, but rather to attempt to clarify Dr. Poldrack’s opinion of how others should view his previous research when it fails to meet the rigorous standards that he persistently endorses. I am very much in agreement with Dr. Poldrack that rigorous methodology and transparency are important foundations for building a strong science. As a graduate student, it is frustrating to see high-profile scientists such as Dr. Poldrack call for increased methodological rigor by new researchers (typically while, rightfully, labeling work that does not meet methodological standards as being unreliable) when they (1) have benefited (and arguably continue to benefit) from the relatively lower barriers to entry that come from having entered a research field before the emergence of a rigid methodological framework (i.e. in having Neuron/PNAS/Science papers on their CV that would not be published in a low-tier journal today due to their methodological problems) , and (2) not applying the same level of criticism or skepticism to their own previous work as they do to emerging work when it does not meet current standards of rigor or transparency. I would like to know what Dr. Poldrack’s opinions are on these issues. I greatly appreciate any time and/or effort spent reading and/or replying to this comment. 

I appreciate these comments, and in fact I have been struggling with exactly these same issues myself, and my realizations about the shortcomings of our past approaches to fMRI analysis have shaken me deeply. Student is exactly right that I have been a coauthor on papers using methods or reporting standards that I now publicly claim to be inappropriate. S/he is also right that my career has benefited substantially from papers published in high profile journals prior using these methods that I now claim to inappropriate.  I'm not going to either defend or denounce the specific papers that the commentator mentions.  I am in agreement that some of my papers in the past used methods or standards that we would now find problematic, but I am actually heartened by that: If we were still satisfied with the same methods that we had been using 15 years ago, then that would suggest that our science had not progressed very far.  Some of those results have been replicated (at least conceptually), which is also heartening, but that's not really a defense.

I also appreciate Student's frustration with the fact that someone like myself can become prominent doing studies that are seemingly lacking according to today's standards, but then criticize the field for doing the same thing.  But at the same time I would ask: Is there a better alternative?  Would you rather that I defended those older techniques just because they were the basis for my career?  Should I lose my position in the field because I followed what we thought were best practices at the time but which turned out to be flawed? Alternatively, should I spend my entire career re-analyzing my old datasets to make sure that my previous claims withstand every new methodological development?  My answer to these questions has been to try to use the best methods I can, and to to be as open and transparent as possible.  Here I'd like to outline a few of the ways in which we have tried to do better.

First, I would note that if someone wishes to look back at the data from our previous studies and reanalyze them, almost all of them are available openly through openfmri.org, and in fact some of them have been the basis for previous analyses of reproducibility.  I and my lab have also spend a good deal of time and effort advocating for and supporting data sharing by other labs, because we think that ultimately this is one of the best ways to address questions about reproducibility (as I discussed in the recent piece by Greg Miller in Science).

Second, we have done our best to weed out questionable research practices and p-hacking.  I have become increasingly convinced regarding the utility of pre-registration, and I am now committed to pre-registering every new study that our lab does (starting with our first registration committed this week).  We are also moving towards the standard use of discovery and validation samples for all of our future studies, to ensure that any results we report are replicable. This is challenging due to the cost of fMRI studies, and it means that we will probably do less science, but that's part of the bargain.

Third, we have done our best to share everything.  For example, in the MyConnectome study, we shared the entire raw dataset, as well as putting an immense amount of working into sharing a reproducible analysis workflow.  Similarly, we now put all of our analysis code online upon publication, if not earlier.  

None of this is a guarantee, and I'm almost certain that in 20 years, either a very gray (and probably much more crotchety) version of myself or someone else will come along and tell us why the analyses were we doing in 2016 were wrong in some way that seems completely obvious in hindsight.  That's not something that I will get defensive about because it means that we are progressing as a science.  But it also doesn't mean that we weren't justified to do what we are doing now, trying to follow the best practices that we know how.