Friday, December 2, 2016

The NIH should stop penalizing collaborative research

The National Institutes of Health (NIH) just put out its most recent strategic plan for research in behavioral and social sciences, which outlines four directions for behavioral/social research in the future (integrating neuroscience, better measurement, digital interventions, and large-scale data-intensive science).  All of these require collaboration between researchers across multiple domains, and indeed Collins and Riley point out the need for more "transdisciplinary" research in the behavioral and social sciences.  Given the strong trend towards transdisciplinary work over the last couple of decades, one would think that the NIH would do whatever it can to help remove barriers to the kinds of collaborations that are often necessary to make transdisciplinary science work.  Instead, collaborative work across institutions is actively penalized by the way that grants are awarded and administered.  A simple change to this could greatly smooth the ability for researchers across different institutions to collaborate, which is often necessary in order to bring together the best researchers across different scientific disciplines.

To explain the situation, first let's think about how one would administer a collaborative grant in the ideal world.  Let's say Professor Smith is a biologist at University X studying cancer, and Professor Jones is a computer scientist at University Y who has a new method for statistical analysis of cancer cells.  They decide to write a grant proposal together, and each of them develops a budget to pay for the people or materials necessary to do the research (let's say $150,000/year for Smith and $100,000/year for Jones).  The grant gets a very good priority score from the reviewers, and the agency decides to fund it.  In an ideal world, the agency would then send $150,000 to University X and $100,000 to University Y, and each would be treated as separate accounts from the standpoint of financial administration, even if their scientific progress would be judged as a whole.

At some agencies (for example, for collaborative grants from the National Science Foundation), this is how it works. However, for nearly all regular grants at the NIH, the entire grant gets awarded to the lead institution, and then this institution must dole out the money to the collaborators via subawards.  This might sound like no big deal, but it causes significant problems in two different ways:

The first problem has to do with "indirect costs" (also known as "overhead"), which are the funds that universities receive for hosting the grant; they are meant to pay for all of the administrative and physical overhead related to a research project.  The overhead rates for federal grants are negotiated between each institution and the federal government; for example, at Stanford the negotiated rate is 57%.  This means that if the grant was awarded by NIH to Dr. Smith at a university where the rate was 50%, then NIH would send the entire $250,000 in "direct costs" plus $125,000 in "indirect costs" to University X. In the situation above, University X would then create a subaward to University Y, and send them the $100,000 for Dr. Jones's part of the research.  But what about the indirect costs?  In the best-of-all-worlds model, each institution would take its proportion of the indirect costs directly. In the NIH model, what happens is that the subaward must include both the direct and indirect costs for University Y, which both must come out of the direct costs given to University X; that is, the subaward amount would be $150,000 ($100,000 in direct costs plus $50,000 in indirect costs).  This penalizes researchers because it means that they will generally get about 1/3 less direct funds for work to be done on a subaward than work done directly from the primary grant, since the indirect costs (usually around 50%) for the subrecipient have to come out of the direct costs of the main grant.  If grant funds were unlimited then this wouldn't be a problem, but many grant mechanisms have explicit caps on the amount that can be requested.  

In addition to the reduced budget due to treating subaward indirect costs as as direct costs in the main budget, there is also an added extra expense due to "double dipping" of indirect costs.  When the primary institution computes its indirect costs, it is allowed to charge indirect costs on the first $25K of the subaward; this means that NIH ends up spending an extra ~$12.5K in indirect costs on each subaward.  This is presumably meant to cover the administrative budget of managing the subcontract, but it is another extra cost that arises for collaborative grants due to the NIH system.

There is a second way that the NIH model makes collaboration harder, which is the greatly increased  administrative burden for subaward management for grants lasting more than a year (as they almost always do).  When an investigator receives an NIH grant directly, the university treats the grant as lasting the entire period; that is, the researcher can spend the money continuously over the grant period.  If they don't spend the entire budget they can automatically carry over the leftover funds to the next year (as long as this amount isn't too much), and the university will also usually allow them to spend a bit of the next year's money before it arrives, since it's guaranteed to show up.  For subawards, the accounting works differently. Every year the primary recipient generates a new subaward, which can't happen until after the primary award for that year has been received and processed.  Then this new subaward has to be processed and given a new account number by the recipient's university. In addition, it is common for the lead school to not allow automatic carry-forward of unspent funds between years, and sometimes they requite any unused funds to be relinquished, and then be rewarded back in the new year's fund.  All of these processes take time, which means that the subaward recipient is often left hanging without funding for periods of time, particularly at the end of the yearly grant period.  This is a pretty minimal cost compared to the actual cost described above, but it ends up taking a substantial amount of time away from doing research.

Why can't the NIH adopt a process like the one used for collaborative grants at NSF, in which the money goes directly to each institution separately and indirect costs are split proportionately?  This would be a way in which NIH could really put its money where its mouth is regarding collaborative transdisciplinary research.  

UPDATE: Vince Calhoun pointed out to me that the indirect costs in the subcontract do not actually count against the modular budget cap.  According to the NIH Guide on budget development: "Consortium F&A costs are NOT included as part of the direct cost base when determining whether the application can use the modular format (direct costs < $250,000 per year), or determining whether prior approval is needed to submit an application (direct costs $500,000 or more for any year)...NOTE: This policy does not apply to applications submitted in response to RFAs or in response to other funding opportunity announcements including specific budgetary limits." Thus, while this addresses the specific issue of modular budgets, it doesn't really help with the many funding opportunities that include specific budget caps, which covers nearly all of the grants that my lab applies for.


Thursday, September 1, 2016

Why preregistration no longer makes me nervous

In a recent presidential column in the APS Observer, Susan Goldin-Meadow lays out her concerns about preregistration.  She has two main concerns:

  • The first is the fear that preregistration will stifle discovery. Science isn’t just about testing hypotheses — it’s also about discovering hypotheses grounded in phenomena that are worthy of study. Aren’t we supposed to let the data guide us in our exploration? How can we make new discoveries if our studies need to be catalogued before they are run?
  • The second concern is that preregistration seems like it applies only to certain types of studies — experimental studies done in the lab under controlled conditions. What about observational research, field research, and research with uncommon participants, to name just a few that might not fit neatly into the preregistration script?

She makes the argument that there are two stages of scientific practice, and that pre-registration is only appropriate for one of them:
The first stage is devoted to discovering phenomena, describing them appropriately (i.e., figuring out which aspects of the phenomenon define it and are essential to it), and exploring the robustness and generality of the phenomenon. Only after this step has been taken (and it is not a trivial one) should we move on to exploring causal factors — mechanisms that precede the phenomenon and are involved in bringing it about, and functions that follow the phenomenon and lead to its recurrence….Preregistration is appropriate for Stage 2 hypothesis-testing studies, but it is hard to reconcile with Stage 1 discovery studies.

I must admit that I started out with exactly the same concerns about pre-registration.  I was worried that it would stifle discovery, and lead to turnkey science that would never tell us anything new. However, I no longer believe that.  It’s become clear to me that pre-registration is just as useful at the discovery phase as at the hypothesis-testing phase, because it helps keep us from fooling ourselves.  For discovery studies, we have adopted a strategy of pre-registering whatever details we can; in some cases this might just be the sample size, sampling strategy, and the main outcome of interest.  In these cases we will almost certainly do analyses beyond these, but having pre-registered these details gives us and others more faith in the results from the planned analyses; it also helps us more clearly distinguish between a priori and ad hoc analysis decisions (i.e., we can’t tell ourselves “we would have planned to do that analysis”); if it’s not pre-registered, then it’s treated through the lens of discovery, and thus not really believed until it’s replicated or otherwise validated.  In the future, in our publications we will be very clear about which results arose from pre-registered analyses and which were unplanned discovery analyses; I am hopeful that by helping more clearly distinguish between these two kinds of analyses, the move to pre-registration will make all of our science better.

I would also argue that the phase of "exploring the robustness and generality of the phenomenon”, which Goldin-Meadow assigns to the unregistered discovery phase, is exactly the phase in which pre-registration is most important. Imagine how many hours of graduate student time and gallons of tears could have been saved if this strategy had been used in the initial studies of ego depletion or facial feedback.  In our lab, it is now standard to perform a pre-registered replication before we believe any new behavioral phenomenon; it’s been interesting to see how many of them fall by the wayside.  In some cases we simply can’t do a replication due to the size or nature of the study; in these cases, we register whatever we can up front, and we try to reserve a separate validation dataset for testing of whatever results come from our initial discovery set.  You can see an example of this in our recent online study of self-regulation.

I’m glad that this discussion is going on in the open, because I think a lot of my colleagues in the field share concerns similar to those expressed by Goldin-Meadow.  I hope that the examples of many successful labs now using pre-registration will help convince them that it really is a road to better science.




Wednesday, August 24, 2016

Interested in the Poldrack Lab for graduate school?

Updates
  • The Poldrack Lab will be accepting new graduate students for 2022.
  • I have instituted a policy that I will no longer meet one-on-one with potential graduate students prior to the application process to discuss potential admission into my lab, as this has the potential to exacerbate existing disparities in graduate school admissions.  I am willing to meet with individuals (particularly those from from underrepresented groups) to discuss the graduate admissions process and other academic issues more generally, as time permits.

This is the time of year when I start getting lots of emails asking whether I am accepting new grad students for next year.  The answer is almost always going to be yes (unless I am moving, and I don’t plan on doing that again for a long time!), because I am always on the lookout for new superstars to join the lab.  If you are interested, here are some thoughts and tips that I hope will help make you more informed about the process.  These are completely my own opinions, and some of them may be totally inaccurate regarding other PIs or graduate programs, so please take them for what they are worth and no more.

Which program should I apply to? I am affiliated with three graduate programs at Stanford: Psychology, Neuroscience, and Biomedical Informatics. In choosing a program, there are several important differences:

  • Research: While most of these programs are fairly flexible, there are generally some expectations regarding the kind of research you will do, depending on the specific program.  For example, if you joining the BMI program then your work is expected to have at least some focus on  novel data analysis or informatics methods, whereas if you are joining Psychology your work is expected to make some contact with psychological function. Having said that, most of what we do in our lab could be done by a student in any of these programs.
  • Coursework: Perhaps the biggest difference between programs is the kind of courses you are required to take. Each program has a set of core requirements.  In psychology, you will take a number of core courses in different areas of psychology (cognitive, neuroscience, social, affective, developmental).  In the neuroscience program you will take a set of core modules spanning different areas of neuroscience (including one on cognitive neuroscience that Justin Gardner and I teach), whereas in BMI you take core courses around informatics-related topics.  In each program you will also take elective courses (often outside the department) that establish complementary core knowledge that is important for your particular research; for example, you can take courses in our world-class statistics department regardless of which program you enroll in. One way to think about this is:  What do I want to learn about that is outside of my specific content area? Take a look at the core courses in each program and see which ones interest you the most.
  • First-year experience: In Psychology, students generally jump straight into a specific lab (or a collaboration between labs), and spend their first year doing a first-year project that they present to their area meeting at the end of the year. In Neuroscience and BMI, students do rotations in multiple labs in their first year, and are expected to pick a lab by the end of their first year. 
  • Admissions: All of these programs are highly selective, but each differs in the nature of its admissions process.  At one end of the spectrum is the Psychology admissions process, where initial decisions for who to interview are made by the combined faculty within each area of the department.  At the other end is the Neuroscience program, where initial decisions are made by an admissions committee.  As a generalization, I would say that the Psychology process is better for candidates whose interests and experience fit very closely with a specific PI or set of PIs, whereas the committee process caters towards candidates who may not have settled on a specific topic or PI.
  • Career positioning: I think that the specific department that one graduates from matters a lot less than people think it does.  For example, I have been in psychology departments that have hired people with PhDs in physics, applied mathematics, and computer science. I think that the work that you do and the skills that you acquire ultimately matter a lot more than the name of the program that is listed on your diploma.  

What does it take to get accepted? There are always more qualified applicants than there are spots in our graduate programs, and there is no way to guarantee admission to any particular program.  On the flipside, there are also no absolute requirements: A perfect GRE score and a 4.0 GPA are great, but we look at the whole picture, and other factors can sometimes outweigh a weak GRE score or GPA.  There are a few factors that are particularly important for admission to my lab:

  • Research experience: It is very rare for someone to be accepted into any of the programs I am affiliated with at Stanford without significant research experience.  Sometimes this can be obtained as an undergraduate, but more often successful applicants to our program have spent at least a year working as a research assistant in an active research laboratory.  There are a couple of important reasons for this.  First, we want you to understand what you are getting into; many people have rosy ideas of what it’s like to be a scientist, which can fall away pretty quickly in light of the actual experience of doing science.  Spending some time in a lab helps you make sure that this is how you want to spend your life. In addition, it provides you with someone who can write a recommendation letter that speaks very directly to your potential as a researcher.  Letters are a very important part of the admissions process, and the most effective letters are those that go into specific detail about your abilities, aptitude, and motivation.
  • Technical skills: The research that we do in my lab is highly technical, requiring knowledge of computing systems, programming, and math/statistics.  I would say that decent programming ability is a pretty firm prerequisite for entering my lab; once you enter the lab I want you to be able to jump directly into doing science, and this just can’t happen if you have to spend a year teaching yourself how to program from scratch. More generally, we expect you to be able to pick up new technical topics easily; I don’t expect students to necessarily show up knowing how a reinforcement learning model works, but I expect them to be able to go and figure it out on their own by reading the relevant papers and then implement it on their own. The best way to demonstrate programming ability is to show a specific project that you have worked on. This could be an open source project that you have contributed to, or a project that you did on the side for fun (for example, mine your own social media feed, or program a cognitive task and measure how your own behavior changes from day to day). If you don’t currently know how to program, see my post on learning to program from scratch, and get going!
  • Risk taking and resilience: If we are doing interesting science then things are going to fail, and we have to learn from those failures and move on.  I want to know that you are someone who is willing to go out on a limb to try something risky, and can handle the inevitable failures gracefully.  Rather than seeing a statement of purpose that only lists all of your successes, I find it very useful to also know about risks you have taken (be they physical, social, or emotional), challenges you have faced, failures you have experienced, and most importantly what you learned from all of these experiences.
What is your lab working on? The ongoing work in my lab is particularly broad, so if you want to be in a lab that is deeply focused on one specific question then my lab is probably not the right place for you.  There are few broad questions that encompass much of the work that we are doing:
  • How can neuroimaging inform the structure of the mind?  My general approach to this question is outlined in my Annual Review chapter with Tal Yarkoni.  Our ongoing work on this topic is using large-scale behavioral studies (both in-lab and online) and imaging studies to characterize the underlying structure of the concept of “self-regulation” as it is used across multiple areas of psychology.  This work also ties into the Cognitive Atlas project, which aims to formally characterize the ontology of psychological functions and their relation to cognitive tasks. Much of the work in this domain is discovery-based data-driven, in the sense that we aim to discover structure using multivariate analysis techniques rather than testing specific existing theories. 
  • How do brains and behavior change over time?  We are examining this at several different timescales. First, we are interested in how experience affects value-based choices, and particularly how the exertion of cognitive control or response inhibition can affect representations of value (Schonberg et al., 2014). Second, we are studying dynamic changes in both resting state and task-related functional connectivity over the seconds/minutes timescale (Shine et al, 2016), in order to relate network-level brain function to cognition.  Third, we are mining the MyConnectome data and other large datasets to better understand how brain function changes over the weeks/months timescale (Shine et al, 2016, Poldrack et al., 2015).  
  • How can we make science better?  Much of our current effort is centered on developing frameworks for improving the reproducibility and transparency of science.  We have developed the OpenfMRI and Neurovault projects to help researchers share data, and our Center for Reproducible Neuroscience is currently developing a next-generation platform for analysis and sharing of neuroimaging data.  We have also developed the Experiment Factory infrastructure for performing large-scale online behavioral testing.  We are also trying to do our best to make our own science as reproducible as possible; for example, we now pre-register all of our studies, and for discovery studies we try when possible to validate the results using a held-out validation sample.

These aren’t the only topics we study, and we are always looking for new and interesting extensions to our ongoing work, so if you are interested in other topics then it’s worth inquiring to see if they would fit with the lab’s interests.   At present, roughly half of the lab is engaged in basic cognitive neuroscience questions, and the other half is engaged in questions related to data analysis/sharing and open science.  This can make for some interesting lab meetings, to say the least. 

What kind of adviser am I? Different advisers have different philosophies, and it’s important to be sure that you pick an advisor whose style is right for you.  I would say that the most important characteristic of my style is that I am to foster independent thinking in my trainees.  Publishing papers is important, but not as important as developing one’s ability to conceive novel and interesting questions and ask them in a rigorous way. This means that beyond the first year project, I don’t generally hand my students problems to work on; rather, I expect them to come up with their own questions, and then we work together to devise the right experiments to test them.  Another important thing to know is that I try to motivate by example, rather than by command.  I rarely breathe down my trainees necks about getting their work done, because I work on the assumption that they will work at least as hard as I work without prodding.  On the other hand, I’m fairly hands-on in the sense that I still love to get deep in the weeds of experimental design and analysis code.  I would also add that I am highly amenable to joint mentorship with other faculty.

If you have further questions about our lab, please don’t hesitate to contact me by email.  As noted above, I have a policy not to meet with potential graduate applicants one-on-one, but I try to do my best to answer specific questions by email about our lab’s current and future research interests. 

Sunday, August 21, 2016

The principle of assumed error

I’m going to be talking at the Neurohackweek meeting in a few weeks, giving an overview of issues around reproducibility in neuroimaging research.  In putting together my talk, I have been thinking about what general principles I want to convey, and I keep coming back to the quote from Richard Feynman in his 1974 Caltech commencement address: "The first principle is that you must not fool yourself and you are the easiest person to fool.”  In thinking about how can we keep from fooling ourselves, I have settled on a general principle, which I am calling the “principle of assumed error” (I doubt this is an original idea, and I would be interested to hear about relevant prior expressions of it).  The principle is that whenever one finds something using a computational analysis that fits with one’s predictions or seems like a “cool” finding, they should assume that it’s due to an error in the code rather than reflecting reality.  Having made this assumption, one should then do everything they can to find out what kind of error could have resulted in the effect.  This is really no different from the strategy that experimental scientists use (in theory), in which upon finding an effect they test every conceivable confound in order to rule them out as a cause of the effect.  However, I find that this kind of thinking is much less common in computational analyses. Instead, when something “works” (i.e. gives us an answer we like)  we run with it, whereas when the code doesn’t give us a good answer then we dig around for different ways to do the analysis that give a more satisfying answer.  Because we will be more likely to accept errors that fit our hypotheses than those that do not due to confirmation bias, this procedure is guaranteed to increase the overall error rate of our research.  If this sounds a lot like p-hacking, that’s because it is; as Gelman & Loken pointed out in their Garden of Forking Paths paper, one doesn't have to be on an explicit fishing expedition in order to engage in practices that inflate error due to data-dependent analysis choices and confirmation bias.  Ultimately I think that the best solution to this problem is to always reserve a validation dataset to confirm the results of any discovery analyses, but before one burns their only chance at such a validation, it’s important to make sure that the analysis has been thoroughly vetted.

Having made the assumption that there is an error, how does one go about finding it?  I think that standard software testing approaches offer a bit of help here, but in general it’s going to be very difficult to find complex algorithmic errors using basic unit tests.  Instead, there are a couple of strategies that I have found useful for diagnosing errors.

Parameter recovery
If your model involves estimating parameters from data, it can be very useful to generate data with known values of those parameters and test whether the estimates match the known values.  For example, I recently wrote a python implementation of the EZ-diffusion model, which is a simple model for estimating diffusion model parameters from behavioral data.  In order to make sure that the model is correctly estimating these parameters, I generated simulated data using parameters randomly sampled from a reasonable range (using the rdiffusion function from the rtdists R package), and then estimated the correlation between the parameters used to generate the data and the model estimates. I set an aribtrary threshold of 0.9 for the correlation between the estimated and actual parameters; since there will be some noise in the data, we can't expect them to match exactly, but this seems close enough to consider successful.  I set up a test using pytest, and then added CircleCI automated testing for my Github repo (which automatically runs the software tests any time a new commit is pushed to the repo)1. This shows how we can take advantage of software testing tools to do parameter recovery tests to make sure that our code is operating properly.  I would argue that whenever one implements a new model fitting routine, this is the first thing that should be done. 

Imposing the null hypothesis
Another approach is to generate data for which the null hypothesis is true, and make sure that the results come out as expected under the null.  This is a good way to protect one from cases where the error results in an overly optimistic result (e.g. as I discussed here previously). One place I have found this particularly useful is in checking to make sure that there is no data peeking when doing classification analysis.  In this example (Github repo here), I show how one can use random shuffling of labels to test whether a classification procedure is illegally peeking at test data during classifier training. In the following function, there is an error in which the classifier is trained on all of the data, rather than just the training data in each fold:

def cheating_classifier(X,y):
    skf=StratifiedKFold(y,n_folds=4)
    pred=numpy.zeros(len(y))
    knn=KNeighborsClassifier()
    for train,test in skf:
        knn.fit(X,y) # this is training on the entire dataset!
        pred[test]=knn.predict(X[test,:])
    return numpy.mean(pred==y)

Fit to a dataset with a true relation between the features and the outcome variable, this classifier predicts the outcome with about 80% accuracy.  In comparison, the correct procedure (separating training and test data):

def crossvalidated_classifier(X,y):
    skf=StratifiedKFold(y,n_folds=4)
    pred=numpy.zeros(len(y))
    knn=KNeighborsClassifier() 
    for train,test in skf:
        knn.fit(X[train,:],y[train])
        pred[test]=knn.predict(X[test,:])
    return numpy.mean(pred==y)

predicts the outcome with about 68% accuracy.  How would we know that the former is incorrect?  What we can do is to perform the classification repeatedly, each time shuffling the labels.  This is basically making the null hypothesis true, and thus accuracy should be at chance (which in this case is 50% because there are two outcomes with equal frequency).  We can assess this using the following:

def shuffle_test(X,y,clf,nperms=10000):
    acc=[]
    y_shuf=y.copy()

    for i in range(nperms):
        numpy.random.shuffle(y_shuf)
        acc.append(clf(X,y_shuf))
    return acc

This shuffles the data 10,000 times and assesses classifier accuracy.  When we do this with the crossvalidated classifier, we see that accuracy is now about 51% - close enough to chance that we can feel comfortable that our procedure is not biased.  However, when we submit the cheating classifier to this procedure, we see mean accuracy of about 69%; thus, our classifier will exhibit substantial classification accuracy even when there is no true relation between the labels and the features, due to overfitting of noise in the test data.

Randomization is not perfect; in particular, one needs to make sure that the samples are exchangeable under the null hypothesis.  This will generally be true when the samples were acquired through random sampling, but can fail when there is structure in the data (e.g. when the samples are individual subjects, but some sets of subjects are related). However, it’s often a very useful strategy when this assumption holds.

I’d love to hear other ideas about how to implement the principle of assumed error for computational analyses.  Please leave your comments below!

1 This should have been simple, but I hit some snags that point to just how difficult it can be to build truly reproducible analysis workflows. Running the code on my Mac, I found that my tests passed (i.e. the correlation between the estimated parameters using EZ-diffusion and the actual parameters used to generate the data was > 0.9), confirming that my implementation seemed to be accurate. However, when I ran it on CircleCI (which implements the code within a Ubuntu Linux virtual machine), the tests failed, showing much lower correlations between estimated and actual values. Many things differed between the two systems, but my hunch was that it was due to the R code that was used to generate the simulated data (since the EZ diffusion model code is quite simple). I found that when I updated my Mac to the latest version of the rtdists package used to generate the data, I reproduced the poor results that I had seen on the CircleCI test. (I turns out that the parameterization of the function that was using had changed, leading to bad results with the previous function call.). My interim solution was to simply install the older version of the package as part of my CircleCI setup; having done this, the CircleCI tests now pass as well.

Friday, July 22, 2016

Having my cake and eating it too?

Several years ago I blogged about some of the challenges around doing science in a field with emerging methodological standards.  Today, a person going by the handle "Student" posted a set of pointed questions to this post, which I am choosing to respond to here as a new post rather than burying them in the comments on the previous post. Here are the comments:

Dr. Poldrack has been at the forefront of advocating for increased rigor and reproducibility in neuroimaging and cognitive neuroscience. This paper provides many useful pieces of advice concerning the reporting of fMRI studies, and my comments are related to this paper and to other papers published by Dr. Poldrack. One of the sections in this paper deals specifically with the reporting of methods and associated parameters related to the control of type I error across multiple tests. In this section, Dr. Poldrack and colleagues write that "When cluster-based inference is used, this should be clearly noted and both the threshold used to create the clusters and the threshold for cluster size should be reported". I strongly agree with this sentiment, but find it frustrating that in later papers, Dr. Poldrack seemingly disregards his own advice with regard to the reporting of extent thresholds, opting to report only that data were cluster-corrected at P<0.05 (e.g. http://cercor.oxfordjournals.org/content/20/3/524.long, http://cercor.oxfordjournals.org/cgi/content/abstract/18/8/1923, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2876211/). In another paper (http://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19915091/), the methods report that "Z (Gaussianised T ) statistic images were thresholded using cluster-corrected statistics with a height threshold of Z > 2.3 (unless otherwise noted) and a cluster probability threshold of P < 0.05, whole- brain corrected using the theory of Gaussian random fields", although every figure presented in the paper notes that the statistical maps shown were thresholded at Z>1.96, P<0.05, corrected. This last instance is particularly confusing, and borders on being misleading. While these are arguably minor omissions, I find it odd that I am thus far unable to find a paper where Dr. Poldrack actually follows his own advice here.  
In another opinion paper regarding fMRI analyses and reporting (http://www.ncbi.nlm.nih.gov/pubmed/21856431), Dr. Poldrack states “Some simple methodological improvements could make a big difference. First, the field needs to agree that inference based on uncorrected statistical results is not acceptable (cf. Bennett et al., 2009). Many researchers have digested this important fact, but it is still common to see results presented at thresholds such as uncorrected p<.005. Because such uncorrected thresholds do not adapt to the data (e.g., the number of voxels tests or their spatial smoothness), they are certain to be invalid in almost every situation (potentially being either overly liberal or overly conservative).” This is a good point, but given the fact that Dr. Poldrack has published papers in high impact journals that rely heavily on inferences from data using uncorrected thresholds (e.g. http://www.ncbi.nlm.nih.gov/pubmed/16157284), and does not appear to have issued any statements to the journals regarding their validity, one wonders whether Dr. Poldrack wants to have his cake and eat it too, so to say. A similar point can be made regarding Dr. Poldrack’s attitude regarding the use of small volume correction. In this paper, he states “Second, I have become increasingly concerned about the use of “small volume corrections” to address the multiple testing problem. The use of a priori masks to constrain statistical testing is perfectly legitimate, but one often gets the feeling that the masks used for small volume correction were chosen after seeing the initial results (perhaps after a whole-brain corrected analysis was not significant). In such a case, any inferences based on these corrections are circular and the statistics are useless”. While this is also true, one wonders whether Dr. Poldrack only trusts his group to use this tool correctly, since it is frequently employed in his papers. 
In a third opinion paper (http://www.ncbi.nlm.nih.gov/pubmed/20571517), Dr. Poldrack discusses the problem of circularity in fMRI analyses. While this is also an important topic, Dr. Poldrack’s group has also published papers using circular analyses (e.g. http://www.jneurosci.org/content/27/14/3743.full.pdf, http://www.jneurosci.org/content/26/9/2424, http://www.ncbi.nlm.nih.gov/pubmed/17255512). 
I would like to note that the reason for this comment is not to malign Dr. Poldrack or his research, but rather to attempt to clarify Dr. Poldrack’s opinion of how others should view his previous research when it fails to meet the rigorous standards that he persistently endorses. I am very much in agreement with Dr. Poldrack that rigorous methodology and transparency are important foundations for building a strong science. As a graduate student, it is frustrating to see high-profile scientists such as Dr. Poldrack call for increased methodological rigor by new researchers (typically while, rightfully, labeling work that does not meet methodological standards as being unreliable) when they (1) have benefited (and arguably continue to benefit) from the relatively lower barriers to entry that come from having entered a research field before the emergence of a rigid methodological framework (i.e. in having Neuron/PNAS/Science papers on their CV that would not be published in a low-tier journal today due to their methodological problems) , and (2) not applying the same level of criticism or skepticism to their own previous work as they do to emerging work when it does not meet current standards of rigor or transparency. I would like to know what Dr. Poldrack’s opinions are on these issues. I greatly appreciate any time and/or effort spent reading and/or replying to this comment. 

I appreciate these comments, and in fact I have been struggling with exactly these same issues myself, and my realizations about the shortcomings of our past approaches to fMRI analysis have shaken me deeply. Student is exactly right that I have been a coauthor on papers using methods or reporting standards that I now publicly claim to be inappropriate. S/he is also right that my career has benefited substantially from papers published in high profile journals prior using these methods that I now claim to inappropriate.  I'm not going to either defend or denounce the specific papers that the commentator mentions.  I am in agreement that some of my papers in the past used methods or standards that we would now find problematic, but I am actually heartened by that: If we were still satisfied with the same methods that we had been using 15 years ago, then that would suggest that our science had not progressed very far.  Some of those results have been replicated (at least conceptually), which is also heartening, but that's not really a defense.

I also appreciate Student's frustration with the fact that someone like myself can become prominent doing studies that are seemingly lacking according to today's standards, but then criticize the field for doing the same thing.  But at the same time I would ask: Is there a better alternative?  Would you rather that I defended those older techniques just because they were the basis for my career?  Should I lose my position in the field because I followed what we thought were best practices at the time but which turned out to be flawed? Alternatively, should I spend my entire career re-analyzing my old datasets to make sure that my previous claims withstand every new methodological development?  My answer to these questions has been to try to use the best methods I can, and to to be as open and transparent as possible.  Here I'd like to outline a few of the ways in which we have tried to do better.

First, I would note that if someone wishes to look back at the data from our previous studies and reanalyze them, almost all of them are available openly through openfmri.org, and in fact some of them have been the basis for previous analyses of reproducibility.  I and my lab have also spend a good deal of time and effort advocating for and supporting data sharing by other labs, because we think that ultimately this is one of the best ways to address questions about reproducibility (as I discussed in the recent piece by Greg Miller in Science).

Second, we have done our best to weed out questionable research practices and p-hacking.  I have become increasingly convinced regarding the utility of pre-registration, and I am now committed to pre-registering every new study that our lab does (starting with our first registration committed this week).  We are also moving towards the standard use of discovery and validation samples for all of our future studies, to ensure that any results we report are replicable. This is challenging due to the cost of fMRI studies, and it means that we will probably do less science, but that's part of the bargain.

Third, we have done our best to share everything.  For example, in the MyConnectome study, we shared the entire raw dataset, as well as putting an immense amount of working into sharing a reproducible analysis workflow.  Similarly, we now put all of our analysis code online upon publication, if not earlier.  

None of this is a guarantee, and I'm almost certain that in 20 years, either a very gray (and probably much more crotchety) version of myself or someone else will come along and tell us why the analyses were we doing in 2016 were wrong in some way that seems completely obvious in hindsight.  That's not something that I will get defensive about because it means that we are progressing as a science.  But it also doesn't mean that we weren't justified to do what we are doing now, trying to follow the best practices that we know how.  





Saturday, May 21, 2016

Scam journals will literally publish crap

In the last couple of years, researchers have started to experience an onslaught of invitations to attend scam conferences and submit papers to scam journals.  Many of these seem to emanate from the OMICS group of Henderson, NV and its various subsidiaries.  A couple of months ago I decided to start trolling these scammers, just to see if I could get a reaction.  After sending many of these, I finally got a response yesterday, which speaks to the complete lack of quality of these journals.  

This was the solicitation:
On May 20, 2016, at 12:55 AM, Abnormal and Behavioural Psychology <behaviouralpsychol@omicsinc.com> wrote: 
Dear Dr. Russell A. Poldrack,Greetings from the Journal of Abnormal and Behavioural Psychology
Journal of Abnormal and Behavioural Psychology is successfully publishing quality articles with the support of eminent scientists like you.
We have chosen selective scientists who have contributed excellent work, Thus I kindly request you to contribute a (Research, Review, Mini Review, Short commentary) or any type of article.
The Journal is indexed in with EBSCO (A-Z), Google Scholar, SHERPA-Romeo, Open J-gate, Journal Seek, Electronic Journals Library, Academic Keys, Safety Lit and many more reputed indexing databases.
 
We publish your manuscript within seven days of Acceptance. For your Excellent Research work we are offering huge discount in the publishing fee (70%). So, we will charge you only 300 USD. This huge offer we are giving in this month only. 
...
With kind regards
Sincerely,
Joyce V. Andria

I had previously received exactly this same solicitation about a month ago, to which I had responded like this:
Dear Ms Andria, 
Thanks for your message.  I just spent three minutes reading and thinking about your email.  My rate for commercial consulting is $500/hour.  Can you please remit your payment of $25 to me at the address below?  I’m sure you can understand that the messages from your organization take valuable time away from scientists, and that you would agree that it’s only fair to renumerate us for this time.
I look forward to receiving your payment promptly.  If you do remit within 30 days I will be forced to send this invoice out for collection.
Sincerely,
Russ Poldrack
I got no response to that message.  So when I received the new message, I decided to step up my troll-fu:
Dear Ms. Andria,
Many thanks for your message soliciting a (Research, Review, Mini Review, Short commentary) or any type of article for your journal. I have a paper that I would like to submit but I am not sure what kind of article it qualifies as. The title is "Tracking the gut microbiome". The paper does not include any text; it is composed entirely of photos of my bowel movements taken every morning for one year. Please let me know if your journal has the capability to publish such a paper; I have found that many other journals are not interested.
Sincerely,
Russell Poldrack
Within 12 hours, I had a response:
From: Abnormal and Behavioural Psychology <behaviouralpsychol@omicsinc.com>
Subject: RE: Appreciated your Excellent Research work
Date: May 20, 2016 at 9:47:28 PM PDT
To: "'Russell Alan Poldrack'" <russpold@stanford.edu>
Dear Dr. Russell A. Poldrack,

Greetings from the Journal of Abnormal and Behavioural Psychology

Thank you for your reply.

I hereby inform you that your article entitled: “Tracking the gut microbiome” is an image type article.

We are happy to know that you want to publish your manuscript with us.

We are waiting for your  earliest submission.

We want to introduce your research work in this month to our Journal. We will be honored to be a part of your scientific journey.

Kindly submit your article on before 26th may, 2016.


Awaiting your response.,

With kind regards
Sincerely,
Anna Watson
Journal Coordinator
Journal of Advances in Automobile Engineering
There you have it: These journals will literally publish complete crap. I hope the rest of you will join me in trolling these parasites - post your trolls and any results in the comments.

Friday, May 20, 2016

Advice for learning to code from scratch

I met this week with a psychology student who was interested in learning to code but had absolutely no experience.  I personally think it’s a travesty that programming is not part of the basic psychology curriculum, because doing novel and interesting research in psychology increasingly requires the ability to collect and work with large datasets and build new analysis tools, which are almost impossible without solid coding skills.  

Because it’s been a while since I learned to code (back when programs were stored on cassette tapes), I decided to ask my friends on the interwebs for some suggestions.  I got some really great feedback, which I thought I would synthesize for others who might be in the same boat.  

Some of the big questions that one should probably answer before getting started are:

  1. Why do you want to learn to code?  For most people who land in my office, it’s because they want to be able to analyze and wrangle data, run simulations, implement computational models, or create experiments to collect data.  
  2. How do you learn best?  I can’t stand watching videos, but some people swear by them.  Some people like to just jump in and start doing, whereas others like to learn the concepts and theories first.  Different strokes...
  3. What language should you start with?  This is the stuff of religious wars.  What’s important to realize, though, is that learning to program is not the same as learning to use a specific language.  Programming is about how to think algorithmically to solve problems; the specific language is just an expression of that thinking.  That said, languages differ in lots of ways, and some are more useful than others for particular purposes.  My feeling is that one should start by learning a first-class language, because it will be easier to learn good practices that are more general.  Your choice of a general purpose language should probably be driven by the field you are in; neuroscientists are increasingly turning to Python, whereas in genomics it seems that Java is very popular.  I personally think that Python offers a nice mix of power and usability, and it’s the language that I encourage everyone to start with.  However, if all you care about doing it performing statistical analyses, then learning R might be your first choice, whereas if you just want to build experiments for mTurk, then Javascript might be the answer.  There may be some problem for which MATLAB is the right answer, but I’m no longer sure what it is. A caveat to all of this is that if you have friends or colleagues who are programming, then you should strongly consider using whatever language they are using, because they will be your best source of help.
  4. What problem do you want to solve?  Some people can learn for the sake of learning, but I find that I need a problem in order to keep me motivated.  I would recommend thinking of a relevant problem that you want to solve and then targeting your learning towards that problem.  One good general strategy is to find a paper in your area of research interest, and try to implement their analysis. Another (suggested by Christina van Heer) is to take some data output from an experiment (e.g. in an Excel file), read it in, and compute some basic statistics.  If you don't have your own data, another alternative is to take a large open dataset (such as health data from NHANES or an openfmri dataset from openfmri.org ) and try to wrangle the data into a format that lets you ask an interesting question.
OK then, so where do you look for help in getting started?

The overwhelming favorite in my social media poll was codeacademy.  It offers interactive exercises in lots of different languages, including Python.  Another Pythonic suggestion was http://learnpythonthehardway.org/book/ which looks quite good. 

For those of you who prefer video courses, there were also a number of votes for online courses, including those from Coursera:
And  FutureLearn:
If you like video courses then these would be a good option.  

Other suggestions included:

Here are some suggested sites with various potentially useful tips




Finally, it’s also worth keeping an eye out for local Software Carpentry workshops.

If you have additional suggestions, please leave them in the comments!

Monday, April 18, 2016

How folksy is psychology? The linguistic history of cognitive ontologies

I just returned from a fabulous meeting on Rethinking the Taxonomy of Psychology, hosted by Mike Anderson, Tim Bayne, and Jackie Sullivan.  I think that in another life I must have been a philosopher, because I always have so much fun hanging out with them, and this time was no different.  In particular, the discussions at this meeting moved from simply talking about whether there is a problem with our ontology (which is old hat at this point) to specifically how we can think about using neuroscience to revise the ontology.  I was particularly excited to see all of the interest from a group of young philosophers whose work is spanning philosophy and cognitive neuroscience, who I am counting on to keep the field moving forward!

I have long made the the point that the conceptual structure of current psychology is not radically different from that of William James in the 19th century.  This seems plausible on its face if you look at some of the section headings from his 1890 “To How Many Things Can We Attend At Once?”
  • “The Varieties Of Attention.”
  • “The Improvement Of Discrimination By Practice”
  • “The Perception Of Time.”
  • “Accuracy Of Our Estimate Of Short Durations”
  • “To What Cerebral Process Is The Sense Of Time Due?”
  • “Forgetting.”
  • “The Neural Process Which Underlies Imagination”
  • “Is Perception Unconscious Inference?”
  • “How The Blind Perceive Space.”
  • “Emotion Follows Upon The Bodily Expression In The Coarser Emotions At Least.”
  • “No Special Brain-Centres For Emotion”
  • “Action After Deliberation”:
Beyond the sometimes flowery language, there are all topics that one could imagine being topics of research papers today, but for my talk I wanted to see if there was more direct evidence that the psychological ontology is less different (and thus more "folksy") than ontologies in other sciences.   To address this, I did a set of analyses that looked at the linguistic history of terms in the contemporary psychological ontology (as defined in the Cognitive Atlas) as compared to terms from contemporary biology (as enshrined in the Gene Ontology).  I started (with a bit of help from Vanessa Sochat) by examining the proportion of terms from the Cognitive Atlas that were present in James' Principles (from the full text available here).  This showed that 22.9% of the terms in our current ontology were present in James's text (some examples are: goal, deductive reasoning, effort, false memory, object perception, visual attention, task set, anxiety, mental imagery, unconscious perception, internal speech, primary memory, theory of mind, judgment).

How does this compare to biology?  To ask this, I obtained two biology textbooks published around the same time as James' Principles (T. H. Huxley's Course of Elementary Instruction in Practical Biology from 1892, and T. J. Parker's Lessons in Elementary Biology from 1893), which are both available in full text from Google Books.  In each of these books I assessed the presence of each term from the Gene Ontology, separately for each of the GO subdomains (biological processes, molecular functions, and cellular components).  Here are the results:

Huxley Parker Overlap
biological process (28,566) 0.09% (26) 0.1% (32) 20
molecular functions (10,057) 0 0 -
cellular components (3,903) 1.05% (41) 1.01% (40) 25

The percentages of overlap are much lower, perhaps not surprisingly since the number of GO terms is so much larger than the number of Cognitive Atlas terms.  But even the absolute numbers are substantially lower, and there is not one mention of any of the GO molecular functions (striking but completely unsurprising, since molecular biology would not be developed for many more decades).

These results were interesting, but it could be that they are specific to these particular books, so I generalized the analysis using the Google N-Gram corpus, which indexes the presence of individual words and phrases across more than 3 million books.  Using a python package that accesses the ngram viewer API, I estimated the presence of all of the Cognitive Atlas terms as well as randomly selected subsets of each of the GO subdomains in the English literature between 1800 and 2000; I'm planning to rerun the analysis on the full corpus using the downloaded version of the N-grams corpus, but using this API required throttling that prevented me from the full sets of GO terms.  Here are the results for the Cognitive Atlas:

It is difficult to imagine stronger evidence that the ontology of psychology is relying on pre-scientific concepts; around 80% of the one-word terms in the ontology were already in use in 1800! Compare this to the Gene Ontology terms (note that there were not enough single-word molecular function terms to get a reasonable estimate):




It's clear that the while a few of the terms in these ontologies were in use prior to the development of the biosciences, the proportion is much smaller than what one sees for psychology. In my talk, I laid out two possibilities arising from this:

  1. Psychology has special access to its ontology that obviates the need for a rejection of folk concepts
  2. Psychology is due for a conceptual revolution that will leave behind at least some of our current concepts
My guess is that the truth lies somewhere in between these.  The discussions that we had at the meeting in London provided some good ideas about how to conceptualize the kinds of changes that neuroscience might drive us to make to this ontology. Perhaps the biggest question to come out of the meeting was whether a data-driven approach can ever overcome the fact that the data were collected from experiments that are based on the current ontology. I am guessing that it can (given, e.g. the close relations between brain activity present in task and rest), but this remains one of the biggest questions to be answered.  Fortunately there seems to be lots of interest and I'm looking forward to great progress on these questions in the next few years.

Friday, February 26, 2016

Reproducibility and quantitative training in psychology

We had a great Town Hall Meeting of our department earlier this week, which was focused on issues around reproducibility, which Mike Frank has already discussed in his blog.  A number of the questions that were raised by both faculty and graduate students centered around training, and this has gotten many of us thinking about how we should update our quantitive training to address these concerns.  Currently the graduate statistics course is fairly standard, covering basic topics in probability and statistics including basic probability theory, sampling distributions, null hypothesis testing, general(ized) linear models (regression, ANOVA), and mixed models, with exercises done primarily using R.  While many of these topics remain essential for psychologists and neuroscientists, it's equally clear that there are a number of other topics that we might want to cover that are highly relevant to issues of reproducibility:

  • the statistics of reproducibility (e.g., implications of power for predictive validity; Ioannidis, 2005)
  • Bayesian estimation and inference
  • bias/variance tradeoffs and regularization
  • generalization and cross-validation
  • model-fitting and model comparison
There are also a number of topics that are clearly related to reproducibility but fall more squarely under the topic of "software hygiene":
  • data management
  • code validation and testing
  • version control
  • reproducible workflows (e.g., virtualization/containerization)
  • literate programming
I would love to hear your thoughts about what a 21st century graduate statistics course in psychology/neuroscience should cover- please leave comments below!