russpoldrack.org: August 2016

Wednesday, August 24, 2016

Interested in the Poldrack Lab for graduate school?

Updates:

The Poldrack Lab will be accepting new graduate students for 2022.
I have instituted a policy that I will no longer meet one-on-one with potential graduate students prior to the application process to discuss potential admission into my lab, as this has the potential to exacerbate existing disparities in graduate school admissions. I am willing to meet with individuals (particularly those from from underrepresented groups) to discuss the graduate admissions process and other academic issues more generally, as time permits.

This is the time of year when I start getting lots of emails asking whether I am accepting new grad students for next year. The answer is almost always going to be yes (unless I am moving, and I don’t plan on doing that again for a long time!), because I am always on the lookout for new superstars to join the lab. If you are interested, here are some thoughts and tips that I hope will help make you more informed about the process. These are completely my own opinions, and some of them may be totally inaccurate regarding other PIs or graduate programs, so please take them for what they are worth and no more.

Which program should I apply to? I am affiliated with three graduate programs at Stanford: Psychology, Neuroscience, and Biomedical Informatics. In choosing a program, there are several important differences:

Research: While most of these programs are fairly flexible, there are generally some expectations regarding the kind of research you will do, depending on the specific program. For example, if you joining the BMI program then your work is expected to have at least some focus on novel data analysis or informatics methods, whereas if you are joining Psychology your work is expected to make some contact with psychological function. Having said that, most of what we do in our lab could be done by a student in any of these programs.
Coursework: Perhaps the biggest difference between programs is the kind of courses you are required to take. Each program has a set of core requirements. In psychology, you will take a number of core courses in different areas of psychology (cognitive, neuroscience, social, affective, developmental). In the neuroscience program you will take a set of core modules spanning different areas of neuroscience (including one on cognitive neuroscience that Justin Gardner and I teach), whereas in BMI you take core courses around informatics-related topics. In each program you will also take elective courses (often outside the department) that establish complementary core knowledge that is important for your particular research; for example, you can take courses in our world-class statistics department regardless of which program you enroll in. One way to think about this is: What do I want to learn about that is outside of my specific content area? Take a look at the core courses in each program and see which ones interest you the most.
First-year experience: In Psychology, students generally jump straight into a specific lab (or a collaboration between labs), and spend their first year doing a first-year project that they present to their area meeting at the end of the year. In Neuroscience and BMI, students do rotations in multiple labs in their first year, and are expected to pick a lab by the end of their first year.
Admissions: All of these programs are highly selective, but each differs in the nature of its admissions process. At one end of the spectrum is the Psychology admissions process, where initial decisions for who to interview are made by the combined faculty within each area of the department. At the other end is the Neuroscience program, where initial decisions are made by an admissions committee. As a generalization, I would say that the Psychology process is better for candidates whose interests and experience fit very closely with a specific PI or set of PIs, whereas the committee process caters towards candidates who may not have settled on a specific topic or PI.
Career positioning: I think that the specific department that one graduates from matters a lot less than people think it does. For example, I have been in psychology departments that have hired people with PhDs in physics, applied mathematics, and computer science. I think that the work that you do and the skills that you acquire ultimately matter a lot more than the name of the program that is listed on your diploma.

What does it take to get accepted? There are always more qualified applicants than there are spots in our graduate programs, and there is no way to guarantee admission to any particular program. On the flipside, there are also no absolute requirements: A perfect GRE score and a 4.0 GPA are great, but we look at the whole picture, and other factors can sometimes outweigh a weak GRE score or GPA. There are a few factors that are particularly important for admission to my lab:

Research experience: It is very rare for someone to be accepted into any of the programs I am affiliated with at Stanford without significant research experience. Sometimes this can be obtained as an undergraduate, but more often successful applicants to our program have spent at least a year working as a research assistant in an active research laboratory. There are a couple of important reasons for this. First, we want you to understand what you are getting into; many people have rosy ideas of what it’s like to be a scientist, which can fall away pretty quickly in light of the actual experience of doing science. Spending some time in a lab helps you make sure that this is how you want to spend your life. In addition, it provides you with someone who can write a recommendation letter that speaks very directly to your potential as a researcher. Letters are a very important part of the admissions process, and the most effective letters are those that go into specific detail about your abilities, aptitude, and motivation.
Technical skills: The research that we do in my lab is highly technical, requiring knowledge of computing systems, programming, and math/statistics. I would say that decent programming ability is a pretty firm prerequisite for entering my lab; once you enter the lab I want you to be able to jump directly into doing science, and this just can’t happen if you have to spend a year teaching yourself how to program from scratch. More generally, we expect you to be able to pick up new technical topics easily; I don’t expect students to necessarily show up knowing how a reinforcement learning model works, but I expect them to be able to go and figure it out on their own by reading the relevant papers and then implement it on their own. The best way to demonstrate programming ability is to show a specific project that you have worked on. This could be an open source project that you have contributed to, or a project that you did on the side for fun (for example, mine your own social media feed, or program a cognitive task and measure how your own behavior changes from day to day). If you don’t currently know how to program, see my post on learning to program from scratch, and get going!
Risk taking and resilience: If we are doing interesting science then things are going to fail, and we have to learn from those failures and move on. I want to know that you are someone who is willing to go out on a limb to try something risky, and can handle the inevitable failures gracefully. Rather than seeing a statement of purpose that only lists all of your successes, I find it very useful to also know about risks you have taken (be they physical, social, or emotional), challenges you have faced, failures you have experienced, and most importantly what you learned from all of these experiences.

What is your lab working on? The ongoing work in my lab is particularly broad, so if you want to be in a lab that is deeply focused on one specific question then my lab is probably not the right place for you. There are few broad questions that encompass much of the work that we are doing:

How can neuroimaging inform the structure of the mind? My general approach to this question is outlined in my Annual Review chapter with Tal Yarkoni. Our ongoing work on this topic is using large-scale behavioral studies (both in-lab and online) and imaging studies to characterize the underlying structure of the concept of “self-regulation” as it is used across multiple areas of psychology. This work also ties into the Cognitive Atlas project, which aims to formally characterize the ontology of psychological functions and their relation to cognitive tasks. Much of the work in this domain is discovery-based data-driven, in the sense that we aim to discover structure using multivariate analysis techniques rather than testing specific existing theories.
How do brains and behavior change over time? We are examining this at several different timescales. First, we are interested in how experience affects value-based choices, and particularly how the exertion of cognitive control or response inhibition can affect representations of value (Schonberg et al., 2014). Second, we are studying dynamic changes in both resting state and task-related functional connectivity over the seconds/minutes timescale (Shine et al, 2016), in order to relate network-level brain function to cognition. Third, we are mining the MyConnectome data and other large datasets to better understand how brain function changes over the weeks/months timescale (Shine et al, 2016, Poldrack et al., 2015).
How can we make science better? Much of our current effort is centered on developing frameworks for improving the reproducibility and transparency of science. We have developed the OpenfMRI and Neurovault projects to help researchers share data, and our Center for Reproducible Neuroscience is currently developing a next-generation platform for analysis and sharing of neuroimaging data. We have also developed the Experiment Factory infrastructure for performing large-scale online behavioral testing. We are also trying to do our best to make our own science as reproducible as possible; for example, we now pre-register all of our studies, and for discovery studies we try when possible to validate the results using a held-out validation sample.

These aren’t the only topics we study, and we are always looking for new and interesting extensions to our ongoing work, so if you are interested in other topics then it’s worth inquiring to see if they would fit with the lab’s interests. At present, roughly half of the lab is engaged in basic cognitive neuroscience questions, and the other half is engaged in questions related to data analysis/sharing and open science. This can make for some interesting lab meetings, to say the least.

What kind of adviser am I? Different advisers have different philosophies, and it’s important to be sure that you pick an advisor whose style is right for you. I would say that the most important characteristic of my style is that I am to foster independent thinking in my trainees. Publishing papers is important, but not as important as developing one’s ability to conceive novel and interesting questions and ask them in a rigorous way. This means that beyond the first year project, I don’t generally hand my students problems to work on; rather, I expect them to come up with their own questions, and then we work together to devise the right experiments to test them. Another important thing to know is that I try to motivate by example, rather than by command. I rarely breathe down my trainees necks about getting their work done, because I work on the assumption that they will work at least as hard as I work without prodding. On the other hand, I’m fairly hands-on in the sense that I still love to get deep in the weeds of experimental design and analysis code. I would also add that I am highly amenable to joint mentorship with other faculty.

If you have further questions about our lab, please don’t hesitate to contact me by email. As noted above, I have a policy not to meet with potential graduate applicants one-on-one, but I try to do my best to answer specific questions by email about our lab’s current and future research interests.

Sunday, August 21, 2016

The principle of assumed error

I’m going to be talking at the Neurohackweek meeting in a few weeks, giving an overview of issues around reproducibility in neuroimaging research. In putting together my talk, I have been thinking about what general principles I want to convey, and I keep coming back to the quote from Richard Feynman in his 1974 Caltech commencement address: "The first principle is that you must not fool yourself and you are the easiest person to fool.” In thinking about how can we keep from fooling ourselves, I have settled on a general principle, which I am calling the “principle of assumed error” (I doubt this is an original idea, and I would be interested to hear about relevant prior expressions of it). The principle is that whenever one finds something using a computational analysis that fits with one’s predictions or seems like a “cool” finding, they should assume that it’s due to an error in the code rather than reflecting reality. Having made this assumption, one should then do everything they can to find out what kind of error could have resulted in the effect. This is really no different from the strategy that experimental scientists use (in theory), in which upon finding an effect they test every conceivable confound in order to rule them out as a cause of the effect. However, I find that this kind of thinking is much less common in computational analyses. Instead, when something “works” (i.e. gives us an answer we like) we run with it, whereas when the code doesn’t give us a good answer then we dig around for different ways to do the analysis that give a more satisfying answer. Because we will be more likely to accept errors that fit our hypotheses than those that do not due to confirmation bias, this procedure is guaranteed to increase the overall error rate of our research. If this sounds a lot like p-hacking, that’s because it is; as Gelman & Loken pointed out in their Garden of Forking Paths paper, one doesn't have to be on an explicit fishing expedition in order to engage in practices that inflate error due to data-dependent analysis choices and confirmation bias. Ultimately I think that the best solution to this problem is to always reserve a validation dataset to confirm the results of any discovery analyses, but before one burns their only chance at such a validation, it’s important to make sure that the analysis has been thoroughly vetted.

Having made the assumption that there is an error, how does one go about finding it? I think that standard software testing approaches offer a bit of help here, but in general it’s going to be very difficult to find complex algorithmic errors using basic unit tests. Instead, there are a couple of strategies that I have found useful for diagnosing errors.

Parameter recovery

If your model involves estimating parameters from data, it can be very useful to generate data with known values of those parameters and test whether the estimates match the known values. For example, I recently wrote a python implementation of the EZ-diffusion model, which is a simple model for estimating diffusion model parameters from behavioral data. In order to make sure that the model is correctly estimating these parameters, I generated simulated data using parameters randomly sampled from a reasonable range (using the rdiffusion function from the rtdists R package), and then estimated the correlation between the parameters used to generate the data and the model estimates. I set an aribtrary threshold of 0.9 for the correlation between the estimated and actual parameters; since there will be some noise in the data, we can't expect them to match exactly, but this seems close enough to consider successful. I set up a test using pytest, and then added CircleCI automated testing for my Github repo (which automatically runs the software tests any time a new commit is pushed to the repo)¹. This shows how we can take advantage of software testing tools to do parameter recovery tests to make sure that our code is operating properly. I would argue that whenever one implements a new model fitting routine, this is the first thing that should be done.

Imposing the null hypothesis

Another approach is to generate data for which the null hypothesis is true, and make sure that the results come out as expected under the null. This is a good way to protect one from cases where the error results in an overly optimistic result (e.g. as I discussed here previously). One place I have found this particularly useful is in checking to make sure that there is no data peeking when doing classification analysis. In this example (Github repo here), I show how one can use random shuffling of labels to test whether a classification procedure is illegally peeking at test data during classifier training. In the following function, there is an error in which the classifier is trained on all of the data, rather than just the training data in each fold:

def cheating_classifier(X,y):

skf=StratifiedKFold(y,n_folds=4)

pred=numpy.zeros(len(y))

knn=KNeighborsClassifier()

for train,test in skf:

knn.fit(X,y) # this is training on the entire dataset!

pred[test]=knn.predict(X[test,:])

return numpy.mean(pred==y)

Fit to a dataset with a true relation between the features and the outcome variable, this classifier predicts the outcome with about 80% accuracy. In comparison, the correct procedure (separating training and test data):

def crossvalidated_classifier(X,y):

skf=StratifiedKFold(y,n_folds=4)

pred=numpy.zeros(len(y))

knn=KNeighborsClassifier()

for train,test in skf:

knn.fit(X[train,:],y[train])

pred[test]=knn.predict(X[test,:])

return numpy.mean(pred==y)

predicts the outcome with about 68% accuracy. How would we know that the former is incorrect? What we can do is to perform the classification repeatedly, each time shuffling the labels. This is basically making the null hypothesis true, and thus accuracy should be at chance (which in this case is 50% because there are two outcomes with equal frequency). We can assess this using the following:

def shuffle_test(X,y,clf,nperms=10000):

acc=[]

y_shuf=y.copy()

for i in range(nperms):

numpy.random.shuffle(y_shuf)

acc.append(clf(X,y_shuf))

return acc

This shuffles the data 10,000 times and assesses classifier accuracy. When we do this with the crossvalidated classifier, we see that accuracy is now about 51% - close enough to chance that we can feel comfortable that our procedure is not biased. However, when we submit the cheating classifier to this procedure, we see mean accuracy of about 69%; thus, our classifier will exhibit substantial classification accuracy even when there is no true relation between the labels and the features, due to overfitting of noise in the test data.

Randomization is not perfect; in particular, one needs to make sure that the samples are exchangeable under the null hypothesis. This will generally be true when the samples were acquired through random sampling, but can fail when there is structure in the data (e.g. when the samples are individual subjects, but some sets of subjects are related). However, it’s often a very useful strategy when this assumption holds.

I’d love to hear other ideas about how to implement the principle of assumed error for computational analyses. Please leave your comments below!

1 This should have been simple, but I hit some snags that point to just how difficult it can be to build truly reproducible analysis workflows. Running the code on my Mac, I found that my tests passed (i.e. the correlation between the estimated parameters using EZ-diffusion and the actual parameters used to generate the data was > 0.9), confirming that my implementation seemed to be accurate. However, when I ran it on CircleCI (which implements the code within a Ubuntu Linux virtual machine), the tests failed, showing much lower correlations between estimated and actual values. Many things differed between the two systems, but my hunch was that it was due to the R code that was used to generate the simulated data (since the EZ diffusion model code is quite simple). I found that when I updated my Mac to the latest version of the rtdists package used to generate the data, I reproduced the poor results that I had seen on the CircleCI test. (I turns out that the parameterization of the function that was using had changed, leading to bad results with the previous function call.). My interim solution was to simply install the older version of the package as part of my CircleCI setup; having done this, the CircleCI tests now pass as well. ^↩