Friday, May 20, 2016

Advice for learning to code from scratch

I met this week with a psychology student who was interested in learning to code but had absolutely no experience.  I personally think it’s a travesty that programming is not part of the basic psychology curriculum, because doing novel and interesting research in psychology increasingly requires the ability to collect and work with large datasets and build new analysis tools, which are almost impossible without solid coding skills.  

Because it’s been a while since I learned to code (back when programs were stored on cassette tapes), I decided to ask my friends on the interwebs for some suggestions.  I got some really great feedback, which I thought I would synthesize for others who might be in the same boat.  

Some of the big questions that one should probably answer before getting started are:

  1. Why do you want to learn to code?  For most people who land in my office, it’s because they want to be able to analyze and wrangle data, run simulations, implement computational models, or create experiments to collect data.  
  2. How do you learn best?  I can’t stand watching videos, but some people swear by them.  Some people like to just jump in and start doing, whereas others like to learn the concepts and theories first.  Different strokes...
  3. What language should you start with?  This is the stuff of religious wars.  What’s important to realize, though, is that learning to program is not the same as learning to use a specific language.  Programming is about how to think algorithmically to solve problems; the specific language is just an expression of that thinking.  That said, languages differ in lots of ways, and some are more useful than others for particular purposes.  My feeling is that one should start by learning a first-class language, because it will be easier to learn good practices that are more general.  Your choice of a general purpose language should probably be driven by the field you are in; neuroscientists are increasingly turning to Python, whereas in genomics it seems that Java is very popular.  I personally think that Python offers a nice mix of power and usability, and it’s the language that I encourage everyone to start with.  However, if all you care about doing it performing statistical analyses, then learning R might be your first choice, whereas if you just want to build experiments for mTurk, then Javascript might be the answer.  There may be some problem for which MATLAB is the right answer, but I’m no longer sure what it is. A caveat to all of this is that if you have friends or colleagues who are programming, then you should strongly consider using whatever language they are using, because they will be your best source of help.
  4. What problem do you want to solve?  Some people can learn for the sake of learning, but I find that I need a problem in order to keep me motivated.  I would recommend thinking of a relevant problem that you want to solve and then targeting your learning towards that problem.  One good general strategy is to find a paper in your area of research interest, and try to implement their analysis. Another (suggested by Christina van Heer) is to take some data output from an experiment (e.g. in an Excel file), read it in, and compute some basic statistics.  If you don't have your own data, another alternative is to take a large open dataset (such as health data from NHANES or an openfmri dataset from openfmri.org ) and try to wrangle the data into a format that lets you ask an interesting question.
OK then, so where do you look for help in getting started?

The overwhelming favorite in my social media poll was codeacademy.  It offers interactive exercises in lots of different languages, including Python.  Another Pythonic suggestion was http://learnpythonthehardway.org/book/ which looks quite good. 

For those of you who prefer video courses, there were also a number of votes for online courses, including those from Coursera:
And  FutureLearn:
If you like video courses then these would be a good option.  

Other suggestions included:

Here are some suggested sites with various potentially useful tips




Finally, it’s also worth keeping an eye out for local Software Carpentry workshops.

If you have additional suggestions, please leave them in the comments!

8 comments:

  1. For Matlab (I know, soooo old-fashioned) the easiest way to start is probably Geoff and my book

    http://www.amazon.com/Matlab-Behavioral-Sciences-Program-Experiment/dp/0195320689/ref=sr_1_1?ie=UTF8&qid=1463778426&sr=8-1&keywords=matlab+for+the+behavioral+sciences

    ReplyDelete
  2. Do you have any thoughts/ recommendations on how to establish coding in the psych undergraduate curriculum?

    ReplyDelete
  3. Surely C/C++ should be mentioned? They're hard, yes, but my God will you learn a lot.

    ReplyDelete
    Replies
    1. Sure, but where to stop? Why not recommend some Lisp or Haskell?
      C was the first programming language I learned, but I feel its current relevance is only for performance optimization, not for science.

      Delete
  4. I also recommend the series of books by Allen Downey (e.g. Think Python), available free online at http://greenteapress.com/wp/

    ReplyDelete
  5. Late to the game here but PyQuick from Google was a great way for me to get started with Python.

    https://developers.google.com/edu/python/introduction

    ReplyDelete
  6. Nice, I really like DJ Mannions lectures. They are really good for a Psychologist wanting to learn programming. Personally, I think that if you manage to learn Python, which is a quite easy language to learn, you can learn more advanced languages later.

    I also would like to add a very recent guide on how to use Python programming in Psychology. I came across this post Python programming in Psychology. In that post you get to learn how to use Python from creating your experiment to visualizing and analysing collected data.

    ReplyDelete