Tuesday, October 19, 2010

statistical redistricting: how to save lots of time and money and get just about the same result

I had promised myself that I wouldn't blog about politics, but this is really more about statistics so I think it's ok.

David Sparks has posted an interesting piece about using statistical clustering to determine US Congressional districts (h/t R-Bloggers).  He uses k-means clustering, and then analyzes the "partisanship" of the resulting districts by assuming that districts with above-median population density are Democratic and those with below-median density are Republican (I'm not sure how good an assumption that is).  The result is that you get much more reasonable looking districts than the crazy ones that politicians come up with, but the partisan balance doesn't seem to change (again, under the assumption that density=party).  Here is an example of the map for Texas:


This is, of course, way too reasonable to actually be put into practice.

Friday, October 15, 2010

My workflow for writing papers (or, why I switched to LaTeX)

In the last few years I have changed my workflow for writing papers pretty radically.  Previously, I used Microsoft Word along with Endnote as my primary platform (on the Mac, of course). My decision to change was driven by several factors:



  • I had grown tired of the klunkiness of Endnote and the lags in its integration with new versions of Microsoft Word. 

  • I had grown even more tired of Word's tendency to crash, or to do crazy things that could only be fixed by starting with a completely new file.

  • I was just starting to work on a book, and I knew that for a large project like that, using Word would be a nightmare. In addition, my coauthors and I wanted to use a source code management system to coordinate changes to the document, and this was not really practical with Word files.


In the end, I decided to move to LaTeX as my primary platform for writing papers and books.  For those not familiar with LaTeX, you can think of it as a markup language like HTML, only for writing papers rather than web pages.  Editing a paper in LaTeX is not WYSIWYG - that is, you don't see the actual layout of the paper as you type.  Rather, you have to typeset the paper in a separate step.  For example, a very short paper might look like this in LaTeX:



\documentclass[11pt]{article}
\title{My Article Article}
\author{Russ Poldrack}
\begin{document}
\maketitle
\section{Introduction}
This is the content of the paper.
\end{document}



Why on earth, you might ask, would I want to give up WYSIWYG editing to write my papers using some obscure markup language? The main reason is that it's very flexible, both in how you use it and what it can do.  Because the files are plain text, you can edit them using any editor you wish.  I use a package called TexShop which has a built-in editor and makes it easy to write, build, and view documents, but I know many others prefer emacs.  There are also many different packages and style files available, which allow a ton of flexibility in layouts and formatting.  Finally, the fact that they are plain text files with a known format means that you can do tricky things like automatically generating LaTeX files from the information in a spreadsheet or database.  I did this a couple of years ago when we had application packets from about 150 people for a summer course; I was able to take the application data from a web database and turn each person's data into a nicely-formatted package, all done using a few pages of python code.


Another major reason for moving was BibTex, which is the reference management system used with LaTeX.  After all of my annoyances with Endnote + Word, BibTeX was like a dream.  I use the BibDesk application to organize my libraries; it includes integrated searching of PubMed and other repositories and has met my needs almost perfectly.  It's also possible to export BibTeX libraries from Papers, but BibDesk is nice because it operates directly on the BibTeX library so there is no need to export.


There is only one thing that I seriously miss from my days of using Word, and that is the "Track Changes" feature for collaborative writing.  One can use unix tools like diff to find where two files differ, but that still only tells you which lines were changed, not what actual text was changed.  There is at least one open source tool that provides something similar to Word's track changes for LaTeX (LaTeX Diff) but I've not yet been able to get it to work on my Mac.


Another problem is that many of my collaborators are not LaTeX users, so I can't exactly send them a file of raw LaTeX code and expect them to edit it.  There are a couple of alternatives.  First is to save it as PDF and let the colleagues make comments on the file, but this doesn't let them actually edit the file.  What I generally do is export the file to rtf (using latex2rtf) and then send that to my colleagues.  Then I have to put their edits back into the LaTeX file by hand.  Not exactly optimal, but it gets the job done.


Writing papers using LaTeX is not for everyone.  There is definitely a learning curve, and occasionally things happen that require some pretty serious debugging.  It also helps if your collaborators are also LaTeX users.  But in general, it's been a welcome change from Word+Endnote.


Here are some resources that have been useful for me:




Thursday, October 14, 2010

Welcome

I've set up this personal site so I will have a place to spout off about the various things that I think about that don't fall under the purview of the my Huffington Post blog (which is focused on implications of research for daily living) or the Cognitive Atlas blog (which is focused on topics related to our Cognitive Atlas project).  I will probably focus mostly on science issues, productivity tools and workflows, and food and travel.