Software solutions
Many scientists spend a good proportion of their time using software to organize and analyse data, which means they also spend time writing, debugging and maintaining software. But only a handful of researchers have ever been taught how to do this efficiently. After an introductory course in C, Java or Fortran, many scientists have to rediscover or reinvent the rest of programming on their own. As a result, they spend too much time wrestling with software when they would be better off pursuing their research — studying gene expression in mustard plants or the molecular dynamics of nanotubes, for example.
What is potentially worse is that most scientists don't know how reliable their software is, and are often unable to trace the results they publish back to the programs that produced them.
To address these issues, I am working with the Python Software Foundation, a non-profit organization devoted to advancing open-source technology, to develop a course that will teach scientists and engineers the 10% of software engineering they need to solve 90% of their problems. Our goal is not to turn bench researchers into computer scientists, but to introduce them to some open-source tools and working practices that can reduce the amount of time they spend programming by up to 25%.
Some of the topics the course addresses are the equivalent of good laboratory practice, such as using version-control systems to track changes to documents and data files, and automated unit testing to ensure that once a bug is fixed, it stays fixed. The course also looks at how to read and manipulate common data formats, from plain text to XML, how to integrate new data with legacy code (such as the tangled Fortran program you just inherited from your PhD supervisor), and how web-based tools such as wikis and weblogs can be used to streamline and coordinate geographically distributed research teams.
A draft version of the course notes will soon be online at http://www.third-bit.com/swc. All of the material will be freely available for personal, academic and commercial use under an open-source licence, and questions and comments are welcome.





