Demonstration of Cookiecutters in Neuroscience Research Onboarding


Abstract
This post demonstrates using Cookiecutter to streamline experiment and session code base onboarding. The example comes from a Neuroscience paper about Neurodata without borders and Datajoint. But it is applicable to almost any research areas where code between researchers and groups have a lot in common.

Pre-requisite

  • Familiarity with Python
  • Familiarity with virtual environments
  • Interested in big picture story of using Cookiecutter rather than created one
  • Familiarity with Github and basic phrases like fork

Relevance
Setting up code bases requires generating files and making decisions like license, requirements, base code, programming language, file structure and other factors. Researchers needing to use code and have an agreed-upon configuration already can streamline onboarding by using a Cookiecutter. This saves times and help reduce bugs due to manually updating code.

Introduction
Sometimes in Neuroscience, different researchers like Grad students and Post-doc need to use the same equipment, experimental set up and code base; but with slight modifications like different credentials or file directory. In regards to the code and differences like credentials, a common approach is to copy-and-paste existing code and search-and-replace areas that are unique to a researcher’s experimental setup. A better approach would be to use a template-system that request information from the researcher and then uses that information to create the code base. This blog will highlight one potential solution – Cookiecutter and demonstrates it efficacy with an example. The example is take from Reimer et al from the Tan lab in Yale. This post does not show you how to develop a Cookiecutter – Just demonstrates how to use one.

What is a Cookiecutter?
Cookiecutter is a templating system that gets information from users and uses that information to autogenerate populate part of code. The Cookiecuter system this post will discuss is one from Audrey Greenfield who is famous in the Djanjo community. It is commonly used for Python projects and is written mostly in Python. But can be used for any other languages like Matlab, C++ and more. The post focuses on consuming a ready-made Cookiecutter and will not discussion creating a Cookiecutter project from scratch. Matias Calderini wrote an excellent post on building a Cookiecutter project and you can study his work to learn how to make one yourself – https://maticalderini.github.io/blog/tutorial/2020/04/13/cookiecutter.html.

Demonstration
In the paper by Reimer el al, there are a number of Jupyter notebooks files a researcher can modify freely and use for their own experiments (it is recommended you read the paper and explore the code.) In using the code in the paper for a specific experimental session, what a researcher needs to do is fork the code off https://github.com/MarikeReimer/Big-Data-with-DataJoint-and-NWB/ and add credentials manually. To demonstrate the power of Cookiecutter, two of the files from the paper will be examined- Chapter2WorkingWithNWBAndDataJointData.ipynb and ImportsAndTableDefinitions.py.

Cookiecutter simply involves pulling down a Cookiecutter, setting values and using the autogenerated code. To start off, first open up a virtual environment. You can use your favorite tool. But this demonstration will use conda.

conda create -n openFieldStudy python=3.8

Now, activate the virtual environment

conda activate openFieldStudy

Next, download Cookiecutter.

conda install cookiecutter

This Cookiecutter code basehttps://github.com/zoldello/OpenFieldStudyCookieCutter was made specifically for this post. Again, a detail discussion of how to made is beyond the scope of this post. But you can find material on a web search, YouTube or the documentation if interested in making your own.

When creating a code base off https://github.com/zoldello/OpenFieldStudyCookieCutter, you will be asked a series of questions. You can either keep pressing the enter (return) key to use the default or add values yourself. The command to create the code base off the Cookiecutter is –

cookiecutter https://github.com/zoldello/OpenFieldStudyCookieCutter.git

You will see a list of options. They will come up one at a time. The final screen display is –

name [Phil]:
email [phil@example.com]:
twitter_handle [zoldello]:
github_username [zoldello]:
project_name [Session]:
lab [Tan Lab]:
project_slug [session]:
project_short_description [Python Boilerplate contains all the boilerplate you need to create a NWB and DataJoint Pipeline to process mouse location preference-experiments.]:
pypi_username [zoldello]:
version [0.1.0]:
Select open_source_license:
1 - MIT license
2 - BSD license
3 - ISC license
4 - Apache Software License 2.0
5 - GNU General Public License v3
6 - Not open source
Choose from 1, 2, 3, 4, 5, 6 [1]:
database_host [tutorial-db.datajoint.io]:
database_user [marikelreimer]:
database_password [AreRA3c5yc]:
subject_id [Mouse_5025]:
session_id [Session_22]:
experiment_path [C:/Users/meowm/OneDrive/DataWarehouse/Experimenter1]:

The code is stored in an autogenerated folder named- Session. Enter it –

cd Session

Look at the content –

ls

You should see four file –

  • Chapter2WorkingWithNWBAndDataJointData.ipynb
  • ImportsAndTableDefinitions.py
  • notes.txt
  • requirements.txt

Pull down dependencies with –

pip install -r requirements.txt

While it is best to sticking with conda when using a conda virtual environment (conda install --file requirements.txt), some of the packages could not be installed with conda. So I reverted to using pip.

Examing the code base. The Cookiecutter did things like –

  • Populated your data notes.txt with the options you selected.
  • Added your credential to Chapter2WorkingWithNWBAndDataJointData.ipynb
  • Added your credential to ImportsAndTableDefinitions.py.

Anyone can get the code, answer the questions as it suits them and they are up and ready very quickly.

There are a variety of Cookiecutter freely available online. You can search on Github – https://github.com/search?q=cookiecutter&type=Repositories albeit most are not directly geared toward Neuroscience. If you create one you feel others will value, you can post it on PyPi.

Conclusion
This post illustrates the merits of using Cookiecutter. As Neuroscience and Genetics research moves ever so more into heavy Software Development usage, a lot of practices honed in professional Software Development can be borrowed to streamline scientific research. If you find the Cookiecutter technique useful, you can adopt it in your own experiment/session code base. Otherwise, it is good to be familiar with it in case it ever becomes useful in the future for you or a colleague.

Notes

  • The associated code- https://github.com/zoldello/OpenFieldStudyCookieCutter is meant for illustrate purposes. It is not production ready nor intended to be. You have to understand the principle and apply it to your system if you so desire. Please, don’t blindly copy-and-paste it and hope for a miracle. Science is not a religion.
  • If you are wary of using plain-text passwords, you can store them in environmental variables or in .bashrc and read it off there or setup up a public/private key system. This blog post goal is to illustrate cookiecutter and not to dwell on computer security principles
  • I am not affliated with Tan lab nor contributor to Reimer et al. Any errors or omissions in this post is entirely my own – @zoldello

Homework

  • Try to dockerize the setup illustrated in this blog post.
  • Try adding underscore to the project name option. The system fails because of a hook that validate project name is not empty. You can explore hooks more.

References/Further Reading