More on Research Workflow

Home Research Resume Blog

Date: 2020-12-22

I’m now a semester into my Ph.D. at Brandeis, and after a couple of projects, I’ve now got some thoughts on a suitable research workflow. Note that this is simply what works for me at this time, and may change in the future.

What was worth it

Experiment management software – Guild.AI
- The TUI for guild compare is great, for example
Scripts with verbose arguments
- Do: --input, --output, --mode
- Don’t: -i, -o, -m
Use click for creating CLIs
Coreutils + command line tools like grep, sed, awk, cut, paste, ...
In particular, the above tools are very, very useful inside little shell scripts
Infrastructure-as-code is suprisingly worthwhile (e.g. guild.yml)
Symlinking is also very worthwhile
- Create an experiments folder that is the sole entry point for data on that experiment
- Inside each experiment folder, simply symlink to data / checkpoints etc.
- This way, any script only needs to navigate to the folder of that particular experiment

What I never found the use case for

A lot of examples seem to have train & evaluation done in the same script
This is often not true when you train with one metric & evaluate using another.
Case in point: evaluating MT / seq2seq outputs decoded using beam search
I also haven’t yet felt the need for cookiecutter or similar tools that produce boilerplate code.