A hallmark of good practice in scientific computing is simulation reproducibility: ensuring that all computational results in a simulation can be re-generated when needed. Computational work is difficult and time-consuming, and it is easy to jump straight to publication of results without checking to ensure that all the steps to obtain them have been properly documented.
Outline of a Paper Repository
This is an example of a paper repository containing simulation results. The Alamo convention is used here, but the principles can be followed for any simulation results.
PaperDescription # Repo name should be all one word beginning with "Paper"
./main.tex # Always call main.tex
./main.pdf # Always ensure that your .gitignore is
./main.out # up to date
./figures/ # Put illustrations ONLY here
MyFigure.svg # If saving as SVG, save a copy as
MyFigure.pdf # a PDF that can be generated from the SVG
./results/ # Everything relating to actual results goes here
TestCaseA/ # Subdirectory for different result types
output1/ # Each simulation has its own directory
input.in # ALWAYS include the input file
metadata # ALWAYS store all metadata including git IDs
diff.patch # If the reference code has changed since the most recent
# commmit, store the git patch
smalldatafile.dat # Store small data - if possible and not too many
bigdatafile.dat # Do not store large or binary files
figure1.pdf # Store all figures presenting this data in the folder
output2/
...
postprocess.py # Any scripts used for more than one dataset should go here
comparefigure.pdf # Figures that compare more than one dataset should go here
TestCaseB/
...
Guiding Principles
Use the following principles when organizing your data.
- Every paper is a git repository. Every paper is written on overleaf and can be cloned to your desktop using git.
- Simulation data is stored in the associated paper’s git repository. All simulation files that are small enough to store in a git repository should be stored inside the git repository associated with the paper. Data that is too large to store in a git repository should still be stored in the same directory, just not added to the repository. (You can use a .gitignore file for this).
- Every simulation gets its own simulation subdirectory: Each simulation result should be stored in a self-contained directory with a unique name. Here, “self-contained” means that you should be able to send only the contents of the directory to someone else for them to reproduce your results.
- The simulation directory contains everything needed to generate the simulation: this means input files, data, etc.
- Visualizations are stored as close to the data as possible: if your visualization (for example a figure) contains data from a single simulation, it should be stored in the simulation’s output directory. If it contains data from multiple simulations, it should be stored in the lowest directory possible that contains both output directories.
- Scripts to generate visualizations are stored as close to the visualizations as possible: and, if possible, named similarly. A figure titled “stress_xx.pdf” is ideally generated using a script called “stress_xx.py” and stored in the same directory.
- Results are not figures: The figures directory is for figures only, meaning illustrations that do not contain actual scientific results.