A hallmark of good practice in scientific computing is simulation reproducibility: ensuring that all computational results in a simulation can be re-generated when needed. Computational work is difficult and time-consuming, and it is easy to jump straight to publication of results without checking to ensure that all the steps to obtain them have been properly documented.

Outline of a Paper Repository

This is an example of a paper repository containing simulation results. The Alamo convention is used here, but the principles can be followed for any simulation results.

PaperDescription # Repo name should be all one word beginning with "Paper"
     ./main.tex # Always call main.tex
     ./main.pdf # Always ensure that your .gitignore is
     ./main.out # up to date
     ./figures/ # Put illustrations ONLY here
         MyFigure.svg # If saving as SVG, save a copy as
         MyFigure.pdf # a PDF that can be generated from the SVG
     ./results/ # Everything relating to actual results goes here
         TestCaseA/ # Subdirectory for different result types
             output1/ # Each simulation has its own directory
                 input.in          # ALWAYS include the input file
                 metadata          # ALWAYS store all metadata including git IDs
                 diff.patch        # If the reference code has changed since the most recent  
                                   # commmit, store the git patch  
                 smalldatafile.dat # Store small data  - if possible and not too many
                 bigdatafile.dat   # Do not store large or binary files
                 figure1.pdf       # Store all figures presenting this data in the folder
             output2/
             ...
             postprocess.py    # Any scripts used for more than one dataset should go here
             comparefigure.pdf # Figures that compare more than one dataset should go here
         TestCaseB/
         ...

Guiding Principles

Use the following principles when organizing your data.

  • Every paper is a git repository. Every paper is written on overleaf and can be cloned to your desktop using git.
  • Simulation data is stored in the associated paper’s git repository. All simulation files that are small enough to store in a git repository should be stored inside the git repository associated with the paper. Data that is too large to store in a git repository should still be stored in the same directory, just not added to the repository. (You can use a .gitignore file for this).
  • Every simulation gets its own simulation subdirectory: Each simulation result should be stored in a self-contained directory with a unique name. Here, “self-contained” means that you should be able to send only the contents of the directory to someone else for them to reproduce your results.
  • The simulation directory contains everything needed to generate the simulation: this means input files, data, etc.
  • Visualizations are stored as close to the data as possible: if your visualization (for example a figure) contains data from a single simulation, it should be stored in the simulation’s output directory. If it contains data from multiple simulations, it should be stored in the lowest directory possible that contains both output directories.
  • Scripts to generate visualizations are stored as close to the visualizations as possible: and, if possible, named similarly. A figure titled “stress_xx.pdf” is ideally generated using a script called “stress_xx.py” and stored in the same directory.
  • Results are not figures: The figures directory is for figures only, meaning illustrations that do not contain actual scientific results.