Timothy Wood - Hints and Scripts for Running Experiments

Hints and Scripts for Running Experiments

Running experiments is an important part of doing research, but it often involves a lot of tedious steps. Here are some hints and scripts for making the process a bit less painful.

For a very brief bash primer, check Learn x in y minutes.

Basic Experiment Shell Script

At the very start of your experimenting you may need to do several manual tests to figure out what parameters or types of scenarios to run. However, as soon as you can you should try to automate as much of the experimental process as possible. Here is the skeleton for a basic bash shell script. Even if you don't know bash, you should be able to pick up what the script is doing from the comments (hint, comments are lines starting with "#").

# Simple Experiment Script
# by T.W.W. - Public Domain 2011
# Parse some command line input paramters, and verify they were set
if [ -z $param1 ]
  echo "ERROR: Requires 2 arguments: $0 [Num Runs] [param 1]"
# Create directory for output files. Experiment will only be run if it will not overwrite an old data set
mkdir -p output
echo "Starting experiment with $numberOfRuns iterations and param1=$param1 at " `date`
# Repeat the experiment $numberOfRuns times
for i in `seq $numberOfRuns`
  #check if you have already run this data set
  if [ -s $outputFile ]
    echo "WARNING: File $outputFile already exists -- skipping run $i"
  echo "Running iteration $i " `date`
  # Put the commands to actually run your program / system here
  ./runSystem $param1 > $outputFile
  # possibly put some clean up steps here or a sleep command if you want a delay between experiments
# Do any final cleanup here
echo "Finished at time " `date`

Parsing the Output

Next you need to be able to analyze the data from your experiment. Let's assume your program outputs a sequence of numbers in three columns like this:

#Interval RespTime Ops
1         15       145
2         19       157
3         22       160
4         27       159
5         23       142
6         21       136

The first column is the measurement interval, the second response time, and the third is the number of operations performed (maybe we are getting data from a web or database server).

Computing Statistics

The first thing we might want to do is compute the total of a column. We can easily use Awk for this:

awk '{sum+=$3} END{print sum}' datafile.txt

This will print the sum of the third column in the file datafile.txt.

Or maybe we want the average of a column. Again, awk to the rescue:

awk '{sum+=$2} END{print sum/NR}' datafile.txt

This will print the average of the second column in the file datafile.txt.

Or better yet, lets get both the average and standard deviation:

awk '{sum+=$2; sumsq+=$2*$2} END {print "Avg: " sum/NR "  StdDev: " sqrt(sumsq/NR - (sum/NR)**2)}'
learn/experiments.txt · Last modified: 2013/11/22 18:59 by twood