Kartchner Bioinformatics Scripts


These scripts are written in the Ruby programming language. You can view more information here.

You must have the Ruby programming language installed to use these scripts. Type

ruby -v
into the prompt to make sure you have Ruby installed.

Many of these scripts also require the BioRuby gem to be installed. You can find information about installing the BioRuby gem, view the documentation here.

split-fasta.rb

Download

Splits a fasta file into smaller fasta files.

Usage: split-fasta.rb [options] file1 file2 ...
    -v, --verbose                    Print more info
    -l, --log                        Log information to file
    -r, --records [NUM]              Number of records to get (defaults: all)
    -n, --per-file [NUM]             Number of records per file (default: 100)
    -o, --offset [NUM]               Number of records to skip (default: 0)
    -d, --dir [DIR]                  Output dir (default: out)
    -p, --print                      Print reads
    -t, --test                       Test file lengths afterward
    -c, --deep-test                  Test file contents afterward
    -h, --help                       Display this screen

Example:

split-fasta.rb -n 1000 -d sample_split my_fasta_file.fasta

This example splits the fasta file into a series of fasta files containing 1000 reads each. The files will be in the directory 'sample_split'.


multi-blast.rb

Download

Use this script to submit multiple blast jobs to the PBS system.

Usage: multi-blast.rb [options]
    -v, --verbose                    Print more info
    -i, --input-dir [NAME]           Input directory
    -o, --output-dir [NAME]          Output directory
        --output-script              Output script rather than submitting it
    -e, --extension [NAME]           Extension of fasta files [default: fna]
    -p, --prefix [NAME]              PBS job prefix [default: kartchner]
    -g, --group [NAME]               Queue to use [default: standard]
    -q, --queue [NAME]               Number of CPUs to use [default: rmaier]
    -n, --num-cpus [NUM]             Number of CPUs to use [default: 12]
    -r, --ram [NUM]                  Amount of RAM to use [default: 23gb]
    -c, --cpu-time [NUM]             Amount of CPU time to use [default: 50:0:0]
    -w, --wall-time [NUM]            Amount of Wall time to use [default: 7:0:0]
    -s, --start [NUM]                Start Index [default: 0]
    -z, --end [NUM]                  End index [default: 50]
    -d, --db-path [PATH]             BLAST Database path [default: /genome/nr]
    -b, --base-name [NAME]           Base name of FASTA files
        --blast-path [PATH]          Path of blast executable
    -t, --test-only                  Test instead of submitting jobs

Example:

multi-blast.rb -i /gsfs1/xdisk/bmf/datasets/main -o /gsfs1/xdisk/bmf/results -b my_fasta_file_ -q windfall {0..9}

This example submits a series of blast jobs to the PBS system. This script should be used with files created with the split-fasta.rb script. The 'base name' (my_fasta_file_) is found by looking at your fasta files. Take everything up to the number part of the filename, and that is the base name. E.g., if you have my_fasta_file_1, my_fasta_file_2, ..., your base name is 'my_fasta_file_'.


combine-blast.rb

Download

Used to combine multiple blast output files into one file.

Usage: combine-blast.rb [options] file1 file2 ...
    -v, --verbose                    Print more info
    -l, --log                        Log information to file
    -f, --output-file [FILE]         Number of records to skip (default: 0)
    -d, --dir [DIR]                  Output dir (default: out)
    -h, --help                       Display this screen

Example:

combine-blast.rb -f output.blastx *.blastx

This example will copy all blastx files in the current directory into a single new file called 'output.blastx'. Make sure you have enough space to contain both the original and new copies of the data.


fasta-rewrite.rb

Download

Reads in a fasta file and then rewrites the file

Usage: fasta-rewrite.rb [options]
    -v, --verbose                    Print more info
    -l, --log                        Log information to file
    -o, --out-file [FILE]            Output file
    -d, --dir [DIR]                  Output dir (default: out)
    -i, --in-file [FILE]             Input file
    -h, --help                       Display this screen

Example:

fasta-rewrite.rb -i input.fasta -o output.fasta

This example reads in a fasta file and writes each read to a new file. If your fasta file seems to contain errors, try using this script to remove them.


fasta-stat.rb

Download

Counts the reads in a fasta file

Usage: fasta-stat.rb [options] file1 file2 ...
    -v, --verbose                    Print more info
    -c, --count                      Count reads in file
    -h, --help                       Display this screen

Example:

fasta-stat.rb -c my_fasta_file.fasta

This example counts the reads in a fasta file.


fastq-stat.rb

Download

Calculates Mean, Median, Mode, and Standard Deviation for FASTQ data

Usage: fastq-stat.rb [options] file1 file2 ...
    -v, --verbose                    Print more info
    -n, --num-records [NUM]          Number of records to get (defaults: all)
    -h, --help                       Display this screen

Example:

fastq-stat.rb my_fastq_data.fastq

This example generates statistics for the file 'my_fastq_data.fastq'.


qsub.rb

Download

Use this to submit a job to the PBS system without creating a separate script

Important: You must use absolute paths in your command when using this script. If you use relative paths, the PBS system may not be able to find your files.

Usage: qsub.rb [options] command
    -v, --verbose                    Print more info
    -o, --output-script              Output script to file and do not submit
    -n, --num-cpus [NUM]             Number of CPUs to use
    -r, --ram [NUM]                  Amount of RAM to use
    -c, --cpu-time [NUM]             Amount of CPU time to use
    -w, --wall-time [NUM]            Amount of Wall time to use
    -t, --test-only                  Test and print script to stdout

Example:

qsub.rb -v -n 12 /bin/blastx -num_threads 8 -db /genome/nr -query /datasets/main/organism.fna

Submits the blastx command as a PBS job using 12 CPUs and printing verbose information regarding the operation of the script.


quality-filter.rb

Download

Filters FASTA data and provides detailed information about the process.

Usage: quality-filter.rb [options] file
    -v, --verbose                    Print more info
    -p, --print                      Print out sequence data in FastQ format
    -l, --log                        Log the output
    -b, --output-bad-seqs            Output bad sequences
    -d, --output-dir [DIR]           Select output dir (default: out)
    -o, --output [FILE]              Select output file (default: output.fasta)
    -r, --num-records [NUM]          Number of records to get (default: all)
    -h, --help                       Display this screen

Example:

quality-filter.rb -b -d output_dir -o quality-filtered.fasta

In this example, the -b switch causes the script to output the sequences that are excluded by the quality filter rather than discard them. The -d switch chooses the output directory, and the -o switch chooses the output file.