Package blast

import "github.com/TuftsBCB/apps/blast"
Overview
Index
Examples

Overview ▾

Package blast provides functions and types to help with running any of the BLAST suite of programs. Namely, this package defines an interface `Blaster` whereby values of types that implement it can execute a BLAST search using the `Blast` function in this package.

The results of a BLAST search are captured as XML data and loaded into the `BlastResults` structure automatically.

Note that this is not a package for executing remote BLAST queries on NCBI's web page, but rather, running local programs like "blastp" on a local database.

type BlastHSP

type BlastHSP struct {
    XMLName     xml.Name `xml:"Hsp"`
    Num         int      `xml:"Hsp_num"`
    BitScore    float64  `xml:"Hsp_bit-score"`
    Score       float64  `xml:"Hsp_score"`
    EValue      float64  `xml:"Hsp_evalue"`
    QueryFrom   int      `xml:"Hsp_query-from"`
    QueryTo     int      `xml:"Hsp_query-to"`
    HitFrom     int      `xml:"Hsp_hit-from"`
    HitTo       int      `xml:"Hsp_hit-to"`
    PatternFrom int      `xml:"Hsp_pattern-from"`
    PatternTo   int      `xml:"Hsp_pattern-to"`
    QueryFrame  int      `xml:"Hsp_query-frame"`
    HitFrame    int      `xml:"Hsp_hit-frame"`
    Identity    int      `xml:"Hsp_identity"`
    Positive    int      `xml:"Hsp_positive"`
    Gaps        int      `xml:"Hsp_gaps"`
    AlignLength int      `xml:"Hsp_align-len"`
    Density     int      `xml:"Hsp_density"`
    AlignQuery  string   `xml:"Hsp_qseq"`
    AlignHit    string   `xml:"Hsp_hseq"`
    AlignMiddle string   `xml:"Hsp_midline"`
}

type BlastHit

type BlastHit struct {
    XMLName   xml.Name   `xml:"Hit"`
    Num       int        `xml:"Hit_num"`
    Id        string     `xml:"Hit_id"`
    Def       string     `xml:"Hit_def"`
    Accession string     `xml:"Hit_accession"`
    Length    int        `xml:"Hit_len"`
    Hsps      []BlastHSP `xml:"Hit_hsps>Hsp"`
}

type BlastIteration

type BlastIteration struct {
    XMLName  xml.Name        `xml:"Iteration"`
    Num      int             `xml:"Iteration_iter-num"`
    QueryID  string          `xml:"Iteration_query-ID"`
    QueryDef string          `xml:"Iteration_query-def"`
    QueryLen int             `xml:"Iteration_query-len"`
    Hits     []BlastHit      `xml:"Iteration_hits>Hit"`
    Stats    BlastStatistics `xml:"Iteration_stat>Statistics"`
    Message  string          `xml:"Iteration_message"`
}

type BlastParams

type BlastParams struct {
    XMLName     xml.Name `xml:"Parameters"`
    Matrix      string   `xml:"Parameters_matrix"`
    Expect      float64  `xml:"Parameters_exect"`
    Include     float64  `xml:"Parameters_include"`
    ScMatch     int      `xml:"Parameters_sc-match"`
    ScMismatch  int      `xml:"Parameters_sc-mismatch"`
    GapOpen     int      `xml:"Parameters_gap-open"`
    GapExtend   int      `xml:"Parameters_gap-extend"`
    Filter      string   `xml:"Parameters_filter"`
    Pattern     string   `xml:"Parameters_pattern"`
    EntrezQuery string   `xml:"Parameters_entrez-query"`
}

type BlastResults

type BlastResults struct {
    XMLName    xml.Name         `xml:"BlastOutput"`
    Program    string           `xml:"BlastOutput_program"`
    Version    string           `xml:"BlastOutput_version"`
    Reference  string           `xml:"BlastOutput_reference"`
    DB         string           `xml:"BlastOutput_db"`
    QueryID    string           `xml:"BlastOutput_query-ID"`
    QueryDef   string           `xml:"BlastOutput_query-def"`
    QueryLen   int              `xml:"BlastOutput_query-len"`
    QuerySeq   string           `xml:"BlastOutput_query-seq"`
    Params     BlastParams      `xml:"BlastOutput_param>Parameters"`
    Iterations []BlastIteration `xml:"BlastOutput_iterations>Iteration"`
}

BlastResults is the top-level struct for representing XML output of the BLAST family of programs. Subsequent XML elements are represented with other `Blast*` types.

The types are meant to be comprehensive with respect to NCBI's DTD found here: http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd. Note that the meat is really here: http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.mod.dtd.

func Blast

func Blast(blaster Blaster) (*BlastResults, error)

Blast executes the search query described by blaster. Search results are returned from Blast's XML output format mode.

Example

ExampleBlast demonstrates a very simple protein BLAST search. Note that you'll need to change `dbPath` to your own local BLAST database. The one I used in the example is a BLAST database containing all of the protein sequences from each strain of yeast from http://www.yeastgenome.org.

Code:

dbPath := "/home/andrew/research/repeats/data/blast/amino"
sequence := seq.Sequence{
    Name: "YAL001C",
    Residues: []seq.Residue(`
MVLTIYPDELVQIVSDKIASNKGKITLNQLWDISGKYFDLSDKKVKQFVLSCVILKKDIE
VYCDGAITTKNVTDIIGDANHSYSVGITEDSLWTLLTGYTKKESTIGNSAFELLLEVAKS
GEKGINTMDLAQVTGQDPRSVTGRIKKINHLLTSSQLIYKGHVVKQLKLKKFSHDGVDSN
PYINIRDHLATIVEVVKRSKNGIRQIIDLKRELKFDKEKRLSKAFIAAIAWLDEKEYLKK
VLVVSPKNPAIKIRCVKYVKDIPDSKGSPSFEYDSNSADEDSVSDSKAAFEDEDLVEGLD
NFNATDLLQNQGLVMEEKEDAVKNEVLLNRFYPLQNQTYDIADKSGLKGISTMDVVNRIT
GKEFQRAFTKSSEYYLESVDKQKENTGGYRLFRIYDFEGKKKFFRLFTAQNFQKLTNAED
EISVPKGFDELGKSRTDLKTLNEDNFVALNNTVRFTTDSDGQDIFFWHGELKIPPNSKKT
PNKNKRKRQVKNSTNASVAGNISNPKRIKLEQHVSTAQEPKSAEDSPSSNGGTVVKGKVV
NFGGFSARSLRSLQRQRAILKVMNTIGGVAYLREQFYESVSKYMGSTTTLDKKTVRGDVD
LMVESEKLGARTEPVSGRKIIFLPTVGEDAIQRYILKEKDSKKATFTDVIHDTEIYFFDQ
TEKNRFHRGKKSVERIRKFQNRQKNAKIKASDDAISKKSTSVNVSDGKIKRRDKKVSAGR
TTVVVENTKEDKTVYHAGTKDGVQALIRAVVVTKSIKNEIMWDKITKLFPNNSLDNLKKK
WTARRVRMGHSGWRAYVDKWKKMLVLAIKSEKISLRDVEELDLIKLLDIWTSFDEKEIKR
PLFLYKNYEENRKKFTLVRDDTLTHSGNDLAMSSMIQREISSLKKTYTRKISASTKDLSK
SQSDDYIRTVIRSILIESPSTTRNEIEALKNVGNESIDNVIMDMAKEKQIYLHGSKLECT
DTLPDILENRGNYKDFGVAFQYRCKVNELLEAGNAIVINQEPSDISSWVLIDLISGELLN
MDVIPMVRNVRPLTYTSRRFEIRTLTPPLIIYANSQTKLNTARKSAVKVPLGKPFSRLWV
NGSGSIRPNIWKQVVTMVVNEIIFHPGITLSRLQSRCREVLSLHEISEICKWLLERQVLI
TTDFDGYWVNHNWYSIYEST*
`),
}

blaster := NewBlastp([]seq.Sequence{sequence}, dbPath)
blaster.SetFlag("evalue", 0.1)

results, err := Blast(blaster)
if err != nil {
    fmt.Println(err)
    return
}

hit := results.Iterations[0].Hits[0].Def
fmt.Println(strings.Contains(strings.ToLower(hit), "tfc3"))

Output:

true

type BlastStatistics

type BlastStatistics struct {
    XMLName      xml.Name `xml:"Statistics"`
    NumSequences int      `xml:"Statistics_db-num"`
    Length       int      `xml:"Statistics_db-len"`
    HSPLength    int      `xml:"Statistics_hsp-len"`
    EffSpace     float64  `xml:"Statistics_eff-space"`
    Kappa        float64  `xml:"Statistics_kappa"`
    Lambda       float64  `xml:"Statistics_lambda"`
    Entropy      float64  `xml:"Statistics_entropy"`
}

type Blaster

type Blaster interface {
    // Executable should return the blast executable to run.
    Executable() string

    // CmdArgs should return a list of command line flags to pass to the
    // blast executable. This list must not include the `-outfmt` flag,
    // since clients of this interface may set it in order to retrieve
    // results in an expected format.
    CmdArgs() []string

    // Stdin, when not nil, will be used for the stdin of the blast process.
    Stdin() io.Reader
}

Blaster represents values that can execute a BLAST search. This package provides some slim implementations of this interface for a couple variations of BLAST. Clients requiring access to some of BLAST's more sophisticated options should provide their own Blaster.

type Query

type Query struct {
    // The BLAST executable to use.
    Exec string
    // contains filtered or unexported fields
}

Query is a generic blaster for any type of BLAST search. It provides a thin wrapper around setting command line flags to pass to a BLAST executable.

func NewBlastn

func NewBlastn(queries []seq.Sequence, database string) *Query

NewBlastn calls NewQuery with "blastn" as the executable.

func NewBlastp

func NewBlastp(queries []seq.Sequence, database string) *Query

NewBlastp calls NewQuery with "blastp" as the executable.

func NewQuery

func NewQuery(exec string, queries []seq.Sequence, database string) *Query

NewQuery constructs a generic blast search with default parameters. Parameters can be overridden using the `SetFlag` method.

Note that `queries` may have length 0. If it does, then the obligation is on the caller to set the `-query` flag (or provide some other means of giving BLAST a search query).

This also sets the `-num_threads` flag to the number of logical CPUs on your machine.

func (Query) CmdArgs

func (fs Query) CmdArgs() []string

func (*Query) Executable

func (b *Query) Executable() string

func (*Query) SetFlag

func (b *Query) SetFlag(name string, value interface{})

SetFlag adds a command line switch (without the proceeding "-") to the set of blastp arguments. `value` should be a string, integer, float, bool or other type with an appropriate `Stringer` implementation that results in a valid command line flag value.

If `value` is `false`, then the flag is removed from the blastp arguments.

func (*Query) Stdin

func (b *Query) Stdin() io.Reader