Package fragbag

import "github.com/TuftsBCB/fragbag"
Overview
Index
Subdirectories

Overview ▾

Package fragbag provides interfaces for using fragment libraries along with several implementations of fragment libraries. This package makes it possible for clients to define their own fragment libraries while reusing all of the infrastructure which operates on fragment libraries.

The central type of this package is the Library interface, along with its child interfaces: SequenceLibrary, StructureLibrary and WeightedLibrary. The Library interface states that all libraries have names, some collection of fragments of uniform size, possibly a sub library and a uniquely identifying tag. The tag is used to recapitulate the type of the fragment library (from the Openers map) when reading them from disk.

Libraries may also wrap other libraries to provide additional functionality. For example, the WeightedLibrary interface describes any fragment library that can weight the raw frequency of a fragment against a query. But this functionality can be added to existing libraries by wrapping them with additional information. (For example, see the implementation of the WeightedTfIdf library.)

A central design decision of this package is that all fragment libraries are immutable. Once they are created, they cannot be changed. Therefore, all actions defined by the Library interfaces never mutate an existing library.

Variables

var Openers map[string]MakeEmptyLib

Openers stores initializers for each type of fragment library. The keys should be values returned by the Tag method in the Library interface. Clients may add to this map, which will enable the Open function in this package to return your custom libraries. (N.B. This is not required if you don't want to use the Open function.)

func IsSequence

func IsSequence(lib Library) bool

IsSequence returns true if the given library is a sequence fragment library. Returns false otherwise. This also works on wrapped libraries. Namely, it will be recursively called on sub libraries.

func IsStructure

func IsStructure(lib Library) bool

IsStructure returns true if the given library is a structure fragment library. Returns false otherwise. This also works on wrapped libraries. Namely, it will be recursively called on sub libraries.

func Save

func Save(w io.Writer, lib Library) error

Save stores the given fragment library with the writer provided.

type Library

type Library interface {
    // Name returns a canonical name for this fragment library.
    Name() string

    // Size returns the number of fragments in the library.
    Size() int

    // FragmentSize returns the size of every fragment in the library.
    // All fragments in a library must have the same size.
    FragmentSize() int

    // Tag returns a uniquely identifying string for this type of fragment
    // library. It is used to dispatch on when opening a fragment library.
    Tag() string

    // SubLibrary returns a library contained inside of this one and returns
    // nil otherwise. When non-nil, this library is a wrapper library which
    // may implement both the StructureLibrary and SequenceLibrary interfaces.
    // When nil, it is guaranteed that only one of the interfaces will be
    // satisfied.
    SubLibrary() Library

    // String returns a custom string representation of the library.
    // This may be anything.
    String() string

    // FragmentString returns a custom string representation of the
    // given fragment.
    FragmentString(fragNum int) string

    // Fragment returns a representation of the sequence fragment
    // corresponding to fragNum. The representation is specific to the
    // library.
    Fragment(fragNum int) interface{}
}

Library defines the base methods necessary for any value to be considered a fragment library. All libraries that do *not* wrap another library should implement either the Structure or Sequence library interfaces and never both.

func Open

func Open(r io.Reader) (Library, error)

Open reads a library from the reader provided. If there is a problem reading or parsing the data as a library, an error is returned. If no error is returned, the Library returned is guarnateed to satisfy either the StructureLibrary or SequenceLibrary interfaces. It is possible that a wrapper library is returned which satisfy both the StructureLibrary and SequenceLibrary interfaces. This type of library can be inspected with the SubLibrary interface method, along with the IsStructure and IsSequence functions in this module.

type MakeEmptyLib

type MakeEmptyLib func(subTags ...string) (Library, error)

MakeEmptyLib represents a function that returns an empty value whose type implements the Library interface. This is used inside the Open function. Namely, when a fragment library file is opened, its tag is used to look up a function with this type in the Openers map. Once the empty value is retrieved, it is initialized with data from the fragment library file.

The subTags parameter is used when opening a library which wraps another library. Namely, it will contain all tags of libraries within it. It will be empty for a library that doesn't wrap another library.

type SequenceLibrary

type SequenceLibrary interface {
    Library

    // BestSequenceFragment returns the fragment number of the best matching
    // fragment against the sequence given. Note that the sequence given must
    // have length N where N is the size of each fragment in this library.
    //
    // If no "good" fragments can be found, then `-1` is returned.
    BestSequenceFragment(seq.Sequence) int

    // AlignmentProb returns the probability (as a negative log-odds) that
    // a query sequence matches a particular fragment.
    AlignmentProb(fragNum int, query seq.Sequence) seq.Prob
}

SequenceLibrary adds methods specific to the operations defined on a library of sequence fragments.

func NewSequenceHMM

func NewSequenceHMM(
    name string,
    fragments []*seq.HMM,
) (SequenceLibrary, error)

NewSequenceHMM initializes a new Fragbag sequence library with the given name and fragments.

Fragments for this library are represented as profile HMMs. Computing the best fragment for any particular sequence uses the score produced by Viterbi.

func NewSequenceProfile

func NewSequenceProfile(
    name string,
    fragments []*seq.Profile,
) (SequenceLibrary, error)

NewSequenceProfile initializes a new Fragbag sequence library with the given name and fragments. All sequence profiles given must have the same number of columns.

Fragments for this library are represented as regular sequence profiles. Namely, each column plainly represents the composition of each amino acid.

type StructureLibrary

type StructureLibrary interface {
    Library

    // BestStructureFragment returns the fragment number of the best matching
    // fragment against the alpha-carbon coordinates given. Note that there
    // must be N coordinates where N is the size of each fragment in this
    // library.
    //
    // If no "good" fragments can be found, then `-1` is returned.
    BestStructureFragment([]structure.Coords) int

    // Atoms returns a list of alpha-carbon coordinates for a particular
    // fragment.
    Atoms(fragNum int) []structure.Coords
}

StructureLibrary adds methods specific to the operations defined on a library of structure fragments.

func NewStructureAtoms

func NewStructureAtoms(
    name string,
    fragments [][]structure.Coords,
) (StructureLibrary, error)

NewStructureAtoms initializes a new Fragbag structure library with the given name and fragments. All fragments given must have exactly the same size.

type WeightedLibrary

type WeightedLibrary interface {
    Library

    // AddWeights turns a raw frequency into a weighted frequency. The
    // frequency given should be related to the fragment given. (e.g., The
    // frequency is the number of times the fragment appeared in a particular
    // query.)
    AddWeights(fragNum int, frequency float32) float32
}

WeightedLibrary adds methods specific to the operations defined on a library of weighted fragments.

func NewWeightedTfIdf

func NewWeightedTfIdf(lib Library, idfs []float32) (WeightedLibrary, error)

NewWeightedTfIdf wraps any fragment library and stores a list of inverse document frequencies for each fragment in the wrapped library.

Note that this library satisfies both the Structure and Sequence library interfaces.

When computing a BOW from this library, the AddWeights method should be applied to the regular unweighted BOW. Note that this is done for you if you're using the bow sub-package.

Subdirectories

Name      Synopsis
..
bow      Package bow provides a representation of a bag-of-words (BOW) along with definitions of common operations.
bowdb      Package bowdb provides functions for reading, writing and searching databases of Bowed values.