Package bow

import "github.com/TuftsBCB/fragbag/bow"
Overview
Index

Overview ▾

Package bow provides a representation of a bag-of-words (BOW) along with definitions of common operations. These operations include computing the cosine or euclidean distance between two BOWs, comparing BOWs and producing BOWs from values of other types (like a PDB chain or a biological sequence).

This package also includes special interoperable functions with the original FragBag implementation written by Rachel Kolodny. Namely, BOWs in the original implementation are encoded as strings (Bow.StringOldStyle writes them and NewOldStyleBow reads them).

type Bow

type Bow struct {
    // Freqs is a map from fragment number to the number of occurrences of
    // that fragment in this "bag of words." This map always has size
    // equivalent to the size of the library.
    Freqs []float32
}

Bow represents a bag-of-words vector of size N for a particular fragment library, where N corresponds to the number of fragments in the fragment library.

Note that a Bow may be weighted. It is up to the fragment library to apply weights to a Bow.

func NewBow

func NewBow(size int) Bow

NewBow returns a bag-of-words with all fragment frequencies set to 0.

func NewOldStyleBow

func NewOldStyleBow(size int, oldschool string) (Bow, error)

NewOldStyleBow returns a bag-of-words from Fragbag's original bag-of-words vector output.

The format works by assinging the first 26 fragment numbers the letters 'a' ... 'z', the next 26 fragment numbers the letters 'A' ... 'Z', and any additional fragment numbers to 52, 53, 54, ..., etc. Moreover, the numbers are delimited by a '#' character, while the letters aren't delimited by anything.

Please see the documentation for (Bow).StringOldStyle for a production rule.

If the string is malformed, NewOldStyleBow will return an error.

func SequenceBow

func SequenceBow(lib fragbag.SequenceLibrary, s seq.Sequence) Bow

SequenceBow is a helper function to compute a bag-of-words given a sequence fragment library and a query sequence.

If the lib given is a weighted library, then the BOW returned will also be weighted.

Note that this function should only be used when providing your own implementation of the SequenceBower interface. Otherwise, BOWs should be computed using the SequenceBow method of the interface.

func StructureBow

func StructureBow(lib fragbag.StructureLibrary, atoms []structure.Coords) Bow

StructureBow is a helper function to compute a bag-of-words given a structure fragment library and a list of alpha-carbon atoms.

If the lib given is a weighted library, then the Bow returned will also be weighted.

Note that this function should only be used when providing your own implementation of the StructureBower interface. Otherwise, BOWs should be computed using the StructureBow method of the interface.

func (Bow) Add

func (b Bow) Add(b2 Bow) Bow

Add performs an add operation on each fragment frequency and returns a new Bow. Add will panic if the operands have different lengths.

func (Bow) Cosine

func (b Bow) Cosine(b2 Bow) float64

Cosine returns the cosine distance between b and b2.

func (Bow) Dot

func (b Bow) Dot(b2 Bow) float64

Dot returns the dot product of b and b2.

func (Bow) Equal

func (b Bow) Equal(b2 Bow) bool

Equal tests whether two Bows are equal.

Two Bows are equivalent when the frequencies of every fragment are equal.

func (Bow) Euclid

func (b Bow) Euclid(b2 Bow) float64

Euclid returns the euclidean distance between b and b2.

func (Bow) Len

func (b Bow) Len() int

Len returns the size of the vector. This is always equivalent to the corresponding library's fragment size.

func (Bow) Magnitude

func (b Bow) Magnitude() float64

Magnitude returns the vector length of b.

func (Bow) String

func (b Bow) String() string

String returns a string representation of the Bow vector. Only fragments with non-zero frequency are emitted.

The output looks like '{fragNum: frequency, fragNum: frequency, ...}'. i.e., '{1: 4, 3: 1}' where all fragment numbers except '1' and '3' have a frequency of zero.

func (Bow) StringOldStyle

func (b Bow) StringOldStyle() string

StringOldStyle returns a bag-of-words vector formatted as a string that matches the old Fragbag program's output.

The format works by assigning the first 26 fragment numbers the letters 'a' ... 'z', the next 26 fragment numbers the letters 'A' ... 'Z', and any additional fragment numbers to 52, 53, 54, ..., etc. Moreover, the numbers are delimited by a '#' character, while the letters aren't delimited by anything.

Here is a grammar describing the output:

output = { fragment }

fragment = lower-letter | upper-letter | { integer }, "#"

lower-letter = "a" | ... | "z"

upper-letter = "A" | ... | "Z"

integer = "0" | ... | "9"

The essential invariants are that any fragment number less than 52 is described by elements in the set { 'a', ..., 'z', 'A', ..., 'Z' } and any fragment number greater than or equal to 52 is described by a corresponding integer (>= 52) followed by a '#' character.

Note that the string returned by this function will not hold up under string equality with Fragbag's output. Namely, Fragbag outputs fragment numbers in an arbitrary order (probably the order in which they are found corresponding to the input PDB file). This order is not captured or preserved by BOW values in this package. Thus, the only way to truly test for equality is to convert Fragbag's output to a BOW using NewOldStyleBow, and using the (Bow).Equal method.

func (Bow) Weighted

func (b Bow) Weighted(lib fragbag.WeightedLibrary) Bow

Weighted transforms any Bow into a weighted Bow with the scheme in the given weighted fragment library. The Bow size must be equivalent to the size of the library given.

type BowDiff

type BowDiff struct {
    Freqs []float32
}

BowDiff represents the difference between two bag-of-words vectors. The types are quite similar, except diffFreqs represents difference between the frequency of a particular fragment number.

The BOW difference is simply the pairwise differences of fragment frequencies.

func NewBowDiff

func NewBowDiff(oldbow, newbow Bow) BowDiff

NewBowDiff creates a new BowDiff by subtracting the 'old' frequencies from the 'new' frequencies.

NewBowDiff will panic if 'oldbow' and 'newbow' have different lengths.

func (BowDiff) IsSame

func (bdiff BowDiff) IsSame() bool

IsSame returns true if there are no differences. (i.e., all diff frequencies are zero.)

func (BowDiff) String

func (bdiff BowDiff) String() string

String returns a string representation of the BOW diff vector. Only fragments with non-zero differences are emitted.

The output looks like '{fragNum: diff-frequency, fragNum: diff-frequency, ...}'. i.e., '{1: 4, 3: 1}' where all fragment numbers except '1' and '3' have a difference frequency of zero.

type Bowed

type Bowed struct {
    // A globally unique identifier corresponding to the source of the bow.
    // e.g., a PDB identifier "1ctf" or a PDB identifier with a chain
    // identifier "1ctfA" or a sequence accession number.
    Id string

    // Arbitrary data associated with the source. May be empty.
    Data []byte

    // The bag-of-words.
    Bow Bow
}

Bowed corresponds to a bag-of-words with meta data about its source. For example, a PDB chain can have a BOW computed for it. Meta data might include that chain's identifier (e.g., 1ctfA) and perhaps that chain's sequence.

Values of this type correspond to records in a BOW database.

type SequenceBower

type SequenceBower interface {
    // Computes a bag-of-words given a sequence fragment library.
    SequenceBow(lib fragbag.SequenceLibrary) Bowed
}

SequenceBower corresponds to Bower values that can provide BOWs given a sequence fragment library.

func BowerFromSequence

func BowerFromSequence(s seq.Sequence) SequenceBower

BowerFromSequence provides a reference implementation of the SequenceBower interface for biological sequences.

type StructureBower

type StructureBower interface {
    // Computes a bag-of-words given a structure fragment library.
    // For example, to compute the bag-of-words of a chain in a PDB entry:
    //
    //     lib := someStructureFragmentLibrary()
    //     chain := somePdbChain()
    //     fmt.Println(BowerFromChain(chain).StructureBow(lib))
    //
    // This is made easier by using pre-defined types in this package that
    // implement this interface.
    StructureBow(lib fragbag.StructureLibrary) Bowed
}

StructureBower corresponds to Bower values that can provide BOWs given a structure fragment library.

func BowerFromChain

func BowerFromChain(c *pdb.Chain) StructureBower

BowerFromChain provides a reference implementation of the StructureBower interface for PDB chains.

func BowerFromCifChain

func BowerFromCifChain(c *pdbx.Chain) StructureBower

BowerFromCifChain provides a reference implementation of the StructureBower interface for chains in PDBx/mmCIF formatted files.

func BowerFromModel

func BowerFromModel(c *pdb.Model) StructureBower

BowerFromModel provides a reference implementation of the StructureBower interface for PDB models.