Package bowdb

import "github.com/TuftsBCB/fragbag/bowdb"
Overview
Index

Overview ▾

Package bowdb provides functions for reading, writing and searching databases of Bowed values. Bowed values correspond to a bag-of-words (BOW) along with meta data about the value on which the BOW was computed (like a PDB chain identifier or SCOP domain).

While reading a database and searching it has been heavily optimized, the search itself is still exhaustive. No attempt has been made yet at constructing a reverse index.

Every BOW database is associated with one and only one fragment library. When a BOW database is saved, a copy of the fragment library is embedded into the database. This library---and only this library---should be used to compute Bowed values for use with the Search function.

Constants

const (
    SortByEuclid = iota
    SortByCosine
)
const (
    OrderAsc = iota
    OrderDesc
)

Variables

var SearchClose = SearchOptions{
    Limit:  -1,
    Min:    0.0,
    Max:    0.35,
    SortBy: SortByCosine,
    Order:  OrderAsc,
}

SearchClose provides search settings that limit results by closeness instead of by number.

var SearchDefault = SearchOptions{
    Limit:  25,
    Min:    0.0,
    Max:    math.MaxFloat64,
    SortBy: SortByCosine,
    Order:  OrderAsc,
}

SearchDefault provides default search settings. Namely, it restricts the result set of a predefined number of hits, and sorts the results by the closest distances using Cosine distance.

type DB

type DB struct {
    // The fragment library used to make this database.
    Lib fragbag.Library

    // The name of this database (which always corresponds to the base
    // name of the database's file path).
    Name string
    // contains filtered or unexported fields
}

DB represents a BOW database. It is always connected to a particular fragment library. In particular, the disk representation of the database is a directory with a copy of the fragment library used to create the database and a binary formatted file of all the frequency vectors computed.

func Create

func Create(lib fragbag.Library, fpath string) (*DB, error)

Create creates a new BOW database on disk at 'dir'. If the directory already exists or cannot be created, an error is returned.

When you're finished adding entries, you must call Close.

Once a BOW database is created, it cannot be modified. (This restriction may be lifted in the future.)

func Open

func Open(fpath string) (*DB, error)

Open opens a new BOW database for reading. In particular, all entries in the database will be loaded into memory.

func (*DB) Add

func (db *DB) Add(e bow.Bowed)

Add will add a row to the database. It is safe to call `Add` from multiple goroutines. The bowed value given must have been computed with the fragment library given to Create.

Add will panic if it is called on a BOW database that has been opened for reading.

func (*DB) Close

func (db *DB) Close() error

Close should be called when done reading/writing a BOW db.

func (*DB) ReadAll

func (db *DB) ReadAll() ([]bow.Bowed, error)

ReadAll reads all entries from disk and returns them in a slice. Subsequent calls do not read from disk; the already read entries are returned.

ReadAll will panic if it is called on a database that was made with the Create function.

func (*DB) Search

func (db *DB) Search(opts SearchOptions, query bow.Bowed) []SearchResult

Search performs an exhaustive search against the query entry. The best N results are returned with respect to the options given. The query given must have been computed with this database's fragment library.

Note that if the ReadAll method hasn't been called before, Search will call it for you. (This means that the first search could take longer than one would otherwise expect.)

It is safe to call Search on the same database from multiple goroutines.

func (*DB) String

func (db *DB) String() string

String returns the name of the database.

type SearchOptions

type SearchOptions struct {
    // Limit contrains the number of results returned by the search.
    Limit int

    // Min specifies a minimum score such that any entry with distance
    // to the query below the minimum will not be shown.
    Min float64

    // Max specifies a maximum score such that any entry with distance
    // to the query above the maximum will not be shown.
    Max float64

    // SortBy specifies which metric to sort results by.
    // Currently, only SortByEuclid and SortByCosine are supported.
    SortBy int

    // Order specifies whether the results are returned in ascending (OrderAsc)
    // or descending (OrderDesc) order.
    Order int
}

SearchOptions corresponds the parameters of a search.

type SearchResult

type SearchResult struct {
    bow.Bowed
    Cosine, Euclid float64
}

SearchResult corresponds to a single result returned from a search. It embeds a Bowed result (which includes meta data about the entry) along with values for all distance metrics.