Package fasta

import "github.com/TuftsBCB/io/fasta"
Overview
Index

Overview ▾

Package fasta provides routines for reading and writing FASTA files and aligned FASTA files.

The format used is the one described by NCBI: http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtmlj

By default, sequences are checked to make sure they contain only valid characters: a-z, A-Z, * and -. All lowercases letters are translated to their upper case equivalent.

func QuickSequenceCount

func QuickSequenceCount(r io.Reader) (int, error)

QuickSequenceCount consumes the given reader and returns the number of times ">" appears at the start of a line.

func SequenceFasta

func SequenceFasta(s seq.Sequence, cols int) string

SequenceFasta returns the FASTA string corresponding to a sequence with the sequence wrapped at the number of columns given.

If cols is <= 0, then no wrapping is done.

func SequenceString

func SequenceString(s seq.Sequence, cols int) []string

SequenceStrings chops up one long sequence into multiple strings based on the number of columns provided.

If cols is <= 0, then no wrapping is done and a single string is returned.

func TranslateNormal

func TranslateNormal(b byte) (seq.Residue, bool)

TranslateNormal is the default translator for regular (NOT aligned) FASTA files.

type Reader

type Reader struct {
    // When set to true, the sequences will not be checked for errors.
    // If you trust the data, this may improve performance.
    // This may be set at any time.
    TrustSequences bool
    // contains filtered or unexported fields
}

A Reader reads entries from FASTA encoded input.

If TrustSequences is true, then sequence data will not be checked to make sure that it conforms to the NCBI spec. (See the Read method for details.) By default, TrustSequences is false.

func NewReader

func NewReader(r io.Reader) *Reader

NewReader creates a new Reader that is ready to read sequences from some io.Reader.

func (*Reader) Read

func (r *Reader) Read() (s seq.Sequence, err error)

Read will read the next entry in the FASTA input. The format roughly corresponds to that described by NCBI: http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml

In particular, the only characters allowed in the sequence section are a-z, A-Z, * and -. Any other character will result in an error.

All lower case letters in the sequence section are translated to upper case.

Blank lines, leading and trailing whitespace are always ignored (regardless of where they are).

No distinction is made between DNA/RNA or amino acid sequences. (Currently.)

It is NOT safe to call this function from multiple goroutines.

If the underlying reader is seekable, it is OK to use its seek operation provided that you call (*Reader).SeekerReset before the next time Read is called. If you don't, the behavior is undefined. Moreover, seeking will result in erroneous line numbers in error messages. Finally, you MUST seek to a location that corresponds precisely to an entry boundary. i.e., the file pointer should be at a '>' character.

func (*Reader) ReadAll

func (r *Reader) ReadAll() ([]seq.Sequence, error)

ReadAll will read all entries in the FASTA input and return them as a slice. If an error is encountered, processing is stopped, and the error is returned.

func (*Reader) ReadSequence

func (r *Reader) ReadSequence(translate Translator) (seq.Sequence, error)

ReadSequence is exported for use in other packages that read FASTA-like files.

The 'translate' function is used when sequences are checked for valid characters.

If you're just reading FASTA files, this method SHOULD NOT be used.

func (*Reader) SeekerReset

func (r *Reader) SeekerReset()

SeekerReset will reset the internal state of Reader to allow Read to be called at arbitrary entry boundaries in the input.

See the comments for Read for more details.

type Translator

type Translator func(b byte) (seq.Residue, bool)

A Translator is a function that accepts a single character, checks whether it's valid, and optionally maps it to a new character. Additionally, if the zero byte is returned, then the character should not be included in the final sequence.

Translators are ONLY applicable to developers writing their own parsers for FASTA-like files. They should not be used to read regular FASTA files.

type Writer

type Writer struct {
    // The number of columns to wrap a sequence at. By default, this
    // is set to 60. A value <= 0 will result in no wrapping.
    Columns int

    // Whether to a '*' at the end of each sequence.
    // By default, this is false.
    Asterisk bool
    // contains filtered or unexported fields
}

A Writer writes entries to a FASTA encoded file.

The 'Columns' corresponds to the number of columns at which a sequence is wrapped. If it's <= 0, then no wrapping will be used.

The header text is never wrapped.

func NewWriter

func NewWriter(w io.Writer) *Writer

NewWriter createa a new FASTA writer that can write FASTA entries to an io.Writer.

func (*Writer) Flush

func (w *Writer) Flush() error

Flush writes any buffered data to the underlying io.Writer.

func (*Writer) Write

func (w *Writer) Write(s seq.Sequence) error

Write writes a single FASTA entry to the underlying io.Writer.

You may need to call Flush in order for the changes to be written.

XXX: Currently, the sequence is not checked. Should it be?

func (*Writer) WriteAll

func (w *Writer) WriteAll(seqs []seq.Sequence) error

WriteAll writes a slice of FASTA entries to the underyling io.Writer, and calls Flush.