biogo is a bioinformatics library for Go

bíogo

bíogo

GoDoc Build Status

Installation

    $ go get github.com/biogo/biogo/...

Overview

bíogo is a bioinformatics library for the Go language.

Getting help

Help or similar requests are preferred on the biogo-user Google Group.

https://groups.google.com/forum/#!forum/biogo-user

Contributing

If you find any bugs, feel free to file an issue on the github issue tracker. Pull requests are welcome, though if they involve changes to API or addition of features, please first open a discussion at the biogo-dev Google Group.

https://groups.google.com/forum/#!forum/biogo-dev

Citing

If you use bíogo, please cite Kortschak, Snyder, Maragkakis and Adelson "bíogo: a simple high-performance bioinformatics toolkit for the Go language", doi:10.21105/joss.00167, and Kortschak and Adelson "bíogo: a simple high-performance bioinformatics toolkit for the Go language", doi:10.1101/005033.

The Purpose of bíogo

bíogo stems from the need to address the size and structure of modern genomic and metagenomic data sets. These properties enforce requirements on the libraries and languages used for analysis:

  • speed - size of data sets
  • concurrency - problems often embarrassingly parallelisable

In addition to the computational burden of massive data set sizes in modern genomics there is an increasing need for complex pipelines to resolve questions in tightening problem space and also a developing need to be able to develop new algorithms to allow novel approaches to interesting questions. These issues suggest the need for a simplicity in syntax to facilitate:

  • ease of coding
  • checking for correctness in development and particularly in peer review

Related to the second issue is the reluctance of some researchers to release code because of quality concerns.

The issue of code release is the first of the principles formalised in the Science Code Manifesto.

Code  All source code written specifically to process data for a published
      paper must be available to the reviewers and readers of the paper.

A language with a simple, yet expressive, syntax should facilitate development of higher quality code and thus help reduce this barrier to research code release.

Articles

bíogo: a simple high-performance bioinformatics toolkit for the Go language

Analysis of Illumina sequencing data using bíogo

Using and extending types in bíogo

Yet Another Bioinformatics Library

It seems that nearly every language has it own bioinformatics library, some of which are very mature, for example BioPerl and BioPython. Why add another one?

The different libraries excel in different fields, acting as scripting glue for applications in a pipeline (much of [1, 2, 3]) and interacting with external hosts [1, 2, 4, 5], wrapping lower level high performance languages with more user friendly syntax [1, 2, 3, 4] or providing bioinformatics functions for high performance languages [5, 6].

The intended niche for bíogo lies somewhere between the scripting libraries and high performance language libraries in being easy to use for both small and large projects while having reasonable performance with computationally intensive tasks.

The intent is to reduce the level of investment required to develop new research software for computationally intensive tasks.

  1. BioPerl http://genome.cshlp.org/content/12/10/1611.full http://www.springerlink.com/content/pp72033m171568p2

  2. BioPython http://bioinformatics.oxfordjournals.org/content/25/11/1422

  3. BioRuby http://bioinformatics.oxfordjournals.org/content/26/20/2617

  4. PyCogent http://genomebiology.com/2007/8/8/R171

  5. BioJava http://bioinformatics.oxfordjournals.org/content/24/18/2096

  6. SeqAn http://www.biomedcentral.com/1471-2105/9/11

Library Structure and Coding Style

The bíogo library structure is influenced both by the Go core library.

The coding style should be aligned with normal Go idioms as represented in the Go core libraries.

Quality Scores

Quality scores are supported for all sequence types, including protein. Phred and Solexa scoring systems are able to be read from files, however internal representation of quality scores is with Phred, so there will be precision loss in conversion. A Solexa quality score type is provided for use where this will be a problem.

Copyright and License

Copyright ©2011-2013 The bíogo Authors except where otherwise noted. All rights reserved. Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.

The bíogo logo is derived from Bitstream Charter, Copyright ©1989-1992 Bitstream Inc., Cambridge, MA.

BITSTREAM CHARTER is a registered trademark of Bitstream Inc.

Owner
bíogo
bíogo is a bioinformatics library collection for Go
bíogo
Comments
  • The orientation of a feature should not be defined relative to its location.

    The orientation of a feature should not be defined relative to its location.

    Background I tried to get the 5'UTR from a transcript. The UTR5start()and UTR5end() functions check the transcript orientation to decide where the 5'end is. This is wrong as seen in the example below. Notice that the transcript orientation is Forward because it's defined on a gene and this results in the wrong identification of the 5'end (returns the 3'end instead of the 5'end).

    Transcript:          3'<------------5'       Forward
    Gene:              <---------------------    Reverse
    Chrom:       -----------------------------------
    

    Suggestion I thought about this and realized that the problem is not the function implementation. The problem is that the transcript does not have the required information and the only option is to look down the feature chain.

    If we think of the oriented feature as a vector, the notation for its coordinates would be [x1, x2] where the first number is the start of the vector and the second number is the end of the vector. Examples: A: [10, 20] B: [20, 10] In this notation we can easily see that B is the reverse of A. In other words, the orientation is an invariant of the notation. If the coordinate system changes, then the actual numbers might change but the first number would always be the start and the second number the end of the vector.

    Instead, in biogo we use a different notation like so: A: [10, 20], Forward B: [10, 20], Reverse

    Therefore, I think it's not correct to define the orientation as // Orientation returns the orientation of the feature relative to its location.. Instead we should use // Orientation returns the orientation of the feature. and make it clear in the documentation what this actually means and that it does not depend on the feature location. This will probably result in some code changes but I think it is required.

    Any thoughts?

  • Issues about filter BAM record

    Issues about filter BAM record

    Hi, First of all, thanks for all this amazing work. Recentlly I trying to filter some records out of a valid BAM file, But I can't deal with the headers properly. The code I used are as follow. BTW I also trying to create a new sam.Header, but really can't figure out how to do it

    f, err = os.Open(input_bam)
    	if err != nil {
    		log.Fatalf("could not open file %q:\n", err)
    	}
    	defer f.Close()
    	ok, err := bgzf.HasEOF(f)
    	if err != nil {
    		log.Fatalf("could not open file %q:\n", err)
    	}
    	if !ok {
    		log.Printf("file %q has no bgzf magic block: may be truncated\n", input_bam)
    	}
    
    	b, err := bam.NewReader(f, threads)
    	if err != nil {
    		log.Fatalf("could not read bam: %q\n", err)
    	}
    	defer b.Close()
    
    
    	fo, err := os.OpenFile(output_bam, os.O_WRONLY|os.O_CREATE, os.ModeAppend)
    	defer fo.Close()
    	if err != nil {
    		log.Fatalf("Could open file %v\n", output_bam)
    	}
            
            // due to I need a header exactlly matches with BAM records, so I trying to filter out header Refs here
    	header := b.Header().Clone()
    
    	removedRef := make([]*sam.Reference, 0)
    	for _, i := range header.Refs() {
    		if _, ok := rna[i.Name()]; !ok {
    			removedRef = append(removedRef, i)
    		}
    	}
    
    	for _, i := range removedRef {
    		err = header.RemoveReference(i)
    		if err != nil {
    			log.Fatalf("remove header reference failed, %v\n", err)
    		}
    	}
    
    	w, err := bam.NewWriter(fo, header, threads)
    	if err != nil {
    		log.Fatalf("Could write file %v\n", f)
    	}
    	defer w.Close()
    
    	// And iter all BAM records
    	for {
    		rec, err := b.Read()
    		if err == io.EOF {
    			break
    		}
    		if err != nil {
    			log.Fatalf("error reading bam: %v", err)
    		}
    
    		if _, ok := rna[rec.Ref.String()]; ok {
    			err = w.Write(rec)
    
    			if err != nil {
    				log.Fatalf(err.Error())
    			}
    		}
    
    	}
    
  • code.google.com imports causing problems in go1.8

    code.google.com imports causing problems in go1.8

    not sure if this is specific to 1.8, but I see:

    brentp@x1:~/go/src/github.com/biogo/biogo/io/seqio$ go test
    ../../seq/annotation.go:8:2: cannot find package "code.google.com/p/biogo/alphabet" in any of:
    	/home/brentp/go/go1.8beta1/go/src/code.google.com/p/biogo/alphabet (from $GOROOT)
    	/home/brentp/go/src/code.google.com/p/biogo/alphabet (from $GOPATH)
    ../../seq/annotation.go:9:2: cannot find package "code.google.com/p/biogo/feat" in any of:
    	/home/brentp/go/go1.8beta1/go/src/code.google.com/p/biogo/feat (from $GOROOT)
    	/home/brentp/go/src/code.google.com/p/biogo/feat (from $GOPATH)
    brentp@x1:~/go/src/github.com/biogo/biogo/io/seqio$ go version
    go version go1.8beta1 linux/amd64
    

    you can also see this with:

    go get github.com/biogo/biogo/...
    
  • Add method to determine does file is bgzf or not

    Add method to determine does file is bgzf or not

    I'm use many compressors (zip, gzip, pgzip, bgzf) and need to understand what 
    file underline i have.
    For example if i download bzgf file i need to enter to some code path to able 
    to seek inside file, in case of gzip/pgzip i need to switch to other things 
    (like enable more cpus or not..).
    Does it possible to add such method?
    

    Original issue reported on code.google.com by [email protected] on 15 Feb 2015 at 12:13

  • biogo.bam fails to iterate over valid BAM file

    biogo.bam fails to iterate over valid BAM file

    Please check that you are using the latest version of bíogo: execute `git
    describe --always' in your bíogo repository and check that it matches the
    latest master at http://code.google.com/p/biogo/source/browse/
    
    What steps will reproduce the problem? (If possible please include a
    program that is a minimal self-contained reproducing case).
    1. Tried to run play.go over play.bam, iterating through each Record and 
    printing its QNAME
    
    
    What is the expected output?
    
    A listing of all QNAME in the BAM file in stdout
    
    What do you see instead?
    
    After printing three QNAME, got an error saying "truncated sequence"
    
    
    What version of the product are you using (`git describe --always')? Which
    version of Go is being used (`go version')? On what operating system?
    
    Using biogo.bam @ 53b55fc, Go version 1.0.3
    
    Please provide any additional information below.
    
    BAM file seems to be valid; samtools view can display it without any problems.
    

    Original issue reported on code.google.com by [email protected] on 8 May 2013 at 8:38

    Attachments:

  • align: improve documentation for accessing alignment scores

    align: improve documentation for accessing alignment scores

    In the structure holding the pairwise alignment results in biogo/align/align.go the alignment score is not exported and the featPair type is private to the package.

    type featPair struct { a, b feature score int } As the alignment methods return []feat.Pair, there is no way to access the alignment score as far as can I see. Is that right? If yes, it would be nice to have a way to do that (other than parsing the string representation of the featPair objects).

  • Testsuite is failing

    Testsuite is failing

    Hi @kortschak

    Thanks a lot for your work on biogo. I maintain this as a package in Debian, and during a recent re-build, tests have started to fail for biogo, in particular these:

    === RUN   Test
    
    ----------------------------------------------------------------------
    FAIL: errors_test.go:44: S.TestCaller
    
    errors_test.go:49:
        c.Check(ln, check.Equals, 45)
    ... obtained int = 46
    ... expected int = 45
    
    === RUN   Test
    
    ----------------------------------------------------------------------
    FAIL: fai_test.go:27: S.TestReadFrom
    
    fai_test.go:178:
        c.Assert(err, check.DeepEquals, t.err)
    ... obtained *csv.ParseError = &csv.ParseError{StartLine:8, Line:8, Column:1, Err:(*errors.errorString)(0xc0000563f0)} ("record on line 8: wrong number of fields")
    ... expected *csv.ParseError = &csv.ParseError{StartLine:8, Line:8, Column:0, Err:(*errors.errorString)(0xc0000563f0)} ("record on line 8: wrong number of fields")
    ... Difference:
    ...     Column: 1 != 0
    
    
    OOPS: 0 passed, 1 FAILED
    --- FAIL: Test (0.00s)
    FAIL
    FAIL	github.com/biogo/biogo/io/seqio/fai	0.023s
    

    Full log can be found here please consider fixing. There is some delta in col values that seems to trigger this, but I cannot do much beyond this point. Please consider fixing this, and thanks again!

  • End coordinate in biogo.bam is off by one

    End coordinate in biogo.bam is off by one

    In record.go (biogo.bam), the End() function is supposed to return end 
    coordinate by it's off by one.
    
    The first nucleotide is counted twice: by r.Pos and the match.
    
    Solution:
    end := r.Pos
    should be replaced by
    end := r.Pos - 1
    
    

    Original issue reported on code.google.com by [email protected] on 13 Aug 2014 at 11:31

  • New Overlap method for Record

    New Overlap method for Record

    Record is missing a method to compute the overlap between the record and user 
    range.
    
    Could you please add it to biogo.bam?
    
    If you could write tests that would be nice. Thanks
    
    Also min and max might already be somewhere in biogo.
    
    func min(a, b int) int {
        if a > b {
            return b
        }
        return a
    }
    
    func max(a, b int) int {
        if a < b {
            return b
        }
        return a
    }
    
    // Overlap returns the length of the overlap between the alignment of the read 
    and the interval
    // specified by the start and end on the reference sequence.
    func (r *Record) Overlap(start int, end int) int {
        var overlap, o int
        pos := r.Pos
        for _, co := range r.Cigar {
            t := co.Type()
            l := co.Len()
            if consume[t].query && consume[t].ref {
                o = min(pos + l, end) - max(pos, start)
                if o > 0 {
                    overlap += o
                }
            }
            if consume[t].query || consume[t].ref {
                pos += l
            }
        }
        return overlap
    }
    

    Original issue reported on code.google.com by [email protected] on 14 Aug 2014 at 1:45

  • How to create a new QSeq using an old one?

    How to create a new QSeq using an old one?

    Hello, I am new to Go so apologies for the newbie question. I'm benchmarking Go vs Python to do a simple string manipulation in the ID section of a FASTQ, to turn 2:N:0:0|SEQORIENT=F|PRIMER=RT_IgM_long_12N|BARCODE=TCGGAAAT,ACGGCAGA into 2:N:0:0_SEQORIENT=F_PRIMER=RT_IgM_long_12N_BARCODE:TCGGAAAT (for read 1, the other side of the comma for read 2). Here is my full program, including a test dataset. It can be run with:

    ./format_barcodes BX-R1_primers-pass_pair-pass_first3.fastq 1 test_output.fastq
    

    If seq, err = reader.Read(), then I thought this would work:

    seq_replaced := linear.NewQSeq(new_id, seq.Seq, seq.Alphabet(), seq.Encode)
    

    I'm getting the error that there is no field or function Seq or Encode:

    format_barcodes/main.go:77:44: seq.Seq undefined (type seq.Sequence has no field or method Seq)
    format_barcodes/main.go:77:69: seq.Encode undefined (type seq.Sequence has no field or method Encode)
    

    But I see those fields in the debugger of GoLand:

    screen shot 2018-04-16 at 8 42 20 pm

    How can I access these fields? I'm getting very confused between what is a seq.Sequence vs a linear.QSeq

    Thank you! Warmest, Olga

  • align: add semi-global alignment

    align: add semi-global alignment

    I recently wrote a NW implementation with penalty-free end gaps for semi-global alignment, before I discovered biogo. Any interest in accepting such a thing, if contributed? (I just want to check in before putting in much effort.)

  • example of getting the consensus of multiple sequences?

    example of getting the consensus of multiple sequences?

    I have many similar sequences of different lengths and need to get the consensus of them. From the docs, the seq module should work for this, but any example provided?

  • Using custom-defined errors

    Using custom-defined errors

    Currently, errors are return as a error. For example, fasta.Reader can return an IO error, a badly formed line, or a badly formed header. The error string is sufficient for a human to recognize where the error occurs. However, it requires a matching library to deal with the error string programmatically since all errors are the same type.

    Error handling and Go gives some examples for custom-defined errors: a struct that satisfy the error interface is defined. Additional details about the error can be included in the struct. The those details can also be included in an error string through a custom defined T. Error() method. When handling the error, type assertion can be used to figure out where went wrong.

    Is it possible to introduce custom-defined error in a future release of biogo?

  • align: a better way to do formatting exists

    align: a better way to do formatting exists

    The align.feature type contains loc field (as it should). It would have been sensible (was probably my intention) to point the location of each alignment segment at the input sequence value. This can still be done, though not in a nice way because of the types I went with when the API was 'designed'.

  • Nice, but not friendly

    Nice, but not friendly

    Hi guys,

    I've looked at few files and it looks really nice. Problem is that your readme does not tell what it is about. Non-academic bioinformaticians won't click on your papers links and people bounce because they do not see any nice example or something that will ignite their interest. Please, add at least basic examples that will show people what it is about (and some feature list maybe?!).

    Thank you guys. Keep up the good work, we need better languages in bioinformatics than C++ and Perl.

Related tags
Evolutionary optimization library for Go (genetic algorithm, partical swarm optimization, differential evolution)
Evolutionary optimization library for Go (genetic algorithm, partical swarm optimization, differential evolution)

eaopt is an evolutionary optimization library Table of Contents Changelog Example Background Features Usage General advice Genetic algorithms Overview

Dec 30, 2022
cross-platform, normalized battery information library

battery Cross-platform, normalized battery information library. Gives access to a system independent, typed battery state, capacity, charge and voltag

Dec 22, 2022
GoLang Library for Browser Capabilities Project

Browser Capabilities GoLang Project PHP has get_browser() function which tells what the user's browser is capable of. You can check original documenta

Sep 27, 2022
Go bindings for unarr (decompression library for RAR, TAR, ZIP and 7z archives)

go-unarr Golang bindings for the unarr library from sumatrapdf. unarr is a decompression library and CLI for RAR, TAR, ZIP and 7z archives. GoDoc See

Dec 29, 2022
Type-safe Prometheus metrics builder library for golang

gotoprom A Prometheus metrics builder gotoprom offers an easy to use declarative API with type-safe labels for building and using Prometheus metrics.

Dec 5, 2022
An easy to use, extensible health check library for Go applications.

Try browsing the code on Sourcegraph! Go Health Check An easy to use, extensible health check library for Go applications. Table of Contents Example M

Dec 30, 2022
An simple, easily extensible and concurrent health-check library for Go services
An simple, easily extensible and concurrent health-check library for Go services

Healthcheck A simple and extensible RESTful Healthcheck API implementation for Go services. Health provides an http.Handlefunc for use as a healthchec

Dec 30, 2022
Simple licensing library for golang.

license-key A simple licensing library in Golang, that generates license files containing arbitrary data. Note that this implementation is quite basic

Dec 24, 2022
Library for interacting with LLVM IR in pure Go.

llvm Library for interacting with LLVM IR in pure Go. Introduction Introductory blog post "LLVM IR and Go" Our Document Installation go get -u github.

Dec 28, 2022
atomic measures + Prometheus exposition library

About Atomic measures with Prometheus exposition for the Go programming language. This is free and unencumbered software released into the public doma

Sep 27, 2022
Morse Code Library in Go

morse Morse Code Library in Go Download and Use go get -u -v github.com/alwindoss/morse or dep ensure -add github.com/alwindoss/morse Sample Usage pac

Dec 30, 2022
A Golang library to manipulate strings according to the word parsing rules of the UNIX Bourne shell.

shellwords A Golang library to manipulate strings according to the word parsing rules of the UNIX Bourne shell. Installation go get github.com/Wing924

Sep 27, 2022
Notification library for gophers and their furry friends.
Notification library for gophers and their furry friends.

Shoutrrr Notification library for gophers and their furry friends. Heavily inspired by caronc/apprise. Quick Start As a package Using shoutrrr is easy

Jan 3, 2023
Go library for creating state machines
Go library for creating state machines

Stateless Create state machines and lightweight state machine-based workflows directly in Go code: phoneCall := stateless.NewStateMachine(stateOffHook

Jan 6, 2023
a cron library for go

cron Cron V3 has been released! To download the specific tagged release, run: go get github.com/robfig/cron/[email protected] Import it in your program as: im

Jan 1, 2023
Functional programming library for Go including a lazy list implementation and some of the most usual functions.

functional A functional programming library including a lazy list implementation and some of the most usual functions. import FP "github.com/tcard/fun

May 21, 2022
FreeSWITCH Event Socket library for the Go programming language.

eventsocket FreeSWITCH Event Socket library for the Go programming language. It supports both inbound and outbound event socket connections, acting ei

Dec 11, 2022
Flow-based and dataflow programming library for Go (golang)
Flow-based and dataflow programming library for Go (golang)

GoFlow - Dataflow and Flow-based programming library for Go (golang) Status of this branch (WIP) Warning: you are currently on v1 branch of GoFlow. v1

Dec 30, 2022
Go port of Coda Hale's Metrics library

go-metrics Go port of Coda Hale's Metrics library: https://github.com/dropwizard/metrics. Documentation: http://godoc.org/github.com/rcrowley/go-metri

Dec 30, 2022