npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@gmod/gff

v1.3.0

Published

read and write GFF3 data as streams

Downloads

22,701

Readme

@gmod/gff

Build Status

Read and write GFF3 data performantly. This module aims to be a complete implementation of the GFF3 specification.

  • streaming parsing and streaming formatting
  • proper escaping and unescaping of attribute and column values
  • supports features with multiple locations and features with multiple parents
  • reconstructs feature hierarchies of both Parent and Derives_from relationships
  • parses FASTA sections
  • does no validation except for referential integrity of Parent and Derives_from relationships
  • only compatible with GFF3

Install

$ npm install --save @gmod/gff

Usage

const gff = require('@gmod/gff').default
// or in ES6 (recommended)
import gff from '@gmod/gff'

const fs = require('fs')

// parse a file from a file name
// parses only features and sequences by default,
// set options to parse directives and/or comments
fs.createReadStream('path/to/my/file.gff3')
  .pipe(gff.parseStream({ parseAll: true }))
  .on('data', (data) => {
    if (data.directive) {
      console.log('got a directive', data)
    } else if (data.comment) {
      console.log('got a comment', data)
    } else if (data.sequence) {
      console.log('got a sequence from a FASTA section')
    } else {
      console.log('got a feature', data)
    }
  })

// parse a string of gff3 synchronously
const stringOfGFF3 = fs.readFileSync('my_annotations.gff3').toString()
const arrayOfThings = gff.parseStringSync(stringOfGFF3)

// format an array of items to a string
const newStringOfGFF3 = gff.formatSync(arrayOfThings)

// format a stream of things to a stream of text.
// inserts sync marks automatically.
myStreamOfGFF3Objects
  .pipe(gff.formatStream())
  .pipe(fs.createWriteStream('my_new.gff3'))

// format a stream of things and write it to
// a gff3 file. inserts sync marks and a
// '##gff-version 3' header if one is not
// already present
gff.formatFile(
  myStreamOfGFF3Objects,
  fs.createWriteStream('my_new_2.gff3', { encoding: 'utf8' }),
)

Object format

features

In GFF3, features can have more than one location. We parse features as arrayrefs of all the lines that share that feature's ID. Values that are . in the GFF3 are null in the output.

A simple feature that's located in just one place:

[
  {
    "seq_id": "ctg123",
    "source": null,
    "type": "gene",
    "start": 1000,
    "end": 9000,
    "score": null,
    "strand": "+",
    "phase": null,
    "attributes": {
      "ID": [
        "gene00001"
      ],
      "Name": [
        "EDEN"
      ]
    },
    "child_features": [],
    "derived_features": []
  }
]

A CDS called cds00001 located in two places:

[
  {
    "seq_id": "ctg123",
    "source": null,
    "type": "CDS",
    "start": 1201,
    "end": 1500,
    "score": null,
    "strand": "+",
    "phase": "0",
    "attributes": {
      "ID": ["cds00001"],
      "Parent": ["mRNA00001"]
    },
    "child_features": [],
    "derived_features": []
  },
  {
    "seq_id": "ctg123",
    "source": null,
    "type": "CDS",
    "start": 3000,
    "end": 3902,
    "score": null,
    "strand": "+",
    "phase": "0",
    "attributes": {
      "ID": ["cds00001"],
      "Parent": ["mRNA00001"]
    },
    "child_features": [],
    "derived_features": []
  }
]

directives

parseDirective("##gff-version 3\n")
// returns
{
  "directive": "gff-version",
  "value": "3"
}
parseDirective('##sequence-region ctg123 1 1497228\n')
// returns
{
  "directive": "sequence-region",
  "value": "ctg123 1 1497228",
  "seq_id": "ctg123",
  "start": "1",
  "end": "1497228"
}

comments

parseComment('# hi this is a comment\n')
// returns
{
  "comment": "hi this is a comment"
}

sequences

These come from any embedded ##FASTA section in the GFF3 file.

parseSequences(`##FASTA
>ctgA test contig
ACTGACTAGCTAGCATCAGCGTCGTAGCTATTATATTACGGTAGCCA`)
// returns
[
  {
    "id": "ctgA",
    "description": "test contig",
    "sequence": "ACTGACTAGCTAGCATCAGCGTCGTAGCTATTATATTACGGTAGCCA"
  }
]

API

Table of Contents

ParseOptions

Parser options

encoding

Text encoding of the input GFF3. default 'utf8'

Type: BufferEncoding

parseFeatures

Whether to parse features, default true

Type: boolean

parseDirectives

Whether to parse directives, default false

Type: boolean

parseComments

Whether to parse comments, default false

Type: boolean

parseSequences

Whether to parse sequences, default true

Type: boolean

parseAll

Parse all features, directives, comments, and sequences. Overrides other parsing options. Default false.

Type: boolean

bufferSize

Maximum number of GFF3 lines to buffer, default 1000

Type: number

parseStream

Parse a stream of text data into a stream of feature, directive, comment, an sequence objects.

Parameters

Returns GFFTransform stream (in objectMode) of parsed items

parseStringSync

Synchronously parse a string containing GFF3 and return an array of the parsed items.

Parameters

  • str string GFF3 string
  • inputOptions ({encoding: BufferEncoding?, bufferSize: number?} | undefined)? Parsing options

Returns Array<(GFF3Feature | GFF3Sequence)> array of parsed features, directives, comments and/or sequences

formatSync

Format an array of GFF3 items (features,directives,comments) into string of GFF3. Does not insert synchronization (###) marks.

Parameters

  • items Array<GFF3Item> Array of features, directives, comments and/or sequences

Returns string the formatted GFF3

formatStream

Format a stream of features, directives, comments and/or sequences into a stream of GFF3 text.

Inserts synchronization (###) marks automatically.

Parameters

  • options FormatOptions parser options (optional, default {})

Returns FormattingTransform

formatFile

Format a stream of features, directives, comments and/or sequences into a GFF3 file and write it to the filesystem.

Inserts synchronization (###) marks and a ##gff-version directive automatically (if one is not already present).

Parameters

  • stream Readable the stream to write to the file
  • writeStream Writable
  • options FormatOptions parser options (optional, default {})
  • filename the file path to write to

Returns Promise<null> promise for null that resolves when the stream has been written

About util

There is also a util module that contains super-low-level functions for dealing with lines and parts of lines.

// non-ES6
const util = require('@gmod/gff').default.util
// or, with ES6
import gff from '@gmod/gff'
const util = gff.util

const gff3Lines = util.formatItem({
  seq_id: 'ctgA',
  ...
}))

util

Table of Contents

unescape

Unescape a string value used in a GFF3 attribute.

Parameters

  • stringVal string Escaped GFF3 string value

Returns string An unescaped string value

escape

Escape a value for use in a GFF3 attribute value.

Parameters

Returns string An escaped string value

escapeColumn

Escape a value for use in a GFF3 column value.

Parameters

Returns string An escaped column value

parseAttributes

Parse the 9th column (attributes) of a GFF3 feature line.

Parameters

  • attrString string String of GFF3 9th column

Returns GFF3Attributes Parsed attributes

parseFeature

Parse a GFF3 feature line

Parameters

Returns GFF3FeatureLine The parsed feature

parseDirective

Parse a GFF3 directive line.

Parameters

  • line string GFF3 directive line

Returns (GFF3Directive | GFF3SequenceRegionDirective | GFF3GenomeBuildDirective | null) The parsed directive

formatAttributes

Format an attributes object into a string suitable for the 9th column of GFF3.

Parameters

Returns string GFF3 9th column string

formatFeature

Format a feature object or array of feature objects into one or more lines of GFF3.

Parameters

Returns string A string of one or more GFF3 lines

formatDirective

Format a directive into a line of GFF3.

Parameters

Returns string A directive line string

formatComment

Format a comment into a GFF3 comment. Yes I know this is just adding a # and a newline.

Parameters

Returns string A comment line string

formatSequence

Format a sequence object as FASTA

Parameters

Returns string Formatted single FASTA sequence string

formatItem

Format a directive, comment, sequence, or feature, or array of such items, into one or more lines of GFF3.

Parameters

Returns (string | Array<string>) A formatted string or array of strings

GFF3Attributes

A record of GFF3 attribute identifiers and the values of those identifiers

Type: Record<string, (Array<string> | undefined)>

GFF3FeatureLine

A representation of a single line of a GFF3 file

seq_id

The ID of the landmark used to establish the coordinate system for the current feature

Type: (string | null)

source

A free text qualifier intended to describe the algorithm or operating procedure that generated this feature

Type: (string | null)

type

The type of the feature

Type: (string | null)

start

The start coordinates of the feature

Type: (number | null)

end

The end coordinates of the feature

Type: (number | null)

score

The score of the feature

Type: (number | null)

strand

The strand of the feature

Type: (string | null)

phase

For features of type "CDS", the phase indicates where the next codon begins relative to the 5' end of the current CDS feature

Type: (string | null)

attributes

Feature attributes

Type: (GFF3Attributes | null)

GFF3FeatureLineWithRefs

Extends GFF3FeatureLine

A GFF3 Feature line that includes references to other features defined in their "Parent" or "Derives_from" attributes

child_features

An array of child features

Type: Array<GFF3Feature>

derived_features

An array of features derived from this feature

Type: Array<GFF3Feature>

GFF3Feature

A GFF3 feature, which may include multiple individual feature lines

Type: Array<GFF3FeatureLineWithRefs>

GFF3Directive

A GFF3 directive

directive

The name of the directive

Type: string

value

The string value of the directive

Type: string

GFF3SequenceRegionDirective

Extends GFF3Directive

A GFF3 sequence-region directive

value

The string value of the directive

Type: string

seq_id

The sequence ID parsed from the directive

Type: string

start

The sequence start parsed from the directive

Type: string

end

The sequence end parsed from the directive

Type: string

GFF3GenomeBuildDirective

Extends GFF3Directive

A GFF3 genome-build directive

value

The string value of the directive

Type: string

source

The genome build source parsed from the directive

Type: string

buildName

The genome build name parsed from the directive

Type: string

GFF3Comment

A GFF3 comment

comment

The text of the comment

Type: string

GFF3Sequence

A GFF3 FASTA single sequence

id

The ID of the sequence

Type: string

description

The description of the sequence

Type: string

sequence

The sequence

Type: string

License

MIT © Robert Buels