npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@gmod/gtf

v0.0.9

Published

read and write GTF data as streams

Downloads

56

Readme

@gmod/gtf

Build Status

GTF or the General Transfer Format is identical to GFF version2. This module was created to read and write GTF data. This module aims to be a complete implementation of the GTF specification.

  • streaming parsing and streaming formatting
  • creates transcript features with children_features
  • only compatible with GTF

Note: For JBrowse, we generally encourage GFF3 over GTF

For GFF3, checkout @gmod/gff-js package found here

Install

$ npm install --save @gmod/gtf

Usage


import gtf from '@gmod/gtf'

// parse a file from a file name
gtf.parseFile('path/to/my/file.gtf', { parseAll: true })
.on('data', data => {
  if (data.directive) {
    console.log('got a directive',data)
  }
  else if (data.comment) {
    console.log('got a comment',data)
  }
  else if (data.sequence) {
    console.log('got a sequence from a FASTA section')
  }
  else {
    console.log('got a feature',data)
  }
})

// parse a stream of GTF text
const fs = require('fs')
fs.createReadStream('path/to/my/file.gtf')
.pipe(gtf.parseStream())
.on('data', data => {
  console.log('got item',data)
  return data
})
.on('end', () => {
  console.log('done parsing!')
})

// parse a string of gtf synchronously
let stringOfGTF = fs
  .readFileSync('my_annotations.gtf')
  .toString()
let arrayOfThings = gtf.parseStringSync(stringOfGTF)

// format an array of items to a string
let stringOfGTF = gtf.formatSync(arrayOfThings)

// format a stream of things to a stream of text.
// inserts sync marks automatically.
// note: this could create new gtf lines for transcript features
myStreamOfGTFObjects
.pipe(gtf.formatStream())
.pipe(fs.createWriteStream('my_new.gtf'))

// format a stream of things and write it to
// a gtf file. inserts sync marks
//  note: this could create new gtf lines for transcript features
myStreamOfGTFObjects
.pipe(gtf.formatFile('path/to/destination.gtf')

Object format

features

Because GTF can not handle a 3 level hierarchy (gene -> transcript -> exon), we parse GTF by creating transcript features with children features.

We do not create features from the gene_id. Values that are . in the GTF are null in the output.

ctgA	bare_predicted	CDS	10000	11500	.	+	0	transcript_id "Apple1";

Note: that is creates an additional transcript feature from the transcript id when featureType is not 'transcript'. It will then create a child CDS feature from the line of GTF shown above.

[
    [
        {
            "seq_name": "ctgA",
            "source": "bare_predicted",
            "featureType": "transcript",
            "start": 10000,
            "end": 11500,
            "score": null,
            "strand": "+",
            "frame": "0",
            "attributes": { "transcript_id": [ "\"Apple1\"" ] },
            "child_features": [[
                {
                    "seq_name": "ctgA",
                    "source": "bare_predicted",
                    "featureType": "CDS",
                    "start": 10000,
                    "end": 11500,
                    "score": null,
                    "strand": "+",
                    "frame": "0",
                    "attributes": { "transcript_id": [ "\"Apple1\"" ] },
                    "child_features": [],
                    "derived_features": []
                }
            ]],
            "derived_features": []
        }
    ]
]

directives, comments, sequences

parseDirective("##gtf\n")
// returns
{
  "directive": "gtf",
}

parseComment('# hi this is a comment\n')
// returns
{
  "comment": "hi this is a comment"
}

//These come from any embedded `##FASTA` section in the GTF file.
{
  "id": "ctgA",
  "description": "test contig",
  "sequence": "ACTGACTAGCTAGCATCAGCGTCGTAGCTATTATATTACGGTAGCCA"
}

API

Table of Contents

parseStream

Parse a stream of text data into a stream of feature, directive, and comment objects.

Parameters

  • options Object optional options object (optional, default {})

    • options.encoding string text encoding of the input GTF. default 'utf8'
    • options.parseAll boolean default false. if true, will parse all items. overrides other flags
    • options.parseFeatures boolean default true
    • options.parseDirectives boolean default false
    • options.parseComments boolean default false
    • options.parseSequences boolean default true
    • options.bufferSize Number maximum number of GTF lines to buffer. defaults to 1000

Returns ReadableStream stream (in objectMode) of parsed items

parseFile

Read and parse a GTF file from the filesystem.

Parameters

  • filename string the filename of the file to parse

  • options Object optional options object

    • options.encoding string the file's string encoding, defaults to 'utf8'
    • options.parseAll boolean default false. if true, will parse all items. overrides other flags
    • options.parseFeatures boolean default true
    • options.parseDirectives boolean default false
    • options.parseComments boolean default false
    • options.parseSequences boolean default true
    • options.bufferSize Number maximum number of GTF lines to buffer. defaults to 1000

Returns ReadableStream stream (in objectMode) of parsed items

parseStringSync

Synchronously parse a string containing GTF and return an arrayref of the parsed items.

Parameters

  • str string

  • inputOptions Object optional options object (optional, default {})

    • inputOptions.parseAll boolean default false. if true, will parse all items. overrides other flags
    • inputOptions.parseFeatures boolean default true
    • inputOptions.parseDirectives boolean default false
    • inputOptions.parseComments boolean default false
    • inputOptions.parseSequences boolean default true

Returns Array array of parsed features, directives, and/or comments

formatSync

Format an array of GTF items (features,directives,comments) into string of GTF. Does not insert synchronization (###) marks. Does not insert directive if it's not already there.

Parameters

  • items

Returns String the formatted GTF

formatStream

Format a stream of items (of the type produced by this script) into a stream of GTF text.

Inserts synchronization (###) marks automatically.

Parameters

  • options Object

    • options.minSyncLines Object minimum number of lines between ### marks. default 100
    • options.insertVersionDirective Boolean if the first item in the stream is not a ##gff-version directive, insert one to show it's gtf default false

formatFile

Format a stream of items (of the type produced by this script) into a GTF file and write it to the filesystem.

Inserts synchronization (###) marks and a ##gtf directive automatically (if one is not already present).

Parameters

  • stream ReadableStream the stream to write to the file

  • filename String the file path to write to

  • options Object (optional, default {})

    • options.encoding String default 'utf8'. encoding for the written file
    • options.minSyncLines Number minimum number of lines between sync (###) marks. default 100
    • options.insertVersionDirective Boolean if the first item in the stream is not a ##gtf directive, insert one. default false

Returns Promise promise for the written filename

util

Table of Contents

util

unescape

Unescape a string/text value used in a GTF attribute. Textual attributes should be surrounded by double quotes source info: https://mblab.wustl.edu/GTF22.html https://en.wikipedia.org/wiki/Gene_transfer_format

Parameters

Returns String

_escape

Escape a value for use in a GTF attribute value.

Parameters

Returns String

escapeColumn

Escape a value for use in a GTF column value.

Parameters

Returns String

parseAttributes

Parse the 9th column (attributes) of a GTF feature line.

Parameters

Returns Object

parseFeature

Parse a GTF feature line.

Parameters

  • line String returns the parsed line in an object

parseDirective

Parse a GTF directive/comment line.

Parameters

Returns Object the information in the directive

formatAttributes

Format an attributes object into a string suitable for the 9th column of GTF.

Parameters

formatFeature

Format a feature object or array of feature objects into one or more lines of GTF.

Parameters

  • featureOrFeatures

formatDirective

Format a directive into a line of GTF.

Parameters

Returns String

formatComment

Format a comment into a GTF comment. Yes I know this is just adding a # and a newline.

Parameters

Returns String

formatSequence

Format a sequence object as FASTA

Parameters

Returns String formatted single FASTA sequence

formatItem

Format a directive, comment, or feature, or array of such items, into one or more lines of GTF.

Parameters

Notes and resources

License

MIT © Robert Buels