@gmod/gtf
v0.0.9
Published
read and write GTF data as streams
Downloads
279
Readme
@gmod/gtf
GTF or the General Transfer Format is identical to GFF version2. This module was created to read and write GTF data. This module aims to be a complete implementation of the GTF specification.
- streaming parsing and streaming formatting
- creates transcript features with children_features
- only compatible with GTF
Note: For JBrowse, we generally encourage GFF3 over GTF
For GFF3, checkout @gmod/gff-js package found here
Install
$ npm install --save @gmod/gtf
Usage
import gtf from '@gmod/gtf'
// parse a file from a file name
gtf.parseFile('path/to/my/file.gtf', { parseAll: true })
.on('data', data => {
if (data.directive) {
console.log('got a directive',data)
}
else if (data.comment) {
console.log('got a comment',data)
}
else if (data.sequence) {
console.log('got a sequence from a FASTA section')
}
else {
console.log('got a feature',data)
}
})
// parse a stream of GTF text
const fs = require('fs')
fs.createReadStream('path/to/my/file.gtf')
.pipe(gtf.parseStream())
.on('data', data => {
console.log('got item',data)
return data
})
.on('end', () => {
console.log('done parsing!')
})
// parse a string of gtf synchronously
let stringOfGTF = fs
.readFileSync('my_annotations.gtf')
.toString()
let arrayOfThings = gtf.parseStringSync(stringOfGTF)
// format an array of items to a string
let stringOfGTF = gtf.formatSync(arrayOfThings)
// format a stream of things to a stream of text.
// inserts sync marks automatically.
// note: this could create new gtf lines for transcript features
myStreamOfGTFObjects
.pipe(gtf.formatStream())
.pipe(fs.createWriteStream('my_new.gtf'))
// format a stream of things and write it to
// a gtf file. inserts sync marks
// note: this could create new gtf lines for transcript features
myStreamOfGTFObjects
.pipe(gtf.formatFile('path/to/destination.gtf')
Object format
features
Because GTF can not handle a 3 level hierarchy (gene -> transcript -> exon), we parse GTF by creating transcript features with children features.
We do not create features from the gene_id. Values that are .
in the GTF are
null
in the output.
ctgA bare_predicted CDS 10000 11500 . + 0 transcript_id "Apple1";
Note: that is creates an additional transcript feature from the transcript id when featureType is not 'transcript'. It will then create a child CDS feature from the line of GTF shown above.
[
[
{
"seq_name": "ctgA",
"source": "bare_predicted",
"featureType": "transcript",
"start": 10000,
"end": 11500,
"score": null,
"strand": "+",
"frame": "0",
"attributes": { "transcript_id": [ "\"Apple1\"" ] },
"child_features": [[
{
"seq_name": "ctgA",
"source": "bare_predicted",
"featureType": "CDS",
"start": 10000,
"end": 11500,
"score": null,
"strand": "+",
"frame": "0",
"attributes": { "transcript_id": [ "\"Apple1\"" ] },
"child_features": [],
"derived_features": []
}
]],
"derived_features": []
}
]
]
directives, comments, sequences
parseDirective("##gtf\n")
// returns
{
"directive": "gtf",
}
parseComment('# hi this is a comment\n')
// returns
{
"comment": "hi this is a comment"
}
//These come from any embedded `##FASTA` section in the GTF file.
{
"id": "ctgA",
"description": "test contig",
"sequence": "ACTGACTAGCTAGCATCAGCGTCGTAGCTATTATATTACGGTAGCCA"
}
API
Table of Contents
parseStream
Parse a stream of text data into a stream of feature, directive, and comment objects.
Parameters
options
Object optional options object (optional, default{}
)options.encoding
string text encoding of the input GTF. default 'utf8'options.parseAll
boolean default false. if true, will parse all items. overrides other flagsoptions.parseFeatures
boolean default trueoptions.parseDirectives
boolean default falseoptions.parseComments
boolean default falseoptions.parseSequences
boolean default trueoptions.bufferSize
Number maximum number of GTF lines to buffer. defaults to 1000
Returns ReadableStream stream (in objectMode) of parsed items
parseFile
Read and parse a GTF file from the filesystem.
Parameters
filename
string the filename of the file to parseoptions
Object optional options objectoptions.encoding
string the file's string encoding, defaults to 'utf8'options.parseAll
boolean default false. if true, will parse all items. overrides other flagsoptions.parseFeatures
boolean default trueoptions.parseDirectives
boolean default falseoptions.parseComments
boolean default falseoptions.parseSequences
boolean default trueoptions.bufferSize
Number maximum number of GTF lines to buffer. defaults to 1000
Returns ReadableStream stream (in objectMode) of parsed items
parseStringSync
Synchronously parse a string containing GTF and return an arrayref of the parsed items.
Parameters
Returns Array array of parsed features, directives, and/or comments
formatSync
Format an array of GTF items (features,directives,comments) into string of GTF. Does not insert synchronization (###) marks. Does not insert directive if it's not already there.
Parameters
items
Returns String the formatted GTF
formatStream
Format a stream of items (of the type produced by this script) into a stream of GTF text.
Inserts synchronization (###) marks automatically.
Parameters
options
Object
formatFile
Format a stream of items (of the type produced by this script) into a GTF file and write it to the filesystem.
Inserts synchronization (###) marks and a ##gtf directive automatically (if one is not already present).
Parameters
stream
ReadableStream the stream to write to the filefilename
String the file path to write tooptions
Object (optional, default{}
)
Returns Promise promise for the written filename
util
Table of Contents
- util
- unescape
- _escape
- escapeColumn
- parseAttributes
- parseFeature
- parseDirective
- formatAttributes
- formatFeature
- formatDirective
- formatComment
- formatSequence
- formatItem
util
unescape
Unescape a string/text value used in a GTF attribute. Textual attributes should be surrounded by double quotes source info: https://mblab.wustl.edu/GTF22.html https://en.wikipedia.org/wiki/Gene_transfer_format
Parameters
s
String
Returns String
_escape
Escape a value for use in a GTF attribute value.
Parameters
regex
s
String
Returns String
escapeColumn
Escape a value for use in a GTF column value.
Parameters
s
String
Returns String
parseAttributes
Parse the 9th column (attributes) of a GTF feature line.
Parameters
attrString
String
Returns Object
parseFeature
Parse a GTF feature line.
Parameters
line
String returns the parsed line in an object
parseDirective
Parse a GTF directive/comment line.
Parameters
line
String
Returns Object the information in the directive
formatAttributes
Format an attributes object into a string suitable for the 9th column of GTF.
Parameters
attrs
Object
formatFeature
Format a feature object or array of feature objects into one or more lines of GTF.
Parameters
featureOrFeatures
formatDirective
Format a directive into a line of GTF.
Parameters
directive
Object
Returns String
formatComment
Format a comment into a GTF comment. Yes I know this is just adding a # and a newline.
Parameters
comment
Object
Returns String
formatSequence
Format a sequence object as FASTA
Parameters
seq
Object
Returns String formatted single FASTA sequence
formatItem
Format a directive, comment, or feature, or array of such items, into one or more lines of GTF.
Parameters
Notes and resources
- This is an adaptation of the JBrowse GTF parser
- GTF docs
License
MIT © Robert Buels