npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

vsm-dictionary-uniprot

v1.0.16

Published

Implementation of a VSM-dictionary that interfaces with the UniProt REST API

Downloads

13

Readme

vsm-dictionary-uniprot

Node.js CI codecov npm version Downloads License

Summary

vsm-dictionary-uniprot is an implementation of the 'VsmDictionary' parent-class/interface (from the package vsm-dictionary), that communicates with UniProt's REST API and translates the provided protein data into a VSM-specific format.

Install

Run: npm install

Example use

Create a directory test-dir and inside run npm install vsm-dictionary-uniprot. Then, create a test.js file and include this code:

const DictionaryUniProt = require('vsm-dictionary-uniprot');
const dict = new DictionaryUniProt({log: true});

dict.getEntryMatchesForString('tp53', { page: 1, perPage: 10 }, 
  (err, res) => {
    if (err) 
      console.log(JSON.stringify(err, null, 4));
    else
      console.log(JSON.stringify(res, null, 4));
  }
);

Then, run node test.js

Browsers

<script src="https://unpkg.com/vsm-dictionary-uniprot@^1.0.0/dist/vsm-dictionary-uniprot.min.js"></script>

after which it is accessible as the global variable VsmDictionaryUniProt.

Tests

Run npm test, which runs the source code tests with Mocha.
If you want to quickly live test the UniProt API, go to the test directory and run:

node getEntries.test.js
node getEntryMatchesForString.test.js

'Build' configuration

To use a VsmDictionary in Node.js, one can simply run npm install and then use require(). But it is also convenient to have a version of the code that can just be loaded via a <script>-tag in the browser.

Therefore, we included webpack.config.js, which is a Webpack configuration file for generating such a browser-ready package.

By running npm build, the built file will appear in a 'dist' subfolder. You can use it by including: <script src="../dist/vsm-dictionary-uniprot.min.js"></script> in the header of an HTML file.

Specification

Like all VsmDictionary subclass implementations, this package follows the parent class specification. In the next sections we will explain the mapping between the data offered by UniProt's API and the corresponding VSM objects. Useful links for the API are:

  • https://www.uniprot.org/help/query-fields
  • https://www.uniprot.org/help/uniprotkb_column_names
  • https://www.uniprot.org/help/api_queries
  • https://www.uniprot.org/help/text-search

Note also that we implement strict error handling in the sense that whenever we launch multiple parallel queries to UniProt's REST API (see the functions specifications below), if one of them returns an error (either a string or an error JSON object response), then the result will be an error object (no matter if all the rest of the calls returned proper results).

If the error response in not a JSON string that we can parse, we formulate the error as a JSON object ourselves in the following format:

{
  status: <number>,
  error: <response> 
}

where the response from the server is JSON stringified.

Map UniProt to DictInfo VSM object

This specification relates to the function:
getDictInfos(options, cb)

If the options.filter.id is not properly defined or the https://www.uniprot.org dictID is included in the list of ids used for filtering, getDictInfos returns a static object with the following properties:

  • id: 'https://www.uniprot.org' (will be used as a dictID)
  • abbrev: 'UniProt'
  • name: 'Universal Protein Resource'

Otherwise, an empty result is returned.

Map UniProt to Entry VSM object

This specification relates to the function:
getEntries(options, cb)

Firstly, if the options.filter.dictID is properly defined and in the list of dictIDs the https://www.uniprot.org dictID is not included, then an empty array of entry objects is returned.

If the options.filter.id is properly defined (with IDs like https://www.uniprot.org/uniprot/P12345) then for each ID (in parallel) we send a query like this one:

https://www.uniprot.org/uniprot/?query=id:P12345&columns=id%2Ccomment%28FUNCTION%29%2Cprotein%20names%2Cgenes%2Corganism%2Creviewed%2Centry%20name%2Cannotation%20score&format=tab

Otherwise, we ask for all ids (by default id sorted) with this query:

https://www.uniprot.org/uniprot/?query=*&columns=id%2Ccomment%28FUNCTION%29%2Cprotein%20names%2Cgenes%2Corganism%2Creviewed%2Centry%20name%2Cannotation%20score&sort=id&desc=no&limit=5&offset=0&format=tab

Note that depending on the options.page and options.perPage options we adjust the limit and offset parameters accordingly. There is no maximum value for the limit parameter, but we chose a value of 50 to use in case perPage is not defined properly (the default value for offset is 0).

Only when requesting for specific IDs, we sort the results depending on the options.sort value: results can be either id-sorted or str-sorted, according to the specification of the parent 'VsmDictionary' class. We then prune these results according to the values options.page (default: 1) and options.perPage (default: 50).

At July 2019, UniProt offered the results from its REST API in various formats but not JSON :( We chose thus the tab-separated format as shown in the above queries (&format=tab). The returned tab-separated lines are mapped to VSM entries. The next table shows the exact mapping:

UniProt column | Type | Required | VSM entry/match object property | Notes
:---:|:---:|:---:|:---:|:---: Entry | String | YES | id | the full URL of the UniProt ID FUNCTION [CC] | String | NO | descr | The protein's function Protein names | String | NO? | str,terms[i].str | Recommended and alternative names for the protein Gene names | String | NO | z.genes | An array of gene names Organism | String | NO | z.species | The organism this protein was found Status | String | NO | z.status | Is the protein information reviewed, unreviewed, deleted (obsolete), etc. Entry name | String | YES | z.entry | A UniProt-specific ID for the entry, e.g. VPS73_YEAST Annotation | String | NO | z.score | Annotation score, a quality index for the protein information (e.g. '4 out of 5')

Note that the above mapping describes what we as developers thought as the most reasonable. There is though a global option optimap that you can pass to the DictionaryUniProt object, which optimizes the above mapping for curator clarity and use. The default value is true and what changes in the mapping table above (which is the mapping for optimap: false actually) is that the VSM's str entry/match object property takes the value of the Entry name. The reason behind this is that the Entry name is always different for every returned result (UniProt's internal id) and thus distinguishable, whereas in the original mapping the first protein name (which was used as str) is not.

Map UniProt to Match VSM object

This specification relates to the function:
getEntryMatchesForString(str, options, cb)

Firstly, if the options.filter.dictID is properly defined and in the list of dictIDs the https://www.uniprot.org dictID is not included, then an empty array of match objects is returned.

Otherwise, an example of a URL string that is being built and send to UniProt's REST API when requesting for tp53, is:

https://www.uniprot.org/uniprot/?query=tp53&columns=id%2Ccomment%28FUNCTION%29%2Cprotein%20names%2Cgenes%2Corganism%2Creviewed%2Centry%20name%2Cannotation%20score&sort=score&limit=20&offset=0&format=tab

The columns requested are the same as in the getEntries(options, cb) case as well as the mapping shown in the table above. Queries requesting for string matches always return results sorted based on an internal, UniProt-specific score value (note the sort=score in the URL). This practically ensures that the most requested and best-quality results will be the ones returned first and they are the same as what you would expect when searching a term in the the main search box of the UniProt website: https://www.uniprot.org/uniprot/?query=tp53&sort=score.

For the limit and offset parameters the same things apply as in the getEntries specification. No sorting whatsoever is done on the client after the results are returned from UniProt's REST API.

License

This project is licensed under the AGPL license - see LICENSE.md.