extract-relationships
v0.0.7
Published
A client library for the Watson Relationship Extraction service on IBM Bluemix.
Downloads
44
Maintainers
Readme
extract-relationships
What is this for?
Identifying things mentioned in unstructured text.
You provide text, and it'll extract:
- the entities mentioned in that text
- the different mentions of each entity
- the relationships between the entities contained in the text
Contents
How this works
This package is a thin library around the IBM Watson Relationship Extraction service on IBM Bluemix. All the natural language processing stuff is happening in that hosted service. This package just formats the request for you, and parses and cleans up the response to make it easier to consume in a node.js / Javascript application.
There is nothing here that you couldn't get by driving those APIs yourself, but hopefully using this thin wrapper will be quicker and easier.
This means:
- You don't need to run your application on Bluemix - the APIs are accessible over the web
- You do need an Internet connection - this package sends HTTP requests to an API hosted on IBM's Bluemix platform. It is not doing the NLP locally.
- You do need to sign up for an account with Bluemix first - the API it's using is authenticated, but this is quick and relatively painless
Usage
Install
npm install extract-relationships --save
Basic
Using default options examples/bluemix.js
var watson = require('extract-relationships');
watson.extract(yourtext, function (err, response) {
if (err) {
return console.error(err);
}
// output is contained in response
});
Using custom options examples/detailed.js
var watson = require('extract-relationships');
watson.extract(yourtext, options, function (err, response) {
if (err) {
return console.error(err);
}
// output is contained in response
});
Demo
Grab the contents of a news story, and use the Watson Relationship Extraction service to pick out the names of the people mentioned in that news story.
Full working source in examples/newspeople.js
Snippet here:
async.waterfall([
//
// download the contents of a BBC news story
function (callback) {
request(bbcNewsStoryUrl, callback);
},
//
// get the text contents out of the story
function (response, body, callback) {
var contents = unfluff(body).text;
callback(null, contents);
},
//
// submit the story text to the Relationship Extraction service
function (text, callback) {
watson.extract(text, bluemixoptions, callback);
},
//
// get the names of people from the response
function (storyinfo, callback) {
// filter the responses to pick out the people entities
var people = storyinfo.entities.filter(function (entity) {
return entity.type === 'PERSON' &&
entity.level === 'NAM' &&
// add a threshold so we ignore entities
// with a very low confidence score
entity.score >= 0.5;
});
// get the names out of those responses
var names = people.map(function (person) {
var personnames = [];
// Look through each mention of this person
// as some of the mentions could refer to their occupation or job title
// and some of the mentions will be 'he', 'she', 'they', etc.
// We're just interested in the names
person.mentions.forEach(function (mention) {
if (mention.role === 'PERSON' && mention.mtype === 'NAM') {
personnames.push(mention.text);
}
});
return personnames;
});
callback(null, names);
}
], function(err, result){
// print out the names we found
console.log(result);
});
Options
{
// If true, entities found in the text are returned, with the
// mentions of each entity.
// Defaults to true if not provided.
includeMentions : true,
// If true, relationships found between entities in the text
// are returned, with the mention of each relationship.
// Defaults to false if not provided.
includeRelationships : false,
// If true, the locations of items found in the text are
// returned, as offsets into the input text.
// Defaults to false if not provided
includeLocations : false,
// If true, the confidence scores in each item returned by the
// API are included in the response (as doubles between
// 0.0 and 1.0).
// Defaults to true if not provided
includeScores : true,
// if true, unique IDs will be included with objects returned
// in the response
includeIds : false,
// Language & corpus profile to use to process text
// 'ie-en-news' is based on English news sources
// 'ie-es-news' is based on Spanish news sources
// see http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/apis/#!/relationship-extraction/extract
// Defaults to 'ie-en-news'
dataset : 'ie-en-news',
// URL to send requests to, and the username and password
// This is not required if running in Bluemix
// see 'Authentication' below
api : {
url : 'https://...',
user : 'your-relationship-extraction-username',
pass : 'your-relationship-extraction-password'
}
}
Interpreting the output
The official documentation can be found in the Watson Developer Cloud Documentation and is the definitive guide to the meaning of the values returned by the API.
An example of what the output looks like can be found here and in the comments in each of the examples.
The structure of the output returned by this client library is outlined here:
{
//
// a list of the entities found
//
entities : [
// see http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/sireapi/#outputEntities
// for the meaning of these values
{
// type of the entity
// see http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/sireapi/#outputEntityTypes
type : 'ORGANIZATION',
subtype : 'EDUCATIONAL',
// class of the entity
// SPC means a reference to a specific thing
// NEG means a negated reference to a specific thing
// GEN means a generic reference, such as a metaphorical reference
class : 'SPC',
// level of the entity
// e.g. NAM for a named entity as a proper name, PRO for pronoun, etc.
level : 'NAM',
// confidence level in the accuracy of this entity annotation
score : 0.9819,
// a list of each of the mentions of this entity
mentions : [
// see http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/sireapi/#outputMentions
// for the meaning of these values
{
// type and class of the mention
mtype : 'NAM',
class : 'SPC',
// context-sensitive role of the entity within this mention
role : 'ORGANIZATION',
// the covered text for this mention
text : 'University',
// location of the mention in the provided text
location : {
// character offsets for the start and end of the mention text
begin : 123, end : 133,
// character offsets for the head word of a phrase in the mention
head-begin : 123, head-end : 133
},
scores : {
// confidence level for the accuracy of this mention
score : 0.8271,
// confidence level for the accuracy that this refers
// to the other mentions
coref : 0.9912
}
}
]
}
],
//
// a list of the relationships found between entities
//
relationships : [
// see http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/sireapi/#outputRelations
// for the meaning of these values
{
// type of the relationship
// see http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/sireapi/#outputRelationTypes
// for the list of possible types
type : 'educatedAt',
// the two entities that this is a relationship between
entities : {
// the entity that this is a relationship from
one : {
type : 'PERSON',
class : 'SPC',
level : 'NAM'
},
// the entity that this is a relationship to
two : {
type : 'ORGANIZATION',
subtype : 'EDUCATIONAL',
class : 'SPC',
level : 'NAM'
}
},
// the instances of this relationship found in the text
// see http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/sireapi/#outputRelations
// for the meaning of these values
mentions : [
// each instance of the relationship
{
// specificity of the relation mention
// e.g. SPECIFIC, NEG etc.
class : 'SPECIFIC',
// nature of the relation mention
modality : 'ASSERTED',
// time of the relation mention relative to the publication of the text
// e.g. PAST, PRESENT, FUTURE
tense : 'UNSPECIFIED',
// confidence level for the accuracy of the relation
score : 0.7231,
// the mention of the entity that this is a relationship from
one : {
// type and class of the mention
mtype : 'NAM',
class : 'SPC',
// type of the entity
etype : 'PERSON',
// context-sensitive role of the entity within this mention
role : 'PERSON',
// the covered text for this mention
text : 'John Smith',
// location of the mention in the provided text
location : {
// character offsets for the start and end of the mention text
begin : 223, end : 233,
// character offsets for the head word of a phrase in the mention
head-begin : 223, head-end : 233
}
},
// the mention of the entity that this is a relationship from
two : {
// type and class of the mention
mtype : 'NAM',
class : 'SPC',
// type of the entity
etype : 'ORGANIZATION',
// context-sensitive role of the entity within this mention
role : 'ORGANIZATION',
// the covered text for this mention
text : 'University',
// location of the mention in the provided text
location : {
// character offsets for the start and end of the mention text
begin : 123, end : 133,
// character offsets for the head word of a phrase in the mention
head-begin : 123, head-end : 133
}
}
}
]
}
]
}
Authentication
The package submits HTTP requests to an API which requires a username and password.
Bluemix
If you are running your code on the Bluemix platform:
- Add the "Watson Relationship Extraction" service and bind it to your app
- That's it - the client library will pick up the credentials needed for the API from the Bluemix environment
Running outside Bluemix
If you are running your code anywhere else, outside of Bluemix, you'll first need to provision yourself a Relationship Extraction service.
There is a more detailed walkthrough on the blog post about this, but in short:
- Go to bluemix.net
- Sign in with your IBM ID (creating one if you've not already got one)
- Go to the Bluemix Dashboard
- Create an app
- Create a web app
- Choose SDK for Node.js
- Give it a name
- 'Add a service' and choose "Watson Relationship Extraction"
- From the App view, click on the 'Show Credentials' link for the bound Relationship Extraction service
- Copy the 'url', 'username' and 'password' values shown
Command-line equivalents of these steps can be found in the Watson Developer Cloud documentation.
Once created, you can get the url, username and password using:
cf env <application-name>
Usage and trademarks
Bluemix and Watson come from, and are trademarks of, IBM. This client library is not provided or supported by IBM. It aims to make it easier to use a Bluemix API, but makes no claims over what you can or cannot do with that API. It is your responsibility to conform to any terms and conditions for usage of the API that are part of signing up for an account on Bluemix.