retext-lexrank
v1.3.1
Published
Lexrank algorithm for retextjs
Downloads
83
Maintainers
Readme
Retext Lexrank
Retext plugin for generating unsupervised text summarization using the Lexrank algorithm.
Install
npm i --save retext-lexrank
Use
import { unified } from 'unified'
import latin from 'retext-latin'
import lexrank from 'retext-lexrank'
const processor = unified()
.use(latin)
.use(lexrank)
const file = '...' // vfile or text string
const tree = processor.parse(file)
processor.run(tree, file)
Use with retext-keywords
Adding the part-of-speech and keywords plugins to the pipeline yields more polarized results.
import { unified } from 'unified'
import latin from 'retext-latin'
import pos from 'retext-pos'
import keywords from 'retext-keywords'
import lexrank from 'retext-lexrank'
const processor = unified()
.use(latin)
.use(pos)
.use(keywords)
.use(lexrank)
Example
Note
The
retext-lexrank
plugin works best on medium-to-long samples of text, like web articles, blogs, and essays. The following is a simple example.
Using the classic write-music sample from the unifiedjs use-cases:
Write Music (by Gary Provost)
This sentence has five words. Here are five more words.
Five word sentences are fine. But several together
become monotonous. Listen to what is happening. The
writing is getting boring. The sound of it drones. It's
like a stuck record. The ear demands some variety.
Now listen. I vary the sentence length, and I create
music. Music. The writing sings. It has a pleasant
rhythm, a lilt, a harmony. I use short sentences. And I
use sentences of medium length. And sometimes when I am
certain the reader is rested, I will engage him with a
sentence of considerable length, a sentence that burns
with energy and builds with all the impetus of a
crescendo, the roll of the drums, the crash of the
cymbals—sounds that say listen to this, it is important.
So write with a combination of short, medium, and long
sentences. Create a sound that pleases the reader's ear.
Don't just write words. Write music.
Supplying the above text to the processor
, we can then find the top-ranked sentences:
import { selectAll } from 'unist-util-select'
import { toString } from 'nlcst-to-string'
selectAll('SentenceNode', tree)
.sort(({ data: { lexrank: a } }, { data: { lexrank: b } }) => b - a)
.slice(0, 3)
.forEach(sentence => {
const score = sentence.data.lexrank.toFixed(2)
console.log(`[${score}]: ${toString(sentence)}`)
})
Running the above yields:
[1.00]: I vary the sentence length, and I create music.
[0.85]: And I use sentences of medium length.
[0.71]: So write with a combination of short, medium, and long sentences.
Tests
Run npm test
to run tests.
Run npm coverage
to produce a test coverage report.