rp-paragraph-splitter
v1.0.0
Published
Splits walls of text into paragraphs, focusing on dialogue.
Downloads
5
Readme
RP Paragraph Splitter
This is a component of the website for my IRC RP's logs that takes a huge blob of text and attempts to cut it into paragraphs that make sense. It does not handle the rejoining of cut IRC messages, as that would couple it with my logs website's code.
It has no dependencies, for the sentence tokenizers available on npm did not handle all the odd formattings you'd find in the clash of writing styles that we call RP. The biggest thing is eclipses and .
not always ending sentences when they're inside quotation marks.
The input is expected to all be from one perspective dialogue-wise, though it will split upon meeting a character name as the first word. This requires it be hooked into a character getter (see settings
under reference).
The tag
parameter is for anything you want to associate all posts with for the next step of your code. For example, I use the character name on my logs website to create separators when the character changes.
I put it here in the hopes that someone else might find it interesting or useful, and it's released under the permissive ISC license.
Goal
The goal is to improve the reading experience for people reading up on RP logs on my site. It doesn't have to be perfect, just good enough.
Rules
The paragraph can be split if either of the below rules are true. The numbers can be tuned with the settings
object. The current sentence will make up the "topic" sentence of the next paragraph.
- Length is past 45 words, the current sentence is 7 words long, the first dialogue has been done, and it's not in the middle of dialogue.
- The last sentence is one complete quotation, and the current is 7 words long.
- The first word is a character name.
Example
var rpps = require('rp-paragraph-splitter');
let text = `Paste a huge block of text here.`;
let tag = 'Character Name';
let paragraphs = rpps.Paragraph.split(text, tag);
for(let i = 0; i < paragraphs.length; ++i) {
let paragraph = paragraphs[i];
console.log(`${paragraph.toString()}\n`);
}
Reference
All the objects below are children of the main module object.
Sentence
The sentence tokenizer. You don't have to touch this to use it, but here it is anyway.
Properties
string text
: The entire sentence text.char first
: the first characterchar last
: The last character.string firstWord
: The first word.string lastWord
: The last word.int quoteCoount
: The number of quotation marks.int length
: The number of words.bool dialogue
: The sentence opened or closed dialogue; i.e. had an odd number of quotation marks.bool fullDialogue
: The sentence started and ended on a '"'.
Functions
Sentence(text)
: Creates a sentence with the text.string .toString()
: Returns thetext
property.Sentence[] Sentence.split(text)
: Splits the text into multipleSentence
s
Paragraph
Properties
Sentence[] sentences
: The sentences in this paragraph.string tag
: Arbitrary tag.
Functions
Paragraph(sentences, tag)
: Creates a sentence with the sentences.string .toString()
: Displays the paragraph content as text.Paragraph[] Paragraph.split(sentences, tag)
: Groups theSentence
s together into paragraphs.Paragraph[] Paragraph.split(text, tag)
: Splits the text intoSentences
s and turns them into paragraphs.
settings
int paragraphLength
: The minimum length for rule 1.ìnt topicLength
: The topic sentence length for rule 1 and 2.function characterCallback(string name)
: Where to ask for character, the rule looks fortrue
or any non-null object as success. Thename
argument is the first word in lowercase.