tex-linebreak2
v0.8.4
Published
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) ![npm version](https://img.shields.io/npm/v/tex-linebreak2.svg)
Downloads
105
Readme
tex-linebreak2 (beta)
tex-linebreak2 is a JavaScript library for breaking text into lines of optimal width. It can be used to lay out justified text as you would find in a newspaper, book or technical paper. It implements the Knuth-Plass line-breaking algorithm(PDF) which is used by TeX.
This library can be used to lay out the text of webpages, plain text, or for rendering justified text to a canvas.
Features
- Can be applied to webpages (with certain limitations). Rendering the elements that are visible on screen usually does not take more than 40ms; rendering a very large document in its entirety may take 200ms.
- Hanging punctuation
- Breakpoints in accordance with the Unicode line breaking algorithm.[^1]
- Supports custom breaking rules.
- Supports both justification and ragged alignment.
- Can be used in a browser or a Node.js environment[^2] to render to any target (
<canvas>
, HTML elements, PDF).
Table of contents
About the Knuth-Plass algorithm
Most text on the web is presented with "ragged-right" margins, as opposed to
the justified text you would find in e.g. a scientific paper or newspaper.
Text can be justified in web pages using text-align: justify
.
However this option alone tends to result in large spaces
between words which is distracting to read. This is due to the
use of "first fit" line-breaking algorithms where the browser considers only the
current line when finding the next breakpoint. Some browsers support hyphenation
via hyphens: auto
which reduces this effect. However the first-fit approach
can still produce wide lines and it can also produce more hyphenated lines than
necessary.
The Knuth-Plass algorithm on the other hand optimizes the spacing between words over the whole paragraph, seeking to minimize the overall "badness" of the layout. This factor depends on the amount by which spaces have been shrunk or stretched and the number of hyphenated lines. The benefits of this approach are greater when rendering narrower columns of text (eg. on small screens).
This table compares the same text rendered in the same environment (font, font size, device width, margins) using CSS justification, CSS justification + hyphenation and this library:
Installation
As a third-party script
When loaded as a third-party script, methods are available through the global variable texLinebreak
.
<head>
<script src="https://unpkg.com/[email protected]/dist/lib.js"></script>
</head>
<body>
<p>Example text</p>
<script>
texLinebreak.texLinebreakDOM("p");
</script>
</body>
As a package
npm install tex-linebreak2
Usage
Webpages
Use the texLinebreakDOM
function to lay out the paragraphs of a website:
import { texLinebreakDOM } from "tex-linebreak2";
texLinebreakDOM("p"); // Selects all <p/> elements
The function accepts either a query selector string or a list of elements:
import { texLinebreakDOM } from "tex-linebreak2";
texLinebreakDOM(document.querySelectorAll("p"), { align: "left" });
Options are passed the second parameter of this function.
The library will listen for window resizing (can be turned off with the option { updateOnWindowResize: false }
, but it will not listen for dynamic DOM changes. If you alter the DOM in a way that may cause the available space for the paragraph to change, you must call texLinebreakDOM
again.
Limitations on webpages
⚠️ A major limitation is that all line breaks are included in any text the user copies, and all hyphenated lines will include the hyphen character when copied.
Underlined text will not have any underlines between words.
Floating elements only have basic support:
- All floating elements that are children of absolutely positioned or sticky elements are ignored.
- Floating elements that are nested within the text itself (e.g.
<p>text <FloatingElement/> text</p>
) are not supported. - Floating elements that do not have
line-height
set are not supported. - If you run into layout issues and aren't using floating elements in your text, you can turn on the option
ignoreFloatingElements
.
If you're using third-party webfonts, you must apply
texLinebreakDOM
after the fonts have fully loaded, and that may introduce a noticable lag.There may be issues regarding nested inline-block elements.
Using
<br/>
instead of using new paragraph elements sometimes causes very wide spaces instead of optimal spacing, which is a bug.
The following are not supported:
- Columns
- Contenteditable elements
- The CSS properties
font-stretch
,font-size-adjust
, andfont-kerning
Other limitations
- Does not support Swedish, German, and Dutch hyphenation rules that cause spelling to change (e.g. tuggummi becoming tugg-gummi and opaatje becoming opa-tje).
Other types of text
import { TexLinebreak } from "tex-linebreak2";
const text =
"Chamæleon animal est quadrupes, macrum & gibbosum, capite galeato, corpore & cauda lacertæ majoris, cervice penè nulla, costis plus minus sedecim, obliquo ductu ventri junctis ut piscibus.";
const t = new TexLinebreak(text, {
lineWidth: 45,
/*
A function that measures the width of a string of text.
(For monospace text, you should however use the
option `preset: "plaintext"`, which will correctly
calculate a string's width)
*/
measureFn: (word) => word.length,
/* Spaces should not expand */
glueStretchFactor: 0,
/* Spaces should not contract */
glueShrinkFactor: 0,
});
/* Get output as plain text */
console.log(t.plaintext);
/*
Output:
Chamæleon animal est quadrupes, macrum &
gibbosum, capite galeato, corpore & cauda
lacertæ majoris, cervice penè nulla, costis
plus minus sedecim, obliquo ductu ventri
junctis ut piscibus.
*/
/* Get output as positioned items */
console.log(t.lines.map((line) => line.positionedItems));
/*
Output:
[[{ type: 'box', text: 'Chamæleon', xOffset: 0, width: 9 },
{ type: 'glue', text: ' ', xOffset: 9, width: 1 },
...
*/
Arbitrary items
You can also lay out arbitrary items (be it text or something else). The algorithm works with generic "box" (typeset material), "glue" (spaces that may have flexible sizing) and "penalty" items. Typically "boxes" are words, "glue" items are spaces and "penalty" items represent hyphenation points or other possible breaking points. These concepts are more versatile than they may appear at first, as is discussed in the original paper.
import { TexLinebreak, MIN_COST } from "tex-linebreak2";
const items = [
{ type: "box", width: 10 },
{ type: "glue", width: 4, stretch: 2, shrink: 1 },
{ type: "box", width: 20 },
{ type: "penalty", cost: MIN_COST },
];
const positionedItems = new TexLinebreak(items, {
lineWidth: 45,
}).positionedItems;
Options
See TexLinebreakOptions
for a list of available options. Of these, the most relevant ones are:
justify
(boolean)align
("left", "right", "center", default "left", does not need to be set on webpages)hangingPunctuation
(boolean, default true)glueStretchFactor
(default 1.2, i.e. becoming 220% of the space's original width) – How much a glue (space) is allowed to stretch. This is not a hard limit; seerenderLineAsUnjustifiedIfAdjustmentRatioExceeds
for hard limits.glueShrinkFactor
(default 0.2, i.e. becoming 80% of the space's original width) – How much a glue (space) is allowed to shrink. This is a hard limit.softHyphenPenalty
(default 50) – Set toMAX_COST
to prohibit breaking on soft hyphens.forceOverflowToBreak
(default false) – Whether long words should be forcibly broken.setElementWidthToMaxLineWidth
– Can be used to shrink a DOM element to the necessary width.ignoreFloatingElements
– If you run into an issue where a floating element is interfering with the layout, and you're not using floating elements in the text, you can set this option to true.
API
The class TexLinebreak
is used to break text into lines. It takes as input either text or items, along with options:
new TexLinebreak("text", {});
// or
new TexLinebreak({ type: "box", width: 10 }, {});
It has the following properties:
lines
is an array ofLine
objects which describe each line of the outputplaintext
will output the text as plain text with newlinesitems
, the input text represented as boxes, glues, and penaltiesbreakpoints
, the indices of items that break a line
A Line
object describes a single line of the output. Its has the following properties:
positionedItems
– An array of the items (box, glue, and penalties) that are relevant for rendering the line (with irrelevant penalties having been removed and with irrelevant glue having been given a width of zero), along with their positioning information given asxOffset
andadjustedWidth
(which is width including any stretching or shrinking).plaintext
Hyphenation
The recommended way to add hyphenation is to preprocess your text server-side and add soft hyphen characters (­
in HTML, \u00AD
in Unicode) to your text. This can be achieved with a library such as Hyphen. Client-side hyphenation will cause a noticable amount of lag on large documents.
tex-linebreak2 will strip the soft hyphen characters from the output so they won't be of annoyance to users who copy your text (can be turned off with the option stripSoftHyphensFromOutputText
).
Credits
This project is an extension of tex-linebreak by Robert Knight.
References
- D. E. Knuth and M. F. Plass, “Breaking paragraphs into lines” (PDF), Software: Practice and Experience, vol. 11, no. 11, pp. 1119–1184, Nov. 1981.
Notes
[^1]: However there may exist a handful of exceptions regarding some non-Latin scripts. [^2]: For Node.js, you do however have to supply your own function to measure the width of text. Furthermore, server-side processing of HTML documents is not supported out of the box.