split-html
v1.1.0
Published
Split HTML into two valid fragments wherever a certain selector is matched. Works on the server side.
Downloads
1,846
Readme
split-html
Given a string containing an HTML fragment, split that string into two or more correctly balanced HTML fragments wherever the specified selector is found. Returns both the new fragments and the elements that matched the selector, in alternation. Works on the server and in the browser. Powered by Cheerio on the server side, jQuery in the browser.
var splitHtml = require('split-html');
var html = '<div>' +
'<h4>First component.</h4>' +
'<img src="/test.jpg">' +
'<p>Second component.</p>' +
'</div>';
var fragments = splitHtml(html, 'img');
console.log(fragments);
This outputs:
[
'<div><h4>First component.</h4></div>',
'<img src="/test.jpg">',
'<div><p>Second component.</p></div>'
]
Note that the img
itself is returned. The first element in the array is always an HTML fragment, the second is always an element that matched the selector, and so on in alternation.
Any container tags already open when the img
tag is encountered are automatically closed at the end of the first fragment and re-opened at the start of the next one with the same attributes.
Optional test function
If a jQuery/CSS-style selector isn't specific enough, you can pass a function as the third argument. This function is called with a Cheerio or jQuery object representing the matching element. If you want to split around this element, return true
. Otherwise, return false
.
This is useful because Cheerio does not currently support :has
, and also because in some situations even :has
might not be specific enough.
// Split on 'a', but only if it contains 'img'
var result = splitHtml(html, 'a', function($el) {
if ($el.find('img').length) {
return true;
} else {
return false;
}
});
Additional options
The following options should be passed as the fourth argument to splitHtml
in an object.
cheerio
An object of Cheerio v0.x .load()
options, as documented here. This is used when using split-html on the server when preparing the HTML fragment to parse.
Why?
We wanted to import Wordpress blog posts into the ApostropheCMS CMS. Wordpress uses HTML to embed images and videos, while Apostrophe represents blocks of text and widgets like slideshows as separate objects in an array. split-html
allows us to neatly slice and dice existing HTML so we can transform it into Apostrophe widgets easily.
What about errors?
If split-html
encounters something it can't figure out, such as terrible markup, it will return the original string as the only element in the array.
Using split-html in the browser
split-html
has been coded to work with either Cheerio or actual jQuery. It will automatically just use jQuery if that is present in the browser. We use this feature in production in ApostropheCMS's rich text editor.
About P'unk Avenue and ApostropheCMS
split-html
was created at P'unk Avenue for use in many projects built with Apostrophe, an open-source content management system built on node.js. If you like split-html
you should definitely check out apostrophecms.com.
Support
Feel free to open issues on github.
Changelog
1.1.0 - 2020-09-09
- Adds the Cheerio configuration option in a fourth options argument.
CHANGES IN 1.0.3
- Included an explicit LICENSE.md file (no change, still MIT licensed). No changes in functionality.
CHANGES IN 1.0.2
- Undeclared variable fixed. No functional changes.
CHANGES IN 1.0.1
- Clarified that this code is mature for browser use as well. No code changes.
CHANGES IN 1.0.0
- Updated documentation and released 1.0.0 stable. No code changes.
CHANGES IN 0.1.1
- Works correctly with actual jQuery, in addition to working correctly in node with Cheerio as before. This required changes to be more pedantic about closing parent tags in the first fragment, and a better simulation of Cheerio's document object.
- Handles nested parent elements correctly.
CHANGES IN 0.1.0
Initial release. With shiny unit tests, of course.