extract-html
v0.1.2
Published
extract text from html
Downloads
1
Readme
extract-html
extract html data out magically!
Install
npm install extract-html
Example
Before
A sample html:
<html>
<title>
</title>
<body>
This is text 1
<p> This is text 2 </p>
<span> This is text 3 </span>
<p> This is text 4 <span> This is text 5 </span> </p>
This is text 6
</body>
</html>
AFTER extract-html
test_after.html
:
<html>
<head>
<title>
</title>
</head>
<body>{{0}}
<p>{{1}}</p>
<span>{{2}}</span>
<p>{{3}}<span>{{4}}</span> </p>{{5}}</body>
</html>
test_output.json
:
{
"0": "This is text 1",
"1": "This is text 2",
"2": "This is text 3",
"3": "This is text 4",
"4": "This is text 5",
"5": "This is text 6"
}
Sample
var extract_test = extract(path.join(__dirname, 'extract.html'), function(err, html, json) {
fs.writeFileSync('./test_after.html', html)
fs.writeFileSync('./test_after.json', json)
});
Options
in extract-html
we are using package https://github.com/beautify-web/js-beautify, for doing the tasks for beautify html and js, so you can set up opt
like
var opt = {
"indent_size": 4,
"indent_char": " ",
"indent_level": 0,
"indent_with_tabs": false,
"preserve_newlines": true,
"max_preserve_newlines": 10,
"jslint_happy": false,
"space_after_anon_function": false,
"brace_style": "collapse",
"keep_array_indentation": false,
"keep_function_indentation": false,
"space_before_conditional": true,
"break_chained_methods": false,
"eval_code": false,
"unescape_strings": false,
"wrap_line_length": 0
}
in APIs.
Example:
var extract_test = extract(path.join(__dirname, 'extract.html'), opt, function(err, html, json) {
fs.writeFileSync('./test_after.html', html)
fs.writeFileSync('./test_after.json', json)
});
API
extract(html path, opt, callback)
- html path: should be the absolute path to your html file.
- callback: callback function, will pass the html that have being convert.
extract.html(html, opt, callback)
- html: raw html
- callback: callback function, will pass the html that have being convert.
License
MIT