extract-html

v0.1.2

Published

2 years ago

extract text from html

Downloads

0High
0Medium
0Low

chilijung

extract html data

extract-html

extract html data out magically!

Install

npm install extract-html

Example

Before

A sample html:

<html>
  <title>
  </title>

  <body>
    This is text 1

    <p> This is text 2 </p>

    <span> This is text 3 </span>

    <p> This is text 4 <span> This is text 5 </span> </p>

    This is text 6
  </body>

</html>

AFTER extract-html

test_after.html:

<html>

<head>
    <title>
    </title>

</head>

<body>{{0}}
    <p>{{1}}</p>

    <span>{{2}}</span>

    <p>{{3}}<span>{{4}}</span> </p>{{5}}</body>

</html>

test_output.json:

{
    "0": "This is text 1",
    "1": "This is text 2",
    "2": "This is text 3",
    "3": "This is text 4",
    "4": "This is text 5",
    "5": "This is text 6"
}

Sample

var extract_test = extract(path.join(__dirname, 'extract.html'), function(err, html, json) {
    fs.writeFileSync('./test_after.html', html)
    fs.writeFileSync('./test_after.json', json)
    });

Options

in extract-html we are using package https://github.com/beautify-web/js-beautify, for doing the tasks for beautify html and js, so you can set up opt like

var opt = {
    "indent_size": 4,
    "indent_char": " ",
    "indent_level": 0,
    "indent_with_tabs": false,
    "preserve_newlines": true,
    "max_preserve_newlines": 10,
    "jslint_happy": false,
    "space_after_anon_function": false,
    "brace_style": "collapse",
    "keep_array_indentation": false,
    "keep_function_indentation": false,
    "space_before_conditional": true,
    "break_chained_methods": false,
    "eval_code": false,
    "unescape_strings": false,
    "wrap_line_length": 0
}

in APIs.

Example:

var extract_test = extract(path.join(__dirname, 'extract.html'), opt, function(err, html, json) {
    fs.writeFileSync('./test_after.html', html)
    fs.writeFileSync('./test_after.json', json)

    });

API

extract(html path, opt, callback)

html path: should be the absolute path to your html file.
callback: callback function, will pass the html that have being convert.

extract.html(html, opt, callback)

html: raw html
callback: callback function, will pass the html that have being convert.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

extract-html

Install

Example

Sample

Options

API

extract(html path, opt, callback)

extract.html(html, opt, callback)

License