canner-extract
v0.2.3
Published
Canner prompt cli tool for extracting data from html
Downloads
5
Readme
canner-extract
A html extractor for canner
Install
npm install -g canner-extract
Usage
Usage: canner-extract <html_file, default value: "index.html">
Options:
-h, --help output usage information
-V, --version output the version number
-m, --manually manually naming text node
Auto-extracting
extracting a html
file to canner.json and layout automatically.
canner-extract index.html
Before
A sample html:
<html>
<title>
</title>
<body>
This is text 1
<p> This is text 2 </p>
<span> This is text 3 </span>
<p> This is text 4 <span> This is text 5 </span> </p>
This is text 6
</body>
</html>
AFTER extract-html
test_after.html
:
<html>
<head>
<title>
</title>
</head>
<body>{{0}}
<p>{{1}}</p>
<span>{{2}}</span>
<p>{{3}}<span>{{4}}</span> </p>{{5}}</body>
</html>
test_output.json
:
{
"0": "This is text 1",
"1": "This is text 2",
"2": "This is text 3",
"3": "This is text 4",
"4": "This is text 5",
"5": "This is text 6"
}
Manually-extracting
extracting a html
file to canner.json and layout manually.
canner-extract -m index.html
This will prompt some messages for you to fill in, which will set the values in canner.json
.
Preview
Result
canner.json
:
{
"layout": "layout.hbs",
"filename": "output.html",
"data": {
"name": "Willis Corto",
"side-title1": "I got reprogrammed by a rogue AI",
"side-title2": "and now I'm totally cray",
"side-tab1": "About",
"side-tab2": "Things I Can Do",
"side-tab3": "A Few Accomplishments",
"side-tab4": "Contact",
"side-twitter": "Twitter",
"side-facebook": "Facebook",
"side-instagram": "Instagram",
"side-github": "Github",
"side-email": "Email",
"main-title1": "Read Only",
"main-title2": "Things I Can Do",
"main-subtitle1": "Just an incredibly simple responsive site",
"main-subtitle2": "template freebie by",
"main-content1": "Faucibus sed lobortis aliquam lorem blandit. Lorem eu nunc metus col. Commodo id in arcu ante lorem ipsum sed accumsan erat praesent faucibus commodo ac mi lacus
....
...
}
}
layout.hbs
:
...
...
<body>
<div id="wrapper">
<!-- Header -->
<section id="header" class="skel-layers-fixed">
<header>
<span class="image avatar"><img src="images/avatar.jpg" alt=""></span>
<h1 id="logo"><a href="#">{{name}}</a></h1>
<p>{{side-title1}}
<br>{{side-title2}}</p>
</header>
<nav id="nav">
<ul>
<li><a href="#one" class="active">{{side-tab1}}</a>
</li>
<li><a href="#two">{{side-tab2}}</a>
</li>
<li><a href="#three">{{side-tab3}}</a>
</li>
<li><a href="#four">{{side-tab4}}</a>
</li>
</ul>
</nav>
...
API
autoParse(html path, opt)
- html path: should be the absolute path to your html file.
return a promise
canner_extract.autoParse(html, opt)
.then(function(result) {
// console.log(result.html);
// console.log(result.json);
});
manuallyParse(html, opt)
- html path: should be the absolute path to your html file.
return a promise
canner_extract.autoParse(html, opt)
.then(function(result) {
// console.log(result.html);
// console.log(result.json);
});
Example
https://github.com/Canner/readonly-can
License
MIT