table-scraper
v1.0.3
Published
Easily scrape any website's html table data into an array of JavaScript objects.
Downloads
4,966
Maintainers
Readme
table-scraper
Simple utility for scraping data from html tables on a given website into a list of javascript objects.
installation
npm install --save table-scraper
methods
get(url)
Returns a promise that resolves to a list of tables found on the input website. HTML table rows are converted to javascript objects
For example: suppose the website at http://www.some-fake-url.com
consisted of the following:
<html>
<head>
</head>
<body>
<table>
<thead>
<tr><th>State</th><th>Capital City</th><th>Pop.<th></tr>
</thead>
<tbody>
<tr><td>Minnesota</td><td>Saint Paul</td><td>3</td></tr>
<tr><td>New York</td><td>Albany</td><td>Eight Million</td></tr>
</tbody>
</table>
</body>
</html>
The following code would result in the array displayed below:
var scraper = require('table-scraper');
scraper
.get('http://www.some-fake-url.com')
.then(function(tableData) {
/*
tableData ===
[
[
{ State: 'Minnesota', 'Capital City': 'Saint Paul', 'Pop.': '3' },
{ State: 'New York', 'Capital City': 'Albany', 'Pop.': 'Eight Million' }
]
]
*/
});
Important to note: the tableData
returned is a list of lists. So, if some-fake-url.com
contained three tables, the structure of the response would look like
[
[ /* list of data from the first table */ ],
[ /* list of data from the second table */ ],
[ /* list of data from the third table */ ]
]
If a table has NO headings (no <th>
elements), the object keys are simply the column index:
[
{'0': <first column data of first row>, '1': <second column data of first row>, .... }
]
Contributing
Feedback/PRs welcome! Please include tests around any new functionality, and make sure existing tests pass:
npm test
Credits
The following node libraries make this utility super easy: