scrape-brrr
v1.1.0
Published
Simple web page scraping
Downloads
2
Readme
scrape-brrr
Simple web page scraping.
Install
yarn add scrape-brrr
Try it online
Usage examples
*The following examples use typescript style import. For plain nodejs, use
const { scrape } = require('scrape-brrr')
Dead-simple usage
/**
* <body>
* <div>
* <span>
* <p>sentence 1</p>
* <p>sentence 2</p>
* <p>sentence 3</p>
* </span>
* </div>
* <p>footer</p>
* </body>
*/
import { scrape } from 'scrape-brrr'
const data = await scrape('http://website.com', 'div p:not(:first-child)')
// ["sentence 2", "sentence 3"]
Scrape single item
/**
* <body>
* <div>Best wof</div>
* <span>Largest wof</span>
* </body>
*/
import { scrape } from 'scrape-brrr'
const data = await scrape('http://website.com', [
{
name: 'stats',
selector: 'div',
},
{
name: 'another-stats',
selector: 'span',
},
])
// {
// stats: "Best wof"
// "another-stats": "Largest wof"
// }
Scrape multiple items
/**
* <body>
* <div>
* <span class="name">husky</span>
* <span class="name">golden</span>
* </div>
* </body>
*/
import { scrape } from 'scrape-brrr'
const data = await scrape('http://website.com', [{
name: 'bestWofs',
selector: 'div .name',
many: true
}])
// { bestWofs: ["husky", "golden"] }
Nested fields
/**
* <body>
* <div>
* <span class="name">husky</span>
* <span class="name">golden</span>
* </div>
* </body>
*/
import { scrape } from 'scrape-brrr'
const data = await scrape('http://website.com', [{
name: 'bestWofs',
selector: 'div',
many: true,
nested: [
{
name: 'name',
selector: 'span',
}
]
}])
// {
// bestWofs: [
// { name: "husky" },
// { name: "golden" },
// ]
// }
Extract link / HTML element attribute
/**
* <body>
* <span class="title" id="best">Best wof</div>
* <a href="/other-stats">other stats</a>
* </body>
*/
import { scrape } from 'scrape-brrr'
const data = await scrape('http://website.com', [
{
name: 'key',
selector: 'span',
attr: 'id'
},
{
name: 'otherLink',
selector: 'a',
attr: 'href'
},
])
// {
// key: "best",
// otherLink: "/other-stats"
// }
Transform
/**
* <body>
* <div>
* <span class="rank">1</span>
* <span class="name">husky</span>
* </div>
* <div>
* <span class="rank">2</span>
* <span class="name">golden</span>
* </div>
* </body>
*/
import { scrape } from 'scrape-brrr'
const data = await scrape('http://website.com', [{
name: 'best',
selector: 'div',
many: true,
nested: [
{
name: 'rank',
selector: '.rank',
},
{
name: 'name',
selector: '.name',
}
],
transform: arr => arr[0]
}])
// {
// best: { name: "husky" },
// }
Website with dynamic content by js
Use puppeteer to load page with javascript to scrape dynamic content.
/**
* <body>
* <h1>
* tick tok tick tok
* </h1>
* <script>
* document.querySelector('h1').textContent = 'boom!'
* </script>
* </body>
*/
import { scrape } from 'scrape-brrr'
const data = await scrape('http://website.com', 'h1', { dynamic: true })
// ["boom!"]
Other features
- Handle non-utf8 charset response from server (e.g. chinese encoding
big5
)
Development
yarn install
yarn test