stream-sitemap-parser
v4.0.3
Published
Receive any type of sitemap stream and parse it. Stream back list of URLs or errors found
Downloads
18
Keywords
Readme
sitemap-parser
Stream a sitemap file and get back a stream of URLs or any error found while parsing the file.
Usage
const { fetch, verify, getRules } = require('stream-sitemap-parser');
fs.createReadStream(file)
.pipe(fetch())
.on('data', function (url) {
// each chunk now contains an url and all its given atributes
{
loc: 'www.google.com',
lastmod: '2017-01-01T00:00:00.000Z',
changefreq: 'monthly',
priority: '0.8',
alternate: [
{
href: 'https://www.google.com/es/',
hreflang: 'es'
}
]
}
})
verify(fs.createReadStream(file))
.then(result => {
// result will be an object containing information about any warning or error found while parsing the sitemap
{
messages: [
{
type: 'tooManyTags',
details: {
parent: 'url',
tag: 'loc'
}
}
],
alternates: [
{
loc: 'https://www.google.com',
alternate: [
{
href: 'https://www.google.com/es/',
hreflang: 'es'
}
]
]
}
})
getRules();
// returns an object of all loaded rules of the parser
fetch
and verify
can take several options.
fetch ( { contentType, domain, maxSize, maxUrls } )
verify (sitemapStream, { contentType, domain, maxSize, maxUrls } )
contentType
will be by default xml
. Set it to txt
when streaming that data type.
domain
will be by default null
. Set it to a given domain to make sure that the URLs parsed will have the same domain.
maxSize
will be by default 50MB
. Set it to any given size to make sure that the stream can't have a larger size than this.
maxUrls
will be by default 50000
. Set it to any given value to make sure that no more URLs will be parsed.