@yeskiy/sitemapper
v5.0.0
Published
Parser for XML Sitemaps to be used with Robots.txt and web crawlers
Downloads
5
Maintainers
Readme
Sitemap-parser
NOTE: This is a fork of the original sitemapper package with full migration to ESM
and ts
. The original package can be found here
Parse through a sitemaps xml to get all the urls for your crawler.
Installation
npm install @yeskiy/sitemapper --save
Simple Example
import Sitemapper from '@yeskiy/sitemapper';
const sitemap = new Sitemapper();
sitemap.fetch('https://www.google.com/work/sitemap.xml').then((sites) => {
console.log(sites);
});
Options
You can add options on the initial Sitemapper object when instantiating it.
requestHeaders
: (Object) - Additional Request Headers (e.g.User-Agent
)timeout
: (Number) - Maximum timeout in ms for a single URL. Default: 15000 (15 seconds)url
: (String) - Sitemap URL to crawldebug
: (Boolean) - Enables/Disables debug console logging. Default: Falseconcurrency
: (Number) - Sets the maximum number of concurrent sitemap crawling threads. Default: 10retries
: (Number) - Sets the maximum number of retries to attempt in case of an error response (e.g. 404 or Timeout). Default: 0rejectUnauthorized
: (Boolean) - If true, it will throw on invalid certificates, such as expired or self-signed ones. Default: Truelastmod
: (Number) - Timestamp of the minimum lastmod value allowed for returned urlsgotParams
: (GotOptions) - Additional options to pass to thegot
library. See Got Options