sitemap2docext
v0.0.5
Published
This module downloads all web pages listed in the Sitemap.xml file and compiles them into a single document.
Downloads
6
Readme
Sitemap 2 Doc
This module downloads all web pages listed in the Sitemap.xml file and compiles them into a single document.
Use node.js v20+
Designed for AI Embedding Generation
Quickstart
Terminal
npm init -y && npm i sitemap2doc
Node index.mjs
import { Sitemap2Doc } from 'sitemap2doc'
const s2d = new Sitemap2Doc()
await s2d.getDocument( {
'projectName': 'test',
'sitemapUrl': 'https://...'
} )
Terminal
node index.mjs
Table of Contents
Methods
getDocument()
| Key | Type | Description | Required | Default |
| ---------------- | --------------------- | ----------------------------------------------------- | -------- | -------- |
| projectName | String
| Set project name | true
| |
| sitemapUrl | String
| Set sitemap source | true
| |
| silent | Boolean
| Control terminal output | false
| false
|
Example
import { Sitemap2Doc } from 'sitemap2doc'
const s2d = new Sitemap2Doc()
await s2d.getDocument( {
'projectName': 'test',
'sitemapUrl': 'https://...'
} )
Get Sitemap https://...
Get Pages 0 1 2 3 4 5 6 7 8 9
Merge 0
getConfig()
Get current config, the default config you can find here: ./src/data/config.mjs
import { Sitemap2Doc } from 'sitemap2doc'
const s2d = new Sitemap2Doc()
let config = s2d.getConfig()
config['download']['chunkSize'] = 4
s2d
.setConfig( { config } )
.getDocument( { ... } )
setConfig()
All module settings are stored in a config file, see ./src/data/config.mjs. This file can be completely overridden by passing an object during initialization.
import { Sitemap2Doc } from 'sitemap2doc'
const s2d = new Sitemap2Doc()
let config = s2d.getConfig()
config['download']['chunkSize'] = 4
s2d
.setConfig( { config } )
.getDocument( { ... } )
License
The module is available as open source under the terms of the Apache 2.0. License.