makestatic-sitemap

v1.2.4

Published

3 years ago

Generates sitemap files

Downloads

0High
0Medium
0Low

tmpfs

makestatic-plugin verify html text xml sitemap map tree

Sitemap

Generate sitemap files

Plugin that inspects the resource graph and generates a sitemap.

It can generate plain text and XML sitemaps for robots, a JSON document for further processing and an HTML document for humans.

See google sitemaps and sitemaps for more information.

For the text and xml formats we recommended assigning this plugin to the emit phase. For the html format it is best used during the transform phase so the modified sitemap template is optimized.

Install

yarn add makestatic-sitemap

API

SiteMap

Inspect the resource graph and generate a sitemap.

Suppported formats are xml, text, json and html.

Enable this plugin for the emit phase when writing the text, xml and json formats. For the html format the transform phase is recommended so the modified sitemap template is optimized.

The xml and text formats are designed to be exposed to robots to indicate how they should crawl the site whilst the json format allows programmatic processing of the sitemap, for example if you wanted to fetch resources after invalidating a CDN cache.

The html format is designed to be exposed to humans on your website to allow them to find relevant information. It generates an unordered list of the sitemap tree hierarchy. When using the html format you should also let us know the template that the list will be injected into using the template option.

The template file must have an HTML AST available and declare an element with an id attribute of sitemap the generated list will be injected as a child of that element. This allows you to work on the sitemap markup and styles as you would work with any other document and ensure the sitemap data is always consistent with the website structure.

SiteMap

new SiteMap(context, options)

Create a SiteMap plugin.

The base URL used to make links absolute first uses the base option otherwise looks for a url configuration option and finally will try to extract the homepage from a package.json file in the current working directory.

When no formats are given the xml format is used.

The name option should not include a file extension.

The rules option allows you to set fields for the xml output based on regular expression test patterns, for example:

{
  rules: [
    {
      test: /docs\//,
      changefreq: 'weekly',
      priority: 0.8
    }
  ]
}

If changefreq is invalid it will be ignored, if priority is outside of the zero to one range it is clamped.

When the image option is set the xml output format will include image resources in documents. The image:loc node is always set to an absolute URL using the src attribute of the img element.

The meta data to add to the sitemap is extracted from attributes on the element so you can declare sitemap meta data in the HTML document. The attribute to XML node name map for img elements:

title: image:title
data-caption: image:caption
data-geo-location: image:geo_location
data-license: image:license

Image elements in sitemap URLs have a limit of 1000 if the number of images in a page exceeds this limit the sitemap will only include the first 1000.

When the video option is set the xml output format will include video resources in documents. The video:content_loc node is always set to an absolute URL using the src attribute of the video element.

Video sitemap meta data is extracted from the element attributes. The attribute to XML node name map for video elements:

title: video:title
data-description: video:description
data-thumbnail-loc: video:thumbnail_loc
data-duration: video:duration
data-expiration-date: video:expiration_date
data-rating: video:rating
data-view-count: video:view_count
data-publication-date: video:publication_date
data-family-friendly: video:family_friendly
data-tag: video:tag
data-category: video:category
data-restriction: video:restriction
data-restriction-relationship: video:restriction (relationship attribute)
data-gallery-loc: video:gallery_loc
data-gallery-loc-title: video:gallery_loc (title attribute)
data-price: video:price
data-price-currency: video:price (currency attribute)
data-requires-subscription: video:requires_subscription
data-uploader: video:uploader
data-uploader-info: video:uploader (info attribute)
data-platform: video:platform
data-platform-relationship: video:platform (relationship attribute)
data-live: video:live

For the data-tag attribute you can separate multiple tags with a comma and they are expanded to multiple video:tag elements in the xml.

If the video element contains child embed elements a video:player_loc xml element is created for each child embed element with a src attribute.

Videos are also limited to 1000 per page.

No validation is performed on the video attributes you should read the corresponding documentation to verify attribute values are correct.

You can pass options specific to a format using the renderer option and the format key, for example:

{
  renderer: {
    html: {
      builder: CustomHtmlBuilder
    }
  }
}

When the robots option is set you should have a robots.txt file being processed and enabled the parse-robots plugin otherwise the option will have no effect. When configured correctly this option adds Sitemap entries for the robots.txt file for the text and xml formats.

If you are creating sitemaps in both text and xml formats two Sitemap entries will be created.

context Object the processing context.
options Object the plugin options.

Options

base String URL for the site domain.
path String name of a sub-directory path.
index String=index.html name of index documents.
name String=sitemap name of the output file.
formats String|Array=xml list of output formats to emit.
rules Array set changefreq, priority fields for xml.
clean Boolean=true use clean URLs for index documents.
slash Boolean=true use trailing slashes.
robots Boolean=false add Sitemap entries to robots.txt.
image Boolean=false generate image xml entries.
video Boolean=false generate video xml entries.
include Boolean=false should the sitemap template be included.
indent Number=0 number of spaces to indent.
template String=sitemap.html template for the HTML format.
renderer Object format specific renderer options.

Throws

Error if no resource graph is available.
Error if no base URL is available.
Error if an unknown format is detected.

.before

SiteMap.prototype.before(context, options)

Generate the sitemap.

When the exclude option is given each entry should be a regular expression pattern. If a pattern matches an HTML document id in the resource graph it is not included in the sitemap.

context Object the processing context.
options Object the plugin options.

Options

exclude Array list of regular expressions to exclude.

HtmlBuilder

Default implementation for generating a DOM structure of the sitemap within the sitemap template file.

It is recommended that you use this default implementation and style the lists but if you really want to use different elements for the sitemap you can supply an alternative builder class as a renderer option.

HtmlBuilder

new HtmlBuilder(context, sitemap, ast, template, options)

Create an HtmlBuilder.

Returns a string href.

context Object the processing context.
sitemap Object raw sitemap data.
ast Object the sitemap AST.
template Object reference to the sitemap template.
options Object the renderer options.

.getHref

HtmlBuilder.prototype.getHref(node)

Get an href attribute value.

Returns a string href.

node Object the sitemap tree node.

.getTitle

HtmlBuilder.prototype.getTitle(node)

Get the title for a node.

This value is used for the link text and the link title attribute.

Returns a string title.

node Object the sitemap tree node.

.getLinkText

HtmlBuilder.prototype.getLinkText(node)

Get the text for a link node.

Prefers a title when available otherwise uses the page name.

Returns a string for the link text.

node Object the sitemap tree node.

.getDescription

HtmlBuilder.prototype.getDescription(node)

Get the description for a node.

Extract the content attribute from a meta element with name set to description.

Returns a string description.

node Object the sitemap tree node.

.getRootElement

HtmlBuilder.prototype.getRootElement(node)

Create the root element to append as a child of the element with an id of sitemap in the template file.

This implementation returns a ul element.

Returns an element.

node Object the sitemap AST node.

.getItemElement

HtmlBuilder.prototype.getItemElement(node)

Get an element for each tree node item.

This implementation returns a li element.

Returns an element.

node Object the sitemap AST node.

onEnter

onEnter(node)

Invoked when a node is entered.

node Object the sitemap tree node.

onExit

onExit(node)

Invoked when a node is exited.

node Object the sitemap tree node.

.build

HtmlBuilder.prototype.build(parent)

Main DOM builder function, generates unordered lists representing the sitemap.

parent Object the parent DOM node.

TextRenderer

Renders the sitemap as plain text.

#render

static render(context, sitemap)

Render the plain text format.

Returns an object with the text file content.

context Object the processing context.
sitemap Object raw sitemap data.

extension

static extension

Get the file extension for the text format.

HtmlRenderer

Renders the sitemap as HTML.

#render

static render(context, sitemap, options)

Render the HTML format.

Unlike other formats this renderer does not return a content string as it modifies the template AST and marks the AST as dirty before updating the file content.

When no builder option is given the default HtmlBuilder class is used.

When the strategy option is given it should be one of root, absolute or relative. The default strategy root builds links with a leading slash, the absolute strategy uses the base URL to make links include the domain name and the relative strategy resolves links relative to the sitemap template file.

If an unsupported strategy is given the default is used.

Returns an object with the sitemap ast.

context Object the processing context.
sitemap Object raw sitemap data.
options Object renderer options.

Options

strategy String=root link href strategy.
selector String=#sitemap query for the parent element.
builder Function the HTML builder class.

Throws

Error if the sitemap template could not be found.
Error if the parent element could not be found.

JsonRenderer

Renders the sitemap as JSON and generates an AST of the sitemap structure.

extension

static extension

Get the file extension for the json format.

XmlRenderer

Renders the sitemap as XML.

#render

static render(context, sitemap)

Render the XML format.

Returns an object with the xml file content.

context Object the processing context.
sitemap Object raw sitemap data.

Options

image String=false include document images.
video String=false include document videos.
rules Array list of document rules.

extension

static extension

Get the file extension for the xml format.

License

MIT

Created by mkdoc on March 12, 2017

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Sitemap

Install

API

SiteMap

See Also

SiteMap

Options

Throws

.before

Options

HtmlBuilder

HtmlBuilder

.getHref

.getTitle

.getLinkText

.getDescription

.getRootElement

.getItemElement

onEnter

onExit

.build

TextRenderer

#render

extension

HtmlRenderer

#render

Options

Throws

JsonRenderer

extension

XmlRenderer

#render

Options

extension

License