@sifrr/seo
v0.0.9
Published
Server Side Redering for any js based app as a express middleware.
Downloads
4
Maintainers
Readme
sifrr-seo ·
Server Side Pre-Redering for any js based app using puppeteer (headless chrome) with caching. Mainly focused on serving rendered content to crawlers/bots.
Features
- Works with Custom Elements, Shadow DOM
- Add custom JS to execute before or after rendering
- key based Caching
How to use
Do npm i @sifrr/seo
or yarn add @sifrr/seo
or add the package to your package.json
file.
Api
Basic usage
SifrrSeo listens for load
page event and waits for any fetch
, xhr
request to complete before returning rendered HTML. It doesn't load any media content on server.
const SifrrSeo = require('@sifrr/seo');
// options
// `cacheStore`: same as store in [node-cache-manager](https://github.com/BryanDonovan/node-cache-manager) options, default: memory store with 100MB storage
// `maxCacheSize`: Maximum in-memory cache size (in MegaBytes)
// `ttl`: time to live for a cache request (in Seconds) 0 means infinity
// `cacheKey`: function that returns cache key for given req object
// `fullUrl`: function for middleware to determine fullUrl of express request
// `beforeRender`: this function will be executed in browser before rendering, doesn't take any arguments
// `afterRender`: this function will be executed in browser after rendering, doesn't take any arguments
// `filterOutgoingRequests`: This function is executed for every outgoing request in sifrr renderer, if this return false request will be blocked, else it will be allowed
//
// default values
const options = {
cacheStore: 'memory', // default in memory caching
maxCacheSize: 100,
ttl: 0,
cacheKey: (url, headers) => url,
beforeRender: () => {},
afterRender: () => {},
filterOutgoingRequests: (url) => true
}
const sifrrSeo = new SifrrSeo(/* Array of user agents to render for */, options);
// By default array is made up of these crawl bot user agents:
// 'Googlebot', // Google
// 'Bingbot', // Bing
// 'Slurp', // Slurp
// 'DuckDuckBot', // DuckDuckGo
// 'Baiduspider', //Baidu
// 'YandexBot', // Yandex
// 'Sogou', // Sogou
// 'Exabot', // Exalead
// Add your own user agent for which you want to server render
// You can give sub string of regex string like '(Google|Microsoft).*'
sifrrSeo.addUserAgent(/* string */ 'Opera Mini');
// add middleware to any connect/express like server
// for example in express:
const express = require('express');
const server = express();
// Only use for GET requests as a express middleware
server.get(sifrrSeo.getExpressMiddleware(/* function to get full url from express request */ expressReq => `http://127.0.0.1:80${expressReq.originalUrl}`));
server.listen(8080);
// Use it programatically - Only renders get urls
// these url, headers are passed to other functions
sifrrSeo.render(
url, /* Full url of page to render with protocol, domain, port, etc. */,
headers = {
/* Headers to send with GET request */
}
).then(html => ...).catch((e) => {
// It won't render the page if [rendering logic](#rendering-logic) is not satisfied and will throw error.
// e.message === 'No Render' when it doesn't render
});
node-cache-manager supports a lot of stores: list.
Rendering logic
sifrr-seo only renders a request if it has no Referer
header (i.e. direct browser requests) and if shouldRender
returns true
and if content-type is html
.
Changing shouldRender()
Change sifrrSeo.shouldRender
, by default it returns this._isUserAgent(headers)
(details). eg:
sifrrSeo.shouldRender = (url, headers) => {
// req is request argument given by server (express/connect)
// return true to render it server-side, return false to not render it.
return this.isUserAgent(req) && req.fullUrl.indexOf('html') >= 0;
};
Clearing cache
By default, server side rendered html is cached till you restart the server or if you close the browser. You can manually clear cache using
sifrrSeo.clearCache();
Higher level API
render()
returns Promise
which resolves in server rendered html
if url response has content-type html, else resolves in false
.
sifrrSeo.render(
url, /* Full url of page to render with protocol, domain, port, etc. */,
headers = {
/* Headers to send with GET request */
}
);
_isUserAgent(headers)
Returns true if headers['user-agent'] matches any of user-agents given in initialization
close()
closes puppeteer browser instance
sifrrSeo.close();
setPuppeteerOption()
adds puppeteer launch option. see list of options here.
Example: sifrrSeo.setPuppeteerOption('headless', false)
to run it without headless mode
sifrrSeo.addPuppeteerOption('headless', false);
puppeteerOptions
return options that will be used to launch puppeteer instance.
sifrrSeo.puppeteerOptions;
Note: Note that first server render will be slow (depending on server machine), but subsequent requests will be really fast because of caching (depending on efficiency of cache key).
Tips
Don't use external scripts in pages without a good cache age.
Pre-render a bunch of urls
const fs = require('fs');
const joinPath = require('path').join;
const seo = new SifrrSeo();
seo.shouldRender = () => true;
async function renderUrls(
urls = [
/* array of urls */
],
path = url => url
) {
for (let i = 0; i < urls.length; i++) {
const html = await seo.render(urls[i]);
await new Promise((res, rej) =>
fs.writeFile(path(urls[i]), html, err => {
if (err) rej(err);
res('The file has been saved!');
})
);
}
await seo.close();
}
renderUrls(['http://localhost:8080/abcd', 'http://localhost:8080/whatever'], u =>
joinPath(__dirname, '.' + u.slice(21))
);