kasha
v0.14.6
Published
Pre-render your Single-Page Application.
Downloads
28
Readme
Kasha
Pre-render your Single-Page Application.
Features
- Prerender the Single-Page Application.
- Automatically collect sitemaps from
<meta>
s. - Generate
robots.txt
with sitemap directives. - Sync prerendering.
- Async prerendering with callback URL.
- URL rewriting.
- Works as a proxy server.
- Rich APIs.
- Caching.
Requirements
SPA compatibility adjustments
In order to make the pre-rendered SPA works correctly in the client-side browser, you need to do some works:
- When pre-rendering, intercept the anonymous AJAX requests and store the responses into
<script>
tag, so AJAX requests would not send again on the client-side. Our AJAX library teleman and teleman-ssr-cache may help you. - On the client-side, mount the SPA and replace the pre-rendered content.
- Set
<meta>
tags, so search engine can know more about the page. You can use set-meta.
Installation
npm i -g kasha
Docker:
docker pull kasha/kasha
Configuration
See config.sample.js
Running
Start the server:
kasha server --config=/path/to/config.js
Docker:
docker run -v /path/to/config.js:/dest/to/config.js kasha/kasha server --config=/dest/to/config.js
Start the worker:
kasha worker --config=/path/to/config.js
# async worker
# requests with 'callbackURL' parameter will be dispatched to async workers.
kasha worker --async --config=/path/to/config.js
Docker:
docker run -v /path/to/config.js:/dest/to/config.js kasha/kasha worker [--async] --config=/dest/to/config.js
Site Config
db.sites.insert({
// The hostname of your site.
host: 'www.example.com',
// In proxy mode, if the request doesn't contain 'X-Forwarded-Proto' or 'Forwarded:...proto=...' header,
// then use 'defaultProtocol'.
defaultProtocol: 'https',
// If your site use REST-style URLs, like /article/123, the query string isn't necessary to the page,
// you can remove the query string to improve the cache hit rate:
// keepQuery: false,
// You can also keep the required query parameter of some URLs
keepQuery: [
[
'/search', // the first element is the pathname of URL.
'type', // starting from the second element, specifies the query names you need to keep.
'keyword'
],
// another URL and its query names
['/product', 'id']
],
// You can use the '/render' API to crawl the hash-based Single-page application.
// For example, you can crawl https://www.example.com/app/#/home via
// /render?url=https%3A%2F%2Fwww.example.com%2Fapp%2F%23%2Fhome
// But if this site is not hash-based, you can remove the hash:
keepHash: false,
// Rewrites the request URL.
rewrites: [
// [from, to]
// If 'to' is an empty string, the request will be aborted.
// pattern syntax see https://github.com/jiangfengming/url-router#pattern
// route all requests to the entry point HTML file
['https://www.example.com/(.*)', 'https://static.example.com/index.html'],
// except robots.txt
['https://www.example.com/robots.txt', 'https://static.example.com/robots.txt'],
// or block it if you do not have one
// ['https://www.example.com/robots.txt', ''],
// block google analytics requests
['https://www.googletagmanager.com/(.*)', '']
],
// Excludes the pages that don't need pre-rendering.
excludes: [
'/your-account/(.*)',
'/your-orders/(.*)'
],
// But include these pages that matched the excludes pattern
includes: [
'your-account/signin'
],
// Specifies the User-Agent
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/73.0.3683.103 Safari/537.36',
// You can create profiles for different device types.
// A profile can override keepQuery, keepHash, rewrites, excludes, includes, userAgent.
profiles: {
desktop: {
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/73.0.3683.103 Safari/537.36',
rewrites: [
[
'https://www.example.com/(.*)',
'https://static.example.com/desktop/index.html'
]
]
},
mobile: {
userAgent: 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/73.0.3683.103 Mobile Safari/537.36',
rewrites: [
[
'https://www.example.com/(.*)',
'https://static.example.com/mobile/index.html'
]
]
}
},
// If profile param of the request isn't set, use this profile
defaultProfile: 'desktop'
})
APIs
Please confirm apiHost
has been set correctly.
For example, if set apiHost: '127.0.0.1:3000'
, then only requests from http(s)://127.0.0.1:3000/*
can access the APIs,
All other domains are served in proxy mode.
GET /render
Renders the page.
Query string params:
url
: The encoded URL of the webpage to render.
profile
: The profile to use.
type
: Set the response type. Defaults to json
.
html
: Returns html with headerContent-Type: text/html
.json
: Returns json with headerContent-Type: application/json
.static
: Returns html with headerContent-Type: text/html
, but stripped the<script>
tags andon*
event handlers.
callbackURL
: Don't wait the result. Once the job is done, POST
the result to the given URL with json
format.
If callbackURL
is set, type
is ignored.
metaOnly
: If type
is json
, only returns meta data without html content.
followRedirect
: Follows the redirects if the page return 301
/302
.
refresh
: Forces to refresh the cache.
noWait
: Don't wait for the response. It is useful for pre-caching the page.
fallback
: If no cache found or the cache is expired, the request is proxied to the origin directly.
If fallback
is set, type
must be html
, callbackURL
, metaOnly
, followRedirect
, refresh
and noWait
can not be set.
To the boolean parameters, if the param is absent or set to 0
, it means false
.
If set to 1
or empty value (e.g., &refresh
, &refresh=
, &refresh=1
), it means true
.
Example: http://localhost:3000/render?url=https%3A%2F%2Fdavidwalsh.name%2Ffacebook-meta-tags
The returned JSON format example:
{
"url": "https://davidwalsh.name/facebook-meta-tags",
"profile": "",
"status": 200,
"redirect": null,
"meta": {
"title": "Facebook Open Graph META Tags",
"description": "Facebook's Open Graph protocol allows for web developers to turn their websites into Facebook \"graph\" objects, allowing a certain level of customization over how information is carried over from a non-Facebook website to Facebook when a page is \"recommended\" and \"liked\".",
"image": "https://davidwalsh.name/demo/facebook-developers-logo.png",
"canonicalUrl": "https://davidwalsh.name/facebook-meta-tags",
"author": "David Walsh",
"keywords": null
},
"openGraph": {
"og": {
"locale": {
"current": "en_US"
},
"type": "article",
"title": "Facebook Open Graph META Tags",
"description": "Facebook's Open Graph protocol allows for web developers to turn their websites into Facebook \"graph\" objects, allowing a certain level of customization over how information is carried over from a non-Facebook website to Facebook when a page is \"recommended\" and \"liked\".",
"url": "https://davidwalsh.name/facebook-meta-tags",
"site_name": "David Walsh Blog",
"updated_time": "2016-02-23T00:44:54+00:00",
"image": [
{
"url": "https://davidwalsh.name/demo/facebook-developers-logo.png",
"secure_url": "https://davidwalsh.name/demo/facebook-developers-logo.png"
},
{
"url": "https://davidwalsh.name/demo/david-facebook-share.png",
"secure_url": "https://davidwalsh.name/demo/david-facebook-share.png"
}
]
},
"article": {
"publisher": "https://www.facebook.com/davidwalshblog",
"section": "APIs",
"published_time": "2011-04-25T09:24:28+00:00",
"modified_time": "2016-02-23T00:44:54+00:00"
}
},
"content": "<!DOCTYPE html><html>...</html>",
"date": "2018-03-13T09:53:00.921Z"
}
GET /:url
Alias of /render?url=ENCODED_URL&type=html
.
For example, http://localhost:3000/https://www.example.com/
is equivalent to
http://localhost:3000/render?url=https%3A%2F%2Fwww.example.com%2F&type=html
And profile
param can be set from Kasha-Profile
header, fallback
can be set from Kasha-Fallback
header.
Notice: the hash
of the url won't be sent to server. If you need the hash
to be sent to the server, use the /render
API.
Proxy mode
If host
header of the request is not apiHost
, or X-Forwarded-Host
or Forwarded:...host=...
header is set,
Then the requested URL will be treated as url
query param of /render
API. And type
is set to html
.
For example, the following request
GET /
Host: www.example.com
Kasha-Profile: mobile
Kasha-Fallback: 1
is equivalent to http://localhost:3000/render?url=https%3A%2F%2Fwww.example.com%2F&type=html&profile=mobile&fallback=1
GET /cache?url=URL
Alias of /render?url=ENCODED_URL&noWait
GET /:site/robots.txt
Get robots.txt
file with sitemaps collected by kasha. e.g.:
http://localhost:3000/https://www.example.com/robots.txt
It will fetch the https://www.example.com/robots.txt
file, then append sitemap directives at the end. The result example:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/
Sitemap: https://www.example.com/sitemaps.index.1.xml
Sitemap: https://www.example.com/sitemaps.index.google.1.xml
Sitemap: https://www.example.com/sitemaps.index.google.news.1.xml
Sitemap: https://www.example.com/sitemaps.index.google.image.1.xml
Sitemap: https://www.example.com/sitemaps.index.google.video.1.xml
GET /:site/sitemaps.:page.xml
Get sitemap of page N.
For example:
http://localhost:3000/https://www.example.com/sitemaps.1.xml
GET /:site/sitemaps.google.:page.xml
Get Google sitemap of page N.
GET /:site/sitemaps.google.news.:page.xml
Get Google news sitemap of page N.
GET /:site/sitemaps.google.image.:page.xml
Get Google image sitemap of page N.
GET /:site/sitemaps.google.video.:page.xml
Get Google video sitemap of page N.
GET /:site/sitemaps.index.:page.xml
Get sitemap index file of page N.
GET /:site/sitemaps.index.google.:page.xml
Get Google sitemap index file of page N.
GET /:site/sitemaps.index.google.news.:page.xml
Get Google news sitemap index file of Page N.
GET /:site/sitemaps.index.google.image.:page.xml
Get Google image sitemap index file of Page N.
GET /:site/sitemaps.index.google.video.:page.xml
Get Google video sitemap index file of page N.
Collecting sitemap data
kasha can collect sitemap data from custom Open Graph <meta>
tags. For example:
<head prefix="og: http://ogp.me/ns# sitemap: https://kasha-io.github.io/kasha/ns/sitemap#">
<!--
canonical url is used as <loc> tag of sitemap xml.
<meta property="og:url" content="..."> can be used also.
-->
<link rel="canonical" href="https://www.example.com/test.html">
<meta property="sitemap:changefreq" content="hourly">
<meta property="sitemap:priority" content="1">
<meta property="sitemap:news:publication:name" content="The Example Times">
<meta property="sitemap:news:publication:language" content="en">
<meta property="sitemap:news:publication_date" content="2018-05-25T09:19:54.000Z">
<meta property="sitemap:news:title" content="Page Title">
<meta property="sitemap:image:loc" content="http://examples.opengraphprotocol.us/media/images/train.jpg">
<meta property="sitemap:image:caption" content="The caption of the image.">
<meta property="sitemap:image:geo_location" content="Limerick, Ireland">
</head>
Sitemap data will be collected only if the origin
of the canonical URL is the same as the current page.
See here for available tags: sitemap protocol and Google sitemap extensions
License
The logo is made from Prosymbols's camera icon licensed by Creative Commons BY 3.0.