dalia
v0.0.3-rc3
Published
SEO Tool for SPA and not only
Downloads
2
Readme
Dalia (SPA) SEO Tool
Tool for masive html analysis, usefull for SEO and OPA (One Page Application) indexing.
Overview
Dalia is a flexible library that uses PhantomJS to index webpages served from your site. A page is only saved when a specified selector is detected visible in the output html. This tool is useful when your site is largely ajax content, or an SPA, and you want your dynamic content indexed by search engines.
Dalia is basically a wrapper over PhantomJS, giving the user the possibility to extract information from the phantom call exactly as they need.
Getting Started
Installation
The simplest way to install nodejs-dalia is to use npm, just npm install html-snapshots
will download
nodejs-dalia and all dependencies.
Classes Documentation
org.itmc.dalia.Logger
Logger class is derived from debug-logger package, only instantiating it:
const logger = require('dalia').Logger.getInstance();
Please be aware, in order to see logs, you need to run export DEBUG="dalia:*"
before running your own script. A more
complex approach would be setting the level you wish to see, as following
export DEBUG="dalia:level"
where level
is the logger directive you wish to monitor. To monitor all, use *
.
org.itmc.dalia.Phantom
Phantom is a simple instantiator for phantomjs, in order to catch the result of phantomjs and work with it further:
const phantom = require('dalia').Phantom.getInstance();
phantom.run('http://html5rocks.com')
.then((data) => { console.log(data) });
Run options
- selector : 'body' => The selector for which PhantomJs should wait when page is loaded.
- timeout : 20000 => Number of seconds after which PhantomJs request is considered as expired.
- checkInterval: 200 => Check interval for the selector.
- detector : Function() => The default detector is a function returning the page's document object (
return document
). However, this parameter can also be a string path to a file which includes a different detector function.
// default detector
//...
detector: function(options) {
if (document.querySelectorAll(options.selector).length > 0) {
if (options.callback && options.callback.onDetect) {
return document;
}
return true;
}
return false;
}
//...
// detector returning all the urls within the page as well
// ...
module.exports = function(options) {
if (document.querySelectorAll(options.selector).length > 0) {
var alist = document.querySelectorAll('a'), hlist = [];
Array.prototype.forEach.call(alist.length ? alist : [], function(a) {
hlist.push(a.href);
});
return hlist;;
}
return false;
}
// ...
org.itmc.dalia.Dalia
For version 0.1.0, Dalia would only serve as an URL indexer. This class was born out of need to index our applications' urls, in order to either create page snapshots, or create sitemap xml.
Dalia's events:
TODO: Events are not documented. Please check code for this matter.
Usage Examples
Creating Sitemap from Indexed Urls
Using (sitemap)[https://www.npmjs.com/package/sitemap], you can create your own sitemap for the website.
const Dalia = require('dalia').Dalia;
const sitemap = require('html-snapshots');
const options = {
maxDepth: 2,
selectors: {
__default: 'body'
}
};
Dalia.getInstance()
.indexUrls('http://html5rocks.com', options)
.then((urls) => {
let sitemap = sm.createSitemap ({
hostname: 'http://html5rocks.com',
cacheTime: 600000,
urls: urls.map(url => { url: url, changefreq: 'daily', priority: 0.3 })
});
require('fs').writeFileSync('/path/to/sitemap.xml', sitemap.toString());
});
Creating Snapshots from Indexed Urls
Using (html-snapshots)[https://github.com/localnerve/html-snapshots], you can also create snapshots of the entire website.
const Dalia = require('dalia').Dalia;
const sitemap = require('html-snapshots');
const options = {
maxDepth: 2,
selectors: {
__default: 'body'
}
};
Dalia.getInstance()
.indexUrls('http://html5rocks.com', options)
.then((urls) => {
var result = htmlSnapshots.run({
input: 'array',
source: urls,
outputDir: './snapshots',
outputDirClean: true,
selector: options.selectors
});
});
Creating Custom Snapshots from Indexed Urls.
For Applications built in frameworks like Aurelia or Angular, using (html-snapshots)[https://github.com/localnerve/html-snapshots], you can also create snapshots of the entire website, and use them along with your website for correct bot indexing.
Ofcourse, you can always use the version above and .htaccess
to do the same thing. This is only for excercise purpose.
const Dalia = require('dalia').Dalia;
const sitemap = require('html-snapshots');
const options = {
maxDepth: 2,
selectors: {
__default: 'body'
}
};
Dalia.getInstance()
.indexUrls('http://html5rocks.com', options)
.then((urls) => {
var result = htmlSnapshots.run({
input: 'array',
source: urls,
outputDir: './snapshots',
outputDirClean: true,
selector: options.selectors,
snapshotScript: {
script: "customFilter",
module: path.join(__dirname, "myFilter.js")
},
});
});
myFilter.js
Please note, this example is Aurelia only.
module.exports = function(content) {
var filterVersion = "1.0-20141123";
return content
.replace('</body>', `<script src="jspm_packages/system.js"></script>
<script src="config.js"></script>
<script>
System.import('aurelia-bootstrapper');
</script>
</body>`)
;
};
Testing
Download and install Aurelia Skeleton esnext
version.
git clone https://github.com/aurelia/skeleton-navigation.git
cd skeleton-navigation/skeleton-esnext
npm install
jspm install
gulp watch
Than run mocha
in Dalia project folder.
npm install mocha -g # only if not installed
cd nodejs-dalia
mocha # we recommend: clean; gulp build && mocha
NOTE: Events are not tested.
Documentation
To generate documentation, please run (within the project root folder):
npm install -g esdoc # globally install esdoc
npm install # install project packages (esdoc depends on packages (for now))
esdoc -c esdoc.json # run esdoc