dalia

v0.0.3-rc3

Published

8 months ago

SEO Tool for SPA and not only

Downloads

0High
0Medium
0Low

templ-project

spa seo

Dalia (SPA) SEO Tool

Tool for masive html analysis, usefull for SEO and OPA (One Page Application) indexing.

Overview

Dalia is a flexible library that uses PhantomJS to index webpages served from your site. A page is only saved when a specified selector is detected visible in the output html. This tool is useful when your site is largely ajax content, or an SPA, and you want your dynamic content indexed by search engines.

Dalia is basically a wrapper over PhantomJS, giving the user the possibility to extract information from the phantom call exactly as they need.

Getting Started

Installation

The simplest way to install nodejs-dalia is to use npm, just npm install html-snapshots will download nodejs-dalia and all dependencies.

Classes Documentation

org.itmc.dalia.Logger

Logger class is derived from debug-logger package, only instantiating it:

const logger = require('dalia').Logger.getInstance();

Please be aware, in order to see logs, you need to run export DEBUG="dalia:*" before running your own script. A more complex approach would be setting the level you wish to see, as following

export DEBUG="dalia:level"

where level is the logger directive you wish to monitor. To monitor all, use *.

org.itmc.dalia.Phantom

Phantom is a simple instantiator for phantomjs, in order to catch the result of phantomjs and work with it further:

const phantom = require('dalia').Phantom.getInstance();
phantom.run('http://html5rocks.com')
    .then((data) => { console.log(data) });

Run options

selector : 'body' => The selector for which PhantomJs should wait when page is loaded.
timeout : 20000 => Number of seconds after which PhantomJs request is considered as expired.
checkInterval: 200 => Check interval for the selector.
detector : Function() => The default detector is a function returning the page's document object (return document). However, this parameter can also be a string path to a file which includes a different detector function.

// default detector
//...
  detector: function(options) {
    if (document.querySelectorAll(options.selector).length > 0) {
      if (options.callback && options.callback.onDetect) {
        return document;
      }
      return true;
    }
    return false;
  }
//...

// detector returning all the urls within the page as well
// ...
module.exports = function(options) {
  if (document.querySelectorAll(options.selector).length > 0) {
    var alist = document.querySelectorAll('a'), hlist = [];
    Array.prototype.forEach.call(alist.length ? alist : [], function(a) {
      hlist.push(a.href);
    });
    return hlist;;
  }
  return false;
}
// ...

org.itmc.dalia.Dalia

For version 0.1.0, Dalia would only serve as an URL indexer. This class was born out of need to index our applications' urls, in order to either create page snapshots, or create sitemap xml.

Dalia's events:

TODO: Events are not documented. Please check code for this matter.

Usage Examples

Creating Sitemap from Indexed Urls

Using (sitemap)[https://www.npmjs.com/package/sitemap], you can create your own sitemap for the website.


const Dalia = require('dalia').Dalia;
const sitemap = require('html-snapshots');

const options = {
    maxDepth: 2,
    selectors: {
        __default: 'body'
    }
};

Dalia.getInstance()
    .indexUrls('http://html5rocks.com', options)
    .then((urls) => {
        let sitemap = sm.createSitemap ({
            hostname: 'http://html5rocks.com',
            cacheTime: 600000,
            urls: urls.map(url => { url: url,  changefreq: 'daily', priority: 0.3 })
        });
        require('fs').writeFileSync('/path/to/sitemap.xml', sitemap.toString());
    });

Creating Snapshots from Indexed Urls

Using (html-snapshots)[https://github.com/localnerve/html-snapshots], you can also create snapshots of the entire website.


const Dalia = require('dalia').Dalia;
const sitemap = require('html-snapshots');

const options = {
    maxDepth: 2,
    selectors: {
        __default: 'body'
    }
};

Dalia.getInstance()
    .indexUrls('http://html5rocks.com', options)
    .then((urls) => {
        var result = htmlSnapshots.run({
            input: 'array',
            source: urls,
            outputDir: './snapshots',
            outputDirClean: true,  
            selector: options.selectors
        });
    });

Creating Custom Snapshots from Indexed Urls.

For Applications built in frameworks like Aurelia or Angular, using (html-snapshots)[https://github.com/localnerve/html-snapshots], you can also create snapshots of the entire website, and use them along with your website for correct bot indexing.

Ofcourse, you can always use the version above and .htaccess to do the same thing. This is only for excercise purpose.


const Dalia = require('dalia').Dalia;
const sitemap = require('html-snapshots');

const options = {
    maxDepth: 2,
    selectors: {
        __default: 'body'
    }
};

Dalia.getInstance()
    .indexUrls('http://html5rocks.com', options)
    .then((urls) => {
        var result = htmlSnapshots.run({
            input: 'array',
            source: urls,
            outputDir: './snapshots',
            outputDirClean: true,  
            selector: options.selectors,            
            snapshotScript: {
                script: "customFilter",
                module: path.join(__dirname, "myFilter.js")
            },
        });
    });

myFilter.js

Please note, this example is Aurelia only.

module.exports = function(content) {
  var filterVersion = "1.0-20141123";

  return content
    .replace('</body>', `<script src="jspm_packages/system.js"></script>
    <script src="config.js"></script>
    <script>
    System.import('aurelia-bootstrapper');
    </script>
</body>`)
    ;
};

Testing

Download and install Aurelia Skeleton esnext version.

git clone https://github.com/aurelia/skeleton-navigation.git
cd skeleton-navigation/skeleton-esnext
npm install
jspm install
gulp watch

Than run mocha in Dalia project folder.

npm install mocha -g # only if not installed

cd nodejs-dalia
mocha # we recommend: clean; gulp build && mocha

NOTE: Events are not tested.

Documentation

To generate documentation, please run (within the project root folder):

npm install -g esdoc    # globally install esdoc
npm install             # install project packages (esdoc depends on packages (for now))
esdoc -c esdoc.json     # run esdoc