npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

spa-crawler

v2.0.2

Published

Crawl 100% JS single page apps with phantomjs and node.

Downloads

34

Readme

spa-crawler

Crawl 100% JS single page apps with phantomjs and node.

NPM Build Status Greenkeeper badge

Install

npm install spa-crawler --save

Why?

Single page apps are great (at a lot of things), but not so great at others. One of those not-so-great things is that they aren't easily crawlable. This is a module that uses phantomjs and node to crawl single page apps.

Note: if you find that you really badly need to crawl your single page app, a single page app might not be the best solution to whatever problem you are trying to solve.

That being said, I think the fact that this is possible is just really cool and that's why I built it.

Usage

Here's an example of how you'd crawl a local single page app. You can check out the sample directory for an example that uses this in conjunction with moonboots-express which is a module that streamlines single page app development in Express. You can run this example with npm start.

var Crawler = require('spa-crawler')

var crawler = new Crawler({
  rndr: {
    // The single page app should emit this event
    // when it is done rendering each page
    readyEvent: 'rendered'
  },
  // The initial url of the single page app
  app: 'http://localhost:3000'
})

// Start out crawler when your app is ready and listen for urls
crawler.start().crawler
  // Log each url
  .on('spaurl', console.log.bind(console))
  // When the crawler is done, kill the process
  .on('complete', () => process.exit(0))

The above code will output:

$ npm start

http://localhost:3000/
http://localhost:3000/page1
http://localhost:3000/page3
http://localhost:3000/page2

The single page app in the example above is in sample/client-app. Check out the code or run npm run start:client and go to http://localhost:3000 to see what the rendered HTML looks like. Also check out the source to see that it's just a <script> tag.

API

Options

  • app (required): This is the url of the initial page of the single page app that you wish to crawl.
  • rndr (default {}): This object is passed directly to rndr-me. You can use all the options that are available in its documentation. Note: there is a default port 8001 and a default readyEvent load that will be set on the rndr server.
  • crawler (default: {}): This object is passed directly to simplecrawler. You can use all the options that are available in its documentation.

rndr-me

spa-crawler utilizes rndr-me, which has a very apt description "an HTTP server that uses PhantomJS to render HTML".

One caveat to using it this way, is that you will almost always want to use the readyEvent option. See the api for specific instructions on how to do that.

This is because most single page apps will not be ready when the window.load event fires (which is what rndr-me listens to by default). In my tests even the most basic use of Backbone + writing to the DOM once had race conditions where it wouldn't always be ready.

Events

Each instance of spa-crawler will have a crawler property. This property will emit all the same events as simplecrawler. There is also one additional event:

  • spaurl (url): Fired for each unique url found within the single page app.

Methods

  • start: Starts the rndr-me server and the crawler.
  • close: Kills the rndr-me server.

Test

Run npm test.

Sample

Run npm start to see the sample crawler run. Or run npm run start:client to examine the sample single page app at http://localhost:3000.

#License

MIT