npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

larry-crawler

v0.0.1

Published

A simple yet flexible Twitter Crawler for Kayako Twitter Challenge

Downloads

7

Readme

larry-crawler

Build Status

Kayako Twitter challenge

Installation

npm install --save larry-crawler

Usage

Navigate to the node_modules directory which contains larry-crawler.

cd larry-crawler/usage
node get-tweets.js

Test

npm test

Output

The application fetches tweets in batches of 100. Unless forcefully killed (CTRL+C), the app will keep running until all tweets matching the defined criteria have been fetched. See result.

NOTE: A batch might produce less than 100 tweets in output if you've applied a secondary filter (like retweetCounts). If 100 tweets were retrieved based on specified HashTag and 30 of them haven't been retweeted, then only 70 tweets are supplied in the response.statuses Array.

Module API

To access the class larry-crawler exposes for crawling twitter:

const {TwitterCrawler} = require ('./larry-crawler');

Get your app or user credentials from https://dev.twitter.com/, then create a new object like:

const crawler = new TwitterCrawler ({

	consumerKey: process.env.TWITTER_CONSUMER_KEY,
	consumerSecret: process.env.TWITTER_CONSUMER_SECRET,
	accessTokenKey: process.env.TWITTER_ACCESS_TOKEN_KEY,
	accessTokenSecret: process.env.TWITTER_ACCESS_TOKEN_SECRET

});

If you have a twitter app, use bearerToken instead of accessTokenKey & accessTokenSecret.

The new object exposes method getTweets() to fetch tweets based on criteria and returns a Promise.

const criteria = { hashtags: ['custserv'], retweetCount: {$gt: 0} };

crawler.getTweets (criteria).then ((response) => {
  console.log (JSON.stringify (response, null, 2));
}).catch (() => {});

To set the max_id parameter for pagination,

criteria.maxIdString = status.id_str

where status is an item in the response.statuses Array.

See get-tweets.js for a full example.

Technical Details

The module has only 1 dependancy - twitter.

  1. Searching based on Hashtags is simple since Twitter API has in-built support for that. But in order to further refine tweets based on number of retweets, the module contains a class SecondaryFilterForTweets.

See Working with search API

  1. Since a maximum of 100 tweeets are sent per request, an effective pagination strategy had to be implemented using the max_id parameter so we can retrieve ALL the tweets since the very beginning. This strategy was followed to achieve pagination.

  2. The primary challenge was to deal with the 64-bit integer ID provided by the Twitter API. JS can only provide precision upto 53 bits. Hence, the application uses id_str field at all times and a special decrement function has been written in usage/utils.js to operate on the string ID.

See Working with 64-bit id in Twitter