npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

deepsalter

v7.3.0

Published

Trust should be earned. Let's do something about it.

Downloads

125

Readme

JavaScript Style Guide

About Deepsalter

What it does

Deepsalter looks at any new reddit submission, finds out if it names one or more journos who have a page on Deepfreeze.it and posts a reply containing links to their Deepfreeze page.

How it does it

Getting new sumbissions

The bot simply polls reddit frequently until it finds any number of new posts and analyzes them.

Processing link posts

If a post is a link with no body, Deepsalter scrapes the linked webpage looking for the author and the body of the article, throwing away anything else (sidebars and the like). This functionality relies partially on Mercury, which takes care of extracting the title and body of the article. To guess the author, a function was adapted from unfluff. Since unfluff can't possibly identify the author correctly on every webpage, especially when the website is 100% dogshit, Deepsalter also relies on a number of special rules defined in data/matchers.json. Those precise rules are tried first and if they fail Deepsalter resorts to guessing.

To scrape a webpage successfully, it's important to get its raw, unmodified HTML code. When a page has been archived on archive.is or the Wayback Machine, Deepsalter discards it and gets the live page instead. Multiple archives are supported - Deepsalter peels them away like an onion until it gets to the original page.

If the webpage doesn't exist anymore, the scrape fails; Deepsalter pretends that the page is empty and carries on.

Processing self posts

If a reddit post is instead a self post, Deepsalter scrapes anything linked in the self post body. The scraping mechanism is the same.

And finally!!

Once everything has been collected, Deepsalter matches the list of journos that have a page on Deepfreeze.it against the post title, its body (if it's a self post) and any of the scraped links. This may result in a list of journos who are named in any of those resources or who are the authors of the linked articles.

The list of journos is used to generate a comment that is then posted as a reply to the submissiion.

Other tech stuff

Deepsalter doesn't write anything to disk and doesn't use any database, relying on reddit's own "save" function instead. Its internal state can be safely thrown away whenever it's done saving and sending replies, making it highly resistant against reboots and failures of all kinds.

While several reddit API wrappers are available, I eventually decided not to use any of them, opting for the smallest and simplest implementation I could write - just a handful of functions.

Deepsalter automatically adjusts the timing of its requests in order to be as responsive as possible without going over reddit's API usage budget of 60 requests per minute.

This project has been feature-complete for a while now. Open an issue or contact whoever is operating it to request corrections or additions.

USAGE

Deepsalter no longer accepts commandline arguments. Run it with yarn start.

CONFIGURATION

As of September 2017, in order to reduce bloat Deepsalter no longer supports logging to file directly. Use pm2 to capture logs - or whatever is supported by the cloud service you're using. All Deepsalter does is write to stderr/stdout.

As of June 2018, Deepsalter no longer supports reading a JSON configuration file from an arbitrary file. Either set the environment variables described below or put a file called .env containing key=value pairs in its source folder as explained in dotenv.

Deepsalter understands the following environment variables. Names are case-sensitive.

deepfreeze_endpoint: Deepfreeze API endpoint. Deepsalter will GET a json document from that address. deepfreeze_journoPageBaseURL: URL fragment to prepend before the ulrencoded journo name. The resulting URL will be the link to the journo page. deepfreeze_TTL: In hours, how long before the Deepfreeze database is re-fetched.

reddit_clientId: Reddit authentication details reddit_clientSecret: Reddit authentication details reddit_username: Reddit authentication details reddit_password: Reddit authentication details reddit_tokenExpiry: In minutes, how long before the auth token is refreshed. Defaults to 55.

reddit_authUrl: Reddit auth endpoint, defaults to https://www.reddit.com/api/v1/access_token reddit_apiBaseUrl: Reddit OAuth API endpoint, defaults to https://oauth.reddit.com/

reddit_subreddits: Comma-separated list of subreddits Deepsalter will watch. Defaults to an empty list. reddit_limit: How many posts Deepsalter should fetch from each subreddit's new feed at the start of every cycle of the main loop. Defaults to 20.

reddit_userAgent: User agent string Deepsalter sends to reddit with each request. Defaults to Node/${process.version} Deepsalter/v${package_json.version}

reddit_delay: In milliseconds, how long Deepsalter should wait between requests even if the budget isn't exausted. Defaults to 100 reddit_concurrency: How many requests Deepsalter should send concurrently. Defaults to 2, is currently capped to 1 by architectural constraints that should be lifted before the final release of v5.0.0.

reddit_budget: How many points Deepsalter can spend during the period set in reddit_budgetDuration. Defaults to 60 points. Every request sent to reddit burns one point. reddit_budgetDuration: In milliseconds, how long the budget lasts before it's reset to the value specified in reddit_budget. Defaults to 61000.

reddit_signature: Text that should be appeneded to every comment Deepsalter generates. Default to a simple cautionary message that also links here.

scraper_userAgent: User agent string Deepsalter sends to websites when it downloads a webpage. scraper_maxDownloadSize: In bytes, maximum size of a webpage. The download of anything larger is interrupted and the result is nulled.

scraper_delay: In milliseconds, how long to wait before moving on to the next webpage. Defaults to 100. scraper_concurrency: How many websites Deepsalter should scrape concurrently. Currently capped to 1 by design because scraping burns too much memory and cloud hosts would terminate the process far too often if it scraped more than one webpage at a time.

scraper_budget: You can set a budget and budgetDuration for scraping, if you want. Defaults to Infinity. scraper_budgetDuration: Defaults to Infinity.

RUNNING THE BOT

Run it:

cd source/directory

yarn

yarn start

You can either use screen or pm2 to keep it running or write a system service script. It does run just fine on Windows.