tasr
v0.0.2
Published
Tool for Analyzing SubReddits
Downloads
3
Readme
tasr: Tool for Analyzing SubReddits
Status: initial development.
Usage
Tasr requires a working installation of node and npm. To install, simply run npm install -g tasr
. Functionality is currently very limited; continue reading for a description of how tasr will work once it's developed further.
Overview
Goal: to create an app that analyzes the content of subreddits (individual forums within reddit.com) based on the content of the posts they contain. Specifically, it should scrape posts from a given time frame and analyze them for their content - whether it be reposts, positive/negative emotional quality, etc. At first, tasr will be a command-line tool, and then will hopefully be integrated into a website so that results can be visualized in real-time, and to allow for more widespread use.
Motivation
One of the subreddits I frequent seems to have become more negative lately. Substantive, positive, helpful posts are few and far between; it's mostly just posts complaining about irrelevant things. But was it always like that? Or has it only recently changed? Longtime users of any subreddit often start to notice that the content of posts change over time and will often voice their concerns as such.
Some subreddits also suffer from reposts - content that gets posted multiple times, usually a few months after the original. The comments section of reposts often contain users lamenting that "there isn't enough original content anymore," or "too many reposts these days," etc. Are these claims justified? Or have those users just been around too long?
Potential solution: a tool that can analyze subreddit content and provide data about trends in post content.
In summary, whenever someone on reddit complains that "x subreddit has become y lately," this app should help people test that hypothesis.
Spec
Tasr will be designed as a command line tool, written in node.js and distributed via npm.
The program should take the following as input:
- subreddit
- time period to analyze
- upvote filter (minimum voting score on a post required to notice it - e.g. you could say "filter out posts with < 100 upvotes")
And as output, it will produce:
- positive/negative post ratio
- average voting score for all posts (or other upvote/downvote metrics)
- reposts/original content ratio (within the given subreddit; analyzing "x-posts" will not be in the scope)
- most common words in post titles
- etc.
Looking into the Reddit API will be beneficial - especially as far as retrieving posts from a subreddit within a given time frame. Sending HTTP requests and parsing the HTML will be very tedious, so hopefully that can be avoided.