npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

imdb-data

v1.0.0

Published

A JSON file of 50,000 IMDB movie reviews to be used in machine learning applications.

Downloads

8

Readme

IMDB Data

Build Status

This is the ubiquitous "Large Movie Review Dataset" from Stanford University in json format. A discussion of the dataset can be found here. The dataset is comprised of 50,000 movie reviews from IMDb.

This is a great starter dataset for Tensorflow.js and learning text classification/machine learning!

This version of the dataset differs slightly from the source data. Rather than being pre-split into test and training data, this dataset simply presents all records. The records are evenly split between positive and negative sentiment.

Usage

You can install the dataset using npm. This will download the 63.3 MB data file.

npm i imdb-data

Next, require the data in your project.

const reviews = require("imdb-data");

You can check to see that there are 50,000 reviews.

console.log(reviews.length);
// 50000

Data Structure

Each element in the reviews array contains a text property t and a sentiment property s. we can see this in the first element of the array:

console.log(reviews[0]);
{
    t: 'Once again Mr. Costner has dragged out a movie for far longer than necessary. Aside from the terrific sea rescue sequences, of which there are very few I just did not care about any of the characters. Most of us have ghosts in the closet, and Costner's character are realized early on, and then forgotten until much later, by which time I did not care. The character we should really care about is a very cocky, overconfident Ashton Kutcher. The problem is he comes off as kid whothinks he's better than anyone else around him and shows no signs of a cluttered closet. His only obstacle appears to be winning over Costner. Finally when we are well past the half way point of this stinker, Costner tells us all about Kutcher's ghosts. We are told why Kutcher is driven to be the best with no prior inkling or foreshadowing. No magic here, it was all I could do to keep from turning it off an hour in.',
    s: 0
}

A sentiment of 1 indicates a positive review whereas a sentiment of 0 indicates a negative review.