npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

epub-wordcount

v2.1.2

Published

Count the number of words in an ebook

Downloads

129

Readme

epub-wordcount

Given an epub file, do our best to count the number of words in it.

Installation

This package is available on npm as epub-wordcount. It can be installed using common JS tools:

npm i -g epub-wordcount

Basic Usage

On the CLI:

% word-count path/to/book.epub

The Strange Case of Dr. Jekyll and Mr. Hyde
-------------------------------------------
  * 26,341 words

In code:

// TS:
import { countWords } from 'epub-wordcount'

// JS:
// const { countWords } = require('epub-wordcount')

countWords('./books/some-book.epub').then((numWords) => {
  console.log(`There are ${numWords} words`)
})
// There are 106190 words

CLI

There's also a cli tool to quickly get the count of any epub file! Invoke it via:

word-count path/to/file.epub

or

word-count directory/of/books

See word-count -h for more info

Options

  • -c, --chars - Print the character count instead of the world count
  • -r, --raw - Instead of printing the nice title, just print out a numeral
  • -t, --text - Print out the whole text of the book. Great for passing into other unix functions, like wc.
  • --ignore-drm - If the function is saying your file has DRM when you know it doesn't, you can pass this flag to force the CLI to ignore the DRM warning. Might cause weird results if the actually does have DRM.

Code API

There are a number of functions exported from this package. Each one takes either a path to a file or an already-parsed file. Mostly you'll use the path, but if the epub you're parsing is in a non-standard format, then you might use that function to ensure the file parses correctly. See here for the options available.

  • countWords(pathOrEpub, ignoreDrm?) => Promise<number>
  • countCharacters(pathOrEpub, ignoreDrm?) => Promise<number>
  • getText(pathOrEpub, ignoreDrm?) => Promise<string[]>

Each of the above can be passed the result of the following:

  • parseEpubAtPath(path, ignoreDrm?) => Promise<EPub>

Limitations

There's no programmatic representation for the table of contents in epub and it's hard to skip over the reviews, copyright, etc. An effort is made to only parse the actual story text, but there's a margin of error. Probably no more than ~500 words.

Pull requests welcome.

Test Books

Unit tests are run on the following e-books:

  • Stevenson's THE STRANGE CASE OF DR. JEKYLL AND MR. HYDE, from the public domain, provided by Standard Ebooks
  • A copy of The Martian, by Andy Weir. This is my copy, purchased from Apple Books. It's DRM encumbered, so releasing it publicly should not be seen as copyright infringement.

Fake ePubs

In modern versions of macOS, dragging a book out of the Books app won't give you an actual epub- it'll give you a folder with the .epub extension. Unsurprisingly, this doesn't play well with ePub tooling.

To fix, run the following command (pulled from here) fixes them for me:

# from inside the rogue directory
zip -X -r ../fixed.epub mimetype *