npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

filesqueeze

v1.0.1

Published

A file compression tool that uses Huffman coding

Downloads

56,847

Readme

FileSqueeze - Huffman Compression Tool

This project implements a custom Huffman Compression Algorithm, designed to compress text-based files such as .txt, .json, .docx, and .pdf. It uses the Huffman coding technique to reduce the size of the text by encoding characters based on their frequency of occurrence in the source data.

Features

  • Text Compression: Compresses text-based file formats by analyzing the frequency of characters and encoding them using the Huffman coding algorithm.
  • Binary Output: Generates compressed files in a binary format.
  • Metadata: Saves metadata related to the compression process, including the Huffman tree structure, to allow for decompression.
  • Compression Metrics: Reports on the compression ratio and the original and compressed file sizes.

Supported Formats

  • .txt
  • .json
  • .docx
  • .pdf (with consideration that only the text is compressed; embedded images are not compressed)

Note: The algorithm is designed for text-based formats. When handling PDFs containing images, only the text portion will be compressed.

Setup Instructions

Prerequisites

Before running the project, ensure you have the following dependencies installed:

  • Node.js (v16 or higher)
  • npm or yarn for managing packages

Installing

  1. Clone this repository:

    git clone https://github.com/HUMBLEF0OL/file-squeeze.git
  2. Navigate to the project directory:

    cd file-squeeze
  3. Install the required dependencies:

    npm install

Usage

Command-line Tool

  1. Compress a file:

    Use the filesqueeze command with the compress option to compress a file.

    filesqueeze compress <inputFile> [--output <outputDir>]
    • <inputFile>: The file to be compressed (e.g., sample.txt).
    • [--output <outputDir>]: The directory to store the compressed files (defaults to ./output).
  2. Decompress a file:

    To decompress a previously compressed file, use the decompress command.

    filesqueeze decompress <inputDir> [--output <outputDir>]
    • <inputDir>: The directory containing the compressed file (encoded.bin and metaData.bin).
    • [--output <outputDir>]: The directory to store the decompressed files (defaults to ./output).

Report Generation

The project generates a compression report for each file processed. The report includes:

  • Original File Size: Size of the file before compression.
  • Compressed File Size: Size of the file after compression.
  • Compression Ratio: The ratio of the original file size to the compressed file size.
  • Time Taken: Time spent to process and compress the file.

You can view the results in the console after the compression completes.

Compression Algorithm Overview

1. Frequency Analysis

  • The algorithm starts by analyzing the frequency of each character in the input file.

2. Priority Queue

  • A priority queue (min-heap) is built using the frequency data. This queue ensures that the least frequent characters are processed first.

3. Huffman Tree Construction

  • The Huffman tree is built by combining nodes based on their frequencies. The two nodes with the least frequency are merged into a parent node, and this process is repeated until only one node (the root) remains.

4. Code Generation

  • Once the tree is built, binary codes are assigned to each character based on its position in the tree. Characters closer to the root get shorter codes, ensuring optimal compression.

5. Serialization

  • The Huffman tree is serialized and saved in binary format for use in decompression.

6. Compression and Saving

  • The input text is encoded using the generated Huffman codes. Both the compressed data and metadata (Huffman tree) are saved into files.

7. Decompression

  • The decompression process reads the serialized Huffman tree and decodes the compressed data back into its original form.

Example

Sample Input (Text File)

hello world

Compressed Output (Encoded File)

  • The file will be compressed into a binary file (encoded.bin), and metadata will be saved in a separate file (metaData.bin).

Metrics Example

File 1: example.txt

  • Original File Size: 90 KB
  • Compressed File Size: 48 KB
  • Compression Ratio: 1.875 (compressed size / original size)

Contributing

If you'd like to contribute to this project, feel free to open a pull request. For bug reports or suggestions, please create an issue in the GitHub repository.

License

This project is licensed under the MIT License.

Acknowledgments

  • The core compression algorithm is based on the Huffman coding technique. You can read more about it here: Huffman coding - Wikipedia.
  • Special thanks to libraries like pdf-lib and pdf-parse for PDF text extraction and manipulation.