npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

logfile-binary-search

v0.0.16

Published

efficiently search and extract date-ranged entries from large log files using binary search

Downloads

476

Readme

logfile-binary-search

Efficiently search and extract date-ranged entries from large log files using binary search, with a built-in Express server for easy API access.

How It Works

The LogfileBinarySearch class employs a binary search algorithm to quickly locate and extract log entries within a specified date range, even in very large log files. It also includes a built-in Express server for easy API access to the log data. Here's how it works:

  1. Minimal Assumptions: The content of the log entries can vary, as long as each 'chunk' retrieved contains at least one timestamp.

  2. Configurable Timestamp Pattern: The timestamp pattern is defined by the dateRegex property, which can be customized to match various date formats. By default, it's set to match ISO 8601 timestamps, but you can modify it to work with any timestamp format.

  3. Binary Search: Instead of reading the entire file sequentially, the algorithm performs a binary search to quickly narrow down the relevant portion of the file.

  4. Chunked Reading: The file is read in chunks to minimize memory usage while maintaining performance.

  5. Date Extraction: The class extracts dates from log entries using the specified regular expression, allowing it to work with the timestamp format in your logs.

  6. Range Identification: Once the start and end positions are found, the class extracts all log entries within the specified date range.

  7. Built-in Server: The module includes an Express server that provides an API for accessing and searching the log data.

Installation

npm install logfile-binary-search express

[express is optional, install if you want to use the included server]

Features

  • Fast binary search for date ranges in large log files
  • Flexible: works with any log format as long as it contains timestamps
  • Configurable timestamp pattern via dateRegex
  • Supports both synchronous and asynchronous operations
  • Estimates row count and file statistics
  • Finds first and last dates in the log file
  • Configurable chunk size and maximum results
  • Built-in Express server for easy API access to log data
  • Efficient tail function for retrieving the most recent log entries

Usage

Basic Usage

const { LogfileBinarySearch } = require('logfile-binary-search');

const filePath = 'path/to/your/logfile.log';
const searcher = new LogfileBinarySearch(filePath);

// Asynchronous usage
async function searchLogs() {
  const startDate = new Date('2023-01-01T00:00:00Z');
  const endDate = new Date('2023-01-31T23:59:59Z');
  
  const results = await searcher.findDateRange(startDate, endDate);
  console.log(results);
}

searchLogs();

// Synchronous usage
const results = searcher.findDateRangeSync(startDate, endDate);
console.log(results);


// **known issue** -- the dates in the search term must be within the range
// covered by the logfile otherwise the search will fail. this means we
// gotta sacrifice the first and last lines in the log file!
//
// const { firstDate, lastDate } = await searcher.findFirstAndLastDates();
// firstValidStartDate = new Date(firstDate.getTime() + 1); // 1 millisecond after
// lastValidEndDate = new Date(lastDate.getTime() - 1);   // 1 millisecond before

//the built-in server [see below] automatically applies this restriction

Using the Built-in Server

To start the built-in server:

const { startServer } = require('logfile-binary-search');

const filePath = './path/to/your/logfile.log';
const port = 3000; // optional, defaults to 3000
const maxResults = 9999; // optional
const chunkSize = 2000; // optional
const useCORS = false; // optional, defaults to false

startServer(filePath, port, maxResults, chunkSize, useCORS);

If you set useCORS to true, make sure to install the cors package first:

npm install cors

The cors package is only required if you enable CORS by setting useCORS to true. If CORS is not enabled, the cors package is not necessary.

This will start an Express server with the following endpoint:

  • GET /logs: Query the log file

Query parameters:

  • startDate: Start date for the search range (ISO 8601 format)
  • endDate: End date for the search range (ISO 8601 format)
  • doReset: If set to any value, it will reset the log date range (useful if the log file has been updated)
  • tail: Number of lines to return from the end of the file (positive integer)

If no startDate or endDate is provided, it returns metadata about the log file.

Example usage:

# Get log file metadata
curl http://localhost:3000/logs

# Search for logs in a date range
curl http://localhost:3000/logs?startDate=2023-01-01T00:00:00Z&endDate=2023-01-31T23:59:59Z

# Reset log date range
curl http://localhost:3000/logs?doReset=true

# Get the last 1000 lines of the log file
curl http://localhost:3000/logs?tail=1000

The tail parameter provides a quick way to retrieve the most recent log entries without needing to specify a date range. This can be particularly useful for real-time monitoring or debugging. The tail function is implemented efficiently, reading the file from the end, which makes it fast even for very large log files.

Server Function

startServer(filePath, port = 3000, maxResults = 9999, chunkSize = 2000, useCORS = false)
  • filePath: Path to the log file
  • port: Port number for the server (default: 3000)
  • maxResults: Limits results returned (default: 9999)
  • chunkSize: Chunk size used while searching logfile (default: 2000)
  • useCORS: Enable or disable CORS (default: false)

Starts the built-in Express server to serve the log file data. If useCORS is set to true, CORS will be enabled for all routes. Remember to install the cors package if you enable this feature.

API

Constructor

new LogfileBinarySearch(filePath, maxResults = 9999, chunkSize = 2000)
  • filePath: Path to the log file
  • maxResults: Maximum number of results to return (default: 9999)
  • chunkSize: Size of chunks to read from the file (default: 2000 bytes)

Properties

  • dateRegex: Regular expression used to match timestamps in log entries. Can be customized to match different timestamp formats.

Methods

Asynchronous Methods

  • estimateRowCountAsync(nChunksToSample = 100): Estimates the number of rows in the file
  • findFirstAndLastDates(): Finds the first and last dates in the log file
  • findDateRange(startDate, endDate): Searches for log entries within the specified date range

Synchronous Methods

  • estimateRowCountSync(nChunksToSample = 100): Synchronous version of estimateRowCountAsync
  • findFirstAndLastDatesSync(): Synchronous version of findFirstAndLastDates
  • findDateRangeSync(startDate, endDate): Synchronous version of findDateRange

Server Function

[requires express to be installed]

startServer(filePath, port = 3000, maxResults?, chunkSize?)
  • filePath: Path to the log file
  • port: Port number for the server, default 3000
  • maxResults: limits results returned, default 9999
  • chunkSize: chunk size used while searching logfile, default 2000

Starts the built-in Express server to serve the log file data.

License

MIT