logfile-binary-search
v0.0.16
Published
efficiently search and extract date-ranged entries from large log files using binary search
Downloads
16
Maintainers
Readme
logfile-binary-search
Efficiently search and extract date-ranged entries from large log files using binary search, with a built-in Express server for easy API access.
How It Works
The LogfileBinarySearch
class employs a binary search algorithm to quickly locate and extract log entries within a specified date range, even in very large log files. It also includes a built-in Express server for easy API access to the log data. Here's how it works:
Minimal Assumptions: The content of the log entries can vary, as long as each 'chunk' retrieved contains at least one timestamp.
Configurable Timestamp Pattern: The timestamp pattern is defined by the
dateRegex
property, which can be customized to match various date formats. By default, it's set to match ISO 8601 timestamps, but you can modify it to work with any timestamp format.Binary Search: Instead of reading the entire file sequentially, the algorithm performs a binary search to quickly narrow down the relevant portion of the file.
Chunked Reading: The file is read in chunks to minimize memory usage while maintaining performance.
Date Extraction: The class extracts dates from log entries using the specified regular expression, allowing it to work with the timestamp format in your logs.
Range Identification: Once the start and end positions are found, the class extracts all log entries within the specified date range.
Built-in Server: The module includes an Express server that provides an API for accessing and searching the log data.
Installation
npm install logfile-binary-search express
[express is optional, install if you want to use the included server]
Features
- Fast binary search for date ranges in large log files
- Flexible: works with any log format as long as it contains timestamps
- Configurable timestamp pattern via
dateRegex
- Supports both synchronous and asynchronous operations
- Estimates row count and file statistics
- Finds first and last dates in the log file
- Configurable chunk size and maximum results
- Built-in Express server for easy API access to log data
- Efficient tail function for retrieving the most recent log entries
Usage
Basic Usage
const { LogfileBinarySearch } = require('logfile-binary-search');
const filePath = 'path/to/your/logfile.log';
const searcher = new LogfileBinarySearch(filePath);
// Asynchronous usage
async function searchLogs() {
const startDate = new Date('2023-01-01T00:00:00Z');
const endDate = new Date('2023-01-31T23:59:59Z');
const results = await searcher.findDateRange(startDate, endDate);
console.log(results);
}
searchLogs();
// Synchronous usage
const results = searcher.findDateRangeSync(startDate, endDate);
console.log(results);
// **known issue** -- the dates in the search term must be within the range
// covered by the logfile otherwise the search will fail. this means we
// gotta sacrifice the first and last lines in the log file!
//
// const { firstDate, lastDate } = await searcher.findFirstAndLastDates();
// firstValidStartDate = new Date(firstDate.getTime() + 1); // 1 millisecond after
// lastValidEndDate = new Date(lastDate.getTime() - 1); // 1 millisecond before
//the built-in server [see below] automatically applies this restriction
Using the Built-in Server
To start the built-in server:
const { startServer } = require('logfile-binary-search');
const filePath = './path/to/your/logfile.log';
const port = 3000; // optional, defaults to 3000
const maxResults = 9999; // optional
const chunkSize = 2000; // optional
const useCORS = false; // optional, defaults to false
startServer(filePath, port, maxResults, chunkSize, useCORS);
If you set useCORS
to true
, make sure to install the cors
package first:
npm install cors
The cors
package is only required if you enable CORS by setting useCORS
to true
. If CORS is not enabled, the cors
package is not necessary.
This will start an Express server with the following endpoint:
GET /logs
: Query the log file
Query parameters:
startDate
: Start date for the search range (ISO 8601 format)endDate
: End date for the search range (ISO 8601 format)doReset
: If set to any value, it will reset the log date range (useful if the log file has been updated)tail
: Number of lines to return from the end of the file (positive integer)
If no startDate
or endDate
is provided, it returns metadata about the log file.
Example usage:
# Get log file metadata
curl http://localhost:3000/logs
# Search for logs in a date range
curl http://localhost:3000/logs?startDate=2023-01-01T00:00:00Z&endDate=2023-01-31T23:59:59Z
# Reset log date range
curl http://localhost:3000/logs?doReset=true
# Get the last 1000 lines of the log file
curl http://localhost:3000/logs?tail=1000
The tail
parameter provides a quick way to retrieve the most recent log entries without needing to specify a date range. This can be particularly useful for real-time monitoring or debugging. The tail function is implemented efficiently, reading the file from the end, which makes it fast even for very large log files.
Server Function
startServer(filePath, port = 3000, maxResults = 9999, chunkSize = 2000, useCORS = false)
filePath
: Path to the log fileport
: Port number for the server (default: 3000)maxResults
: Limits results returned (default: 9999)chunkSize
: Chunk size used while searching logfile (default: 2000)useCORS
: Enable or disable CORS (default: false)
Starts the built-in Express server to serve the log file data. If useCORS
is set to true
, CORS will be enabled for all routes. Remember to install the cors
package if you enable this feature.
API
Constructor
new LogfileBinarySearch(filePath, maxResults = 9999, chunkSize = 2000)
filePath
: Path to the log filemaxResults
: Maximum number of results to return (default: 9999)chunkSize
: Size of chunks to read from the file (default: 2000 bytes)
Properties
dateRegex
: Regular expression used to match timestamps in log entries. Can be customized to match different timestamp formats.
Methods
Asynchronous Methods
estimateRowCountAsync(nChunksToSample = 100)
: Estimates the number of rows in the filefindFirstAndLastDates()
: Finds the first and last dates in the log filefindDateRange(startDate, endDate)
: Searches for log entries within the specified date range
Synchronous Methods
estimateRowCountSync(nChunksToSample = 100)
: Synchronous version ofestimateRowCountAsync
findFirstAndLastDatesSync()
: Synchronous version offindFirstAndLastDates
findDateRangeSync(startDate, endDate)
: Synchronous version offindDateRange
Server Function
[requires express to be installed]
startServer(filePath, port = 3000, maxResults?, chunkSize?)
filePath
: Path to the log fileport
: Port number for the server, default 3000maxResults
: limits results returned, default 9999chunkSize
: chunk size used while searching logfile, default 2000
Starts the built-in Express server to serve the log file data.
License
MIT