saw-the-table
v0.0.5
Published
Extract tables from spreadsheets
Downloads
291
Readme
saw-the-table 🪚 👁️
First it saw the table, then it sawed the tables! A tool that first "sees" tables in your spreadsheets, then "cuts" them out into separate CSVs.
What it does
Given any spreadsheet:
- 👁️ SEES tables using density and layout analysis
- 🪚 SAWS them into clean, separate CSV files
No more manual copying and pasting of table regions!
Algorithm
Uses a density-based scan to detect regions of data, considering cells as connected if they're close enough and the region is dense enough. Tables are detected by analyzing gaps and density patterns in the data, similar to how humans visually identify tables.
Features
- Automatic Table Detection: Finds tables in spreadsheets using density and layout analysis
- Multiple File Types: Supports
.xlsx
,.xls
,.csv
, and.tsv
files - Batch Processing: Process single files or entire directories
- Multi-sheet Support: Handles multiple sheets in Excel files
- Clean Output: Each detected table is saved as a separate CSV file
- Developer Friendly: Can be used as a CLI tool or as a library in your code
Installation
# Using npm
npm install saw-the-table
# Using yarn
yarn add saw-the-table
# Using bun
bun add saw-the-table
Usage
As a CLI Tool
# Process a single file
saw-the-table -i ./path/to/spreadsheet.xlsx -o ./output/dir
# Process a directory (recursively finds all compatible files)
saw-the-table -i ./path/to/spreadsheets -o ./output/dir
# With verbose output
saw-the-table -i ./input -o ./output -v
As a Library
import { extractTablesFromXLSX, extractTablesFromCSV } from 'saw-the-table';
// For Excel files
const buffer = await fs.readFile('spreadsheet.xlsx');
const tables = await extractTablesFromXLSX(buffer);
// For CSV/TSV files
const content = await fs.readFile('data.csv', 'utf-8');
const tables = await extractTablesFromCSV(content);
Output Format
For each input file, you get:
- One CSV file per detected table
- Files are named using the pattern:
{original_name}-{sheet_name}-table{n}.csv
Example:
input/
report.xlsx (containing 2 tables in Sheet1)
data.csv (containing 1 table)
output/
report-Sheet1-table1.csv
report-Sheet1-table2.csv
data-table1.csv
Development
# Install dependencies
bun install
# Run in development mode (with watch)
bun run dev
# Run the CLI directly from source (for testing)
bun run dev:run -- -i ./input -o ./output -v
# Build
bun run build
Future Roadmap
- Implement streaming support for files
- Make algorithm parameters configurable when used as SDK (density thresholds, gap rules, etc.)