offmute
v0.0.4
Published
An experiment in meeting transcription and diarization with just an LLM.
Downloads
326
Readme
npx offmute 🎙️
Intelligent meeting transcription and analysis using Google's Gemini models
Features • Quick Start • Installation • Usage • Advanced • How It Works
🚀 Features
- 🎯 Transcription & Diarization: Convert audio/video content to text while identifying different speakers
- 🎭 Smart Speaker Identification: Attempts to identify speakers by name and role when possible
- 📊 Meeting Reports: Generates structured reports with key points, action items, and participant profiles
- 🎬 Video Analysis: Extracts and analyzes visual information from video meetings, understand when demos are beign didsplayed
- ⚡ Multiple Processing Tiers: From budget-friendly to premium processing options
- 🔄 Robust Processing: Handles long meetings with automatic chunking and proper cleanup
- 📁 Flexible Output: Markdown-formatted transcripts and reports with optional intermediate outputs
🏃 Quick Start
# Set your Gemini API key
export GEMINI_API_KEY=your_key_here
# Run on a meeting recording
npx offmute path/to/your/meeting.mp4
📦 Installation
As a CLI Tool
npx offmute <Meeting_Location> <options>
As a Package
npm install offmute
Get Help
npx offmute --help
bunx
or bun
works faster if you have it!
💻 Usage
Command Line Interface
npx offmute <input-file> [options]
Options:
-t, --tier <tier>
: Processing tier (first, business, economy, budget) [default: "business"]-a, --all
: Save all intermediate outputs-sc, --screenshot-count <number>
: Number of screenshots to extract [default: 4]-ac, --audio-chunk-minutes <number>
: Length of audio chunks in minutes [default: 10]-r, --report
: Generate a structured meeting report-rd, --reports-dir <path>
: Custom directory for report output
Processing Tiers
- First Tier (
first
): Pro models for all operations - Business Tier (
business
): Pro for description, Flash for transcription - Economy Tier (
economy
): Flash models for all operations - Budget Tier (
budget
): Flash for description, 8B for transcription
As a Module
import {
generateDescription,
generateTranscription,
generateReport,
} from "offmute";
// Generate description and transcription
const description = await generateDescription(inputFile, {
screenshotModel: "gemini-1.5-pro",
audioModel: "gemini-1.5-pro",
mergeModel: "gemini-1.5-pro",
showProgress: true,
});
const transcription = await generateTranscription(inputFile, description, {
transcriptionModel: "gemini-1.5-pro",
showProgress: true,
});
// Generate a structured report
const report = await generateReport(
description.finalDescription,
transcription.chunkTranscriptions.join("\n\n"),
{
model: "gemini-1.5-pro",
reportName: "meeting_summary",
showProgress: true,
}
);
🔧 Advanced Usage
Intermediate Outputs
When run with the -a
flag, offmute saves intermediate processing files:
input_file_intermediates/
├── screenshots/ # Video screenshots
├── audio/ # Processed audio chunks
├── transcription/ # Per-chunk transcriptions
└── report/ # Report generation data
Custom Chunk Sizes
Adjust processing for different content types:
# Longer chunks for presentations
offmute presentation.mp4 -ac 20
# More screenshots for visual-heavy content
offmute workshop.mp4 -sc 8
⚙️ How It Works
offmute uses a multi-stage pipeline:
Content Analysis
- Extracts screenshots from videos at key moments
- Chunks audio into processable segments
- Generates initial descriptions of visual and audio content
Transcription & Diarization
- Processes audio chunks with context awareness
- Identifies and labels speakers
- Maintains conversation flow across chunks
Report Generation (Spreadfill)
- Uses a unique "Spreadfill" technique:
- Generates report structure with section headings
- Fills each section independently using full context
- Ensures coherent narrative while maintaining detailed coverage
- Uses a unique "Spreadfill" technique:
Spreadfill Technique
The Spreadfill approach helps maintain consistency while allowing detailed analysis:
// 1. Generate structure
const structure = await generateHeadings(description, transcript);
// 2. Fill sections independently
const sections = await Promise.all(
structure.sections.map((section) => generateSection(section, fullContext))
);
// 3. Combine into coherent report
const report = combineResults(sections);
🛠️ Requirements
- Node.js 14 or later
- ffmpeg installed on your system
- Google Gemini API key
Contributing
You can start in TODOs.md
to help with things I'm thinking about, or you can steel yourself and check out PROBLEMS.md
.
Created by Hrishi Olickel • Support offmute by starring our GitHub repository