sanechain
v0.0.5
Published
Extended langchain basically, with some added sanity.
Downloads
7
Readme
Sane Chain
An attempt to make langchainjs easier to work with
WIP - ~~nothing works yet, just saving the name~~ Some things work, just um - not tested, no warranties :1st_place_medal:
Adds the following loaders:
Utility Classes
DocumentLoader
This class essentially packages up all of langchainjs (plus sanechain) and creates a class: DocumentLoader that can basically load up all your documents regardless of type.
Example:
const filesAndDirectories = [
'path/to/somefile.md',
'path/to/somefile.pdf',
'path/to/somefile.text',
'path/to/somefile.html',
'path/to/somedirectory',
'https://github.com/some/repo',
'https://github.com/some/other_repo',
'path/to/chatgpt.json'
]
const documentLoader = new DocumentLoader(filesAndDirectories)
const documents = documentLoader.loadDocuments()
const splitDocuments = documentLoader.splitDocuments()
// Might take time, probably gonna implement a queue system to speed things up, already using async though.
// also @todo add full parity with all langchain python loaders.
Loaders
ChatGPT Loader
import { ChatGPTLoader } from './chat_gpt_loader.js';
const loader = new ChatGPTLoader('path/to/chat/log.json', 10);
const documents = await loader.load();
Simpler GithubRepoLoader
Insert github link, get repo documents.
import {GithubRepoLoader} from 'sanechain'
const loader = new GithubRepoLoader("https://github.com/owner/repo", { /*params*/ });
const documents = await loader.load();
Roadmap
- [ ] Models
- [ ] General
- [ ] Chat
- [ ] Embeddings
- [ ] Prompts
- [ ] General Templates
- [ ] Chat Template
- [ ] Example Selectors
- [ ] Output Parsers
- [ ] Indexes (Primary focus at first)
- [ ] Document Loaders %%
- [ ] Airbyte JSON
- [ ] Apify Dataset
- [ ] Arxiv
- [ ] AWS S3
- [ ] AZLyrics
- [ ] Azure Blob Storage
- [ ] Bilibili
- [ ] Blackboard
- [ ] Blockchain
- [x] ChatGPT Data
- [ ] Confluence
- [ ] CoNLL-U
- [ ] Copy / Paste
- [x] CSV (langchainjs)
- [ ] Diffbot
- [ ] Discord
- [ ] DuckDB
- [x] EPub (langchainjs)
- [ ] EverNote
- [ ] Facebook Chat
- [ ] Figma
- [x] File Directory (langchainjs)
- [x] Git (langchainjs + custom url loader)
- [ ] GitBook
- [ ] Google BigQuery
- [ ] Google Cloud Storage
- [ ] Google Drive
- [ ] Gutenberg
- [ ] Hacker News
- [ ] HTML
- [ ] HuggingFace dataset
- [ ] iFixit
- [ ] Images
- [ ] Image captions
- [ ] IMDB
- [ ] JSON Files (langchain)
- [ ] Jupyter Notebook
- [x] Markdown (sorta, just parses using TextLoader)
- [ ] MediaWikiDump
- [ ] Microsoft OneDrive
- [ ] Microsoft PowerPoint
- [x] Microsoft Word (langchainjs)
- [ ] Modern Treasury
- [ ] Notion DB 1/2
- [ ] Notion DB 2/2
- [ ] Obsidian
- [ ] Pandas DataFrame
- [x] PDF (langchain)
- [ ] Using PyPDFium2
- [ ] ReadTheDocs Documentation
- [ ] Roam
- [ ] Sitemap
- [ ] Slack
- [ ] Spreedly
- [ ] Stripe
- [ ] Subtitle (langchain)
- [ ] Telegram
- [ ] TOML
- [ ] Unstructured File (half way)
- [x] URL (langchainjs via puppetter, playwright, cheerio, etc)
- [ ] Selenium URL Loader
- [x] Playwright URL Loader (langchainjs)
- [ ] WebBaseLoader
- [ ] WhatsApp Chat
- [ ] Wikipedia
- [ ] YouTube transcripts [ Text Splitters ]
- [ ] Character Text Splitter
- [ ] HuggingFace Length Function
- [ ] Latext Text SPlitter
- [ ] Markdown Text Splitter
- [ ] NLTK Text Splitter
- [ ] RecursiveCharacterTextSplitter
- [ ] Spacy Text Splitter
- [ ] tiktoken (OpenAI) Length Function
- [ ] TiktokenTextSplitter
- [ ] Vector stores
- [ ] Retrievers
- [ ] Document Loaders %%
- [ ] Memory (TBD)
- [ ] Chains (TBD)
- [ ] Agents
- [ ] Tools (TBD)
- [ ] Agents (TBD)
- [ ] Toolkits (TBD)
- [ ] AgentExecutors (TBD)