article-archiver
v1.5.0
Published
Node tool to scrape and transform articles for local reading
Downloads
3
Maintainers
Readme
Article Archiver
The purpose of this library is to convert online articles and blog posts into local markdown by only preserving:
- article content
- media assets
- meta data
The heavy lifting around scraping is done with Cypress and the content is enhanced with Mozilla Readability.
Getting Started
⚠️ This library is under development and not expected to work until the TODO's are completed ⚠️
Installation
npm install -g article-archiver
Usage
npx article-archiver <urls>
Architecture
TODO
- [x] setup cypress
- [x] configure cypress to scrape URL's
- [x] implement code cleaner and enhancer
- [ ] implement readability
- [ ] wire up scraper to enhancer
- [ ] setup http server for tmp files
- [ ] setup website-scraper
- [ ] wire up archiver to save local assets to tmp folder
- [ ] setup utf8 and turndown transformers
- [ ] wire up transformer to merge meta data and write to output