semo-plugin-read
v1.0.22
Published
A tool which convert web pages to markdown and many other formats, based on Semo
Downloads
17
Maintainers
Readme
semo-plugin-read
This is a Semo plugin, to provide a cli tool to grab web page and process to many useful formats for learning purpose.
Usage
npm i -g @semo/cli semo-plugin-read
semo read [URL|本地 markdown] --format=[FORMAT]
semo read [url]
Parse and read a url or a md file with your favorate format.
选项:
--format, -F Output format, use --available-formats to see all supported formats, default: markdown.
[默认值: "markdown"]
--clipboard Input from clipboard
--proxy, -P Proxy images to prevent anti-hotlinking.
--port Web server port. [默认值: 3000]
--domain Set source input from which domain, without protocol and www.
--open-browser, --open, -B Auto open browser.
--clear-console, --clear, -C Auto open browser.
--title Prepend title, use no-title to disable.
--footer Append footer, use no-footer to disable. [默认值: true]
--toc Include TOC
--rename, -R New name, with extension.
--output, -O Location for output.
--available-formats, -A List all supported formats
Extend plugin
There are 2 kinds of extensions, one is for defining formats, another one is for processing content. There are many extensions already under /packages
directory.
Define formats
hook_define_format: new Utils.Hook('semo-plugin-read', async ({ format, title, markdown, argv, converted }) => {})
Arguments:
- format: Semo read option for format
- title: Web page url
- markdown: Parsed Markdown
- converted: converted.content is the main part html of the page body
- argv: Semo's argv
** Domain's processing **
hook_domain: new Utils.Hook('semo-plugin-read', {
preprocess: (html, argv) => html,
postprocess: (markdown, argv) => markdown
})
- html: original html
- markdown: parsed markdown
Examples
semo read https://juejin.im/post/5d82e116e51d453b7779d5f6
semo read README.md --format console
semo read --format=wechat # wechat format is defined by plugin
semo run read URL --format=markdown # Semo can run read command in this way
semo read --available-formats # Show all formats
Built-in formats
There are many format defined by read plugins, here only shows built-in formats.
markdown
ormd
: Convert web page to markdownconsole
: Output markdown to consoledebug
: Output parsed main page html, for debuging
Known bugs
mobi
plugin can not save remote images, we can first save toepub
format, then covert tomobi
usingebook-convert
command.- Ajax content do not support for now.
Contributions
PRs, Issues, Plugins are all welcome.
About Semo
semo
是这个插件的驱动,是我开发的一个命令行开发框架,是在开源项目 yargs
基础上做的封装,大家感兴趣的话可以移步这里和这里了解 更多。
Semo
is the core of this plugin, is a command line framework, based on yargs
. You can see more on https://semo.js.org and https://github.com/semojs
LICENSE
MIT