jastshare
v0.0.0
Published
A template package
Downloads
2
Readme
Web for AI
A library that provides a web interface for AI
Features
- Ultra-precise HTML2Markdown conversion
- Ultra-precise Markdown segmentation based on AST
- Ultra-precise HTML retrieval functions using headless browsers
- The core functionality is edge-native (runs on Cloudflare Workers!!!)
Demo
There is a demo API for Html2Markdown deployed on CloudflareWorker. Please access the following link
Quick Start
Just install and execute scripts
pnpm i webforai playwright
import { promises as fs } from "fs";
import { htmlToMarkdown, htmlToMdast } from "webforai";
import { loadHtml } from "webforai/loaders/playwright";
const url = "https://www.npmjs.com/package/webforai";
const html = await loadHtml(url);
const markdown = htmlToMarkdown(html, { baseUrl: url });
await fs.writeFile("output.md", markdown);
other examples are in examples
Examples
- Simple Example
- Scraping With ChatGPT API
- Translate Markdown with Splitter
- Cloudflare Worker with puppeteer & DO
Usage
Main Functions
htmlToMarkdown(html: string, options?: HtmlToMarkdownOptions): string
Convert HTML to Markdown. By default, unnecessary HTML is excluded and processed.
If solveLinks
is specified, the relative links in the Mdast will be resolved.
This function just calls htmlToMdast and mdastToMarkdown in that order internally.
htmlToMdast(html: string, options?: HtmlToMdastOptions): Mdast
This project uses Hast and Mdast as defined by syntax-tree internally.
This function converts HTML to Mdast, an intermediate representation, which is required when using mdastSplitter
, etc.
mdastToMarkdown(mdast: Mdast | RootContent[], options?: { solveLinks?: string }): string
Convert Mdast to Markdown. If solveLinks
is specified, the relative links in the Mdast will be resolved.
Loader Functions
loadHtml(url: string, options?: LoadHtmlOptions): Promise<string>
Load HTML from the specified URL. This function uses Playwright internally.