scraper.ai
v2.0.2
Published
A simple web scraper using Google's Generative AI to extract and format text from web pages into markdown.
Downloads
7
Readme
webscraperbysourav
A simple web scraper using Google's Generative AI to extract and format text from web pages into markdown.
Description
webscraperbysourav
is a Node.js library that leverages Google's Generative AI to crawl web pages, extract visible text, and format it into clean, structured markdown. This library is useful for extracting content from websites and organizing it into a readable format without HTML tags.
Features
- Fetches web page content using Axios.
- Splits content into manageable chunks for processing.
- Uses Google's Generative AI to format the content into markdown.
- Saves the output as a
.md
file with the content structured and cleaned.
Installation
To install the package, run:
npm install webscraperbysourav
#API
SouravClient
Constructor
new SouravClient({ apiKey }): Creates a new SouravClient instance.
apiKey (string): Your Google API key for accessing Generative AI.
Methods
crawl({ url, goal }): Crawls the given URL and formats the content.
url (string): The URL of the web page to crawl.
goal (string): The goal or instructions for text extraction and formatting.
Returns: A promise that resolves to the markdown content of the page.
Contributing
If you want to contribute to this project, please fork the repository and submit a pull request. Ensure that your changes are well-tested and documented.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Contact
For any issues or inquiries, please contact Sourav Dubey . Email: [email protected].
Acknowledgments
Google Generative AI for providing the AI model used in this library.
Axios for HTTP requests.
Feel free to modify this README.md to better fit your project's specifics and personal preferences!
c
Copy code
Feel free to customize the contact information and acknowledgments
to better suit your needs. If you have any specific details or sections you want to add, just let me know!