npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

web-scrap-ai

v1.0.4

Published

A chatbot CLI for web scraping using Puppeteer

Downloads

8

Readme

Web-Scrap-AI

Version: 1.0.3 Author: Akash Singh License: GPL-3.0

Table of Contents

  1. Overview
  2. Features
  3. Installation
  4. Usage
  5. How It Works
  6. Environment Variables
  7. Error Handling
  8. Docker Support
  9. Contributing
  10. License

Overview

Web-Scrap-AI is a powerful command-line tool that combines the capabilities of web scraping and artificial intelligence. It allows users to scrape data from websites using Puppeteer and interact with that data through an AI-powered chatbot. The tool is designed to be intuitive, allowing both automatic and custom class-based scraping, with answers tailored to the specific JSON data scraped from the website.

Features

  • Docker Support: Dockerfile provided for easy containerization and deployment.
  • Automatic Web Scraping: Quickly scrape a website's data using the built-in automatic scraping function.
  • Custom Class-Based Scraping: Target specific parts of a website by specifying class names and assigning custom titles to the scraped data.
  • AI-Powered Chatbot: Interact with the scraped data using an AI chatbot that can answer questions based on the data in JSON format.
  • Context-Specific Queries: Ask the chatbot about specific sections of the scraped data, like titles or paragraphs, using simple commands such as /title or /paragraph.
  • Environment Configuration: Easily configure and manage your Groq API key for AI interactions.

Step-by-Step Guide

  1. Start the CLI: Run the following command in your terminal:
  2. Enter the Website URL: The CLI will prompt you to enter the URL of the website you want to scrape.
  3. Choose Scraping Method:
    • Automatic Scraping: The tool will automatically scrape the most relevant data from the website.
    • Custom Class-Based Scraping: You can specify the class names you want to scrape and assign custom titles to the scraped data.

Custom Class-Based Scraping

Custom class-based scraping allows you to target specific elements of a website by providing the class names of those elements. This is particularly useful when you want to scrape specific sections of a page, such as product details, reviews, or any other content marked with identifiable class names.

How It Class-Based Works

  • Class Names: You provide the class names of the elements you want to scrape.
  • Titles: For each class name, you assign a title that will be used in the resulting JSON file to categorize the data.

Example

Let's say you want to scrape a website's product names and prices:

  • Class name for product names: .product-title
  • Class name for product prices: .product-price
  • Assigned titles: Product Names and Product Prices

During the scraping process, the tool will extract data from elements with these class names and store them under the provided titles in the JSON file.

Installation

To install Web-Scrap-AI , you can use npm:

npm install web-scrap-ai

This will install the package and automatically set up the required script in your package.json.

Usage

Command-Line Interface (CLI)

After installation, you can use the web-scrap-ai command to start the tool:

How It Works

Web Scraping

  1. Automatic Scraping: The tool can automatically scrape data from the provided URL using Puppeteer. It tries to intelligently extract meaningful content from the page.
  2. Custom Class-Based Scraping: Users can manually specify the class names of the HTML elements they want to scrape. The tool will prompt for the class names and the titles to assign to the data in the JSON file.

AI Chatbot

  1. Initiating the Chatbot: After scraping, the tool can start an AI chatbot that interacts with the scraped data. The chatbot only answers questions related to the scraped JSON data.
  2. Selecting the JSON File: If there are multiple JSON files in the root directory, the user can select which file to use for the chatbot session.
  3. Data-Specific Responses: The AI can automatically detect commands like /title or /paragraph and respond with information specific to those sections.
  4. Fine-Tuning: The model is fine-tuned during the session to ensure it only answers questions relevant to the selected JSON file.

Environment Variables

This package requires a Groq API key only if ai model doesn't work or you are not satisfied with the answer, which is stored in a .env file. The key is used to access Groq's AI model.

  • To set api key enter "api key" when chatbot is running.

Setting Up the API Key

  • If the .env file doesn't exist, it will be created automatically when you enter the API key for the first time.
  • The API key is stored under the variable name GROQ_API_KEY.

Error Handling

  • Invalid URLs: If the provided URL is invalid, the tool will display an error message and terminate the session.
  • API Errors: If there is an issue with the Groq API, the tool will prompt the user to re-enter the API key.
  • File System Errors: The tool includes error handling for file reading/writing operations. If it cannot access the package.json or any other required file, it will display an appropriate error message.

Docker Support

Web-Scrap-AI comes with a Dockerfile to allow easy containerization and deployment. You can build and run the Docker container as follows:

Contributing

Contributions to Web-Scrap-AI are welcome! If you have ideas for new features or improvements, feel free to submit a pull request or open an issue.

Support

If you like this project, show your support & love!

buy me a coffee

License

This project is licensed under the GPL-3.0 License. See the LICENSE file for details.