npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

web-scraper-node

v1.0.0

Published

A lightweight Node.js web scraper with multiple features.

Downloads

8

Readme

Web Scraper Kütüphanesi (PHP & Node.js)

Bu web kazıyıcı kütüphanesi, web sitelerinden veri kazıma, belirli HTML elementlerini seçme ve sayfa güncellemelerini izleme için tasarlanmıştır. Hem PHP hem de Node.js için destek sunar ve pagination, meta veri çıkarma, dinamik sayfa kazıma ve form gönderimi gibi özellikleri içerir.

Özellikler

  • HTML Element Kazıma: CSS seçicileri kullanarak belirli elementleri seçme ve veri çıkarma.
  • Pagination Desteği: Birden fazla sayfa üzerinden veri kazıma.
  • Meta Veri Çıkarma: HTML sayfalarından başlık, açıklama ve anahtar kelime meta etiketlerini çıkarma.
  • Dinamik İçerik Kazıma: JavaScript ile içerik yükleyen sayfaları Puppeteer (Node.js) kullanarak kazıma.
  • Proxy ve Kullanıcı Ajanı Desteği: Proxy'leri ve User-Agent başlıklarını özelleştirme.
  • Sayfa Güncellemelerini İzleme: Belirli aralıklarla bir web sayfasındaki değişiklikleri izleme.
  • Form Gönderme: Veri kazımadan önce giriş yapma ve form gönderme işlemleri gerçekleştirme.
  • JSON'a Veri Dışa Aktarma: Kazınan verileri JSON formatında kaydetme.

Kurulum

PHP Versiyonu:

  1. Bağımlılıkları Composer ile yükleyin:
composer install
  1. Aşağıdaki composer.json yapılandırmasını ekleyin:
{
    "name": "fathkoc/php-web-scraper",
    "require": {
        "php": ">=7.4",
        "guzzlehttp/guzzle": "^7.0",
        "symfony/dom-crawler": "^5.0",
        "symfony/css-selector": "^5.0"
    }
}

Node.js Versiyonu:

  1. Bağımlılıkları npm ile yükleyin:
npm install axios cheerio puppeteer
  1. Aşağıdaki package.json yapılandırmasını ekleyin:
{
  "name": "node-web-scraper",
  "version": "1.0.0",
  "main": "src/scraper.js",
  "dependencies": {
    "axios": "^0.21.1",
    "cheerio": "^1.0.0-rc.10",
    "puppeteer": "^10.0.0"
  }
}

Kullanım

PHP:

use WebScraper\Scraper;

$scraper = new Scraper();
$html = $scraper->fetchPageContent('https://example.com');
$data = $scraper->scrapeElement($html, 'h1');
print_r($data); // Tüm <h1> elementlerini çıktı verir

Node.js:

const Scraper = require('./src/scraper');

const scraper = new Scraper();
(async () => {
    const html = await scraper.fetchPageContent('https://example.com');
    const data = scraper.scrapeElement(html, 'h2');
    console.log(data); // Tüm <h2> elementlerini çıktı verir
})();

Web Scraper Library (PHP & Node.js)

This web scraper library is designed to scrape data from websites, extract specific HTML elements, and track page updates. It supports both PHP and Node.js implementations, with features like pagination, meta data extraction, dynamic page scraping, and form submissions.

Features

  • HTML Element Scraping: Select and extract specific elements (e.g., headings, paragraphs) using CSS selectors.
  • Pagination Support: Scrape data across multiple pages with pagination.
  • Meta Data Extraction: Extract meta tags such as title, description, and keywords from HTML pages.
  • Dynamic Content Scraping: Use Puppeteer (Node.js) to scrape pages that load content dynamically using JavaScript.
  • Proxy and User-Agent Support: Customize User-Agent headers and use proxies to avoid detection.
  • Track Page Updates: Continuously monitor a webpage for changes at set intervals.
  • Form Submission: Perform login actions and form submissions before scraping.
  • Export to JSON: Save scraped data in JSON format for easy use.

Installation

PHP Version:

  1. Install dependencies via Composer:
composer install
  1. Add the following composer.json configuration:
{
    "name": "fathkoc/php-web-scraper",
    "require": {
        "php": ">=7.4",
        "guzzlehttp/guzzle": "^7.0",
        "symfony/dom-crawler": "^5.0",
        "symfony/css-selector": "^5.0"
    }
}

Node.js Version:

  1. Install dependencies via npm:
npm install axios cheerio puppeteer
  1. Add the following package.json configuration:
{
  "name": "node-web-scraper",
  "version": "1.0.0",
  "main": "src/scraper.js",
  "dependencies": {
    "axios": "^0.21.1",
    "cheerio": "^1.0.0-rc.10",
    "puppeteer": "^10.0.0"
  }
}

Usage

PHP:

use WebScraper\Scraper;

$scraper = new Scraper();
$html = $scraper->fetchPageContent('https://example.com');
$data = $scraper->scrapeElement($html, 'h1');
print_r($data); // Output all <h1> elements

Node.js:

const Scraper = require('./src/scraper');

const scraper = new Scraper();
(async () => {
    const html = await scraper.fetchPageContent('https://example.com');
    const data = scraper.scrapeElement(html, 'h2');
    console.log(data); // Output all <h2> elements
})();

License

MIT License