npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

playwright-demo

v1.0.0

Published

<p align="center"> <a href="" rel="noopener"> <img width=200px height=200px src="https://i.imgur.com/6wj0hh6.jpg" alt="Project logo"></a> </p>

Downloads

2

Readme

Status GitHub Issues GitHub Pull Requests License


📝 Table of Contents

🧐 About

Shopee 数据爬取,网址: https://shopee.com.my/

  • 结果集数据结构
      let result = {
        toy_category: toyCategory,    // Excel sheet标题(分类)
        keyword: title,               // 搜索关键字
        name: "",                     // 商品名称
        image: "",                    // 商品首图
        images: [],                   // 商品其他图片
        price: "",                    // 商品价格
        price_before_discount: "",    // 商品折扣前的价格
        price_max: "",                // 商品最大区间价格
        price_max_before_discount: "",// 商品折扣前的最大区间价
        price_min: "",                // 商品最小价格
        price_min_before_discount: "",// 商品折扣前的最小区间价
        status: "",                   // 商品状态
        type: "",                     // 商品类型
        stock: "",                    // 商品库存
        rating_star: "",              // 商品评分等级
        star_5_count: "",             // 五星数量
        star_4_count: "",             // 四星数量
        star_3_count: "",             // 三星数量
        star_2_count: "",             // 二星数量
        star_1_count: "",             // 一星数量
        shop_name: "",                // 店家名称
        shop_location: "",            // 店家地址
        shop_place: "",               // 店家位置
        shop_rating_star: "",         // 店家评分等级
        shop_rating_good: "",         // 店家好评
        shop_rating_normal: "",       // 店家一般评价
        shop_rating_bad: "",          // 减价差评
        source_url: "",               // 商品详情页
        shopid: "",                   // 店家ID
        itemid: "",                   // 商品ID
        create_time: new Date(),      // 创建时间
        crawl_date: new Date(),       // 爬取时间
      }

🏁 Getting Started

环境变量配置

PRODUCTION        # 默认 false , 设置为 true 则为无头模式运行
PAGE              # 默认 2 , 浏览器同时启动几个page进行爬取
CRON              # 定时运行表达式 例如: '01 12 * * *'
TIMEOUT           # 程序运行超时等待时间 默认 60000 (60秒)
KEYWORDINDEX      # 默认 0 , 搜索关键字的索引,如总量200,该值指定100,则从101条数据开始爬取
WORKERRETRYTIMES  # 默认 5 , 每个任务尝试重试最大次数
CRAWLERLISTRETRYTIMES # 默认 10 , 每个任务中列表详情爬取尝试重试最大次数
SAVE              # 默认 false , 设置为 true 则为数据存储入库

CRAWLAB_MONGO_HOST # mongodb 连接地址 默认: localhost
CRAWLAB_MONGO_PORT # mongodb 连接端口 默认: 27017
CRAWLAB_MONGO_DB   # 库名
CRAWLAB_MONGO_USERNAME # 用户名
CRAWLAB_MONGO_PASSWORD # 密码
CRAWLAB_MONGO_AUTHSOURCE # 授权用户表名,默认 admin
CRAWLAB_COLLECTION  # 数据入库集合名称
CRAWLAB_COLLECTION_KEYWORDCOUNT # 关键字入库集合名称

程序启动命令

  • 使用以下命令启动,其他具体查看package.json
docker run -d --restart=always  -e KEYWORDINDEX=  -e PRODUCTION=true -e CRAWLAB_MONGO_HOST=mongo -e CRAWLAB_MONGO_USERNAME=muser -e CRAWLAB_MONGO_PASSWORD=mon0420 -e CRAWLAB_MONGO_DB=crawlab_test -e CRAWLAB_COLLECTION=shopee -e SAVE=true -e CRAWLAB_IS_DEDUP=true -e CRAWLAB_DEDUP_FIELD=keyword,source_url  -e CRAWLAB_IS_DEDUP_KEYWORDCOUNT=true -e CRAWLAB_COLLECTION_KEYWORDCOUNT=shopee_keyword_count -e CRAWLAB_DEDUP_FIELD_KEYWORDCOUNT=keyword,create_date -e CRON='01 12 * * *' -e PARALLEL=1 -e PAGE=3 -e TIMEOUT=20000 -e WORKERRETRYTIMES=5 -e CRAWLERLISTRETRYTIMES=10 --ipc=host --name shopee --network crawlab_default registry.cn-hongkong.aliyuncs.com/globalmisc/crawler-shopee:latest