npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

wowsearch

v1.0.1

Published

## 配置说明

Downloads

7

Readme

wowsearch

配置说明

concurrency

爬虫程序最大并发数量

  • Type: number
  • Default: Math.max(cpus().length - 1, 1)

js_render

使用无头浏览器 puppeteer爬取 HTML
注意:该选项适合于爬取未使用服务端渲染的站点

  • Type: boolean
  • Default: false

js_render_options

使用 puppeteer.launch 需要传入的 options

await puppeteer.launch(
  js_render_options
)

注意:该选项适合于 js_render = true

  • Type: {}

js_render_evaluate

在分析页面 HTML 之前,需要执行的客户端脚本, 如需要点击事件展开文本内容

注意:该选项适合于 js_render = true

  • Type: string | Function

js_waitfor

查看waitFor

  • Type: string | number | Function

timeout

爬取数据时候的超时毫秒数

  • Type: number
  • Default: 30000

start_urls

爬取站点的入口地址,如 ['https://blog.com']

  • Type: string[]
  • Default: []

start_urls_patterns

地址适配规则,如 ['https://blog.com/**']

特殊的,使用如下配置项,可以用于区分不同环境(如中英文环境文档)

;[
  {
    test: 'https://imcuttle.github.io/edam/**/*_zh',
    tags: ['zh']
  },
  {
    test: 'https://imcuttle.github.io/edam/**/*',
    tags: ['en']
  }
]
  • Type: Array<Rule> Rule
  • Default: [/.*/]

stop_urls_patterns

无视的地址适配规则

  • Type: Array<Rule> Rule
  • Default: []

sitemap_urls

站点地图的地址,支持 sitemap.xmlsitemap.txt

  • Type: string[]
  • Default: []

selectors_exclude

忽略的节点选择器,一般用于省略一些不需要分析的节点

  • Type: Selector[]

selectors

爬取的网页节点的选择器集合

anchor_selector

使用在 lvl 选择器中,寻找锚点的选择器

anchor_attribute_name

使用在 lvl 选择器中,锚点节点中锚点的属性

  • Type: string
  • Default: 'id'

smart_crawling

开启智能爬虫模式,分析网页中的 a 标签,扩展需要爬取的地址

  • Type: boolean
  • Default: false

smart_crawling_selector

智能爬虫模式,分析 a 标签的选择器

force_crawling_urls

开启强制使用智能分析的 url,否则将使用 start_urls_patternsstop_urls_patterns 进行过滤

  • Type: boolean
  • Default: false

request_cookie

爬取页面时候,需要注入的 cookie 信心,如有些站点需要用户校验

  • Type: string

source_adaptor

数据适配器,指定数据推送的远端,如

{
  "name": "wowsearch-elastic-adaptor/node",
  "options": {
    "endpoint": "https://example.elasticsearch.com"
    "index_name": "my_blog"
  }
}
  • Type: {name: string, options: any}

Rule

  • Type: RegExp | Function | string | {test: RegExp | Function | string, [key: string]: any}

Selectors

  • Type: Object

lvl0

0 级选择器,一般用于选择文章标题,如

{
  global: true,
  selector: '.post .title'
}

lvl1

1 级选择器,一般用于选择文章一级标题,如

{
  global: true,
  selector: '.post article h1'
}

lvl2

2 级选择器,一般用于选择文章二级标题

lvl3

同上

lvl4

同上

lvl5

同上

lvl6

同上

text

纯文本选择器,如 .post article li, .post article p, .post article pre

Selector

  • Type: string | StrictSelector

StrictSelector

  • Type: Object

strip_chars

选择器文本需要剔除的字符串,用于剔除一些不必要的字符

  • Type: string
  • Default: ' .,;:§¶'

type

选择器的类型

  • Type: 'xpath' | 'css'
  • Default: 'css'

global

是否是全局选择器。全局选择器将在一个页面中全局存在一个,一般用于寻找文章的标题

  • Type: boolean

default_value

未找到的话,所使用的默认值

  • Type: string

anchor_attribute_name

anchor_selector