npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

test-syj-babel-aaa

v1.0.5-beta.1

Published

使用 superagent 抓取网页 使用 cheerio 分析网页

Downloads

1

Readme

#《使用 superagent 与 cheerio 完成简单爬虫》

使用 superagent 抓取网页 使用 cheerio 分析网页

##内容

Node.js 总是吹牛逼说自己异步特性多么多么厉害,但是对于初学者来说,要找一个能好好利用异步的场景不容易。我想来想去,爬虫的场景就比较适合,没事就异步并发地爬几个网站玩玩。

本来想教大家怎么爬 github 的 api 的,但是 github 有 rate limit 的限制,所以只好牺牲一下 CNode 社区(国内最专业的 Node.js 开源技术社区),教大家怎么去爬它了。

我们这回需要用到三个依赖,分别是 express,superagent 和 cheerio。

先介绍一下,

  • superagent(http://visionmedia.github.io/superagent/ ) 是个 http 方面的库,可以发起 get 或 post 请求。

  • cheerio(https://github.com/cheeriojs/cheerio ) 大家可以理解成一个 Node.js 版的 jquery,用来从网页中以 css selector 取数据,使用方式跟 jquery 一样一样的。

##运行

安装依赖 npm install

写应用逻辑

我们应用的核心逻辑长这样

app.get('/', (req, res, next) => {
  superagent.get('https://cnodejs.org/')
    .end( (err, sres) =>{
    // 常规的错误处理
    if (err) {
      return next(err);
    }
    // sres.text 里面存储着网页的 html 内容,将它传给 cheerio.load 之后
    // 就可以得到一个实现了 jquery 接口的变量,我们习惯性地将它命名为 `$`
    // 剩下就都是 jquery 的内容了
    const $ = cheerio.load(sres.text);
    const items = [];
    $('#topic_list .topic_title').each((idx, element) => {
      const $element = $(element);
      items.push({
        title: $element.attr('title'),
        href: $element.attr('href')
      });
    });

    res.send(items);
  });
});

执行npm run build会拷贝src下所有文件到build目录,并将目录下js文件转码成es5语法

执行 npm run release 会执行build命令,然后检查git分支、版本、是否有更新,最后发布到 npm