cartoon-catch

v2.1.7

Published

3 years ago

cartoon-catch

Downloads

0High
0Medium
0Low

tangjun

cartoon-catch

漫画爬虫

可以爬虫90%的漫画网站，不支持可以提供网址给作者，作者更新代码。

如不懂使用，可查看例子

实例

// import grab from 'cartoon-catch'; // ts
const {default: grab} = require('cartoon-catch') // js
const options = {
	name: '../漫画名‘,
	host: 'http://xx.com',
	target: '.list a'
}

grab('http://xx.com/m12345', options, function(html, url, title){
	const img = html.match(/https:\/\/img.xxx.com.*\d+.jpg/g)
	const title = html.match(/\<strong\>(.*.)\<\/strong\>/);
	return [imgs, title]
})

options 参数

|参数名|值|是否必填|类型|备注| |:----|:----|:----|:----|:----| | name | ../鬼灭之刃 | 是 | string | 指定下载路径 | | target | .list a | 是 | string | 跳转到详情的dom | | host | http://xxx.com | 否 | string | 设置详情页host, 默认使用页面host | | slice | [0, 2] | 否 | array | 指定数组返回选定的元素 | | imageHost | http://cdn.xxx.com | 否 | string | 图片地址的host，如无就无需传 | | encoding | true | 否 | boolean | 请求返回值乱码时传false | | headers | {} | 否 | object | 自定义头部 | | urlReplace | ['1.htm', '{index}'] | 否 | array | url替换 | | titleReplace | ['鬼灭之刃', '鬼灭'] | 否 | array | title替换 | | beforeFunction | function([url, title]){} | 否 | function | 自定义爬取方式 | | downloadOptions | {} | 否 | object | 设置下载图片参数 |

downloadOptions 参数

|参数名|值|是否必填|类型|备注| |:----|:----|:----|:----|:----| | extract | jpg | 否 | string | 下载图片后缀，默认jpg | | parallel | 5 | 否 | number | 并行下载，默认5 | | timeout | 5000 | 否 | number | 请求超时时间，默认5000 | | gainInterval | 3000 | 否 | number | 重试时间，默认3000 | | againTimes | 0 | 否 | number | 重试次数，默认无限次 |

其他

如果需要DEBUG，环境变量传入DEBUG=debug
支持爬虫网站如下

|网址| |:---| |https://www.manhuaren.com| |http://mangabz.com| |https://www.soman.com| |https://m.kuaikanmanhua.com| |https://www.77mh.cc| |http://m.pufei.org| |http://m.ikkdm.com| |https://www.kanbl.cc| |https://m.36mh.com/| |https://m.kuaikanmanhua.com/| |http://www.qiman6.com| |...|

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

漫画爬虫

实例

options 参数

downloadOptions 参数

其他