@specify_/mangascraper
v3.6.0
Published
A mangascraper for fetching manga from a variety of manga sources (mangakakalot, manganato, and etc.)
Downloads
20
Maintainers
Readme
Mangascraper is a package used to scrape mangas. It is a solution to retrieving mangas that do not offer an API. Mangascraper can run either asynchronously, returning a Promise
, or synchronously if a callback
function is provided.
Table of Contents
Installation
npm
npm install @specify_/mangascraper
Sources
Currently, mangascraper supports 5 sources, but will support more in the future.
| Source | Supported? | Uses puppeteer? | Uses axios? | | ------------------------------------------- | ---------- | ------------------- | --------------- | | MangaBox | ✔️ | ✔️ | ✔️ | | Mangafreak | ❌ | --- | --- | | Mangakakalot | ✔️ | ❌ | ✔️ | | Manganato | ✔️ | ❌ | ✔️ | | Mangahasu | ✔️ | ❌ | ✔️ | | Mangaparkv2 | ✔️ | ✔️ | ❌ | | Mangasee | ✔️ | ✔️ | ❌ | | Readmng | ✔️ | ✔️ | ✔️ | | Kissmanga | ❌ | --- | --- |
If a supported source uses axios, mangascraper will try to use axios as much as possible to save computer resources. If the network request is blocked by Cloudflare, mangascraper will resort to using puppeteer.
If a supported source uses both axios and puppeteer, it means one or more methods in the source use either axios or puppeteer. For example, Readmng
uses puppeteer for search()
, but uses axios for getMangaMeta()
and getPages
Usage
To start using the package, import a class such as Mangakakalot
from the package and use the methods to get mangas from that source.
Here's an example:
import { Manganato } from '@specify_/mangascraper';
const manganato = new Manganato();
(async () => {
const mangas = await manganato.search('One Piece');
const meta = await manganato.getMangaMeta(mangas[0].url);
console.log(meta.chapters);
})();
which outputs...
[
{
name: 'Chapter 1007',
url: 'https://readmanganato.com/manga-aa951409/chapter-1007',
views: '730,899',
uploadDate: 2021-03-12T07:00:00.000Z
},
{
name: 'Chapter 1006',
url: 'https://readmanganato.com/manga-aa951409/chapter-1006',
views: '364,964',
uploadDate: 2021-03-05T07:00:00.000Z
},
... and more items
]
Configuring puppeteer
Connecting to an endpoint
If you already have an existing puppeteer endpoint, mangascraper can connect to that endpoint instead and perform faster concurrent operations.
Mangascraper also includes its own puppeteer launch arguments, and it is recommended to use them for scraping to go smoothly.
import puppeteer from 'puppeteer';
import { initPuppeteer, MangaSee } from '@specify_/mangascraper';
(async () => {
const browser = await puppeteer.launch({ ...initPuppeteer });
const endpoint = browser.wsEndpoint();
browser.disconnect();
const mangasee = new MangaSee({ puppeteerInstance: { instance: 'endpoint', wsEndpoint: endpoint } });
const mangas = await mangasee.search('Haikyu!');
})();
Since you are using your own puppeteer package, mangascraper cannot make any modificatins to the browser such as including a proxy.
const browser = await puppeteer.launch();
const mangapark = new MangaPark({
proxy: { host: '127.0.0.1', port: 8080 },
puppeteerInstance: { instance: 'custom', browser },
}); // ❌ Mangascraper cannot include proxy
const browser = await puppeteer.launch({ args: ['--proxy-server=127.0.0.1:8080'] });
const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } }); // ✔️ Our own browser instance will launch with a proxy
Because mangascraper is connecting to an existing endpoint, you must do all your browser arguments outside of mangascraper. See this for more on this.
Overriding mangascraper's puppeteer launch arguments
If you want to override the launch arguments mangascraper uses, you can add this to any manga class such as MangaSee as long as you are using the default instance. Any other instance will require you to implement your own or inherit mangascraper's puppeteer options with initPuppeteer
const mangasee = new MangaSee({ puppeteerInstance: { instance: 'default', launch: { ...myCustomLaunchOptions } } });
If you want to include a proxy, mangascraper will automatically put it into the launch arguments.
const manganato = new Mangahasu({
proxy: { host: 'proxy_host', port: 8080 },
puppeteerInstance: { instance: 'default' },
});
Using an existing puppeteer package
By using an existing puppeteer package in your app, this will enable mangascraper to use one browser instead of opening new browsers per operation. In addition, mangascraper will be able to scrape manga concurrently. With this approach, resources will be less intensive on chromium, and it can save you a lot of time if you are handling a lot of scraping operations. This is the best approach if you do not want to connect to an existing endpoint.
However, you must have puppeteer already installed.
This is the most basic setup:
import puppeteer from 'puppeteer';
import { MangaPark, initPuppeteer } from '@specify_/mangascraper';
(async () => {
const browser = await puppeteer.launch(initPuppeteer);
const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } });
})();
Since you are using your own puppeteer package, mangascraper cannot add any modifications to the browser such as including a proxy.
const browser = await puppeteer.launch();
const mangapark = new MangaPark({
proxy: { host: '127.0.0.1', port: 8080 },
puppeteerInstance: { instance: 'custom', browser },
}); // ❌ Mangascraper cannot include a proxy
const browser = await puppeteer.launch({ args: ['--proxy-server=127.0.0.1:8080'] });
const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } }); // ✔️ Our own browser instance will launch with a proxy
By default, mangascraper does not close the browser after the end of operation. If by any means you want to close the browser after an operation has finished. You can add the following to puppeteerInstance
puppeteerInstance: {
instance: 'custom',
browser: browser,
options: {
closeAfterOperation: true // After an operation is finished, close the browser
}
}
However, this will prevent mangascraper from proceeding to another operation after one is finished such as this example:
const mangapark = new MangaPark({ puppeteerInstance: 'custom', browser, options: { closeAfterOperation: true } });
await mangapark
.search('Naruto', { orderBy: 'latest_updates' })
.then(async (mangas) => await Promise.all(mangas.map((manga) => mangapark.getMangaMeta(manga.url)))); // ❌ Browser will close after gathering results of mangas that match the title Naruto and will not gather metadata from each source.
Examples
const mangas = await mangahasu.search('Fairytail');
console.log(mangas);
mangahasu.search('Fairytail', null, (err, mangas) => {
if (err) return console.error(err);
console.log(mangas);
});
Get a list of manga that match the title Black Clover
import { Mangakakalot } from '@specify_/mangascraper';
const mangakakalot = new Mangakakalot();
mangakakalot.search('Black Clover', function (err, mangas) {
console.log(mangas);
});
Get a list of manga from the Isekai genre
import { Mangakakalot } from '@specify_/mangascraper';
const mangakakalot = new Mangakakalot();
mangakakalot.getMangas({ genre: 'Isekai' }, function (err, mangas) {
console.log(mangas);
});
Get the metadata of the Jaryuu Tensei manga
import { Mangakakalot } from '@specify_/mangascraper';
const mangakakalot = new Mangakakalot();
mangakakalot.getMangaMeta('https://mangakakalot.com/read-qt9nz158504844280', function (err, meta) {
console.log(meta);
});
Get a list of manga that match the title Naruto
import { MangaNato } from '@specify_/mangascraper';
const manganato = new Manganato();
manganato.search('Naruto', null, function (err, mangas) {
console.log(mangas);
});
Get a list of manga from the Romance genre that do not have the Drama genre
import { MangaNato } from '@specify_/mangascraper';
const manganato = new Manganato();
manganato.search(null, { genre: { include: ['Romance'], exclude: ['Drama'] } }, function (err, mangas) {
console.log(mangas);
});
Get the metadata of the Solo Leveling manhwa
import { MangaNato } from '@specify_/mangascraper';
const manganato = new Manganato();
manganato.getMangaMeta('https://readmanganato.com/manga-dr980474', function (err, meta) {
console.log(meta);
});
Simple search for manga that match the genre, which uses less compute power compared to getMangas()
import { MangaNato } from '@specify_/mangascraper';
const manganato = new MangaNato();
manganato.getMangasFromGenre('Comedy', {}, (err, mangas) => {
console.log(mangas);
});
Get a list of manga
import { Mangahasu } from '@specify_/mangascraper';
const mangahasu = new Mangahasu();
mangahasu.search(null, null, (err, mangas) => {
console.log(mangas);
});
Get the metadata of Attack on Titan manga
import { Mangahasu } from '@specify_/mangascraper';
const mangahasu = new Mangahasu();
mangahasu.getMangaMeta('https://mangahasu.se/shingeki-no-kyojin-v6-p27286.html', (err, meta) => {
console.log(meta);
});
Get pages of the chapter that is in the 1st index of the Attack on Titan chapters array.
import { Mangahasu } from '@specify_/mangascraper';
const mangahasu = new Mangahasu();
(async () => {
const mangas = await mangahasu.search('Attack on Titan');
const meta = await mangahasu.getMangaMeta(mangas[0].url);
const pages = await mangahasu.getPages(meta.chapters[0].url);
console.log(pages);
})();
Get a list of manga that match the title the melancholy of haruhi suzumiya, and as well open puppeteer in headful mode (useful for debugging);
import { MangaSee } from '@specify_/mangascraper';
const mangasee = new MangaSee({ debug: true }); // Opens puppeteer in headful mode
(async () => {
const mangas = await mangasee.search('the melancholy of haruhi suzumiya');
console.log(mangas);
})();
Get all mangas from the MangaSee directory.
import { MangaSee } from '@specify_/mangascraper';
const mangasee = new MangaSee();
(async () => {
const mangas = await mangasee.directory();
console.log(mangas);
})();
Get the metadata of the Berserk manga
import { MangaSee } from '@specify_/mangascraper';
const mangasee = new MangaSee();
(async () => {
const berserk = await mangasee.getMangaMeta('https://mangasee123.com/manga/Berserk');
console.log(berserk);
})();
Get the Chapter 363 pages of the Berserk manga
import { MangaSee } from '@specify_/mangascraper';
const mangasee = new MangaSee();
(async () => {
const chapter363 = await mangasee.getPages('https://mangasee123.com/read-online/Berserk-chapter-363-index-2.html');
console.log(chapter363);
})();
Search for a manga that matches the title noragami.
Get the first result and get the meta
Then get the pages of the latest chapter
import { MangaPark, initPuppeteer } from '@specify_/mangascraper';
(async () => {
const browser = await puppeteer.launch(initPuppeteer);
const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } });
const mangas = await mangapark.search('noragami');
const meta = await mangapark.getMangaMeta(mangas[0].url);
const pages = await mangapark.getPages(meta.chapters[meta.chapters.recentlyUpdated][0].pages);
console.log(pages);
})();
Get 50 of the most viewed mangas
import { ReadMng } from '@specify_/mangascraper';
(async () => {
const readmng = new ReadMng();
const mangas = await readmng.search();
console.log(mangas);
})();
For React JS
Get pages and display them on webpage. Do note that the getMangaMeta
method of this class requires puppeteer, so if you want to get the manga meta, consider fetching to a custom API that uses the mangascraper package.
import React from 'react';
import { MangaBox } from '@specify_/mangascraper';
const mangabox = new MangaBox();
const App: React.FC = () => {
const [pages, setPages] = React.useState<string[]>([]);
React.useEffect(() => {
mangabox
.getPages('https://mangabox.org/manga/solo-leveling-manhua-manga/chapter-159/')
.then((pages) => setPages(pages))
.catch((e) => console.error(e));
}, []);
return (
<div>
{pages.map((page) => (
<img src={page} />
))}
</div>
);
};
export default App;
API Reference
License
Distributed under MIT © Joseph Marbella