@specify_/mangascraper

v3.6.0

Published

3 years ago

A mangascraper for fetching manga from a variety of manga sources (mangakakalot, manganato, and etc.)

Downloads

0High
0Medium
0Low

specify_

scraper manga anime manhua manhwa

Mangascraper is a package used to scrape mangas. It is a solution to retrieving mangas that do not offer an API. Mangascraper can run either asynchronously, returning a Promise, or synchronously if a callback function is provided.

Installation

npm

npm install @specify_/mangascraper

Sources

Currently, mangascraper supports 5 sources, but will support more in the future.

| Source | Supported? | Uses puppeteer? | Uses axios? | | ------------------------------------------- | ---------- | ------------------- | --------------- | | MangaBox | ✔️ | ✔️ | ✔️ | | Mangafreak | ❌ | --- | --- | | Mangakakalot | ✔️ | ❌ | ✔️ | | Manganato | ✔️ | ❌ | ✔️ | | Mangahasu | ✔️ | ❌ | ✔️ | | Mangaparkv2 | ✔️ | ✔️ | ❌ | | Mangasee | ✔️ | ✔️ | ❌ | | Readmng | ✔️ | ✔️ | ✔️ | | Kissmanga | ❌ | --- | --- |

If a supported source uses axios, mangascraper will try to use axios as much as possible to save computer resources. If the network request is blocked by Cloudflare, mangascraper will resort to using puppeteer.

If a supported source uses both axios and puppeteer, it means one or more methods in the source use either axios or puppeteer. For example, Readmng uses puppeteer for search(), but uses axios for getMangaMeta() and getPages

Usage

To start using the package, import a class such as Mangakakalot from the package and use the methods to get mangas from that source.

Here's an example:

import { Manganato } from '@specify_/mangascraper';

const manganato = new Manganato();

(async () => {
  const mangas = await manganato.search('One Piece');
  const meta = await manganato.getMangaMeta(mangas[0].url);
  console.log(meta.chapters);
})();

which outputs...

[
  {
    name: 'Chapter 1007',
    url: 'https://readmanganato.com/manga-aa951409/chapter-1007',
    views: '730,899',
    uploadDate: 2021-03-12T07:00:00.000Z
  },
  {
    name: 'Chapter 1006',
    url: 'https://readmanganato.com/manga-aa951409/chapter-1006',
    views: '364,964',
    uploadDate: 2021-03-05T07:00:00.000Z
  },
  ... and more items
]

Configuring puppeteer

Connecting to an endpoint

If you already have an existing puppeteer endpoint, mangascraper can connect to that endpoint instead and perform faster concurrent operations.

Mangascraper also includes its own puppeteer launch arguments, and it is recommended to use them for scraping to go smoothly.

import puppeteer from 'puppeteer';
import { initPuppeteer, MangaSee } from '@specify_/mangascraper';

(async () => {
  const browser = await puppeteer.launch({ ...initPuppeteer });
  const endpoint = browser.wsEndpoint();
  browser.disconnect();

  const mangasee = new MangaSee({ puppeteerInstance: { instance: 'endpoint', wsEndpoint: endpoint } });

  const mangas = await mangasee.search('Haikyu!');
})();

Since you are using your own puppeteer package, mangascraper cannot make any modificatins to the browser such as including a proxy.

const browser = await puppeteer.launch();
const mangapark = new MangaPark({
  proxy: { host: '127.0.0.1', port: 8080 },
  puppeteerInstance: { instance: 'custom', browser },
}); // ❌ Mangascraper cannot include proxy

const browser = await puppeteer.launch({ args: ['--proxy-server=127.0.0.1:8080'] });
const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } }); // ✔️ Our own browser instance will launch with a proxy

Because mangascraper is connecting to an existing endpoint, you must do all your browser arguments outside of mangascraper. See this for more on this.

Overriding mangascraper's puppeteer launch arguments

If you want to override the launch arguments mangascraper uses, you can add this to any manga class such as MangaSee as long as you are using the default instance. Any other instance will require you to implement your own or inherit mangascraper's puppeteer options with initPuppeteer

const mangasee = new MangaSee({ puppeteerInstance: { instance: 'default', launch: { ...myCustomLaunchOptions } } });

If you want to include a proxy, mangascraper will automatically put it into the launch arguments.

const manganato = new Mangahasu({
  proxy: { host: 'proxy_host', port: 8080 },
  puppeteerInstance: { instance: 'default' },
});

Using an existing puppeteer package

By using an existing puppeteer package in your app, this will enable mangascraper to use one browser instead of opening new browsers per operation. In addition, mangascraper will be able to scrape manga concurrently. With this approach, resources will be less intensive on chromium, and it can save you a lot of time if you are handling a lot of scraping operations. This is the best approach if you do not want to connect to an existing endpoint.

However, you must have puppeteer already installed.

This is the most basic setup:

import puppeteer from 'puppeteer';
import { MangaPark, initPuppeteer } from '@specify_/mangascraper';

(async () => {
  const browser = await puppeteer.launch(initPuppeteer);
  const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } });
})();

Since you are using your own puppeteer package, mangascraper cannot add any modifications to the browser such as including a proxy.

const browser = await puppeteer.launch();
const mangapark = new MangaPark({
  proxy: { host: '127.0.0.1', port: 8080 },
  puppeteerInstance: { instance: 'custom', browser },
}); // ❌ Mangascraper cannot include a proxy

const browser = await puppeteer.launch({ args: ['--proxy-server=127.0.0.1:8080'] });
const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } }); // ✔️ Our own browser instance will launch with a proxy

By default, mangascraper does not close the browser after the end of operation. If by any means you want to close the browser after an operation has finished. You can add the following to puppeteerInstance

puppeteerInstance: {
  instance: 'custom',
  browser: browser,
  options: {
    closeAfterOperation: true // After an operation is finished, close the browser
  }
}

However, this will prevent mangascraper from proceeding to another operation after one is finished such as this example:

const mangapark = new MangaPark({ puppeteerInstance: 'custom', browser, options: { closeAfterOperation: true } });
await mangapark
  .search('Naruto', { orderBy: 'latest_updates' })
  .then(async (mangas) => await Promise.all(mangas.map((manga) => mangapark.getMangaMeta(manga.url)))); // ❌ Browser will close after gathering results of mangas that match the title Naruto and will not gather metadata from each source.

Examples

const mangas = await mangahasu.search('Fairytail');
console.log(mangas);

mangahasu.search('Fairytail', null, (err, mangas) => {
  if (err) return console.error(err);
  console.log(mangas);
});

Get a list of manga that match the title Black Clover

import { Mangakakalot } from '@specify_/mangascraper';

const mangakakalot = new Mangakakalot();

mangakakalot.search('Black Clover', function (err, mangas) {
  console.log(mangas);
});

Get a list of manga from the Isekai genre

import { Mangakakalot } from '@specify_/mangascraper';

const mangakakalot = new Mangakakalot();

mangakakalot.getMangas({ genre: 'Isekai' }, function (err, mangas) {
  console.log(mangas);
});

Get the metadata of the Jaryuu Tensei manga

import { Mangakakalot } from '@specify_/mangascraper';

const mangakakalot = new Mangakakalot();

mangakakalot.getMangaMeta('https://mangakakalot.com/read-qt9nz158504844280', function (err, meta) {
  console.log(meta);
});

Get a list of manga that match the title Naruto

import { MangaNato } from '@specify_/mangascraper';

const manganato = new Manganato();

manganato.search('Naruto', null, function (err, mangas) {
  console.log(mangas);
});

Get a list of manga from the Romance genre that do not have the Drama genre

import { MangaNato } from '@specify_/mangascraper';

const manganato = new Manganato();

manganato.search(null, { genre: { include: ['Romance'], exclude: ['Drama'] } }, function (err, mangas) {
  console.log(mangas);
});

Get the metadata of the Solo Leveling manhwa

import { MangaNato } from '@specify_/mangascraper';

const manganato = new Manganato();

manganato.getMangaMeta('https://readmanganato.com/manga-dr980474', function (err, meta) {
  console.log(meta);
});

Simple search for manga that match the genre, which uses less compute power compared to getMangas()

import { MangaNato } from '@specify_/mangascraper';

const manganato = new MangaNato();

manganato.getMangasFromGenre('Comedy', {}, (err, mangas) => {
  console.log(mangas);
});

Get a list of manga

import { Mangahasu } from '@specify_/mangascraper';

const mangahasu = new Mangahasu();

mangahasu.search(null, null, (err, mangas) => {
  console.log(mangas);
});

Get the metadata of Attack on Titan manga

import { Mangahasu } from '@specify_/mangascraper';

const mangahasu = new Mangahasu();

mangahasu.getMangaMeta('https://mangahasu.se/shingeki-no-kyojin-v6-p27286.html', (err, meta) => {
  console.log(meta);
});

Get pages of the chapter that is in the 1st index of the Attack on Titan chapters array.

import { Mangahasu } from '@specify_/mangascraper';

const mangahasu = new Mangahasu();

(async () => {
  const mangas = await mangahasu.search('Attack on Titan');
  const meta = await mangahasu.getMangaMeta(mangas[0].url);
  const pages = await mangahasu.getPages(meta.chapters[0].url);

  console.log(pages);
})();

Get a list of manga that match the title the melancholy of haruhi suzumiya, and as well open puppeteer in headful mode (useful for debugging);

import { MangaSee } from '@specify_/mangascraper';

const mangasee = new MangaSee({ debug: true }); // Opens puppeteer in headful mode

(async () => {
  const mangas = await mangasee.search('the melancholy of haruhi suzumiya');
  console.log(mangas);
})();

Get all mangas from the MangaSee directory.

import { MangaSee } from '@specify_/mangascraper';

const mangasee = new MangaSee();

(async () => {
  const mangas = await mangasee.directory();
  console.log(mangas);
})();

Get the metadata of the Berserk manga

import { MangaSee } from '@specify_/mangascraper';

const mangasee = new MangaSee();

(async () => {
  const berserk = await mangasee.getMangaMeta('https://mangasee123.com/manga/Berserk');
  console.log(berserk);
})();

Get the Chapter 363 pages of the Berserk manga

import { MangaSee } from '@specify_/mangascraper';

const mangasee = new MangaSee();

(async () => {
  const chapter363 = await mangasee.getPages('https://mangasee123.com/read-online/Berserk-chapter-363-index-2.html');
  console.log(chapter363);
})();

Search for a manga that matches the title noragami.

Get the first result and get the meta

Then get the pages of the latest chapter

import { MangaPark, initPuppeteer } from '@specify_/mangascraper';

(async () => {
  const browser = await puppeteer.launch(initPuppeteer);
  const mangapark = new MangaPark({ puppeteerInstance: { instance: 'custom', browser } });

  const mangas = await mangapark.search('noragami');
  const meta = await mangapark.getMangaMeta(mangas[0].url);
  const pages = await mangapark.getPages(meta.chapters[meta.chapters.recentlyUpdated][0].pages);

  console.log(pages);
})();

Get 50 of the most viewed mangas

import { ReadMng } from '@specify_/mangascraper';

(async () => {
  const readmng = new ReadMng();

  const mangas = await readmng.search();

  console.log(mangas);
})();

For React JS

Get pages and display them on webpage. Do note that the getMangaMeta method of this class requires puppeteer, so if you want to get the manga meta, consider fetching to a custom API that uses the mangascraper package.

import React from 'react';
import { MangaBox } from '@specify_/mangascraper';

const mangabox = new MangaBox();

const App: React.FC = () => {
  const [pages, setPages] = React.useState<string[]>([]);

  React.useEffect(() => {
    mangabox
      .getPages('https://mangabox.org/manga/solo-leveling-manhua-manga/chapter-159/')
      .then((pages) => setPages(pages))
      .catch((e) => console.error(e));
  }, []);

  return (
    <div>
      {pages.map((page) => (
        <img src={page} />
      ))}
    </div>
  );
};

export default App;

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Table of Contents