meta-scrapper
v1.0.1
Published
Meta tag scrapper. Minimal. Fast. Easy to use.
Downloads
1
Maintainers
Readme
This library provides an opportunity to parse meta tags of sites. In order to parse information from the site - it is enough just to use the methods that are described below and provide them with a link.
The library was created for the linkmarker application, but was separated from the application itself, since now there are not very many analogues in NPM
Warning
It's impossible to parse data on the client side because of CORS, so use this library on the backend side
How to use?
Installation is very simple:
pnpm i -d js-meta-parser // For PNPM
npm i -d js-meta-parser // For NPM
yarn add js-meta-parser // For YARN
Module is available for CJS and ESM.
import {scrapMeta} from 'js-meta-parser';
// OR
const scrapMeta = require('js-meta-parser');
All modules are defined with TypeScript declarations 😌
Examples
Medium
Let's imagine that you want to parse all meta information from Medium
import scrapMeta from 'js-meta-parser';
const mediumMeta = scrapMeta('medium.com');
mediumMeta.then(meta => {
console.log(meta.info);
})
// Output
{
title: 'Medium – Where good ideas find you.',
url: URL {
href: 'https://medium.com/',
origin: 'https://medium.com',
},
descriptionList: [
'Medium is an open platform where readers find dynamic thinking, and where expert and undiscovered voices can share their writing on any topic.',
'Medium is an open platform where readers find dynamic thinking, and where expert and undiscovered voices can share their writing on any topic.'
],
iconList: [
'https://miro.medium.com/1*m-R_BkNf1Qjr1YbyOIJY2w.png',
'https://miro.medium.com/fit/c/152/152/1*sHhtYhaCe2Uc3IU0IgKwIQ.png'
],
preview: 'https://miro.medium.com/fit/c/152/152/1*sHhtYhaCe2Uc3IU0IgKwIQ.png',
themeColor: '#000000',
locale: 'en_US',
siteName: 'Medium',
appId: null,
type: 'website'
}
Telegram
As with the previous site, everything is also quite easy here:
const tgMeta = scrapMeta(new URL('https://telegram.org/'));
tgMeta.then(meta => {
// Also we can get unique fields one by one
console.log(
meta.title,
meta.type,
meta.locale,
meta.descriptionList,
meta.iconList,
// ...
);
})
// Output
Telegram Messenger, null, 'en_US', [ 'Fast. Secure. Powerful.' ],
[
'https://telegram.org/img/website_icon.svg?4',
'https://telegram.org/img/apple-touch-icon.png'
]
Available tags
At the moment, active development is underway, but all the main tags have been tested for performance, tests have been written on them. The following tags are available:
- Title
- Description (Default + OG)
- Icon (Default + OG)
- Type (OG)
- Site Name (OG)
- Preview (Default)
- Theme (Meta)
- Full URL
- Manifest parsing
- App ID (FB)
- Locale (OG)