@dscvr-one/link-preview-js
v0.1.0
Published
JavaScript module to extract and fetch HTTP link information from blocks of text.
Downloads
1
Readme
Before creating an issue
It's more than likely there is nothing wrong with the library:
- It's very simple; fetch HTML, parse HTML, and search for OpenGraph HTML tags.
- Unless HTML or the OpenGraph standard change, the library will not break
- If the target website you are trying to preview redirects you to a login page the preview will fail, because it will parse the login page
- If the target website does not have OpenGraph tags the preview will most likely fail, there are some fallbacks but in general, it will not work
- You cannot preview (fetch) another web page from YOUR web page. This is an intentional security feature of browsers called CORS
If you use this library and find it useful please consider sponsoring me, open source takes a lot of time and effort.
Link Preview
Allows you to extract information from an HTTP URL/link (or parse an HTML string) and retrieve meta information such as title, description, images, videos, etc. via OpenGraph tags.
GOTCHAs
- You cannot request a different domain from your web app (Browsers block cross-origin-requests). If you don't know how same-origin-policy works, here is a good intro, therefore this library works on Node.js and certain mobile run-times (Cordova or React-Native).
- This library acts as if the user would visit the page, sites might re-direct you to sign-up pages, consent screens, etc. You can try to change the user-agent header (try with
google-bot
or withTwitterbot
), but you need to work around these issues yourself.
API
getLinkPreview
: you have to pass a string, doesn't matter if it is just a URL or a piece of text that contains a URL, the library will take care of parsing it and returning the info o the first valid HTTP(S) URL info it finds.
getPreviewFromContent
: useful for passing a pre-fetched Response object from an existing async/etc. call. Refer to the example below for required object values.
import { getLinkPreview, getPreviewFromContent } from "link-preview-js";
// pass the link directly
getLinkPreview("https://www.youtube.com/watch?v=MejbOFk7H6c").then((data) =>
console.debug(data)
);
////////////////////////// OR //////////////////////////
// pass a chunk of text
getLinkPreview(
"This is a text supposed to be parsed and the first link displayed https://www.youtube.com/watch?v=MejbOFk7H6c"
).then((data) => console.debug(data));
////////////////////////// OR //////////////////////////
// pass a pre-fetched response object
// The passed response object should include, at minimum:
// {
// data: '<!DOCTYPE...><html>...', // response content
// headers: {
// ...
// // should include content-type
// content-type: "text/html; charset=ISO-8859-1",
// ...
// },
// url: 'https://domain.com/' // resolved url
// }
yourAjaxCall(url, (response) => {
getPreviewFromContent(response).then((data) => console.debug(data));
});
Options
Additionally, you can pass an options object which should add more functionality to the parsing of the link
| Property Name | Result |
| -------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| imagesPropertyType (optional) (ex: 'og') | Fetches images only with the specified property, meta[property='${imagesPropertyType}:image']
|
| headers (optional) (ex: { 'user-agent': 'googlebot', 'Accept-Language': 'en-US' }) | Add request headers to fetch call |
| timeout (optional) (ex: 1000) | Timeout for the request to fail |
| followRedirects (optional) (default 'error') | For security reasons, the library does not automatically follow redirects ('error' value), a malicious agent can exploit redirects to steal data, posible values: ('error', 'follow', 'manual') |
| handleRedirects (optional) (with followRedirects 'manual') | When followRedirects is set to 'manual' you need to pass a function that validates if the redirectinon is secure, below you can find an example |
| resolveDNSHost (optional) | Function that resolves the final address of the detected/parsed URL to prevent SSRF attacks |
getLinkPreview("https://www.youtube.com/watch?v=MejbOFk7H6c", {
imagesPropertyType: "og", // fetches only open-graph images
headers: {
"user-agent": "googlebot", // fetches with googlebot crawler user agent
"Accept-Language": "fr-CA", // fetches site for French language
// ...other optional HTTP request headers
},
timeout: 1000
}).then(data => console.debug(data));
SSRF Concerns
Doing requests on behalf of your users or using user-provided URLs is dangerous. One of such attack is trying to fetch a domain that redirects to localhost so the users get the contents of your server (doesn't affect mobile runtimes). To mitigate this attack you can use the resolveDNSHost option:
// example how to use node's dns resolver
const dns = require("node:dns");
getLinkPreview("http://maliciousLocalHostRedirection.com", {
resolveDNSHost: async (url: string) => {
return new Promise((resolve, reject) => {
const hostname = new URL(url).hostname;
dns.lookup(hostname, (err, address, family) => {
if (err) {
reject(err);
return;
}
resolve(address); // if address resolves to localhost or '127.0.0.1' library will throw an error
});
});
},
}).catch((e) => {
// will throw a detected redirection to localhost
});
This might add some latency to your request but prevents loopback attacks.
Redirections
Same to SSRF, following redirections is dangerous, the library errors by default when the response tries to redirect the user. There are however some simple redirections that are valid (e.g. HTTP to HTTPS) and you might want to allow them, you can do it via:
await getLinkPreview(`http://google.com/`, {
followRedirects: `manual`,
handleRedirects: (baseURL: string, forwardedURL: string) => {
const urlObj = new URL(baseURL);
const forwardedURLObj = new URL(forwardedURL);
if (
forwardedURLObj.hostname === urlObj.hostname ||
forwardedURLObj.hostname === "www." + urlObj.hostname ||
"www." + forwardedURLObj.hostname === urlObj.hostname
) {
return true;
} else {
return false;
}
},
});
Response
Returns a Promise that resolves with an object describing the provided link. The info object returned varies depending on the content type (MIME type) returned in the HTTP response (see below for variations of response). Rejects with an error if the response can not be parsed or if there was no URL in the text provided.
Text/HTML URL
{
url: "https://www.youtube.com/watch?v=MejbOFk7H6c",
title: "OK Go - Needing/Getting - Official Video - YouTube",
siteName: "YouTube",
description: "Buy the video on iTunes: https://itunes.apple.com/us/album/needing-getting-bundle-ep/id508124847 See more about the guitars at: http://www.gretschguitars.com...",
images: ["https://i.ytimg.com/vi/MejbOFk7H6c/maxresdefault.jpg"],
mediaType: "video.other",
contentType: "text/html",
charset: "utf-8"
videos: [],
favicons:["https://www.youtube.com/yts/img/favicon_32-vflOogEID.png","https://www.youtube.com/yts/img/favicon_48-vflVjB_Qk.png","https://www.youtube.com/yts/img/favicon_96-vflW9Ec0w.png","https://www.youtube.com/yts/img/favicon_144-vfliLAfaB.png","https://s.ytimg.com/yts/img/favicon-vfl8qSV2F.ico"]
}
Image URL
{
url: "https://media.npr.org/assets/img/2018/04/27/gettyimages-656523922nunes-4bb9a194ab2986834622983bb2f8fe57728a9e5f-s1100-c15.jpg",
mediaType: "image",
contentType: "image/jpeg",
favicons: [ "https://media.npr.org/favicon.ico" ]
}
Audio URL
{
url: "https://ondemand.npr.org/anon.npr-mp3/npr/atc/2007/12/20071231_atc_13.mp3",
mediaType: "audio",
contentType: "audio/mpeg",
favicons: [ "https://ondemand.npr.org/favicon.ico" ]
}
Video URL
{
url: "https://www.w3schools.com/html/mov_bbb.mp4",
mediaType: "video",
contentType: "video/mp4",
favicons: [ "https://www.w3schools.com/favicon.ico" ]
}
Application URL
{
url: "https://assets.curtmfg.com/masterlibrary/56282/installsheet/CME_56282_INS.pdf",
mediaType: "application",
contentType: "application/pdf",
favicons: [ "https://assets.curtmfg.com/favicon.ico" ]
}
Ship.js Automated Release(s) 🏗
- Once all features/bugfixes are deployed on
main
- Run
npm run release
& ship.js will trigger a build with updated CHANGELOG & proper git tags - Follow the guide from the automated PR from Ship.js
- Once you Squash & Merge the automated PR, wait for the Ship.js trigger workflow to run successfully.
Branching Strategy 🎋
- Create your feature branch from
main
branch, eg.chore/DCP-123-update-config
- Create a new PR from
chore/DCP-123-update-config
tomain
- Once the PR is merged into
main
, follow the Ship.js Automated Release(s) section
Contributing
- Create your feature branch from
main
(git checkout -b chore/DCP-123-update-config
) - Commit your changes (
git commit -Sam 'feat: add feature'
) - Push to the branch (
git push origin chore/DCP-123-update-config
) - Create a new Pull Request
Note:
- Please contribute using GitHub Flow
- Commits & PRs will be allowed only if the commit messages & PR titles follow the conventional commit standard, read more about it here
- PS. Ensure your commits are signed. Read why
License
MIT license