npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

purlfy

v0.3.11

Published

The ultimate URL purifier.

Downloads

31

Readme

pURLfy

English | 简体中文

The ultimate URL purifier.

[!NOTE] Do you know that the name "pURLfy" is a combination of "purify" and "URL"? It can be pronounced as pjuɑrelfaɪ.

🪄 Features

Usually, pURLfy is used for purifying URL, including removing redundant tracking parameters, skipping redirecting pages, and extracting the link that really matters. However, pURLfy is not limited to this. It is actually a powerful rule-based tool for transforming URLs, and example use cases include replacing the domain name and redirecting to an alternative of the given URL etc. It features:

  • ⚡ Fast: Purify URLs quickly and efficiently.
  • 🪶 Lightweight: Zero-dependency; Minified script less than 4kb.
  • 📃 Rule-based: Perform purification based on rules, making it more flexible.
  • 🔄️ Async: Calling purify won't block your thread.
  • 🔁 Iterative purification: If the URL still contains tracking parameters after a single purification (e.g. URLs returned by redirect rules), it will continue to be purified.
  • 📊 Statistics: You can track statistics of the purification process, including the number of links purified, the number of parameters removed, the number of URLs decoded, the number of URLs redirected, and the number of characters deleted, etc.

🤔 Usage

🚀 Quick Start

Visit our demo page, try out our Tampermonkey script, or simply node cli.js <url[]> [<options>] to purify a list of URLs (For more information, please refer to the comments in the script).

// Somewhat import `Purlfy` class from https://cdn.jsdelivr.net/gh/PRO-2684/pURLfy@latest/purlfy.min.js
const purifier = new Purlfy({ // Instantiate a Purlfy object
    fetchEnabled: true,
    lambdaEnabled: true,
});
const rules = await (await fetch("https://cdn.jsdelivr.net/gh/PRO-2684/[email protected]/<ruleset>.json")).json(); // Rules
// You may also use GitHub raw link for really latest rules: https://raw.githubusercontent.com/PRO-2684/pURLfy-rules/core-0.3.x/<ruleset>.json
const additionalRules = {}; // You can also add your own rules
purifier.importRules(rules, additionalRules); // Import rules
purifier.addEventListener("statisticschange", e => { // Add an event listener for statistics change
    console.log("Statistics increment:", e.detail); // Only available in platforms that support `CustomEvent`
    console.log("Current statistics:", purifier.getStatistics());
});
purifier.purify("https://example.com/?utm_source=123").then(console.log); // Purify a URL

Here's a list of test URLs that you can use to test pURLfy:

  • Bilibili's short link: https://b23.tv/SI6OEcv
  • Ordinary Tieba link: https://tieba.baidu.com/p/7989575070?share=none&fr=none&see_lz=none&share_from=none&sfc=none&client_type=none&client_version=none&st=none&is_video=none&unique=none
  • MC Wiki's external link: https://link.mcmod.cn/target/aHR0cHM6Ly9naXRodWIuY29tL3dheTJtdWNobm9pc2UvQmV0dGVyQWR2YW5jZW1lbnRz
  • Bing's search result: https://www.bing.com/ck/a?!&&p=de70ef254652193fJmltdHM9MTcxMjYyMDgwMCZpZ3VpZD0wMzhlNjdlMy1mN2I2LTZmMDktMGE3YS03M2JlZjZhMzZlOGMmaW5zaWQ9NTA2Nw&ptn=3&ver=2&hsh=3&fclid=038e67e3-f7b6-6f09-0a7a-73bef6a36e8c&psq=anti&u=a1aHR0cHM6Ly9nby5taWNyb3NvZnQuY29tL2Z3bGluay8_bGlua2lkPTg2ODkyMg&ntb=1
  • A URL nested too many times that cannot be opened normally: https://www.minecraftforum.net/linkout?remoteUrl=https%3A%2F%2Fwww.urlshare.cn%2Fumirror_url_check%3Furl%3Dhttps%253A%252F%252Fc.pc.qq.com%252Fmiddlem.html%253Fpfurl%253Dhttps%25253A%25252F%25252Fgithub.com%25252Fjiashuaizhang%25252Frpc-encrypt%25253Futm_source%25253Dtest

📚 API

Constructor

new Purlfy({
    fetchEnabled: Boolean, // Enable the redirect mode (default: false)
    lambdaEnabled: Boolean, // Enable the lambda mode (default: false)
    maxIterations: Number, // Maximum number of iterations (default: 5)
    statistics: { // Initial statistics
        url: Number, // Number of links purified
        param: Number, // Number of parameters removed
        decoded: Number, // Number of URLs decoded (`param` mode)
        redirected: Number, // Number of URLs redirected (`redirect` mode)
        visited: Number, // Number of URLs visited (`visit` mode)
        char: Number, // Number of characters deleted
    },
    log: Function, // Log function (default is using `console.log` for output)
    fetch: async Function, // Function to fetch the given URL, should at least support `method`, `headers` and `redirect` in `options` parameter (default is using `fetch`)
})

Instance Methods

  • importRules(...rulesets: object[]): void: Import a series of rulesets.
  • purify(url: string): Promise<object>: Purify a URL.
    • url: The URL to be purified.
    • Returns a Promise that resolves to an object containing:
      • url: string: The purified URL.
      • rule: string: The matched rule.
  • clearStatistics(): void: Clear statistics.
  • clearRules(): void: Clear all imported rules.
  • getStatistics(): object: Get statistics.
  • addEventListener("statisticschange", callback: function): void: Add an event listener for statistics change.
    • The callback function will receive an CustomEvent / Event object based on whether the platform supports it.
    • If platform supports CustomEvent, the detail property of the event object will contain the incremental statistics.
  • removeEventListener("statisticschange", callback: function): void: Remove an event listener for statistics change.

Instance Properties

You can change these properties after instantiation, and they will take effect for the next call to purify.

  • fetchEnabled: Boolean: Whether the redirect mode is enabled.
  • lambdaEnabled: Boolean: Whether the lambda mode is enabled.
  • maxIterations: Number: Maximum number of iterations.

Static Properties

  • Purlfy.version: string: The version of pURLfy.

📖 Rulesets

Community-contributed rulesets are hosted on GitHub, and you can find them at pURLfy-rules. The format of a ruleset file is as follows:

{
    "<domain>": {
        "<path>": {
            // A single rule
            "description": "<description>",
            "mode": "<mode>",
            // Other parameters
            "author": "<author>"
        },
        // ...
    },
    // ...
}

Formal definition of the format can be found at ruleset.schema.json.

✅ Path Matching

<domain>, <path>: The domain and a part of path, such as example.com/, /^.+\.example\.com$, path/ and page. Here's an explanation of them:

  • The basic behavior is like paths on Unix file systems.
    • If not ending with /, its value will be treated as a rule.
    • If ending with /, there's more paths under it, like "folders" (theoretically, you can nest infinitely)
    • / is not allowed in the middle of <domain> or <path>.
  • Note that if it starts with /, it will be treated as a RegExp pattern.
    • For example, /^.+\.example\.com$ will match all subdomains of example.com, and /^\d+$ will match a part of path that contains only digits.
    • Do remember to escape \, . etc in JSON strings.
    • Empty regex will be ignored. (i.e. / or //)
    • Using RegExp is not recommended unless necessary, since it will slow down the matching process.
  • If it's an empty string, it will be treated as a FallBack rule: this rule will be used when no other rules are matched at this level.
  • If there's multiple rules matched, the best matched rule will be used. (Exact match > RegExp match > FallBack rule)
  • If you want a rule to match all paths under a domain, you can omit <path>, but remember to remove the / after the domain.

A simple example with comments showing the URLs that can be matched:

{
    "example.com/": {
        "a": {
            // The rule here will match "example.com/a"
        },
        "path/": {
            "to/": {
                "page": {
                    // The rule here will match "example.com/path/to/page"
                },
                "/^\\d+$": { // Remember to escape `\`
                    // The rule here will match all paths under "example.com/path/to/" that are composed of digits
                },
                "": {
                    // The rule here will match "example.com/path/to", excluding "page" and digits under it
                }
            },
            "": {
                // The rule here will match "example.com/path", excluding "to" under it
            }
        },
        "": {
            // The rule here will match "example.com", excluding "path" under it
        }
    },
    "example.org": {
        // The rule here will match every path under "example.org"
    },
    "": {
        // Fallback: this rule will be used for all paths that are not matched
    }
}

Here's an erroneous example:

{
    "example.com/": {
        "path/": { // Path ending with `/` will be treated as a "directory", thus you should remove the trailing `/`
            // Attempting to match "example.com/path"
        }
    },
    "example.org": { // Path not ending with `/` will be treated as a rule, thus you should add a trailing `/`
        "page": {
            // Attempting to match "example.org/page"
        }
    },
    "example.net/": {
        "path/to/page": { // Can't contain `/` in the middle - you should nest them
            // Attempting to match "example.net/path/to/page"
        },
        "/^\d+$": { // `\d` won't parse correctly in JSON strings, so use `\\d` instead
            // Attempting to match all paths under "example.net/" that are composed of digits
        }
    }
}

📃 A Single Rule

Paths not ending with / will be treated as a single rule, and there's multiple modes for a rule. The common parameters are as follows:

{
    "description": "<Rule Description>",
    "mode": "<Mode>",
    // Mode-specific parameters
    "author": "<Author>"
}

This table shows supported parameters for each mode:

| Param\Mode | white | black | param | regex | redirect | visit | lambda | | ---------- | -- | --- | -- | --- | -- | --- | -- | | std | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | params | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | | acts | ❌ | ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | | regex | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | | replace | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | | ~~ua~~ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | | headers | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | | lambda | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | | continue | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |

🟢 Whitelist Mode white

| Param | Type | Default | | --- | --- | --- | | params | string[] | Required |

Under Whitelist mode, only the parameters specified in params will be kept, and others will be removed. Usually this is the most commonly used mode.

🔴 Blacklist Mode black

| Param | Type | Default | | --- | --- | --- | | params | string[] | Required | | std | Boolean | false |

Under Blacklist mode, the parameters specified in params will be removed, and others will be kept. std is for controlling whether the URL search string shall be deemed standard. Only if it is true or the URL search string is indeed standard will the URL be processed.

🟤 Specific Parameter Mode param

| Param | Type | Default | | --- | --- | --- | | params | string[] | Required | | acts | string[] | ["url"] | | continue | Boolean | true |

Under Specific Parameter mode, pURLfy will:

  1. Attempt to extract the parameters specified in params in order, until the first existing parameter is matched.
  2. Decode the parameter value using the processors specified in the acts array in order (if any acts value is invalid or throws an error, it is considered a failure and the original URL is returned).
  3. Use the final result as the new URL.
  4. If continue is not set to false, purify the new URL again.

🟣 Regex Mode regex

| Param | Type | Default | | --- | --- | --- | | acts | string[] | [] | | regex | string[] | Required | | replace | string[] | Required | | continue | Boolean | true |

Under Regex mode, pURLfy will, for each regex-replace pair:

  1. Match the RegExp pattern specified in regex against the URL.
  2. Replace all matched parts with the "replacement string" specified in replace.
  3. Decode the result using the processors specified in the acts array in order (if any acts value is invalid or throws an error, it is considered a failure and the original URL is returned).

If you'd like to learn more about the syntax of the "replacement string", please refer to the MDN documentation.

🟡 Redirect Mode redirect

[!CAUTION] For compatibility reasons, the redirect mode is disabled by default. Refer to the API documentation for enabling it.

| Param | Type | Default | | --- | --- | --- | | ~~ua~~ | string | undefined | | headers | object | {} | | continue | Boolean | true |

Under Redirect mode, pURLfy will call constructor parameter fetch to get the redirected URL, by firing a HEAD request using headers as the headers to the matched URL and return the Location header or the updated response.url. If continue is not set to false, the new URL will be purified again.

Note: ua parameter will be deprecated in the future, and you should use headers to set the User-Agent header.

🟠 Visit Mode visit

[!CAUTION] For compatibility reasons, the redirect mode is disabled by default. Refer to the API documentation for enabling it.

| Param | Type | Default | | --- | --- | --- | | ~~ua~~ | string | undefined | | headers | object | {} | | acts | string[] | ["regex:<url_pattern>"] | | continue | Boolean | true |

Under Visit mode, pURLfy will visit the URL with headers as the headers, and if the URL has not beed redirected, it will call the processors specified in acts in order (<url_pattern> is https?:\/\/.(?:www\.)?[-a-zA-Z0-9@%._\+~#=]{2,256}\.[a-z]{2,6}\b(?:[-a-zA-Z0-9@:%_\+.~#?!&\/\/=]*)). The initial input to acts is of type string, i.e. the text returned by visiting the URL. If the URL has been redirected, the redirected URL will be returned. If continue is not set to false, the new URL will be purified again.

Note: ua parameter will be deprecated in the future, and you should use headers to set the User-Agent header.

🔵 Lambda Mode lambda

[!CAUTION] For security reasons, the lambda mode is disabled by default. If you trust the rules provider, refer to the API documentation for enabling it.

| Param | Type | Default | | --- | --- | --- | | lambda | string | Required | | continue | Boolean | true |

Under Lambda mode, pURLfy will try to execute the lambda function specified in lambda and use the result as the new URL. The function shall be async, and its body should accept a single URL parameter url and return a new URL object. For example:

{
    "example.com": {
        "description": "example",
        "mode": "lambda",
        "lambda": "url.searchParams.delete('key'); return url;",
        "continue": false,
        "author": "PRO-2684"
    },
    // ...
}

If URL https://example.com/?key=123 matches this rule, the key parameter will be deleted. After this operation, since continue is set to false, the URL returned by the function will not be purified again. Of course, this is not a good example, because this can be achieved by using Blacklist mode.

🖇️ Processors

Some processors support parameters, simply append them to the function name separated by a colon (:): func:arg. The following processors are currently supported:

  • url: string->string, URL decoding (decodeURIComponent)
  • base64: string->string, Base64 decoding of UTF-8 strings (Adapted from MDN)
  • slice:start:end: string->string, String slicing (s.slice(start, end)), start and end will be converted to integers
  • regex:<regex>: string->string, regex matching, returns the first match of the regex or an empty string if no match is found
  • dom: string->Document, parse the string as a HTML Document object (you'll need to define DOMParser globally if using in Node.js)
  • sel:<selector>: Any->Element/null, select the first element using CSS selector <selector> (The input shall have querySelector method)
  • attr:<attribute>: Element->string, get the value of the attribute <attribute> of the element (getAttribute)
  • text: Element->string, get the text content of the element (textContent)

😎 Projects Using pURLfy

[!TIP] If you are using pURLfy in your project, feel free to submit a PR to add your project here!

🎉 Acknowledgments

⭐ Star History

Stargazers over time