npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@adguard/css-tokenizer

v1.1.1

Published

CSS / Extended CSS tokenizer

Downloads

2,074

Readme

CSS / Extended CSS Tokenizer

npm-badge install-size-badge license-badge

This library provides two distinct CSS tokenizers:

  1. Standard CSS Tokenizer: This tokenizer strictly adheres to the CSS Syntax Level 3 specification outlined by the W3C.
  2. Extended CSS Tokenizer: Designed to extend the capabilities of the standard tokenizer, this component introduces support for special pseudo-classes like :contains() and :xpath().

Table of contents:

Installation

You can install the library using

  • Yarn (recommended): yarn add @adguard/css-tokenizer
  • NPM: npm install @adguard/css-tokenizer
  • PNPM: pnpm add @adguard/css-tokenizer

Motivation

To appreciate the necessity for a custom tokenizer, it's essential to understand the concept of Extended CSS, recognize the challenges it poses, and discover how we can effectively address these issues.

What is Extended CSS?

Extended CSS is a superset of CSS used by adblockers to provide more robust filtering capabilities. In practical terms, Extended CSS introduces additional pseudo-classes that are not defined in the CSS specification. For more information, please refer to the following resources:

Why do we need a custom tokenizer?

The standard CSS tokenizer cannot handle Extended CSS's pseudo-classes in every case. For example, the :contains() pseudo-class can have the following syntax:

div:contains(i'm a parameter)

A standard CSS tokenizer interprets the single quotation mark (') as a string delimiter, causing an error due to the lack of a closing ) character. This deviation from the expected syntax results in a parsing issue.

The :xpath() pseudo-class poses a similar challenge for a standard CSS tokenizer, as it can have syntax like this:

div:xpath(//*...)

A standard tokenizer mistakenly identifies the /* sequence as the start of a comment, leading to incorrect parsing, however, the /* sequence is the part of the XPath expression.

The solution: Custom function handlers

We've designed the standard CSS tokenizer to rigorously adhere to the CSS Syntax Level 3 specification. However, we've also introduced the ability to handle certain pseudo-classes in a custom manner, akin to how the <url-token> is managed in the CSS specs. When the tokenizer encounters a function token (pattern: function-name(), it searches for a handler function in the functionHandlers map based on the function name and calls the custom handler if it exists.

The custom handler receives a single argument: the shared tokenizer context object, which can be used to manage the function, similar to how other tokens are handled in the library.

This approach allows us to maintain a native, specification-compliant CSS tokenizer with minimal overhead while also providing the flexibility to manage special pseudo-classes in a custom way.

In essence, the Extended CSS tokenizer is a standard CSS tokenizer with custom function handlers for special pseudo-classes.

No new token types

It's crucial to emphasize that our implementation remains committed to the token types specified in the CSS W3C standards. We do not introduce new token types, ensuring that our tokenizer stays in harmony with the official CSS Syntax Level 3 specification. This dedication to adhering to industry standards and best practices guarantees that our library maintains compatibility and consistency with CSS-related tools and workflows.

By preserving the standard CSS token types, we aim to provide users with a reliable and seamless experience while working with CSS, upholding the integrity of the language as defined by the W3C.

Example usage

Here's a straightforward example of how to use the library:

// `tokenize` is a regular CSS tokenizer (and doesn't support Extended CSS)
// `tokenizeExtended` is an Extended CSS tokenizer
const { tokenize, tokenizeExtended, getFormattedTokenName } = require('@adguard/css-tokenizer');

// Input to tokenize
const CSS_SOURCE = `div:contains(aa'bb) { display: none !important; }`;

const COLUMNS = Object.freeze({
    TOKEN: 'Token',
    START: 'Start',
    END: 'End',
    FRAGMENT: 'Fragment'
});

// Prepare the data array
const data = [];

// Tokenize the input - feel free to try `tokenize` and `tokenizeExtended`
tokenizeExtended(CSS_SOURCE, (token, start, end) => {
    data.push({
        [COLUMNS.TOKEN]: getFormattedTokenName(token),
        [COLUMNS.START]: start,
        [COLUMNS.END]: end,
        [COLUMNS.FRAGMENT]: CSS_SOURCE.substring(start, end),
    });
});

// Print the tokenization result as a table
console.table(data, Object.values(COLUMNS));

API

Tokenizer functions

tokenize

/**
 * CSS tokenizer function
 *
 * @param source Source code to tokenize
 * @param onToken Tokenizer callback which is called for each token found in source code
 * @param onError Error callback which is called when a parsing error is found (optional)
 * @param functionHandlers Custom function handlers (optional)
 */
function tokenize(
    source: string,
    onToken: OnTokenCallback,
    onError: OnErrorCallback = () => {},
    functionHandlers?: Map<number, TokenizerContextFunction>,
): void;

where

/**
 * Callback which is called when a token is found
 *
 * @param type Token type
 * @param start Token start offset
 * @param end Token end offset
 * @param props Other token properties (if any)
 * @note Hash tokens have a type flag set to either "id" or "unrestricted". The type flag defaults to "unrestricted" if
 * not otherwise set
 */
type OnTokenCallback = (type: TokenType, start: number, end: number, props?: Record<string, unknown>) => void;
/**
 * Callback which is called when a parsing error is found. According to the spec, parsing errors are not fatal and
 * therefore the tokenizer is quite permissive, but if needed, the error callback can be used.
 *
 * @param message Error message
 * @param start Error start offset
 * @param end Error end offset
 * @see {@link https://www.w3.org/TR/css-syntax-3/#error-handling}
 */
type OnErrorCallback = (message: string, start: number, end: number) => void;
/**
 * Function handler
 *
 * @param context Reference to the tokenizer context instance
 * @param ...args Additional arguments (if any)
 */
type TokenizerContextFunction = (context: TokenizerContext, ...args: any[]) => void;

tokenizeExtended

tokenizeExtended is an extended version of the tokenize function that supports custom function handlers. This function is designed to handle special pseudo-classes like :contains() and :xpath().

/**
 * Extended CSS tokenizer function
 *
 * @param source Source code to tokenize
 * @param onToken Tokenizer callback which is called for each token found in source code
 * @param onError Error callback which is called when a parsing error is found (optional)
 * @param functionHandlers Custom function handlers (optional)
 * @note If you specify custom function handlers, they will be merged with the default function handlers. If you
 * duplicate a function handler, the custom one will be used instead of the default one, so you can override the default
 * function handlers this way, if you want to.
 */
function tokenizeExtended(
    source: string,
    onToken: OnTokenCallback,
    onError: OnErrorCallback = () => {},
    functionHandlers: Map<number, TokenizerContextFunction> = new Map(),
): void

Utilities

TokenizerContext

A class that represents the tokenizer context. It is used to manage the tokenizer state and provides access to the source code, current position, and other relevant information.

decodeIdent

/**
 * Decodes a CSS identifier according to the CSS Syntax Module Level 3 specification.
 *
 * @param ident CSS identifier to decode.
 *
 * @example
 * ```ts
 * decodeIdent(String.raw`\00075\00072\0006C`); // 'url'
 * decodeIdent('url'); // 'url'
 * ```
 *
 * @returns Decoded CSS identifier.
 */
function decodeIdent(ident: string): string;

CSS_TOKENIZER_VERSION

/**
 * @adguard/css-tokenizer version
 */
const CSS_TOKENIZER_VERSION: string;

Token types

TokenType

An enumeration of token types recognized by the tokenizer. They are strictly based on the CSS Syntax Level 3 specification.

See https://www.w3.org/TR/css-syntax-3/#tokenization for more details.

getBaseTokenName

/**
 * Get base token name by token type
 *
 * @param type Token type
 *
 * @example
 * ```ts
 * getBaseTokenName(TokenType.Ident); // 'ident'
 * getBaseTokenName(-1); // 'unknown'
 * ```
 *
 * @returns Base token name or 'unknown' if token type is unknown
 */
function getBaseTokenName(type: TokenType): string;

getFormattedTokenName

/**
 * Get formatted token name by token type
 *
 * @param type Token type
 *
 * @example
 * ```ts
 * getFormattedTokenName(TokenType.Ident); // '<ident-token>'
 * getFormattedTokenName(-1); // '<unknown-token>'
 * ```
 *
 * @returns Formatted token name or `'<unknown-token>'` if token type is unknown
 */
function getFormattedTokenName(type: TokenType): string;

[!NOTE] Our API and token list is also compatible with the CSSTree's tokenizer API, and in the long term, we plan to integrate this library into CSSTree via our ECSSTree library, see this issue for more details.

Benchmark results

You can find the benchmark results in the benchmark/RESULTS.md file.

Ideas & Questions

If you have any questions or ideas for new features, please open an issue or a discussion. We will be happy to discuss it with you.

License

This project is licensed under the MIT license. See the LICENSE file for details.