oniguruma-to-es
v0.8.1
Published
Convert Oniguruma patterns to native JavaScript RegExp
Downloads
2,044,299
Maintainers
Readme
Oniguruma-To-ES
An Oniguruma to JavaScript regex transpiler that runs in the browser and on your server. Use it to:
- Take advantage of Oniguruma's many extended regex features in JavaScript.
- Run regexes written for Oniguruma from JavaScript, such as those used in TextMate grammars (used by VS Code, Shiki syntax highlighter, etc.).
- Share regexes across your Ruby and JavaScript code.
Compared to running the Oniguruma C library via WASM bindings using vscode-oniguruma, this library is less than 4% of the size and its regexes often run much faster since they run as native JavaScript.
Try the demo REPL
Oniguruma-To-ES deeply understands the hundreds of large and small differences between Oniguruma and JavaScript regex syntax and behavior, across multiple JavaScript version targets. It's obsessive about ensuring that the emulated features it supports have exactly the same behavior, even in extreme edge cases. And it's been battle-tested on thousands of real-world Oniguruma regexes used in TextMate grammars (via the Shiki library). A few uncommon features can't be perfectly emulated and allow rare differences, but if you don't want to allow this, you can set the accuracy
option to throw for such patterns (see details below).
📜 Contents
🕹️ Install and use
npm install oniguruma-to-es
import {toRegExp} from 'oniguruma-to-es';
const str = '…';
const pattern = '…';
// Works with all string/regexp methods since it returns a native regexp
str.match(toRegExp(pattern));
<script src="https://cdn.jsdelivr.net/npm/oniguruma-to-es/dist/index.min.js"></script>
<script>
const {toRegExp} = OnigurumaToES;
</script>
🔑 API
toRegExp
Accepts an Oniguruma pattern and returns an equivalent JavaScript RegExp
.
[!TIP] Try it in the demo REPL.
function toRegExp(
pattern: string,
options?: OnigurumaToEsOptions
): RegExp | EmulatedRegExp;
Type OnigurumaToEsOptions
type OnigurumaToEsOptions = {
accuracy?: 'default' | 'strict';
avoidSubclass?: boolean;
flags?: string;
global?: boolean;
hasIndices?: boolean;
maxRecursionDepth?: number | null;
rules?: {
allowOrphanBackrefs?: boolean;
allowUnhandledGAnchors?: boolean;
asciiWordBoundaries?: boolean;
captureGroup?: boolean;
};
target?: 'auto' | 'ES2025' | 'ES2024' | 'ES2018';
verbose?: boolean;
};
See Options for more details.
toDetails
Accepts an Oniguruma pattern and returns the details needed to construct an equivalent JavaScript RegExp
.
function toDetails(
pattern: string,
options?: OnigurumaToEsOptions
): {
pattern: string;
flags: string;
subclass?: EmulatedRegExpOptions;
};
Note that the returned flags
might also be different than those provided, as a result of the emulation process. The returned pattern
, flags
, and subclass
properties can be provided as arguments to the EmulatedRegExp
constructor to produce the same result as toRegExp
.
If the only keys returned are pattern
and flags
, they can optionally be provided to JavaScript's RegExp
constructor instead. Setting option avoidSubclass
to true
ensures that this is always the case, by throwing an error for any patterns that rely on EmulatedRegExp
's additional handling.
toOnigurumaAst
Returns an Oniguruma AST generated from an Oniguruma pattern.
function toOnigurumaAst(
pattern: string,
options?: {
flags?: string;
rules?: {
captureGroup?: boolean;
};
}
): OnigurumaAst;
EmulatedRegExp
Works the same as JavaScript's native RegExp
constructor in all contexts, but can be given results from toDetails
to produce the same result as toRegExp
.
class EmulatedRegExp extends RegExp {
constructor(
pattern: string | EmulatedRegExp,
flags?: string,
options?: EmulatedRegExpOptions
);
};
🔩 Options
The following options are shared by functions toRegExp
and toDetails
.
accuracy
One of 'default'
(default) or 'strict'
.
Sets the level of emulation rigor/strictness.
- Default: Permits a few close approximations in order to support additional features.
- Strict: Error if the pattern can't be emulated with identical behavior (even in rare edge cases) for the given
target
.
Using default accuracy
adds support for the following features, depending on target
:
- All targets (
ES2025
and earlier):- Enables use of
\X
using a close approximation of a Unicode extended grapheme cluster. - Enables recursion (ex:
\g<0>
) with a depth limit specified by optionmaxRecursionDepth
.
- Enables use of
ES2024
and earlier:- Enables use of case-insensitive backreferences to case-sensitive groups.
ES2018
:- Enables use of POSIX classes
[:graph:]
and[:print:]
using ASCII-based versions rather than the Unicode versions available forES2024
and later. Other POSIX classes are always Unicode-based.
- Enables use of POSIX classes
avoidSubclass
Default: false
.
Disables advanced emulation that relies on returning a RegExp
subclass. In cases when a subclass would otherwise have been used, this results in one of the following:
- An error is thrown for certain patterns that are not emulatable without a subclass.
- When the regex can still be emulated accurately, subpattern match details (accessed via properties of match objects returned when using the regex) might differ from Oniguruma.
flags
Oniguruma flags; a string with i
, m
, x
, D
, S
, W
in any order (all optional).
Flags can also be specified via modifiers in the pattern.
[!IMPORTANT] Oniguruma and JavaScript both have an
m
flag but with different meanings. Oniguruma'sm
is equivalent to JavaScript'ss
(dotAll
).
global
Default: false
.
Include JavaScript flag g
(global
) in the result.
hasIndices
Default: false
.
Include JavaScript flag d
(hasIndices
) in the result.
maxRecursionDepth
Default: 5
.
Specifies the recursion depth limit. Supported values are integers 2
–100
and null
. If null
, any use of recursion results in an error.
Since recursion isn't infinite-depth like in Oniguruma, use of recursion also results in an error if using strict accuracy
.
Using a high limit has a small impact on performance. Generally, this is only a problem if the regex has an existing issue with runaway backtracking that recursion exacerbates. Higher limits have no effect on regexes that don't use recursion, so you should feel free to increase this if helpful.
rules
Advanced pattern options that override standard error checking and flags when enabled.
allowOrphanBackrefs
: Useful with TextMate grammars that merge backreferences across patterns.allowUnhandledGAnchors
: Removes unsupported uses of\G
, rather than erroring.- Oniguruma-To-ES uses a variety of strategies to accurately emulate many common uses of
\G
. When using this option, if a\G
is found that doesn't have a known emulation strategy, the\G
is simply removed. This might lead to some false positive matches, but is useful for non-critical matching (like syntax highlighting) when having some mismatches is better than not working.
- Oniguruma-To-ES uses a variety of strategies to accurately emulate many common uses of
asciiWordBoundaries
: Use ASCII-based\b
and\B
, which increases search performance of generated regexes.captureGroup
: Oniguruma optionONIG_OPTION_CAPTURE_GROUP
. Unnamed captures and numbered calls allowed when using named capture.
target
One of 'auto'
(default), 'ES2025'
, 'ES2024'
, or 'ES2018'
.
JavaScript version used for generated regexes. Using auto
detects the best value based on your environment. Later targets allow faster processing, simpler generated source, and support for additional features.
ES2018
: Uses JS flagu
.- Emulation restrictions: Character class intersection, nested negated character classes, and Unicode properties added after ES2018 are not allowed.
- Generated regexes might use ES2018 features that require Node.js 10 or a browser version released during 2018 to 2023 (in Safari's case). Minimum requirement for any regex is Node.js 6 or a 2016-era browser.
ES2024
: Uses JS flagv
.- No emulation restrictions.
- Generated regexes require Node.js 20 or any 2023-era browser (compat table).
ES2025
: Uses JS flagv
and allows use of flag groups and duplicate group names.- Benefits: Faster transpilation, simpler generated source, and duplicate group names are preserved across separate alternation paths.
- Generated regexes might use features that require Node.js 23 or a 2024-era browser (except Safari, which lacks support for flag groups).
verbose
Default: false
.
Disables optimizations that simplify the pattern when it doesn't change the meaning.
✅ Supported features
Following are the supported features by target. The official Oniguruma syntax doc doesn't cover many of the finer details described here.
[!NOTE] Targets
ES2024
andES2025
have the same emulation capabilities. Resulting regexes might have different source and flags, but they match the same strings. Seetarget
.
Notice that nearly every feature below has at least subtle differences from JavaScript. Some features listed as unsupported are not emulatable using native JavaScript regexes, but support for others might be added in future versions of this library. Unsupported features throw an error.
The table above doesn't include all aspects that Oniguruma-To-ES emulates (including error handling, most aspects that work the same as in JavaScript, and many aspects of non-JavaScript features that work the same in the other regex flavors that support them).
Footnotes
- Target
ES2018
doesn't allow using Unicode property names added in JavaScript specifications after ES2018. - Unicode blocks (which in Oniguruma are used with an
In…
prefix) are easily emulatable but their character data would significantly increase library weight. They're also a flawed and arguably unuseful feature, given the ability to use Unicode scripts and other properties. - With target
ES2018
, the specific POSIX classes[:graph:]
and[:print:]
use ASCII-based versions rather than the Unicode versions available for targetES2024
and later, and they result in an error if using strictaccuracy
. - Target
ES2018
doesn't support nested negated character classes. - It's not an error for numbered backreferences to come before their referenced group in Oniguruma, but an error is the best path for Oniguruma-To-ES because (1) most placements are mistakes and can never match (based on the Oniguruma behavior for backreferences to nonparticipating groups), (2) erroring matches the behavior of named backreferences, and (3) the edge cases where they're matchable rely on rules for backreference resetting within quantified groups that are different in JavaScript and aren't emulatable. Note that it's not a backreference in the first place if using
\10
or higher and not as many capturing groups are defined to the left (it's an octal or identity escape). - The recursion depth limit is specified by option
maxRecursionDepth
. Overlapping recursions and the use of backreferences when the recursed subpattern contains captures aren't yet supported. Patterns that would error in Oniguruma due to triggering infinite recursion might find a match in Oniguruma-To-ES since recursion is bounded (future versions will detect this and error at transpilation time).
❌ Unsupported features
The following don't yet have any support, and throw errors. They're all infrequently-used features, with most being extremely rare. Note that Oniguruma-To-ES can handle 99.9% of real-world Oniguruma regexes, based on patterns used in a large collection of TextMate grammars.
- Supportable:
- Grapheme boundaries:
\y
,\Y
. - Flags
P
(POSIX is ASCII) andy{g}
/y{w}
(grapheme boundary modes). - Rarely-used character specifiers: Non-A-Za-z with
\cx
,\C-x
; meta\M-x
,\M-\C-x
; bracketed octals\o{…}
; octal UTF-8 encoded bytes (≥\200
). - Code point sequences:
\x{H H …}
,\o{O O …}
. - Whole-pattern modifier: Don't capture
(?C)
.
- Grapheme boundaries:
- Supportable for some uses:
- Absence functions:
(?~…)
, etc. - Conditionals:
(?(…)…)
, etc. - Whole-pattern modifiers: Ignore-case is ASCII
(?I)
, find longest(?L)
.
- Absence functions:
- Not supportable:
- Callout functions:
(?{…})
, etc.
- Callout functions:
㊗️ Unicode / mixed case-sensitivity
Oniguruma-To-ES fully supports mixed case-sensitivity (and handles the Unicode edge cases) regardless of JavaScript target. It also restricts Unicode properties to those supported by Oniguruma and the target JavaScript version.
Oniguruma-To-ES focuses on being lightweight to make it better for use in browsers. This is partly achieved by not including heavyweight Unicode character data, which imposes a couple of minor/rare restrictions:
- Character class intersection and nested negated character classes are unsupported with target
ES2018
. Use targetES2024
or later if you need support for these features. - With targets before
ES2025
, a handful of Unicode properties that target a specific character case (ex:\p{Lower}
) can't be used case-insensitively in patterns that contain other characters with a specific case that are used case-sensitively.- In other words, almost every usage is fine, including
A\p{Lower}
,(?i:A\p{Lower})
,(?i:A)\p{Lower}
,(?i:A(?-i:\p{Lower}))
, and\w(?i:\p{Lower})
, but notA(?i:\p{Lower})
. - Using these properties case-insensitively is basically never done intentionally, so you're unlikely to encounter this error unless it's catching a mistake.
- In other words, almost every usage is fine, including
👀 Similar projects
JsRegex transpiles Onigmo regexes to JavaScript (Onigmo is a fork of Oniguruma with mostly shared syntax and behavior). It's written in Ruby and relies on the Regexp::Parser Ruby gem, which means regexes must be pre-transpiled on the server to use them in JavaScript. Note that JsRegex doesn't always translate edge case behavior differences.
🏷️ About
Oniguruma-To-ES was created by Steven Levithan.
If you want to support this project, I'd love your help by contributing improvements, sharing it with others, or sponsoring ongoing development.
© 2024–present. MIT License.