npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

cm6-monarch

v0.6.3

Published

A (supercharged) port of the Monaco Editor's Monarch syntax highlighter to CodeMirror 6.

Downloads

39

Readme

cm6-monarch

This is a 'fork' of the Monaco Editor's Monarch syntax highlighter to CodeMirror 6. It is generally compatible with Monaco Editor language sources. In contrast to the original implementation, this version of Monarch has been radically supercharged with many backend improvements. The most useful new features are likely the action parser syntax tree additions, and the usage of sticky regexes in the backend, allowing for full lookbehind support.

Please note: This isn't done. It does work - and I don't think it's too buggy, but it's still got a little ways to go before I can consider it done. It's mainly missing quality of life features.

Usage

If you're wanting to make a language using Monarch, the official tutorial/playground can be helpful, with the examples being something you should look at. However, if you need a reference, I have written a replacement document with a complete description of all presently supported syntax and features, in the docs/syntax.md file. There is some significant additions, although basically only non-breaking ones, like lookbehind support.

A few things to note:

  • Don't use this if you think you can make a Lezer grammar. CodeMirror does better with a proper parser.
  • You should use the CodeMirror 6 highlighter tags. You can actually use your own tags/scopes, the language will automatically create them and export them for you. However, using these is not advised because they'll be entirely custom to your language and hard to support.
  • You can now nest scopes, albeit not as nicely as states - check the docs/syntax.md file (at the bottom) to learn more. The nesting is done as a parser directive, and what I mean is that the parser will actually create a syntax tree from your parser directives. You can use this to add code folding and all kinds of other syntax node related features.

To actually, y'know, use it in the code, it looks like this:

import { createMonarchLanguage } from 'cm6-monarch'

// Just for reference, this is the configuration interface.
interface MonarchLanguageDefinition {
  /** The name of the language. This is actually important for CodeMirror, so make sure it's correct. */
  name: string
  /** The Monarch lexer that will be used to tokenize the language. */
  lexer: IMonarchLanguage
  /** A list of `LanguageDescription` objects that will be used when the parser nests in a language. */
  nestLanguages?: LanguageDescription[]
  /** Configuration options for the parser, such as node props. */
  configure?: MonarchConfigure
  /** A list of aliases for the name of the language. (e.g. 'go' -> ['golang']) */
  alias?: string[]
  /** A list of file extensions. (e.g. ['.ts']) */
  ext?: string[]
  /** The 'languageData' field for the language. CodeMirror plugins use this data to interact with the language. */
  languageData?: { [name: string]: any }
  /** Extra extensions to be loaded. */
  extraExtensions?: Extension[]
}

// And what it returns:
interface MonarchLanguageData {
  /** Creates a `LanguageSupport` object that can be used like an ordinary language/extension. */
  load(): LanguageSupport
  /** A list of new `Tag` objects generated automatically from the language definition. */
  tags: { [name: string]: Tag }
  /** A `LanguageDescription` object, commonly used for nesting languages. */
  description: LanguageDescription
}

// This uses the lexer given in the Monarch tutorial.
// const { load, tags, description } = createMonarchLanguage({
const myLanguage = createMonarchLanguage({
  name: 'myLang',
  lexer: {
    // defaultToken: 'invalid',
    keywords: [
      'abstract', 'continue', 'for', 'new', 'switch', 'assert', 'goto', 'do',
      'if', 'private', 'this', 'break', 'protected', 'throw', 'else', 'public',
      'enum', 'return', 'catch', 'try', 'interface', 'static', 'class',
      'finally', 'const', 'super', 'while', 'true', 'false'
    ],
    typeKeywords: [
      'boolean', 'double', 'byte', 'int', 'short', 'char', 'void', 'long', 'float'
    ],
    operators: [
      '=', '>', '<', '!', '~', '?', ':', '==', '<=', '>=', '!=',
      '&&', '||', '++', '--', '+', '-', '*', '/', '&', '|', '^', '%',
      '<<', '>>', '>>>', '+=', '-=', '*=', '/=', '&=', '|=', '^=',
      '%=', '<<=', '>>=', '>>>='
    ],
    symbols:  /[=><!~?:&|+\-*\/\^%]+/,
    escapes: /\\(?:[abfnrtv\\"']|x[0-9A-Fa-f]{1,4}|u[0-9A-Fa-f]{4}|U[0-9A-Fa-f]{8})/,

    tokenizer: {
      root: [
        // identifiers and keywords
        [/[a-z_$][\w$]*/, { cases: {'@typeKeywords': 'keyword',
                                    '@keywords': 'keyword',
                                    '@default': 'identifier' } }],
        [/[A-Z][\w\$]*/, 'type.identifier' ],  // to show class names nicely

        // whitespace
        { include: '@whitespace' },

        // delimiters and operators
        [/[{}()\[\]]/, '@brackets'],
        [/[<>](?!@symbols)/, '@brackets'],
        [/@symbols/, { cases: { '@operators': 'operator',
                                '@default'  : '' } } ],

        // @ annotations.
        // As an example, we emit a debugging log message on these tokens.
        // Note: message are supressed during the first load -- change some lines to see them.
        [/@\s*[a-zA-Z_\$][\w\$]*/, { token: 'annotation', log: 'annotation token: $0' }],

        // numbers
        [/\d*\.\d+([eE][\-+]?\d+)?/, 'number.float'],
        [/0[xX][0-9a-fA-F]+/, 'number.hex'],
        [/\d+/, 'number'],

        // delimiter: after number because of .\d floats
        [/[;,.]/, 'delimiter'],

        // strings
        [/"([^"\\]|\\.)*$/, 'string.invalid' ],  // non-teminated string
        [/"/,  { token: 'string.quote', bracket: '@open', next: '@string' } ],

        // characters
        [/'[^\\']'/, 'string'],
        [/(')(@escapes)(')/, ['string','string.escape','string']],
        [/'/, 'string.invalid']
      ],

      comment: [
        [/[^\/*]+/, 'comment' ],
        [/\/\*/,    'comment', '@push' ],    // nested comment
        ["\\*/",    'comment', '@pop'  ],
        [/[\/*]/,   'comment' ]
      ],

      string: [
        [/[^\\"]+/,  'string'],
        [/@escapes/, 'string.escape'],
        [/\\./,      'string.escape.invalid'],
        [/"/,        { token: 'string.quote', bracket: '@close', next: '@pop' } ]
      ],

      whitespace: [
        [/[ \t\r\n]+/, 'white'],
        [/\/\*/,       'comment', '@comment' ],
        [/\/\/.*$/,    'comment'],
      ],
    }
  }
})

// And this is how you would load it in the editor:

import {EditorState} from "@codemirror/state"
import {EditorView, keymap} from "@codemirror/view"
import {defaultKeymap} from "@codemirror/commands"
import {defaultHighlightStyle} from '@codemirror/highlight'

const startState = EditorState.create({
  doc: "Hello World",
  extensions: [
    keymap.of(defaultKeymap),
    defaultHighlightStyle,
    myLanguage.load()
  ]
})

const view = new EditorView({
  state: startState,
  parent: document.body
})

// Note that `description` is also exported by the creation function.
// The `description` object is a `LanguageDescription`,
// which are most commonly used to load nested grammars.

// The `lang-markdown` language supports these, so just to show how that works:

import { markdown } from '@codemirror/lang-markdown'

const myBetterStartState = EditorState.create({
  doc: "Hello World",
  extensions: [
    keymap.of(defaultKeymap),
    defaultHighlightStyle,
    markdown({ codeLanguages: [myLanguage.description] })
  ]
})

// You could of course do this the other way around,
// and nest `LanguageDescription`s in the language you created.

Why?

The dark lord cometh it gets worse

God, I ask myself that every day.

Anyways, it's because Monarch uses regex, and this particular version of it supports stupidly flexible regex. If you're a regex wizard, you'll like this.

To be more specific about why, we'll start with a comparison to Lezer, which is the parser CodeMirror 6 normally uses. Lezer is a proper parser, capable of outputting beautiful syntax trees with stunningly simple grammar definitions. Seriously - go look at the official Lezer grammar for json files - it's really tiny and simple to understand. This parser can't do that.

However, Monarch enthusiastically allows you to do sickening, awful things. It has almost none of the restrictions that Lezer has. This parser can back up its input as far as you like, rematch against a token, enter pseudo not real states, allows you to use the name of those phantom states in the branching paths and token names (wtf), now fully supports lookbehind, and probably worse things that I haven't even discovered yet. It supports all of this while not really being all that hard to use, as long as you know some regex.

If you're making an extension to Markdown, cobbling together a wiki-lang, creating a DSL, etc. this parser might be perfect for you. It lets you get syntax highlighting easily. If you're writing a proper grammar for a, uh, less barbaric language, you'll want to make a Lezer grammar so that you get a proper syntax tree.

Is it fast?

I think! It's definitely not slow. Monarch's runtime tokenizer is actually really tiny and simple. The most complex portions of Monarch are almost certainly its lexer compiling. The runtime itself uses a lot of caching in the backend, and tries not to eat your RAM while it does it. I don't know yet how it handles really huge documents, but it's probably not too bad, and there is still some avenues for further optimization.

Did you make this because your special snowflake version of Markdown needed syntax highlighting?

Yes.