npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

tree-sitter-types-builder

v0.0.3

Published

Tree sitter helper program to generate static typescript definitions for every tree-sitter node type of a language.

Downloads

2

Readme

tree-sitter-types-builder

This tool is a helpful utility for developers to generate every .type of possible SyntaxNode that can be found in a tree-sitter grammar, as string literals. Even in most small languages, the number of SyntaxNode types can be quite large (well into the hundreds of definitions). While many of the definitions are redundant (after analysis provided by tree-sitter), it is much easier to remove these types than to find what types will be needed.

Usage/Installation

  1. Install the package globally (using your preferred package manager)

    # npm installation
    npm i -g tree-sitter-types-builder 
    
    # yarn installation
    yarn global add tree-sitter-types-builder
    
    # pnpm installation
    pnpm add --global tree-sitter-types-builder
  2. Use tree-sitter-types-builder command where needed

    # in some project with a wasm file
    tree-sitter-types-builder --wasm path/to/your.wasm --language your_language --output path/to/your/types.ts 

Note: requires web-tree-sitter, and tree-sitter-cli.

  1. Install inside package inside project

    pnpm install --save-dev tree-sitter-types-builder
  2. Build a wasm file

    # for example, to build a wasm file for the bash language
    npx tree-sitter build-wasm ./tree-sitter-bash

    This will create a tree-sitter-bash.wasm file in the tree-sitter-bash directory

    # for newer tree-sitter-cli versions
    npx tree-sitter build --wasm ./tree-sitter-bash
  3. Run the command for your language

    npx tree-sitter-types-builder --wasm path/to/your.wasm --language your_language --output path/to/your/types.ts

    edit the generated types to fit your needs

Example (TS) | Introduction

The recommended :heavy_check_mark: example below, assumes that you have already compiled a wasm file for your language and have generated the types. It also assumes that you are using web-tree-sitter to parse your code. If you have completed these steps, you can now use the generated types to build any features for your language

import { SyntaxNode } from 'web-tree-sitter';
import { LangNodeType } from './types' // generated by tree-sitter-types-builder

// 1.) initialize parser for a language
// 2.) parse some code to get the Tree of SyntaxNode's from web-tree-sitter
// 3.) build features, by selecting nodes of interest using the generated LangNodeType

function findChildOfType(rootNode: SyntaxNode, type: LangNodeType): SyntaxNode | null {
  if (rootNode.type === type) return rootNode;
  for (const child of rootNode.children) {
    const found = findChildOfType(child, type);
    if (found) return found;
  }
  return null;
}

// now you get auto-completion for LangNodeType.FunctionDeclaration
// and avoid passing incorrect strings to the function
findChildOfType(rootNode, LangNodeType.FunctionDeclaration);

This process automates potentially error-prone manual work and makes the code more robust. It also makes the code more readable and easier to maintain. A tree-sitter-{lang} maintainer can now update their grammar without breaking the code of their users.

Brief outline displaying how quickly exact context/naming of types, tree-sitter-api requires

import { SyntaxNode } from 'web-tree-sitter';

function findChildOfType(rootNode: SyntaxNode, type: string): SyntaxNode | null {
  if (rootNode.type === type) return rootNode;
  for (const child of rootNode.children) {
    const found = findChildOfType(child, type);
    if (found) return found;
  }
  return null;
}

// now, the user must test the exact string into the findChildOfType function
// and will not be able to get auto-completion for the type of node they are looking for.
findChildOfType(rootNode, 'function_declaration');

// Furhtermore, consider implementing features that require multiple types of
// nodes to be selected. The context of the code will be much harder to understand
// and properly deduce. 
function findUnreachableCode(rootNode: SyntaxNode): SyntaxNode | null {
  const functionNode = findChildOfType(rootNode, 'function');
  const blockNode = findChildOfType(functionNode, 'block');
  const returnNode = findChildOfType(blockNode, 'return_statement');
  // check for returnNode's to have siblings after them, within the current
  // block scope
  return returnNode;
}

Did you catch the potential bug in the above code? Depending on the language, a function might not have anything other than the identifier for the function name (common in shell languages). The block node would also potentially also just be for the keyword of the block-scope.

How do the generated types help? (ADVACNED COMPARISON)

Auto-completion/Intellisense/GoTo-References

Using this package will give you language features, project wide. This is useful for adding other features later, especially if they require similar implementations/node-types to your currently completed features. You can use a goto-refrences request on a LangNodeType to see all the places where that specific node has been used.

  • Wide Type Definition in tree-sitter API
  • Generated type definitions provide a string literal for each type of node

Extensiblilty & Ambiguity

Context wise, you can also extend the types generated by the tool to include additional type-narrowing. For example, only allowing a specific set of nodes to be searched for is much clearer to define in as a singular new type definition.

export type BlockScopeNode = LangNodeType.Block | LangNodeType.FunctionDeclaration | LangNodeType.IfStatement | LangNodeType.WhileStatement;
// no auto-completion for the types of nodes that can be used
// no reference to where the type is used (for block_statement, function_declaration, if_statement, while_statement)
export type BlockScopeNode = 'block' | 'function_declaration' | 'if_statement' | 'while_statement'

// if another type-narrowing intends to use an overlaping type, the tree-sitter
// API can easily hide using the wrong the string meant for the type
export type StatementScope = 'block_statement' | 'if_statement' | 'while_statement' | 'for_statement'

Easy Testability & Maintainability

Allows for the indented types of nodes to be selected, and tested before new maintainers approach the code. Consider the following example, where you are comparing two nodes that might correspond to similiar string values (this could be different forms of whitespaces, comments, or even something like block vs block-scope).

import Parser, { SyntaxNode } from 'web-tree-sitter';
import { LangNodeType } from './types.ts';

function nodeMatchesType(node: SyntaxNode, type: LangNodeType): boolean {
  return node.type === type;
}

const nodeA = LangNodeType.block;
const nodeB = LangNodeType.blockScope;

function getInOrderNodes(rootNode: SyntaxNode, collectedNodes: SyntaxNode[] = []): SyntaxNode[] {
  collectedNodes.push(rootNode);
  for (const child of rootNode.children) {
      if (child) getNodes(child, collectedNodes);
  }
  return collectedNodes;
}

for (const node of getInOrderNodes(rootNode)) {
  if (nodeMatchesType(node, nodeA)) {
    // do something with nodeA
  } else if (nodeMatchesType(node, nodeB)) {
    // do something with nodeB
  }
}

// can also use the namespace getKeys() function to iterate over all the types
LangNodeType.getKeys().forEach((key) => {
  const node = LangNodeType[key];
  if (nodeMatchesType(node, nodeA)) {
    // do something with nodeA
  } else if (nodeMatchesType(node, nodeB)) {
    // do something with nodeB
  }
});

The project's maintainability is the core reason for the creation of this tool. In a project where I used tree-sitter to parse a language and did not separately define the types of nodes, the complexity of not separating the tree-sitter-wasm API from the rest of the code was a major issue. Refactoring a project of large scale, without the SyntaxNode types statically defined becomes exponentially more difficult as the project grows.

Consistency

This file can be used to check for equivalent type conversions across different apis. This is an important feature for project that might grow very large. Keeping the relevant types in a location that can be easily navigated to is a good practice for any project.

Further Reading

The syntax generated by this tool is based on the type definitions in the language server protocol and the exploits the Type system's ability to extend types with additional properties/functions (through the use of a namespace). This allows the type definitions to be more expressive by allowing for them to be iterated over, while keeping their ability to be statically referenced.

The specific type definitions use a string literal to represent the type of SyntaxNode that is being referenced. Not onlyd does this help abstract the tree-sitter API from the user, but it also allows for the type definitions to be more expressive by displaying all type definitions in a single place.

This would be especially useful for developers who are just beginning a project that uses the tree-sitter API. They can now easily see all the types that are available to them, and can easily determine which types they need to use. Properly defining the set of SyntaxNode types relevant to the features of the project is a much clearer method than having to rely on the very wide type definition it corresponds to from a tree-sitter's parser.

Conclusion

This projects aims to provide a clear and testable method for building a feature rich set of language features from a tree-sitter grammer. It also can be helpful to keep this tool on hand to check for name changes across releases of a languages grammar.

License

MIT