tree-sitter-types-builder

v0.0.3

Published

4 months ago

Tree sitter helper program to generate static typescript definitions for every tree-sitter node type of a language.

Downloads

0High
0Medium
0Low

ndonfris

tree-sitter parser typescript

tree-sitter-types-builder

This tool is a helpful utility for developers to generate every .type of possible SyntaxNode that can be found in a tree-sitter grammar, as string literals. Even in most small languages, the number of SyntaxNode types can be quite large (well into the hundreds of definitions). While many of the definitions are redundant (after analysis provided by tree-sitter), it is much easier to remove these types than to find what types will be needed.

Usage/Installation

Install the package globally (using your preferred package manager)

# npm installation
npm i -g tree-sitter-types-builder 

# yarn installation
yarn global add tree-sitter-types-builder

# pnpm installation
pnpm add --global tree-sitter-types-builder

Use tree-sitter-types-builder command where needed

# in some project with a wasm file
tree-sitter-types-builder --wasm path/to/your.wasm --language your_language --output path/to/your/types.ts

Note: requires web-tree-sitter, and tree-sitter-cli.

Install inside package inside project

pnpm install --save-dev tree-sitter-types-builder

Build a wasm file

# for example, to build a wasm file for the bash language
npx tree-sitter build-wasm ./tree-sitter-bash

This will create a tree-sitter-bash.wasm file in the tree-sitter-bash directory

# for newer tree-sitter-cli versions
npx tree-sitter build --wasm ./tree-sitter-bash

Run the command for your language

npx tree-sitter-types-builder --wasm path/to/your.wasm --language your_language --output path/to/your/types.ts

edit the generated types to fit your needs

Example (TS) | Introduction

The recommended :heavy_check_mark: example below, assumes that you have already compiled a wasm file for your language and have generated the types. It also assumes that you are using web-tree-sitter to parse your code. If you have completed these steps, you can now use the generated types to build any features for your language

import { SyntaxNode } from 'web-tree-sitter';
import { LangNodeType } from './types' // generated by tree-sitter-types-builder

// 1.) initialize parser for a language
// 2.) parse some code to get the Tree of SyntaxNode's from web-tree-sitter
// 3.) build features, by selecting nodes of interest using the generated LangNodeType

function findChildOfType(rootNode: SyntaxNode, type: LangNodeType): SyntaxNode | null {
  if (rootNode.type === type) return rootNode;
  for (const child of rootNode.children) {
    const found = findChildOfType(child, type);
    if (found) return found;
  }
  return null;
}

// now you get auto-completion for LangNodeType.FunctionDeclaration
// and avoid passing incorrect strings to the function
findChildOfType(rootNode, LangNodeType.FunctionDeclaration);

This process automates potentially error-prone manual work and makes the code more robust. It also makes the code more readable and easier to maintain. A tree-sitter-{lang} maintainer can now update their grammar without breaking the code of their users.

Brief outline displaying how quickly exact context/naming of types, tree-sitter-api requires

import { SyntaxNode } from 'web-tree-sitter';

function findChildOfType(rootNode: SyntaxNode, type: string): SyntaxNode | null {
  if (rootNode.type === type) return rootNode;
  for (const child of rootNode.children) {
    const found = findChildOfType(child, type);
    if (found) return found;
  }
  return null;
}

// now, the user must test the exact string into the findChildOfType function
// and will not be able to get auto-completion for the type of node they are looking for.
findChildOfType(rootNode, 'function_declaration');

// Furhtermore, consider implementing features that require multiple types of
// nodes to be selected. The context of the code will be much harder to understand
// and properly deduce. 
function findUnreachableCode(rootNode: SyntaxNode): SyntaxNode | null {
  const functionNode = findChildOfType(rootNode, 'function');
  const blockNode = findChildOfType(functionNode, 'block');
  const returnNode = findChildOfType(blockNode, 'return_statement');
  // check for returnNode's to have siblings after them, within the current
  // block scope
  return returnNode;
}

Did you catch the potential bug in the above code? Depending on the language, a function might not have anything other than the identifier for the function name (common in shell languages). The block node would also potentially also just be for the keyword of the block-scope.

How do the generated types help? (ADVACNED COMPARISON)

Auto-completion/Intellisense/GoTo-References

Using this package will give you language features, project wide. This is useful for adding other features later, especially if they require similar implementations/node-types to your currently completed features. You can use a goto-refrences request on a LangNodeType to see all the places where that specific node has been used.

Wide Type Definition in tree-sitter API
Generated type definitions provide a string literal for each type of node

Extensiblilty & Ambiguity

Context wise, you can also extend the types generated by the tool to include additional type-narrowing. For example, only allowing a specific set of nodes to be searched for is much clearer to define in as a singular new type definition.

export type BlockScopeNode = LangNodeType.Block | LangNodeType.FunctionDeclaration | LangNodeType.IfStatement | LangNodeType.WhileStatement;

// no auto-completion for the types of nodes that can be used
// no reference to where the type is used (for block_statement, function_declaration, if_statement, while_statement)
export type BlockScopeNode = 'block' | 'function_declaration' | 'if_statement' | 'while_statement'

// if another type-narrowing intends to use an overlaping type, the tree-sitter
// API can easily hide using the wrong the string meant for the type
export type StatementScope = 'block_statement' | 'if_statement' | 'while_statement' | 'for_statement'

Easy Testability & Maintainability

Allows for the indented types of nodes to be selected, and tested before new maintainers approach the code. Consider the following example, where you are comparing two nodes that might correspond to similiar string values (this could be different forms of whitespaces, comments, or even something like block vs block-scope).

import Parser, { SyntaxNode } from 'web-tree-sitter';
import { LangNodeType } from './types.ts';

function nodeMatchesType(node: SyntaxNode, type: LangNodeType): boolean {
  return node.type === type;
}

const nodeA = LangNodeType.block;
const nodeB = LangNodeType.blockScope;

function getInOrderNodes(rootNode: SyntaxNode, collectedNodes: SyntaxNode[] = []): SyntaxNode[] {
  collectedNodes.push(rootNode);
  for (const child of rootNode.children) {
      if (child) getNodes(child, collectedNodes);
  }
  return collectedNodes;
}

for (const node of getInOrderNodes(rootNode)) {
  if (nodeMatchesType(node, nodeA)) {
    // do something with nodeA
  } else if (nodeMatchesType(node, nodeB)) {
    // do something with nodeB
  }
}

// can also use the namespace getKeys() function to iterate over all the types
LangNodeType.getKeys().forEach((key) => {
  const node = LangNodeType[key];
  if (nodeMatchesType(node, nodeA)) {
    // do something with nodeA
  } else if (nodeMatchesType(node, nodeB)) {
    // do something with nodeB
  }
});

The project's maintainability is the core reason for the creation of this tool. In a project where I used tree-sitter to parse a language and did not separately define the types of nodes, the complexity of not separating the tree-sitter-wasm API from the rest of the code was a major issue. Refactoring a project of large scale, without the SyntaxNode types statically defined becomes exponentially more difficult as the project grows.

Consistency

This file can be used to check for equivalent type conversions across different apis. This is an important feature for project that might grow very large. Keeping the relevant types in a location that can be easily navigated to is a good practice for any project.

Conclusion

This projects aims to provide a clear and testable method for building a feature rich set of language features from a tree-sitter grammer. It also can be helpful to keep this tool on hand to check for name changes across releases of a languages grammar.

License

MIT

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

tree-sitter-types-builder

Usage/Installation

Example (TS) | Introduction

How do the generated types help? (ADVACNED COMPARISON)

Auto-completion/Intellisense/GoTo-References

Extensiblilty & Ambiguity

Easy Testability & Maintainability

Consistency

Further Reading

Conclusion

License