tree-sitter-types-builder
v0.0.3
Published
Tree sitter helper program to generate static typescript definitions for every tree-sitter node type of a language.
Downloads
2
Maintainers
Readme
tree-sitter-types-builder
This tool is a helpful utility for developers to generate every .type
of possible
SyntaxNode
that can be found in a tree-sitter
grammar, as string literals. Even in most small
languages, the number of SyntaxNode
types can be quite large (well into the hundreds of
definitions). While many of the definitions are redundant (after analysis provided by tree-sitter),
it is much easier to remove these types than to find what types will be needed.
- Usage/Installation
- Example (TS) | Introduction
- How do the generated types help? (ADVACNED COMPARISON)
- Further Reading
- Conclusion
- License
Usage/Installation
Install the package globally (using your preferred package manager)
# npm installation npm i -g tree-sitter-types-builder # yarn installation yarn global add tree-sitter-types-builder # pnpm installation pnpm add --global tree-sitter-types-builder
Use
tree-sitter-types-builder
command where needed# in some project with a wasm file tree-sitter-types-builder --wasm path/to/your.wasm --language your_language --output path/to/your/types.ts
Note: requires web-tree-sitter, and tree-sitter-cli.
Install inside package inside project
pnpm install --save-dev tree-sitter-types-builder
Build a wasm file
# for example, to build a wasm file for the bash language npx tree-sitter build-wasm ./tree-sitter-bash
This will create a
tree-sitter-bash.wasm
file in thetree-sitter-bash
directory# for newer tree-sitter-cli versions npx tree-sitter build --wasm ./tree-sitter-bash
Run the command for your language
npx tree-sitter-types-builder --wasm path/to/your.wasm --language your_language --output path/to/your/types.ts
edit the generated types to fit your needs
Example (TS) | Introduction
The recommended :heavy_check_mark: example below, assumes that you have already compiled a wasm file for your language
and have generated the types. It also assumes that you are using web-tree-sitter
to parse your code. If you have completed these steps, you can now use the generated
types to build any features for your language
import { SyntaxNode } from 'web-tree-sitter';
import { LangNodeType } from './types' // generated by tree-sitter-types-builder
// 1.) initialize parser for a language
// 2.) parse some code to get the Tree of SyntaxNode's from web-tree-sitter
// 3.) build features, by selecting nodes of interest using the generated LangNodeType
function findChildOfType(rootNode: SyntaxNode, type: LangNodeType): SyntaxNode | null {
if (rootNode.type === type) return rootNode;
for (const child of rootNode.children) {
const found = findChildOfType(child, type);
if (found) return found;
}
return null;
}
// now you get auto-completion for LangNodeType.FunctionDeclaration
// and avoid passing incorrect strings to the function
findChildOfType(rootNode, LangNodeType.FunctionDeclaration);
This process automates potentially error-prone manual work and makes the code more robust. It also makes the code more readable and easier to maintain. A tree-sitter-{lang} maintainer can now update their grammar without breaking the code of their users.
Brief outline displaying how quickly exact context/naming of types, tree-sitter-api requires
import { SyntaxNode } from 'web-tree-sitter';
function findChildOfType(rootNode: SyntaxNode, type: string): SyntaxNode | null {
if (rootNode.type === type) return rootNode;
for (const child of rootNode.children) {
const found = findChildOfType(child, type);
if (found) return found;
}
return null;
}
// now, the user must test the exact string into the findChildOfType function
// and will not be able to get auto-completion for the type of node they are looking for.
findChildOfType(rootNode, 'function_declaration');
// Furhtermore, consider implementing features that require multiple types of
// nodes to be selected. The context of the code will be much harder to understand
// and properly deduce.
function findUnreachableCode(rootNode: SyntaxNode): SyntaxNode | null {
const functionNode = findChildOfType(rootNode, 'function');
const blockNode = findChildOfType(functionNode, 'block');
const returnNode = findChildOfType(blockNode, 'return_statement');
// check for returnNode's to have siblings after them, within the current
// block scope
return returnNode;
}
Did you catch the potential bug in the above code? Depending on the language, a function might not have anything other than the identifier for the function name (common in shell languages). The
block
node would also potentially also just be for the keyword of the block-scope.
How do the generated types help? (ADVACNED COMPARISON)
Auto-completion/Intellisense/GoTo-References
Using this package will give you language features, project wide. This is useful for adding
other features later, especially if they require similar implementations/node-types
to your currently completed features. You can use a goto-refrences request on a LangNodeType
to see all
the places where that specific node has been used.
- Wide Type Definition in tree-sitter API
- Generated type definitions provide a string literal for each type of node
Extensiblilty & Ambiguity
Context wise, you can also extend the types generated by the tool to include additional type-narrowing. For example, only allowing a specific set of nodes to be searched for is much clearer to define in as a singular new type definition.
export type BlockScopeNode = LangNodeType.Block | LangNodeType.FunctionDeclaration | LangNodeType.IfStatement | LangNodeType.WhileStatement;
// no auto-completion for the types of nodes that can be used // no reference to where the type is used (for block_statement, function_declaration, if_statement, while_statement) export type BlockScopeNode = 'block' | 'function_declaration' | 'if_statement' | 'while_statement' // if another type-narrowing intends to use an overlaping type, the tree-sitter // API can easily hide using the wrong the string meant for the type export type StatementScope = 'block_statement' | 'if_statement' | 'while_statement' | 'for_statement'
Easy Testability & Maintainability
Allows for the indented types of nodes to be selected, and tested before
new maintainers approach the code. Consider the following example,
where you are comparing two nodes that might correspond to similiar string values
(this could be different forms of whitespaces
, comments
, or even something like block
vs block-scope
).
import Parser, { SyntaxNode } from 'web-tree-sitter'; import { LangNodeType } from './types.ts'; function nodeMatchesType(node: SyntaxNode, type: LangNodeType): boolean { return node.type === type; } const nodeA = LangNodeType.block; const nodeB = LangNodeType.blockScope; function getInOrderNodes(rootNode: SyntaxNode, collectedNodes: SyntaxNode[] = []): SyntaxNode[] { collectedNodes.push(rootNode); for (const child of rootNode.children) { if (child) getNodes(child, collectedNodes); } return collectedNodes; } for (const node of getInOrderNodes(rootNode)) { if (nodeMatchesType(node, nodeA)) { // do something with nodeA } else if (nodeMatchesType(node, nodeB)) { // do something with nodeB } } // can also use the namespace getKeys() function to iterate over all the types LangNodeType.getKeys().forEach((key) => { const node = LangNodeType[key]; if (nodeMatchesType(node, nodeA)) { // do something with nodeA } else if (nodeMatchesType(node, nodeB)) { // do something with nodeB } });
The project's maintainability is the core reason for the creation of this tool.
In a project where I used tree-sitter
to parse a language and did not
separately define the types of nodes, the complexity of not separating the
tree-sitter-wasm API from the rest of the code was a major issue. Refactoring a
project of large scale, without the SyntaxNode
types statically defined becomes exponentially
more difficult as the project grows.
Consistency
This file can be used to check for equivalent type conversions across different apis. This is an important feature for project that might grow very large. Keeping the relevant types in a location that can be easily navigated to is a good practice for any project.
Further Reading
The syntax generated by this tool is based on the type definitions in the language server protocol and the exploits the Type system's ability to extend types with additional properties/functions (through the use of a namespace). This allows the type definitions to be more expressive by allowing for them to be iterated over, while keeping their ability to be statically referenced.
The specific type definitions use a string literal to represent the type of SyntaxNode
that
is being referenced. Not onlyd does this help abstract the tree-sitter
API
from the user, but it also allows for the type definitions to be more expressive
by displaying all type definitions in a single place.
This would be especially useful for developers who are just beginning a project that uses
the tree-sitter
API. They can now easily see all the types that are available to them,
and can easily determine which types they need to use. Properly defining the
set of SyntaxNode
types relevant to the features of the project is a much
clearer method than having to rely on the very wide type definition it corresponds to
from a tree-sitter's parser.
Conclusion
This projects aims to provide a clear and testable method for building a feature rich set of language features from a tree-sitter grammer. It also can be helpful to keep this tool on hand to check for name changes across releases of a languages grammar.
License
MIT