@shaderfrog/glsl-parser
v5.3.2
Published
A GLSL ES 1.0 and 3.0 parser and preprocessor that can preserve whitespace and comments
Downloads
9,732
Readme
Shaderfrog GLSL Compiler
The Shaderfrog GLSL compiler is an open source GLSL 1.00 and 3.00 parser and preprocessor that compiles back to GLSL. Both the parser and preprocessor can preserve comments and whitespace.
The parser is built with a PEG grammar, via the Peggy Javascript library. The PEG grammars for both the preprocessor and parser are available on Github.
See the state of this library for limitations and goals of this compiler.
Table of Contents
- Installation
- Parsing
- Preprocessing
- Manipulating and Searching ASTs
- What are "parsing" and "preprocessing"?
- State of this library
- Local Development
Usage
Installation
npm install --save @shaderfrog/glsl-parser
Parsing
import { parser, generate } from '@shaderfrog/glsl-parser';
// To parse a GLSL program's source code into an AST:
const program = parser.parse('float a = 1.0;');
// To turn a parsed AST back into a source program
const transpiled = generate(program);
The parser accepts an optional second options
argument:
parser.parse('float a = 1.0;', options);
Where options
is:
type ParserOptions = {
// Hide warnings. If set to false or not set, then the parser logs warnings
// like undefined functions and variables. If `failOnWarn` is set to true,
// warnings will still cause the parser to raise an error. Defaults to false.
quiet: boolean;
// An optional string representing the origin of the GLSL, for debugging and
// error messages. For example, "main.js". If the parser raises an error, the
// grammarSource shows up in the error.source field. If you format the error
// (see the errors section), the grammarSource shows up in the formatted error
// string. Defaults to undefined.
grammarSource: string;
// If true, sets location information on each AST node, in the form of
// { column: number, line: number, offset: number }. Defaults to false.
includeLocation: boolean;
// If true, causes the parser to raise an error instead of log a warning.
// The parser does limited type checking, and things like undeclared variables
// are treated as warnings. Defaults to false.
failOnWarn: boolean;
}
The program type
parser.parse()
returns a Program
, which is a special AST Node:
interface Program {
// Hard coded AST node type of "program"
type: 'program';
// The AST itself is an array of the top level statements
program: AstNode[];
// All of the scopes found during parsing
scopes: Scope[];
// Leading whitespace of the source code
wsStart?: string;
// Trailing whitespace of the source code
wsEnd?: string;
}
Scope
parse()
returns a Program
, which has a scopes
array on it. A scope looks
like:
type Scope = {
name: string;
parent?: Scope;
bindings: ScopeIndex;
types: TypeScopeIndex;
functions: FunctionScopeIndex;
location?: LocationObject;
}
The name
of a scope is either "global"
, the name of the function that
introduced the scope, or in anonymous blocks, "{"
. In each scope, bindings
represents variables,
types
represents user-created types (structs in GLSL), and functions
represents
functions.
For bindings
and types
, the scope index looks like:
type ScopeIndex = {
[name: string]: {
declaration?: AstNode;
references: AstNode[];
}
}
Where name
is the name of the variable or type. declaration
is the AST node
where the variable was declared. In the case the variable is used without being
declared, declaration
won't be present. If you set the failOnWarn
parser
option to true
, the parser will throw an error when encountering
an undeclared variable, rather than allow a scope entry without a declaration.
For functions
, the scope index is slightly different:
type FunctionScopeIndex = {
[name: string]: {
[signature: string]: {
returnType: string;
parameterTypes: string[];
declaration?: FunctionNode;
references: AstNode[];
}
}
};
Where name
is the name of the function, and signature
is a string representing
the function's return and parameter types, in the form of "returnType: paramType1, paramType2, ..."
or "returnType: void"
in the case of no arguments. Each signature
in this
index represents an "overloaded" function in GLSL, as in:
void someFunction(int x) {};
void someFunction(int x, int y) {};
With this source code, there will be two entries under name
, one for each
overload signature. The references
are the uses of that specific overloaded
version of the function. references
also contains the function prototypes
for the overloaded function, if present.
In the case there is only one declaration for a function, there will still be
a single entry under name
with the function's signature
.
⚠️ Caution! This parser does very limited type checking. This leads to a known case where a function call can match to the wrong overload in scope:
void someFunction(float, float);
void someFunction(bool, bool);
someFunction(true, true); // This will be attributed to the wrong scope entry
The parser doesn't know the type of the operands in the function call, so it matches based on the name and arity of the functions.
See also Utility-Functions for renaming scope references.
Errors
If you have invalid GLSL, the parser throws a GlslSyntaxError
. The parser uses
Peggy under the hood, so GlslSyntaxError
is a convenience type alias for a
peggy.SyntaxError
.
import { parser, GlslSyntaxError } from '@shaderfrog/glsl-parser';
let error: GlslSyntaxError | undefined;
try {
// Line without a semicolon
c.parse(`float a`);
} catch (e) {
error = e as GlslSyntaxError;
}
If you want to check if a caught error is an instanceof
a GlslSyntaxError
,
then you need to use the Peggy SyntaxError
error object, which lives on the
parser:
console.log(error instanceof parser.SyntaxError)
// true
The only error the parser intentionally throws is a GlslSyntaxError
. You
should be safe to cast it to a GlslSyntaxError
with as
in Typescript.
The error message string is automatically generated by Peggy:
console.log(error.message)
// 'Expected ",", ";", "=", or array specifier but end of input found'
The error object includes the location of the error. It is not printed in the error message by default.
console.log(error.location)
/*
{
source: undefined,
start: { offset: 7, line: 1, column: 8 },
end: { offset: 7, line: 1, column: 8 }
}
*/
Note the source
field on the error object is the grammarSource
string
provided to the parser options, which is undefined
by default. If you pass in
a grammarSource
to parser.parse()
, it shows up in the error object. This is
meant to help you track which source file you're parsing, for example you could
enter "myfile.glsl"
as an argument to parser.parse()
so that the error
includes that your source GLSL came from your application's myfile.glsl
file.
The error object also has a fairly confusing format()
method, which comes
from the underlying Peggy error object. It produces an ASCII formatted string
with arrows and underlines. The source
option passed to .format()
must
match your grammarSource
in parser options (which is undefined
by default).
This API is awkward and I might override it in future versions of the parser.
console.log(error.format([{ text, source: undefined }])
/*
Error: Expected ",", ";", "=", or array specifier but "f" found.
--> undefined:2:1
|
2 | float b
| ^
*/
Preprocessing
The parser also ships with a preprocess()
function.
The preprocessor takes in a program source code string and produces a preprocessed program source code string. If you want to access and manipulate the AST produced by preprocessing, see the next sections.
import preprocess from '@shaderfrog/glsl-parser/preprocessor';
// Preprocess a program
console.log(preprocess(`
#define a 1
float b = a;
`, options));
Where options
is:
type PreprocessorOptions = {
// Don't strip comments before preprocessing
preserveComments: boolean,
// Macro definitions to use when preprocessing
defines: {
SOME_MACRO_NAME: 'macro body'
},
// A list of callbacks evaluated for each node type, and returns whether or not
// this AST node is subject to preprocessing
preserve: {
ast_node_name: (path) => boolean
}
}
A preprocessed program string can be handed off to the main GLSL parser. Preprocessing is optional, but a program string may not be valid until it is preprocessed.
If you want more control over preprocessing, the preprocess
function above is
a convenience method for approximately the following:
import {
preprocessAst,
preprocessComments,
generate,
parser,
} from '@shaderfrog/glsl-parser/preprocessor';
// Remove comments before preprocessing
const commentsRemoved = preprocessComments(`float a = 1.0;`);
// Parse the source text into an AST
const ast = parser.parse(commentsRemoved);
// Then preprocess it, expanding #defines, evaluating #ifs, etc
preprocessAst(ast);
// Then convert it back into a program string, which can be passed to the
// core glsl parser
const preprocessed = generate(ast);
Manipulating and Searching ASTs
Visitors
The Shaderfrog parser provides a AST visitor function for manipulating and searching an AST. The visitor API loosely follows the Babel visitor API. A visitor object looks like:
const visitors = {
function_call: {
enter: (path) => {},
exit: (path) => {},
}
}
Where every key in the object is a node type, and every value is an object
with optional enter
and exit
functions. What's passed to each function
is not the AST node itself, instead it's a "path" object, which gives you
information about the node's parents, methods to manipulate the node, and the
node itself. The path object:
{
// Properties:
// The node itself
node: AstNode;
// The parent of this node
parent: AstNode | null;
// The parent path of this path
parentPath: Path | null;
// The key of this node in the parent object, if node parent is an object
key: string | null;
// The index of this node in the parent array, if node parent is an array
index: number | null;
// Methods:
// Don't visit any children of this node
skip: () => void;
// Stop traversal entirely
stop: () => void;
// Remove this node from the AST
remove: () => void;
// Replace this node with another AST node. See replaceWith() documentation.
replaceWith: (replacer: any) => void;
// Search for parents of this node's parent using a test function
findParent: (test: (p: Path) => boolean) => Path | null;
}
Visit an AST by calling the visit method with an AST and visitors:
import { visit } from '@shaderfrog/glsl-parser/ast';
visit(ast, visitors);
The visit function doesn't return a value. If you want to collect data from the AST, use a variable in the outer scope to collect data. For example:
let numberOfFunctionCalls = 0;
visit(ast, {
function_call: {
enter: (path) => {
numberOfFunctionCalls += 1;
},
}
});
console.log('There are ', numberOfFunctionCalls, 'function calls');
You can also visit the preprocessed AST with visitPreprocessedAst
. Visitors
follow the same convention outlined above.
import {
parser,
visitPreprocessedAst,
} from '@shaderfrog/glsl-parser/preprocessor';
// Parse the source text into an AST
const ast = parser.parse(`float a = 1.0;`);
visitPreprocessedAst(ast, visitors);
Stopping traversal
To skip all children of a node, call path.skip()
.
To stop traversal entirely, call path.stop()
in either enter()
or exit()
.
No future enter()
nor exit()
callbacks will fire.
Visitor .replaceWith()
Behavior
When you visit a node and call path.replaceWith(otherNode)
inside the visitor's enter()
method:
otherNode
and its children are visited by the same visitors.- The
exit()
function of the visitor is not called.
Notes:
- Calling
.replaceWith()
in a visitor'sexit()
method is undefined behavior. - Replacing a node with the same type can cause infinite recursion, as the visitor will continue to visit the replaced node of the same type. You must handle this case manually.
These rules apply to all visitors, both GLSL AST visitors and preprocessor AST visitors.
Utility Functions
Rename variables / identifiers in a program
You can rename bindings (aka variables), functions, and types (aka structs) with renameBindings
, renameFunctions
, and renameTypes
respectively.
The signature for these methods:
const renameBindings = (
// The scope to rename the bindings in. ast.scopes[0] is the global scope.
// Passing this ast.scopes[0] renames all global variables
bindings: ScopeIndex,
// The rename function. This is called once per scope entry with the original
// name in the scope, to generate the renamed variable.
mangle: (name: string) => string
): ScopeIndex
These scope renaming functions, renameBindings
, renameFunctions
, and renameTypes
, do two things:
- Each function mutates the AST to rename identifiers in place.
- They return an immutable new ScopeIndex where the scope references themselves are renamed.
If you want your ast.scopes array to stay in sync with your AST, you need to re-assign it to the output of the functions! Examples:
import { renameBindings, renameFunctions, renameTypes } from '@shaderfrog/glsl-parser/parser/utils';
// Suffix top level variables with _x, and update the scope
ast.scopes[0].bindings = renameBindings(ast.scopes[0].bindings, (name) => `${name}_x`);
// Suffix function names with _x
ast.scopes[0].functions = renameFunctions(ast.scopes[0].functions, (name) => `${name}_x`);
// Suffix struct names and usages (including constructors) with _x
ast.scopes[0].types = renameTypes(ast.scopes[0].types, (name) => `${name}_x`);
There are also functions to rename only one variable/identifier in a given scope. Use these if you know specifically which variable you want to rename.
import { renameBinding, renameFunction, renameType } from '@shaderfrog/glsl-parser/parser/utils';
// Replace all instances of "oldVar" with "newVar" in the global scope, and
// creates a new global scope entry named "newVar"
ast.scopes[0].bindings.newVar = renameBinding(
ast.scopes[0].bindings.oldVar,
'newVar',
);
// You need to manually delete the old scope entry if you want the scope to stay
// in sync with your program AST
delete ast.scopes[0].bindings.oldVar;
// Rename a specific function
ast.scopes[0].functions.newFn = renameFunction(
ast.scopes[0].functions.oldFn,
'newFn',
);
delete ast.scopes[0].functions.oldFn;
// Rename a specific type/struct
ast.scopes[0].functions.newType = renametype(
ast.scopes[0].functions.oldType,
'newType',
);
delete ast.scopes[0].functions.oldType;
Debugging utility functions
The parser also exports a debugging function, useful for logging information about the AST.
import { debugScopes } from '@shaderfrog/glsl-parser/parser/utils';
// Print a condensed representation of the AST scopes to the console
debugScopes(ast);
What are "parsing" and "preprocessing"?
In general, a parser is a computer program that analyzes source code and turn it into a data structure called an "abstract syntax tree" (AST). The AST is a tree representation of the source program, which can be analyzed or manipulated. A use of this GLSL parser could be to parse a program into an AST, find all variable names in the AST, rename them, and generate new GLSL source code with renamed variables.
Preprocessing is a text manipulation step supported in shader source code. One
way to think about preprocessing is it's a glorified find and replace language
that's also part of your program's source code. Special lines starting with #
tell the preprocessor how to manipulate other text in the program source code.
Some programs are not parsable until they are preprocessed, because the source
code may be invalid until the text find and replacements are done. During
preprocessing, the source code is treated purely as text. Said another way,
there is no consideration for the GLSL source code structure, like float
,
vec2
, etc. Preprocessing only handles special lines and rules starting with
#
.
See the GLSL Langauge Spec to learn more about GLSL preprocessing. Some notable
differences from the C++ parser are no "stringize" operator (#
), no #include
operator, and #if
expressions can only operate on integer constants, not other
types of data. The Shaderfrog GLSL preprocessor can't be used as a C/C++
preprocessor without modification.
Parsing, preprocessing, and code generation, are all phases of a compiler. This library is technically a source code > source code compiler, also known as a "transpiler." The input and output source code are both GLSL.
State of this library
The Shaderfrog compiler has tests for the more complex parts of the GLSL ES 3.00 grammar. This library is definitively the most complete GLSL compiler written in Javascript.
This library is used by the experimental Shaderfrog 2.0 shader composer. The compiler has wide expoure to different GLSL programs.
This library also exposed:
This library doesn't support full "semantic analysis" required by the Khronos
GLSL specification. For example, some tokens are only valid in GLSL 1.00 vs
3.00, like texture()
vs texture2D()
. This parser considers both valid as
they're both part of the grammar. However if you send compiled source code off
to a native compiler like ANGLE with the wrong texture
function, it will fail
to compile.
This library is mainly for manipulating ASTs before handing off a generated program to a downstream compilers like as ANGLE.
The preprocessor supports full macro evaluations and expansions, with the
exceptions of __LINE__
. Additional control lines like #error
and #pragma
and #extension
have no effect, and can be fully preserved as part of parsing.
Limitations of the Parser and Preprocessor
Known Issues
- There's probably some bugs in the preprocessor logic. I haven't yet verified
all of the evaluations of "binary" expressions in
preprocessor.ts
preprocessor.ts
has lots of yucky typecasting
Known missing semantic analysis compared to the specification
- Compilers are supposed to raise an error if a switch body ends in a case or default label.
- Currently no semantic analysis of vertex vs fragment shaders
Deviations from the Khronos Grammar
- The parser allows for programs to contain top level preprocessor statements,
like
#define X
, as a convenience to avoid preprocessing some programs, and for simple programs lets you preserve preprocessor lines in the AST if you want to do something specific with them. However, preprocessor statements at any other part of the program are not allowed. The Khronos grammar does not include preprocessor statements. selection_statement
is renamed toif_statement
- The grammar specifies
declaration
itself ends with a semicolon. I moved the semicolon into thedeclaration_statement
rule. - The grammar has a left paren "(" in the function_call production. Due to how I de-left-recursed the function_call -> postfix_expression loop, I moved the left paren into the function_identifier production.
- Function calls in the grammar are TYPE_NAME LEFT_PAREN, in my grammar they're IDENTIFIER LEFT_PAREN, because TYPE_NAME is only used for structs, and function names are stored in their own separate place in the scope.
Local Development
To work on the tests, run npx jest --watch
.
The GLSL grammar definition lives in src/parser/glsl-grammar.pegjs
. Peggy
supports inlining Javascript code in the .pegjs
file to define utility
functions, but that means you have to write in vanilla Javascript, which is
terrible. Instead, I've pulled out utility functions into the grammar.ts
entrypoint. Some functions need access to Peggy's local variables, like
location(s)
, so the makeLocals()
function uses a closure to provide that
access.
To submit a change, please open a pull request. Tests are appreciated!
See the Github workflow for the checks run against each PR.