npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

cs-parser

v1.4.0

Published

Context-sensitive parser framework

Downloads

6

Readme

CS Parser

npm version Apache License 2.0


Write Your Own Parser

CS Parser gives you the power to easily write a parser for your code or data in any language or any format. It also can help you to develop your own languages or data formats.

The mechanics of CS Parser is pretty simple and straightforward. If you have a basic knowledge of JavaScript, you can write a clean and readable parser for your specific needs even in like 120 lines or less with the help of the APIs that CS Parser provides, which are very easy to use.

Getting Started

First, you have to install cs-parser via npm.

npm i cs-parser

Next, import (ESM) or require (CJS) it in your JS.

// ES Module
import csp from 'cs-parser'

// CommonJS
const csp = require('cs-parser')

Then, call create method to get a Parser object which is the main API provider.

let parser = csp.create()

Examples

Before we proceed to explain the basics, here are some quick, working examples if you want to take a look first:

Basics

The workflow is as follows:

1. Define rules with addRule method.

parser.addRule({ /* 1st rule definition */ })
parser.addRule({ /* 2nd rule definition */ })
  // You can add rules as many as you want.

2. Parse data with parse or parseFile methods.

// For data as a string
let results = parser.parse(data)

// For data in a file
let results = await parser.parseFile('file/to/parse')

3. Use the results, scanning with traverse method.

results.traverse(each => {
  console.log(each.data)
})

Defining a rule

A rule definition that you pass to addRule() has to be an object that should have some specific properties and methods.

parser.addRule({
  from: '{',
  to:   '}'
})

from and to properties determine where the rule applies to in data. So the above rule means:

  • Activate this rule if the parser reached at {
  • Deactivate this rule if the parser reached at }

You can also use regex like this:

parser.addRule({
  from: /(\w).* {/,
  to:   '}'
})

This rule will be activated when the current reading buffer matches with the pattern like: something { .

Callback methods

A rule has to have at least one of init, parse, and fin callback methods.

parser.addRule({
  from: /(\w).* {/,
  to:   '}',
  init(cx, chunk, matches) { ... },
  parse(cx, chunk) { ... },
  fin(cx, chunk, matches) { ... }
})
  • init will be called once when the rule is activated.
  • parse will be called for every chunk when the rule is active.
  • fin will be called once when the rule is deactivated.

The 1st parameter cx is a context object (explained later) that is currently associated with this rule. The 2nd parameter chunk is the current chunk (explained later) of data that the parser has just processed at the time. The 3rd parameter of init / fin is optional, that are results of regex matching of from / to if they are regex.

Chunk

By default, the parser processes the data line-by-line, and each line is passed to the 2nd parameter of the callback methods as "chunk". However, you can change this behavior if you want to, by setting splitter property to any string other than \n (linebreak) which is the default value.

Context object

When a rule got activated, the parser generates a context object for it and also adds it to the context stack. The rule can manipulate the associated context object with its callback methods however you want. It can be said that the relationship between a rule and a context object is similar to the one between a class and its instance.

For convenience, a context object has data property which is just a plain object, so you can store any kinds of data in it, like this:

init(cx, chunk, matches) {
  cx.data.name  = matches[1]
  cx.data.lines = []

  // Or you can just reassign a new value
  cx.data = {
    name:  matches[1],
    lines: []
  }
}

Via their 1st parameter, init, parse, and fin methods can share the same instance of context object, like this:

parse(cx, chunk) {
  cx.data.lines.push(chunk)
},
fin(cx) {
  console.log('Block: ' + cx.data.name)
  console.log('Total Lines: ' + cx.data.lines.length)
}

Let's parse!

Having done with defining rules, we explain how to actually parse data and use the results in this section.

If you have a data as a string or a Buffer object, pass it to parse() method.

let data = '...' // Data to parse
let results = parser.parse(data)

As a result, it returns the "root" context (explained later) which contains all the contexts that were generated throughout the entire parsing process.

There is another option: parseFile(), which parses the content of other file asynchronously.

let results = await parser.parseFile('path/to/file')

Since its process is implemented as in a streaming manner, it is recommended over parse() method if the data is large.

Root context

Root context is a top-level context object that contains all the context objects generated throughout the entire parsing process.

To access to each context individually, pass a callback to traverse method of the root context.

let results = parser.parse(data)
results.traverse(each => {
  console.log('Block: ' + each.data.name)
})

Each context is passed to the 1st parameter of the callback you passed.

Basic example

Now it's a good time to take a closer look at the 1st example: employees.js. We also recommend to download the file (or clone this repo) and see it running with node, and do some experiments by yourself.

node employees.js

There are more advanced features that we cannot covered in this README. If you are interested, see this example: docblocks.js Also please check the full documentations.

How do I debug my parser?

Use outline() method of Context that outputs the outline of the structure of a context and all the sub-contexts of it.

This would be helpful to ensure if your parser correctly analyzed the structure of the data. Let's see the outline of the result of employees.js.

let result = parser.parse(data)
console.debug(result.outline())

The output:

root
  anonymous
  anonymous
  anonymous
  anonymous

The reason it shows anonymous is, the rule associated with them doesn't have name property.

Let's add name: 'employee', to the rule, and see the difference of outline().

parser.addRule({
	name: 'employee', // <-Added
	from: /(\w+) {/, // Starts with '(word) {'
	to:   '}',       //   Ends with '}'
...

The output:

root
  employee
  employee
  employee
  employee

It's somewhat better. But you can improve this output even more.

With express callback, you can totally customize how a context is expressed by outline().

parser.addRule({
	from: /(\w+) {/, // Starts with '(word) {'
	to:   '}',       //   Ends with '}'
	...
	express(cx) { // <-Added
		return 'employee: ' + cx.data.name
	}
})

The output:

root
  employee: Alice
  employee: Bob
  employee: Charlie
  employee: Dolly

Now you can get much better outline!

Links


© 2018 Satoshi Soma (amekusa.com) CS Parser is licensed under the Apache License, Version 2.0