dracoql

v0.4.1

Published

2 years ago

DracoQL is a TypeScript-based DSL for web scraping and data manipulation with tooling to pull data from files, web or databas and pipe it to different sources.

Downloads

0High
0Medium
0Low

aadv1k

DracoQL 🐉

DracoQL is a an embeddable query language for processing and transforming data from the web resources and writing it to files and databases.

Language actively in development, please report any bugs under issues.

Install

npm install dracoql

Usage

import * as draco from "dracoql";

draco.eval(`PIPE "Hello world!" TO STDOUT`);

Additionally, you can get runtime variables from the caller

import * as draco from "dracoql";

draco.eval(`VAR data = FETCH https://jsonplaceholder.typicode.com/todos/ AS JSON`, (ctx) => {
  console.log(ctx.getVar("data"))
});

Syntax

Variables

A variable can hold either an INT_LITERAL, STRING_LITERAL or an expression. Draco does not support string escaping, you can instead use '' for that.

VAR foo = 1
VAR bar = "hello world!"
VAR baz = FETCH "https://example.org"

Networking

Draco provides FETCH as the primary method for interacting with a url

Fetch Response

VAR data = FETCH "https://example.org"
      HEADER "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/112.0"
      HEADER "Content-type: application/json"
      METHOD "GET"

Here the data variable will hold a request object, which looks like so

{
  headers: any,
  status: number,
  redirected: boolean
  url: string
}

Additionaly, you can also make POST requests

VAR data = FETCH "https://reqres.in/api/users" METHOD "POST"
  HEADER "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/112.0"
  HEADER "Content-type: application/json"
  BODY JSON '{"name": "morpheus", "job": "leader"}'

Fetch JSON

VAR data = FETCH "https://reqres.in/api/users" 
  HEADER "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/112.0"
  AS JSON

here data will be stored as the parsed JSON object

Fetch HTML

VAR data = FETCH "https://reqres.in" 
  HEADER "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/112.0"
  AS HTML

here data will be stored as the parsed HTML object, which looks like so

{
  tag: string,
  attributes: any,
  children: [...]
}

Caching HTML

Addtionally draco also has a CACHE keyword which requires an time in milliseconds and optional path for html-cache directory

Here is example usage. NOTE Caching only works with HTML data type

VAR data = FETCH "https://example.org"
  CACHE 10000
  AS HTML

Headless HTML mode

To scrap HTML from SPAs Draco offers an optinal HEADLESS flag, which when enabled will use puppeteer to load and fetch the html page.

VAR data = FETCH "https://bloomberg.com"
  CACHE 6e5
  AS HTML HEADLESS

Piping

To extract data out of the evaluater, you can use the PIPE keyword

PIPE "hello world" TO STDOUT

you can also output data to a file

PIPE "Draco was here" TO FILE "draco.txt"

Extraction

Draco provides in-built support for parsing HTML selectors and JSON queries

VAR res = FETCH "https://reqres.in/api/users" AS JSON
VAR data = EXTRACT "data.0.id" FROM res
PIPE data TO STDOUT

VAR res = FETCH "https://reqres.in" AS HTML
VAR headline = EXTRACT "h2.tagline:nth-child(1)" FROM res
PIPE headline TO STDOUT

Examples

Fetch data and log it to the console

VAR data = FETCH "https://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names/male.txt"
PIPE title TO STDOUT

Fetch data and put it to file

VAR data = FETCH "https://jsonplaceholder.typicode.com/users/1"
  AS JSON 
  OR DIE 

PIPE data TO FILE "user.json"

Scrape data from a website

VAR data = FETCH https://www.cnet.com/

VAR headline = EXTRACT 
  ".c-pageHomeHightlights>div:nth-child(1)>div:nth-child(2)>div:nth-child(1)>a:nth-child(1)>div:nth-child(1)>div:nth-child(2)>div:nth-child(1)>h3:nth-child(1)>span:nth-child(1)"
  FROM data 
  AS HTML

VAR txt = EXTRACT innerText FROM headline 
  AS JSON

PIPE txt TO STDOUT

API

module draco, exports the lexer, interpreter and an parser.

import * as draco from "dracoql";

const lexer = new draco.lexer(`PIPE "hello world" TO STDOUT`);
const parser = new draco.parser(lexer.lex());
const interpreter = new draco.interpreter(parser.parse());

(async () => {
  await interpreter.run();
})()