dracoql
v0.4.1
Published
DracoQL is a TypeScript-based DSL for web scraping and data manipulation with tooling to pull data from files, web or databas and pipe it to different sources.
Downloads
3
Readme
DracoQL 🐉
DracoQL is a an embeddable query language for processing and transforming data from the web resources and writing it to files and databases.
Language actively in development, please report any bugs under issues.
Install
npm install dracoql
Usage
import * as draco from "dracoql";
draco.eval(`PIPE "Hello world!" TO STDOUT`);
Additionally, you can get runtime variables from the caller
import * as draco from "dracoql";
draco.eval(`VAR data = FETCH https://jsonplaceholder.typicode.com/todos/ AS JSON`, (ctx) => {
console.log(ctx.getVar("data"))
});
Syntax
Variables
A variable can hold either an INT_LITERAL
, STRING_LITERAL
or an expression. Draco does not support string escaping, you can instead use ''
for that.
VAR foo = 1
VAR bar = "hello world!"
VAR baz = FETCH "https://example.org"
Networking
Draco provides FETCH
as the primary method for interacting with a url
Fetch Response
VAR data = FETCH "https://example.org"
HEADER "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/112.0"
HEADER "Content-type: application/json"
METHOD "GET"
Here the data
variable will hold a request object, which looks like so
{
headers: any,
status: number,
redirected: boolean
url: string
}
Additionaly, you can also make POST requests
VAR data = FETCH "https://reqres.in/api/users" METHOD "POST"
HEADER "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/112.0"
HEADER "Content-type: application/json"
BODY JSON '{"name": "morpheus", "job": "leader"}'
Fetch JSON
VAR data = FETCH "https://reqres.in/api/users"
HEADER "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/112.0"
AS JSON
here data
will be stored as the parsed JSON object
Fetch HTML
VAR data = FETCH "https://reqres.in"
HEADER "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/112.0"
AS HTML
here data
will be stored as the parsed HTML object, which looks like so
{
tag: string,
attributes: any,
children: [...]
}
Caching HTML
Addtionally draco also has a CACHE
keyword which requires an time in milliseconds and optional path for html-cache
directory
Here is example usage. NOTE Caching only works with HTML data type
VAR data = FETCH "https://example.org"
CACHE 10000
AS HTML
Headless HTML mode
To scrap HTML from SPAs Draco offers an optinal HEADLESS
flag, which when enabled will use puppeteer to load and fetch the html page.
VAR data = FETCH "https://bloomberg.com"
CACHE 6e5
AS HTML HEADLESS
Piping
To extract data out of the evaluater, you can use the PIPE
keyword
PIPE "hello world" TO STDOUT
you can also output data to a file
PIPE "Draco was here" TO FILE "draco.txt"
Extraction
Draco provides in-built support for parsing HTML selectors and JSON queries
VAR res = FETCH "https://reqres.in/api/users" AS JSON
VAR data = EXTRACT "data.0.id" FROM res
PIPE data TO STDOUT
VAR res = FETCH "https://reqres.in" AS HTML
VAR headline = EXTRACT "h2.tagline:nth-child(1)" FROM res
PIPE headline TO STDOUT
Examples
Fetch data and log it to the console
VAR data = FETCH "https://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names/male.txt"
PIPE title TO STDOUT
Fetch data and put it to file
VAR data = FETCH "https://jsonplaceholder.typicode.com/users/1"
AS JSON
OR DIE
PIPE data TO FILE "user.json"
Scrape data from a website
VAR data = FETCH https://www.cnet.com/
VAR headline = EXTRACT
".c-pageHomeHightlights>div:nth-child(1)>div:nth-child(2)>div:nth-child(1)>a:nth-child(1)>div:nth-child(1)>div:nth-child(2)>div:nth-child(1)>h3:nth-child(1)>span:nth-child(1)"
FROM data
AS HTML
VAR txt = EXTRACT innerText FROM headline
AS JSON
PIPE txt TO STDOUT
API
module draco, exports the lexer, interpreter and an parser.
import * as draco from "dracoql";
const lexer = new draco.lexer(`PIPE "hello world" TO STDOUT`);
const parser = new draco.parser(lexer.lex());
const interpreter = new draco.interpreter(parser.parse());
(async () => {
await interpreter.run();
})()