sql-fingerprint
v1.0.2
Published
Convert queries into fingerprints.
Downloads
4
Readme
sql-fingerprint
Converts a SQL into a fingerprint, the abstracted form of a query, which makes it possible to classify similar queries.
Actually, this module is a JavaScript port of pt-fingerprint. Therefore is primarily intended to use for MySQL queries.
Description
The fingerprint is the abstracted form of a query. What "abstracting query" is string conversion such that, for instance, replace values to ?
, collapse whitespaces and so on.
Let's say that there are such queries:
SELECT * FROM t WHERE i = 1 ORDER BY a, b ASC, d DESC, e ASC;
select * from t where i = 1
order by a, b ASC, d DESC, e asc;
A fingerprint of both of these will be below:
select * from t where i = ? order by a, b, d desc, e;
Usage
CLI
npm install -g sql-fingerprint
fingerprint --query="your query"
Module
npm install sql-fingerprint
import fingerprint from 'sql-fingerprint';
console.log(fingerprint('SELECT * FROM users WHERE id = 1', false, false));
BigQuery UDF
The initial motivation I wrote this was that classifies similar queries stored in my dataset of BigQuery. BigQuery supports user-defined functions written in JavaScript. Therefore the approach of pt-fingerprint, which is generate fingerprint by a set of RegExps, is more fit than other approaches such as parsing SQL, building AST and then interpreting it. RegExps approach seems a little bit harder to maintain but easier to use as UDF because of the relatively simple source code.
CREATE TEMP FUNCTION fingerprint(sql STRING, matchMD5Checksum BOOL, matchEmbeddedNumbers BOOL)
RETURNS STRING
LANGUAGE js AS r"""
// copy and paste fingerprint function here from fingerprint.js
// ...
return query;
""";
SELECT
fingerprint(textPayload, true, true) fp,
count(*) as num,
max(query) as raw_query_sample
FROM
`your_table`
WHERE
DATE(timestamp, "Asia/Tokyo") = "2022-05-22"
group by
fp
order by
num desc
See also: Standard SQL user-defined functions | BigQuery | Google Cloud
Options
matchMD5Checksum
- Replace md5 string to single
?
- Replace md5 string to single
-- original
SELECT * FROM db.fbc5e685a5d3d45aa1d0347fdb7c4d35_temp where id=1
-- fingerprint with matchMD5Checksum=true
select * from db.?_temp where id=?
-- fingerprint with matchMD5Checksum=false (default)
select * from db.fbc?_temp where id=?
matchEmbeddedNumbers
- Preserve numbers within words. Useful for the case like that table name contains a number
-- original
SELECT * FROM prices.rt_5min WHERE id = 1
-- fingerprint with matchEmbeddedNumbers=true
select * from prices.rt_5min where id = ?
-- fingerprint with matchEmbeddedNumbers=false (default)
select * from prices.rt_?min where id = ?