javascript-clone-detection

v0.6.0

Published

3 years ago

Academic study project on duplication of Javascript code using AST syntactic analysis

Downloads

0High
0Medium
0Low

felipelealdefaria

Javascript Clone Detection AST Machine Learning

JavaScript Clone Detection - (v0.6.0)

Academic study project on JavaScript code duplication using AST parsing with text similarity.

Usage

Run:

make init
clone-analisys <PATH> <SIMILARITY INDEX>
// clone-analisys src/api-server 0.85

Current Process

We select a piece of code to convert it into an Abstract Syntax Tree (AST) representation. Then, the cleaning and normalization phase is carried out, in which we remove unwanted attributes and apply a standardization between similar structures, such as the example of an arrow function for a regular function.

// the both code snippets are characterized as type 2 clone

const arrowFunction = (value) => {
  const { type } = value
  return type
}

function regularFunction(value) {
  // this is a regular function
  const { type } = value
  return type
};

To perform a representation of code snippets in AST, we have good libraries like:

| Library | Version | |----------------------------------------------------------------------------------|:-------------:| |espree | 7.3.1 | |@babel/parser | 7.14.7 | |abstract-syntax-tree | 2.19.1 |

In this project we are using abstract-syntax-tree because it is a library that offers greater facilities to manipulate an AST.

Similarity between ASTs

To perform the comparison between ASTs, even in this current version, we had two options, namely: i) Comparison between pure ASTs where we only have the return if they are identical or not, or; ii) Convert the ASTs to text (string) and use libraries that check the textual similarity between the code snippets.

| Library | Version | Type | |------------------------------------------------------------------------|:-------------:|:-----------------:| |ast-compare | 2.1.0 | Compare ASTs | |string-similarity | 4.0.4 | Compare strings | |string-comparison | 1.0.9 | Compare strings |

The decision to compare ASTs directly seems to be the most coherent decision, but so far lib ast-compare can only identify whether the pieces are identical or not. In this scenario, using the representation of Abstract Syntax Trees still gives us the advantage of being a uniform and easy-to-manipulate representation for pre-processing and normalizations, in addition to transforming it into text so that it can be compared as a textual element.

Results

Using the code snippets examples above, we have:

No pre-processing and normalization

ast-compare:  false
string-similarity (Dice):  0.925351071692535
string-comparison (Cosine):  0.9672041516493517
string-comparison (Levenshtein):  0.9072164948453608
string-comparison (Longest Common Subsequence):  0.9357933579335793
string-comparison (Metric Longest Common Subsequence):  0.9337260677466863

With pre-processing and normalization (v.0.3.1)

ast-compare:  true
string-similarity (Dice):  1
string-comparison (Cosine):  1
string-comparison (Levenshtein):  1
string-comparison (Longest Common Subsequence):  1
string-comparison (Metric Longest Common Subsequence):  1

To learn more about the issues addressed, read: ESTUDO EMPÍRICO SOBRE DUPLICAÇÃO DE CÓDIGO EM APLICAÇÕES REACT.JS.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme