npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

pl-copyfind

v0.10.1

Published

ideas are based on those found at plagiarism.bloomfieldmedia.com's CopyFind/WCopyFind

Downloads

15

Readme

pl-copyfind

A plagarism comparing function

This project was inspired by the work of Dr Lou Bloomfield's CopyFind/WCopyFind windows programs (http://plagiarism.bloomfieldmedia.com/z-wordpress/software/wcopyfind/). Algorithmically there is an equivalence, however there are marked differences.

Key Differences:

pl-copyfind does not:

  • ~download and extract the 'text' from various file formats. Depending upon the platform (either nodejs or a browser), there are a few solutions to this:

    • mozilla's 'readbility' functions, which generally does excellent work at extracting only the main article html from web pages
    • npm package textract. This is a 'one stop shop' for reading in a plethora of file formats. Note that this cannot be used in a browser solution, as it requires external binaries to be installed (although a server-side solution can be used for this).
    • the demo package illustrates a poor man's method of converting html to text, using purely regex's.
  • ~generate output files, although an optional html output is available.

pl-copyfind does:

  • ~have equivalent switches that the original program uses. You can ignore all of them and run your own sanitisers however.
  • ~allow the 'hashes' to be stored in a cache. This is up to you to implement (although easy using one of many npm file caching packages).
  • ~allow multple inputs (comparators and comparatees).

Default Options:

	PhraseLength: 6, // Shortest Phrase to Match
	WordThreshold: 100, // Fewest Matches to Report
	SkipLength: 20, // Needs bSkipLongWords. words this long are skipped
	MismatchTolerance: 2, // #Most Imperfections to Allow
	MismatchPercentage: 80, // Minimum % of Matching Words
	bIgnoreCase: false, // Ignore Letter Case
	bIgnoreNumbers: false, // Ignore Numbers
	bIgnoreOuterPunctuation: false, // Ignore Outer Punctuation
	bIgnorePunctuation: false, // Ignore Punctuation
	bSkipLongWords: false, // Skip Long Words
	bSkipNonwords: false, // Skip Non-Words
	bBuildReport: true, // generate html output
	bBriefReport: true, // show a html report of matches with lead in and out text, for context (otherwise shows full source text). Needs bBuildReport
	bTerseReport: false // show ONLY the matching text. Needs bBuildReport

Usage:

See the demos folder for a complete working example that does not require a web server to execute (just open index.html from your local file system to try it out).

Example 1. Single input comparison

var copyfind = require('pl-copyfind');
...
var options = { PhraseLength: 3, WordThreshold: 3, bIgnoreCase:true}; 
var src_text = "original text is here. lorem ipsum dolorem est";
var check_text = "I plagiarised lorem ipsum DOLOREM est and I reckon I can get away with it";
 
copyfind(src_text, check_text, options, function(err, data) {
	if (err) 
		throw "Failed to compare: " + err.toString();
 
	if (!data.matches.length)
		return false; // no comparison found
 
 	console.log("Found " + data.matches.length + " matches"); // expect 1
	for (var i=0; i<data.matches.length; i++) {
		var match = data.matches[i]; 
		var orig_text = src_text.substr(match.textL.pos, match.textL.length);
		var copied_text = check_text.substr(match.textR.pos, match.textR.length);
		console.log("Match found: \n" + orig_text + "\nvs. \n" + copied_text + "\nat position : " + match.textR.pos);
	}
});

Example 2. Multiple input comparisons

var copyfind = require('pl-copyfind');
...
var options = { PhraseLength: 3, WordThreshold: 3 }; 
var src_texts = ["original text is here. lorem ipsum dolorem est","This is another original text that is also dolorem est"];
var check_texts = ["I plagiarised lorem ipsum dolorem est and I reckon I can get away with it","I didn't do lorem est this time"];

copyfind(src_texts, check_texts, options, function(err, data) {
	if (err) 
		throw "Failed to compare: " + err.toString();

	if (!data.matches.length)
		return false; // no comparison found on any text

    for (var l=0; l<src_texts.length; l++) {
        for (var r=0; r<check_texts.length; r++) {
        	for (var i=0; i<data.matches[l][r].length; i++) {
        		var match = data.matches[l][r][i]; 
        		var orig_text = src_texts[l].substr(match.textL.pos, match.textL.length);
        		var copied_text = check_texts[r].substr(match.textR.pos, match.textR.length);
        		console.log("Match found: #["+l+"]\n" + orig_text + "\nvs. #["+r+"]\n" + copied_text + "\nat position : " + match.textR.pos);
        	}
        }
	}
});

Example 3. Render html reports

var copyfind = require('pl-copyfind');
...
var options = { bBuildReport:  true }; 
var src_text = "original text is here. lorem ipsum dolorem est";
var check_text = "I plagiarised lorem ipsum dolorem est and I reckon I can get away with it";

copyfind(src_texts, check_text, options, function(err, data) {
	if (err) 
		throw "Failed to compare: " + err.toString();
    alert(data.html);
});

Example 4. Cache results for faster re-comparisons

var copyfind = require('pl-copyfind');
...
var options = {  }; 
var src_text = "original text is here. lorem ipsum dolorem est";
var check_text1 = "I plagiarised lorem ipsum dolorem est and I reckon I can get away with  with it";
var check_text2 = "Another plagiarised lorem ipsum dolorem est and I reckon I can get away with it";

copyfind(src_texts, check_text1, options, function(err, data) {
	if (err) 
		throw "Failed to compare: " + err.toString();
    alert("execution took " + data.executionTime + " ms");
    options.hashesL = data.hashesL; // save the hashdata. You *could* store this in a file cache too
});

// re-uses hashesL for better performance
copyfind(src_texts, check_text2, options, function(err, data) {
	if (err) 
		throw "Failed to compare: " + err.toString();
    alert("execution took " + data.executionTime + " ms");
});

Licensing:

This module and all its source is licensed under GPL, which is the original licensing of WCopyFind/CopyFind source. The license file can be found at [https://github.com/cmroanirgo/pl-copyfind/blob/master/LICENSE.md].

Please note that if you use this library, as-is, then your project need not be subject to what is commonly called 'GPL cancer'. It is only if you embrace and extend the module that you must also release your source code, also under a GPL license. However, as all things go, it would be appreciated if attribution for the work done in this project was acknowledged in your source and information pages.