npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

dedupe-files

v1.4.1

Published

Finds all duplicate files across the set of paths and then will **print** them out, **move** them to a directory, or **delete** them. Duplicates are identified by their actual content not their name or other attributes.

Downloads

11

Readme

dedupe-files

Finds all duplicate files across the set of paths and then will print them out, move them to a directory, or delete them. Duplicates are identified by their actual content not their name or other attributes.

NPM version Node Version Build Status License

Install / Requirements

Requires Node.js >=16 Just run it with npx (as shown below) or install it globally with npm install -g dedupe-files

Usage

Usage: dedupe-files <command> [options]

Finds all duplicate files across the set of paths and then will **print** them out, **move** them to a directory, or **delete** them. Duplicates are identified by their actual content not their name or other attributes.

Options:
  -h, --help                        display help for command

Commands:
  print [options] <input_paths...>  print out duplicates
  move [options] <input_paths...>   move duplicates to a directory
  delete <input_paths...>           delete duplicate files
  help [command]                    display help for command

Examples:

The following prints out a line to duplicates.txt for each duplicate file found in /Volumes/photos and /Volumes/backups/photos:

  $ dedupe-files print --out "duplicates.txt" "/Volumes/photos" "/Volumes/backups/photos"

The following moves each duplicate file found in /Volumes/photos and /Volumes/backups/photos to ~/Downloads/duplicates.
The files in ~/Downloads/one are considered more "original" than those in ~/Downloads/two since it appears earlier on the command line:

  $ dedupe-files move --out "~/Downloads/duplicates" "~/Downloads/one" "~/Downloads/two"

print command

Usage: dedupe-files print [options] <input_paths...>

Prints duplicate files to terminal or to a file.

Options:
  -n, --names       include files with duplicate names, but different content
  -o, --out <file>  A file path to output the duplicate file paths to. If not specified, file paths are written to stdout.
  -h, --help        display help for command

move command

Usage: dedupe-files move [options] <input_paths...>

Moves duplicate files to a designated directory.

Options:
  -o, --out <path>  Directory to output the duplicate files to.
  -h, --help        display help for command

Remarks:

Files in *input_paths* that appear earlier on the command line are considered more "original".
That is, the duplicates that are moved are the ones that are rooted in the last-most *input_paths* argument.

delete command

Usage: dedupe-files delete [options] <input_paths...>

Deletes duplicate files.

Options:
  -h, --help  display help for command

Remarks:

Files in *input_paths* that appear earlier on the command line are considered more "original".
That is, the duplicates that are deleted are the ones that are rooted in the last-most *input_paths* argument.
If duplicates are in the same directory tree, then which one is deleted is not deterministic (but it will leave one behind).

Examples

print

$ npx dedupe-files print \
  --out "duplicates.txt" \
  '/Volumes/scott-photos/photos/2014' \
  '/Volumes/home/!Backups/PHOTOS/Photos - backup misc/2014'

Searching /Volumes/scott-photos/photos/2014 (priority 0)
Searching /Volumes/home/!Backups/PHOTOS/Photos - backup misc/2014 (priority 1)
Searching /Volumes/scott-photos/photos/2014/08 (priority 0)
Searching /Volumes/scott-photos/photos/2014/02 (priority 0)
Searching /Volumes/scott-photos/photos/2014/12 (priority 0)
Searching /Volumes/scott-photos/photos/2014/11 (priority 0)
Searching /Volumes/scott-photos/photos/2014/10 (priority 0)
Searching /Volumes/scott-photos/photos/2014/09 (priority 0)
Searching /Volumes/scott-photos/photos/2014/07 (priority 0)
Searching /Volumes/scott-photos/photos/2014/06 (priority 0)
Searching /Volumes/scott-photos/photos/2014/05 (priority 0)
Searching /Volumes/scott-photos/photos/2014/04 (priority 0)
Searching /Volumes/scott-photos/photos/2014/03 (priority 0)
Searching /Volumes/scott-photos/photos/2014/01 (priority 0)
Found 2,829 files...
Writing to output file /Users/scott/src/activescott/files-and-folders/packages/dedupe-files/tests/integration/print-photos-small.sh.out.
Comparing files with identical sizes...
Hashing 1,749 files in batches of 64: 4%...7%...11%...15%...18%...22%...26%...29%...33%...37%...40%...44%...48%...51%...55%...59%...62%...66%...70%...73%...77%...81%...84%...88%...91%...95%...
Hashing files complete.
Duplicate files consume 5.4 GB.
872 duplicate files written to output file /Users/scott/src/activescott/files-and-folders/packages/dedupe-files/tests/integration/print-photos-small.sh.out.
print took 16.659 seconds.

# duplicates.txt:

content identical: /Volumes/scott-photos/photos/2014/10/IMG_0533.JPG and /Volumes/home/!Backups/PHOTOS/Photos - backup misc/2014/IMG_0070.jpg
content identical: /Volumes/scott-photos/photos/2014/03/IMG_0881.JPG and /Volumes/home/!Backups/PHOTOS/Photos - backup misc/2014/IMG_0881.JPG
...
content identical: /Volumes/scott-photos/photos/2014/08/IMG_1285.MOV and /Volumes/home/!Backups/PHOTOS/Photos - backup misc/2014/IMG_1285.MOV

move

$ npx dedupe-files@beta move \
  --out "~/Downloads/duplicates" \
  "~/Downloads/one" \
  "~/Downloads/two"

Searching /Users/scott/Downloads/dedupe-files-temp/one (priority 0)
Searching /Users/scott/Downloads/dedupe-files-temp/two (priority 1)
Searching /Users/scott/Downloads/dedupe-files-temp/two/toodeep (priority 1)
Found 8 files...
Hashing 6 files in batches of 64...
Hashing files complete.
Moving /Users/scott/Downloads/dedupe-files-temp/two/not-the-eye-test-pic.jpg to /Users/scott/Downloads/dedupe-files-temp/duplicates/not-the-eye-test-pic.jpg
Moving /Users/scott/Downloads/dedupe-files-temp/two/tv-test-pattern.png to /Users/scott/Downloads/dedupe-files-temp/duplicates/tv-test-pattern.png

Large Example with 8,978 duplicates across 61,347 files

npx dedupe-files print \
  --out "dupe-photos.out" \
  /Volumes/scott-photos/photos \
  /Volumes/home/\!Backups/PHOTOS/

Searching /Volumes/scott-photos/photos (priority 0)
...
Searching /Volumes/home/!Backups/PHOTOS/Scotts Photos Library2020-12-21.photoslibrary/resources/cloudsharing/resources/derivatives/masters/5 (priority 1)
Found 61,347 files...
Writing to output file /Users/scott/src/activescott/files-and-folders/packages/dedupe-files/tests/integration/print-photos-all.sh.out.
Comparing files with identical sizes...
Hashing 29,244 files in batches of 64: 0%...0%...1%...1%...1%...1%...2%...2%...2%...2%...2%...3%...3%...3%...3%...4%...4%...4%...4%...4%...5%...5%...5%...5%...5%...6%...6%...6%...6%...7%...7%...7%...7%...7%...8%...8%...8%...8%...9%...9%...9%...9%...9%...10%...10%...10%...10%...11%...11%...11%...11%...11%...12%...12%...12%...12%...12%...13%...13%...13%...13%...14%...14%...14%...14%...14%...15%...15%...15%...15%...16%...16%...16%...16%...16%...17%...17%...17%...17%...18%...18%...18%...18%...18%...19%...19%...19%...19%...19%...20%...20%...20%...20%...21%...21%...21%...21%...21%...22%...22%...22%...22%...23%...23%...23%...23%...23%...24%...24%...24%...24%...25%...25%...25%...25%...25%...26%...26%...26%...26%...26%...27%...27%...27%...27%...28%...28%...28%...28%...28%...29%...29%...29%...29%...30%...30%...30%...30%...30%...31%...31%...31%...31%...32%...32%...32%...32%...32%...33%...33%...33%...33%...33%...34%...34%...34%...34%...35%...35%...35%...35%...35%...36%...36%...36%...36%...37%...37%...37%...37%...37%...38%...38%...38%...38%...39%...39%...39%...39%...39%...40%...40%...40%...40%...40%...41%...41%...41%...41%...42%...42%...42%...42%...42%...43%...43%...43%...43%...44%...44%...44%...44%...44%...45%...45%...45%...45%...46%...46%...46%...46%...46%...47%...47%...47%...47%...47%...48%...48%...48%...48%...49%...49%...49%...49%...49%...50%...50%...50%...50%...51%...51%...51%...51%...51%...52%...52%...52%...52%...53%...53%...53%...53%...53%...54%...54%...54%...54%...54%...55%...55%...55%...55%...56%...56%...56%...56%...56%...57%...57%...57%...57%...58%...58%...58%...58%...58%...59%...59%...59%...59%...60%...60%...60%...60%...60%...61%...61%...61%...61%...61%...62%...62%...62%...62%...63%...63%...63%...63%...63%...64%...64%...64%...64%...65%...65%...65%...65%...65%...66%...66%...66%...66%...67%...67%...67%...67%...67%...68%...68%...68%...68%...68%...69%...69%...69%...69%...70%...70%...70%...70%...70%...71%...71%...71%...71%...72%...72%...72%...72%...72%...73%...73%...73%...73%...74%...74%...74%...74%...74%...75%...75%...75%...75%...76%...76%...76%...76%...76%...77%...77%...77%...77%...77%...78%...78%...78%...78%...79%...79%...79%...79%...79%...80%...80%...80%...80%...81%...81%...81%...81%...81%...82%...82%...82%...82%...83%...83%...83%...83%...83%...84%...84%...84%...84%...84%...85%...85%...85%...85%...86%...86%...86%...86%...86%...87%...87%...87%...87%...88%...88%...88%...88%...88%...89%...89%...89%...89%...90%...90%...90%...90%...90%...91%...91%...91%...91%...91%...92%...92%...92%...92%...93%...93%...93%...93%...93%...94%...94%...94%...94%...95%...95%...95%...95%...95%...96%...96%...96%...96%...97%...
Hashing files complete.
Duplicate files consume 117.7 GB.
8,978 duplicates written to output file /Users/scott/src/activescott/files-and-folders/packages/dedupe-files/tests/integration/print-photos-all.sh.out.
print took 2264 seconds.

Features

  • Actions to be taken on duplicates:
    • print
    • move
    • delete
  • Priorities: Files have priorities. Essentially the ones that are first on the specified set of input paths will be considered the "original".
  • Algorithm: Compares sizes first, and then hashes files. Could maybe be faster by doing partial hashes of same-size files

Roadmap

  • [x] feat: searches multiple input paths

  • [x] feat: compares files by content, not merely name or size

    • name same, content different
    • name same, content different, size same
    • name different, content same
  • [x] chore: typescript. because types help (proven to myself yet again)

  • [x] chore: fix git paths (git dir needs moved up to parent monorepo dir)

  • [x] feat: optimize perf by tracking all files w/ size instead of hash, and only hash files where size is the same

  • [x] feat: use deterministic algorithm to classify the "original" and the "duplicate" (i.e. first argument passed in is a higher priority for "original")

    • [x] test: test-dir/one test-dir/two
    • [x] test: test-dir/two test-dir/one
  • actions to be taken for each duplicate:

  • [x] feat: print/dry-run action (no action) for found duplicate: prints to output file or stdout

  • [x] feat: move file action for found duplicate to specified path

    • [x] feat: ensure move command never loses a file by using "original" file's name with a postfixed a counter for uniqueness
  • [x] feat: delete action for found duplicate

Alternatives

  • rdfind does this quite well but it isn't very flexible on what to do with the duplicates and it's slow (despite being in c++, which is not as easy to maintain or contribute to by others.

Below are more but I found these after I was well down the road of writing this one...

  • https://github.com/mixu/file-dedupe ?
  • https://qarmin.github.io/czkawka/ ?
  • https://www.pixelbeat.org/fslint/ ?