npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@jrc03c/html-diff

v0.0.3

Published

This tool helps to find differences between HTML files on a per-element basis in addition to finding differences on a per-line or per-character basis. This makes it easier to discover if two elements are basically identical except that one lacks a class n

Downloads

3

Readme

Intro

This tool helps to find differences between HTML files on a per-element basis in addition to finding differences on a per-line or per-character basis. This makes it easier to discover if two elements are basically identical except that one lacks a class name that the other has, or that one has slightly different textContent than the other, or that they have the same children but in different orders, or that one element has a particular child as an immediate descendant whereas another element has the same child as a deeply-nested descendant, etc.

Installation

For use in Node, bundlers, and the browser:

npm install @jrc03c/html-diff

For use at the command line:

npm install -g @jrc03c/html-diff

Usage

CLI

html-diff file1.html file2.html

Optionally, you can pass a "simple" flag (--simple or -s), which will cause the output to be printed in a YAML-ish format, which is sometimes a little easier to read than JS objects. For example:

html-diff -s file1.html file2.html

JS

In Node or bundlers:

const { getDifferences } = require("@jrc03c/html-diff")

Or in the browser:

<!--
  This defines all of the relevant functions, variables, and objects in the
  global scope.
-->
<script src="path/to/dist/html-diff.js"></script>

Then:

console.log(getDifferences(element1, element2))

NOTE: Some of the functions in this library expect HTMLElement inputs. If you're using this library in Node, I recommend that you use jsdom to construct virtual DOMs, and then pass elements from those DOMs into this library's functions. For example:

const { JSDOM } = require("jsdom")
const dom1 = new JSDOM("<div>Hello, world!</div>")
const dom2 = new JSDOM("<div>Goodbye, world!</div>")

console.log(
  getDifferences(dom1.window.document.body, dom2.window.document.body)
)

API

DEFAULT_OPTIONS

DEFAULT_OPTIONS is an object that holds all of the constants used in the library's calculations. It has these properties and default values:

  • attributeWeight = represents how much element attribute differences should be weighted relative to other differences; has a default value of 1
  • childDifferenceWeight = represents how much the total differences between child elements (excluding the order of the children) should be weighted relative to other differences; has a default value of 1
  • childOrderWeight = represents how much the child order differences should be weighted relative to other differences; has a default value of 1
  • classWeight = represents how much element class differences should be weighted relative to other differences; has a default value of 1
  • differencePenalty = represents the power to which all differences should be raised, which is useful for exaggerating differences; has a default value of 1
  • idWeight = represents how much element ID differences should be weighted relative to other differences; has a default value of 1
  • shouldScoreChildren = represents whether or not child scores should contribute to the overall score; has a default value of true, but can be set to false to compare the given elements as though their children don't exist
  • tagNameWeight = represents how much element tag name differences should be weighted relative to other differences; has a default value of 1
  • textContentWeight = represents how much element text content differences should be weighted relative to other differences; has a default value of 1

To adjust any of the above properties, reassign their values, and then pass the entire object (or a copy of it, or whatever) into the relevant functions below that take an options parameter. Note that the options parameter is optional everwhere it appears below.

getAttributes(el)

Returns a list of objects, each of which represents a single attribute on the element and which has properties of "name" and "value". Does not include "class" or "id" attributes because those are evaluated separately.

getDifferences(el1, el2, [options])

Returns a list of objects, each of which describes a difference between the two given elements. Each difference object has these properties:

  • el1 = the path from the document root to the first element in the relevant pair of conflicting elements
  • el2 = the path from the document root to the second element in the relevant pair of conflicting elements
  • type = the type of difference between the relevant pair of conflicting elements; can be one of:
    • ATTRIBUTE_DIFFERENCE
    • CHILD_CONTENT_DIFFERENCE
    • CHILD_ORDER_DIFFERENCE
    • CLASS_DIFFERENCE
    • ID_DIFFERENCE
    • ORDER_DIFFERENCE
    • TAG_NAME_DIFFERENCE
    • TEXT_CONTENT_DIFFERENCE
  • el1Value = the value in the first element where the difference occurred
  • el2Value = the value in the second element where the difference occurred
  • attribute = the name of the attribute where the difference occurred; this property is present only when the difference type is ATTRIBUTE_DIFFERENCE

getDiffScore(e1, e2, [options])

Returns a score and list of differences (from getDifferences) between the two given elements. The lowest possible score is 1, in which case the elements are identical.

getMostSimilarElement(el, others, options)

Given a list of elements called others, returns the element that's most similar to el (when compared using getDiffScore).

getNonChildTextContent(el)

Returns the text content of the given element that does not include any text content from child elements.

To do

  • Write unit tests for the main API functions.
  • Implement some dynamic programming features (like a dictionary that holds the differences between two elements so that they don't have to be recalculated multiple times). I'm not actually sure how big of a problem this is, but I do know that the functions recurse quite a bit, so it may make a difference in terms of performance.