npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

ledger-publisher

v0.9.7

Published

Routines to identify publishers for the Brave ledger.

Downloads

539

Readme

ledger-publisher

Routines to identify publishers for the Brave ledger:

Publisher Identities

A publisher identity is derived from a URL and is intended to correspond to the publisher associated with the URL.

var getPublisher = require('ledger-publisher').getPublisher

var publisher = getPublisher('URL')

Note that because some domains host multiple publishers, a publisher identity may contain both a domain and a path separated by a solidus(/).

Also note that certain URLs aren't really appropriate for a publisher mapping. For example, if a URL returns a 302, don't bother mapping that URL.

Terminology

Consider this URL:

https://foo.bar.example.com/component1/...?query

The label com from the URL's domain is a top-level domain (TLD), and the string example.com is a second-level domain (SLD). By convention, the relative domain (RLD) is the string to the left of the SLD (e.g., foo.bar), and the qualifying label (QLD) is the right-most label of the RLD (e.g., bar).

There are two popular types of TLDs: infrastructure and international country code (ccTLD).

Although an SLD is normally thought of being the next-to-last right-most label (e.g., example), for domains with a ccTLD, the convention differs. Consider this URL:

http://search.yahoo.co.jp/search?query

The string co.jp corresponds to the TLD, the string yahoo.co.jp corresponds to the SLD, and the QLD and RLD are both the string search.

Syntax

The ABNF syntax for a publisher identity is:

publisher-identity = domain [ "/" segment ]

            domain = [ RLD "." ] SLD
               RLD = *[ label "." ] QLD
               QLD = label
               SLD = label "." TLD
               TLD = infraTLD / ccTLD
             ccTLD = label "." 2ALPHA                ; a two-letter country code, cf. ISO 3166
          infraTLD = label                           ; ".com", ".gov", etc.

             label = alphanum *62(alphanum / "-")    ; any octet encoded according to RFC 2181
          alphanum = ALPHA / DIGIT

           segment = *pchar                          ; as defined in Section 3.3 of RFC 3986

Note that a publisher identity must not include either a fragment (#...) or a query (?...).

var isPublisher = require('ledger-publisher').isPublisher

if (isPublisher('...')) ...

Mapping

The package uses a rule set expressed as a JavaScript array.

Each rule in the array consists of an object with one mandatory property, condition, a JavaScript boolean expression. In addition, there is usually either a consequent property (a JavaScript expression returning either a string, null, or undefined), or a dom property.

To detetermine the publisher identity associated with a URL:

  1. If the TLD associated with the URL's domain does not correspond to an infrastructure or ccTLD, then the publisher identity is undefined.

  2. The URL is parsed into an object using the URL module.

  3. The parsed object is extended with the URL, TLD, SLD, RLD, and QLD objects. If there is no RLD, the empty string ("") is used for both the RLD and QLD.

  4. If the dom.publisher property of the rule is present, then the HTML associated with the URL must be present, and one additional object is present during evaluation, node, which is the result of jsdom(markup).body.querySelector(dom.publisher.nodeSelector), and the dom.publisher.consequent property is used instead of the consequent property for the rule in Step 5.2.

  5. Each rule is examined, in order, starting from the first element:

    5.1. If the condition evaluates to false, then execution continues with the next rule.

    5.2. Otherwise, the consequent is evaluated.

    5.3. If the resulting value is the empty string (""), then execution continues with the next rule.

    5.4. If the resulting value is false, null or undefined, then the publisher identity is undefined.

    5.5. Otherwise, the resulting value is used as the publisher identity.

  6. If Step 5.5 is never executed, then the publisher identity is undefined.

The initial rule set is built by a NPM script:

npm run build-rules

An initial rule set is available as:

require('ledger-publisher').ruleset

NB: THAT IN PREVIOUS VERSIONS OF THIS PACKAGE, THE PROPERTY WAS CALLED rules NOT ruleset

Your Help is Needed!

Please submit a pull request with updates to the rule set.

If you are running the Brave Browser on your desktop, you can run

% node dump.js

in order to examine all the URLs you have visited in your current session (from the file session-store-1) and see the resulting publisher identities.

Page Visits

A page visit is just what you'd expect, but it requires both a URL and the duration of the focus (in milliseconds). A synopsis is a collection of page visits that have been reduced to a a publisher and a score. The synopsis includes a rolling window so that older visits are removed.

var synopsis = new (require('ledger-publisher').Synopsis)()

// each time a page is unloaded, record the focus duration
// markup is an optional third-parameter, cf., getPublisher above
    synopsis.addVisit('URL', duration)

// addVisit is a wrapper around addPublisher
    synopsis.addPublisher(publisher, props)

At present, these properties are examined:

  • duration - the number of milli-seconds (mandatory)

  • markup - the HTML markup (optional)

In order to calculate the score, options can be provided when creating the object. The defaults are:

{ minPublisherDuration    : 8 * 1000
, numFrames      : 30
, frameSize      : 24 * 60 * 60 * 1000
}

When addVisit is invoked, the duration must be at least minPublisherDuration milliseconds in length. If so, then one or more "scorekeepers" are run to calculate the score for the visit, using both the options and props. At present, there are two scorekeepers:

  • concave - courtesy of @dimitry-xyz

  • visits - the total number of visits

The Concave Scorekeeper

The concave scorekeeper rewards the publisher of a page according to:

  1. a fixed bonus for the page hit
  2. how much time the user spends on the page

The reward increases as the user spends more time on the page, but the model uses a concave quadratic (utility) function to provide diminishing returns as the time spent on the page increases. If we set the durationWeight parameter to zero, the model only takes into account the page hit and ignores the time spent on the page when calculating the reward.

Tuning

Scorekeepers may be "tuned" using options, at present, only the concave scorekeeper makes use of these. The defaults are:

{ _d : 1 / (30 * 1000)              //    0.0000333...
, _a : (1 / (_d * 2)) - minPublisherDuration // 5000
, _b : minPublisherDuration - _a             // 5000
}

The sliding window consist of numFrames frames, each having a timeframe of frameSize milliseconds. So, for the default values, the sliding window will be 30 days long.

Top Publishers

Once a synopsis is underway, the "top N" publishers can be determined. Each publisher will has an associated weighted score, so that the sum of the scores "should approximate" 1.0:

// get the top "N" publishers

   console.log(JSON.stringify(synopsis.topN(20), null, 2))

// e.g., [ { publisher: "example.com", weight 0.0123456789 } ... ]

The parameter to the topN method is optional.

Similarly, to pseudo-randomly select a single publisher, using the weighted score:

// select a single publisher

   console.log(synopsis.winner())

// e.g., "brave.com"

// or multiple winners

   console.log(synopsis.winners(n))

Acknowledgements

Many thanks to Elijah Insua for the excellent jsdom package, and to Thomas Parisot for the excellent tldjs package.