npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

urlinfo-server

v0.6.1

Published

url information for fighting abuse etc.

Downloads

2

Readme

urlinfo

a service for providing information on any given url

urlinfo is architected very similarly to DNS though designed for fetching information on full url endpoints. Each urlinfo server is set to either set & get records from disk or proxy requests to another urlinfo server. Servers can proxy indefinately but they all have built in LRU caches so requests are responsive regardless how many hops away the origin is.

Runtime

Download and install nodejs from the official site (LTS recommended).

Installation

npm install -g urlinfo-server

Start Server

urlinfo server store.db --port 9000

Start Proxy

urlinfo proxy localhost:9000 --port 9001

Benchmark

urlinfo benchmark localhost:9001 --records 50 --requests 10000

CLI

urlinfo ships with a CLI that makes starting servers easy. Servers can talk to disk or proxy calls to an http endpoint providing flexability to have servers stood up in multiple locations. All servers are a client under the hood so they all save disk or network calls by using a built in LRU cache.

  urlinfo - 0.6.0
  Service for storing and fetching url information.

  Commands:
    urlinfo server <file>               http server (stores records in database)
    urlinfo proxy <domain>              http server (proxies records to other urlinfo API)
    urlinfo benchmark <domain>          benchmarks the urlinfo server that is passed in.

  Options:
    -p, --port (9001)                   specify port for server
    -h, --help                          outputs this help message
    -v, --version                       outputs version

  Examples:
    urlinfo server store.db             starts server on port 9001. saves state in store.db
    urlinfo proxy example.com -p 9002   starts server on port 9002. proxies req. to example.com

Development

Cloning the repository...

git clone https://github.com/sintaxi/urlinfo-server.git

Installing dev dependencies...

npm install

Running the tests...

npm test

For debuging include DEBUG=urlinfo when running from the CLI or when running the tests. Running the tests should show the following output...

Test output

Lib

Using urlinfo as a client library has the benefits of a built in LRU cache. It also gives you the option of either doing lookups to disk or to do lookups over http to a urlinfo server that is running on another machine. In the case of disk lookups the LRU reduces the number of times you touch the disk and in the case of lookups over http the LRU cache saves you trips over the network.

var disk = urlinfo.createClient({ disk: __dirname + "/store.db" })

// fetch record
disk.set("foo.com", {}, function(err, record){
  // returns record
})

// get record
disk.get("foo.com", function(record){
  // returns record or null
})

Each client has a built in listen() method for standing up an http server in front of the client.

disk.listen(9000, function(err){
  consolelog("server is listening on port 9000")
})

To fetch records from our http server all we have to do is instantiate a client that will speak to that endpoint and the library behaves the same way. This allows us to have one origin of truth but several servers setup to server requests.

var network = urlinfo.createClient({ proxy: "https://localhost:9000" })

FAQ

The size of the URL list could grow infinitely, how does urlinfo scale this beyond the memory capacity of the system?

Although each instance of the urlinfo client/server contains an LRU cache for fast access all requests eventually resolve to an instance that reads and writes to a disk k/v store. If the origin server ever needed to change storage mechanisms all that would be required is that a new store be setup and then the previous origin store can be set to proxy requests to the new server.

Assuming that the number of requests will exceed the capacity of a single system, describe how might you solve this, and how might this change if you have to distribute this workload to an additional region, such as Europe.

urlinfo is architected much like DNS. Assuming the origin server is in North America the best way to expand to Europe would be to stand up a pseudo-origin server in Europe that proxies requests to North America. In addition to that it would be prudent to setup multiple urlinfo servers in Europe depending on volume of requests & latency to Europe origin server. Each urlinfo server reduces load on the origin server.

What are some strategies used to update the service with new URLs? Updates may be as much as 5 thousand URLs a day with updates arriving every 10 minutes.

URLs can be updated with a PUT request to any of the servers at the same URL used to get the data. This makes the vaious ways to update urllist virtually endless and tooling to do so widely available.

You’re woken up at 3am, what are some of the things you’ll look for?

  • Check health of processes.
  • Check DNS is resolving & SSL certs are valid.
  • Check for memory or disk saturation on machines.
  • If there are data integrity issues purge caches and reduce time to expire in LRU.
  • Check logs for useful error outputs.
  • Ensure origin server is "available" and able to read/write records.
  • Check for data integrity in the database.

Does that change anything you’ve done in the app?

Yes. Flags for controlling LRU size and durration should be added to CLI. Better logging output such as http errors or errors reading/writing to disk should also be added.

What are some considerations for the lifecycle of the app?

  • Managing SSL certs for the APIs (unaddressed)
  • Token management for speaking to API (unaddressed)
  • Changing data return object and clients that consume the data (major concern).
  • Database backups. Ability to restore with older version of dataset.

You need to deploy new version of this application. What would you do?

Publish new version of urlinfo to npm. Run Ansible script which reaches out to all running instances and pulls latest version and triggers process restarts. Alternatively code could be pulled from git if it was a goal to avoid the npm repository.