npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

extract-from-document

v1.1.0

Published

Simplify data extraction from document

Downloads

2

Readme

extract-from-document

This is a utility function that simplifies data extraction from document

Installing

Via npm:

$ npm install [-g] extract-from-document

Usage

To use this function you should get an instance of document or element (IScope) eg. when you are inside a browser and a recipe (IRecipe) which will configure what you want to extract from a document. A recipe can be a Source, Scope or IMap.

Source is a simplest one - it has element selector, optional attribute name (default: 'innerText') and optional isSingle flag (default: true).

export class Source {
  constructor(public selector: string, public attribute: string = 'innerText', public isSingle: boolean = true) {}
}

IMap is an object with key-value pair where the key is a string and value is an IRecipe

export interface IMap {
  [key: string]: IRecipe;
}

Scope is a map (IMap) with a selector (specify a context). It is useful eg. when you want to extract data from a specific table or row.

export class Scope {
  constructor(public map: IMap, public selector: string, public isSingle: boolean = true) {}
}

Example usage is shown below with a function getDocument() that you must implement or replace that will return a document or HTML element.

import { extractFromDocument, IRecipe, Source } from 'extract-from-document';

const document: HTMLElement | Document = getDocument() // obtain somehow document instance
const recipe: IRecipe = new Source('.some-class-selector');
const result = extractFromDocument(recipe, document); 

Example usage with puppeter

I will show you how you can use it with a puppeter.

Implement a helper function called extract which will be inside ./util/extract.ts file. It will launch a browser, open page and pass extractFromDocument with a provided recipe to evaluate function which will extract data from a given url.

import { launch } from 'puppeteer';

import { extractFromDocument, IRecipe } from 'extract-from-document';

export async function extract(recipe: IRecipe, url: string) {
  const browser = await launch({ args: ['--no-sandbox', '--disable-setuid-sandbox'] });
  const page = await browser.newPage();
  await page.goto(url);

  const result = await page.evaluate(extractFromDocument, recipe);

  await browser.close();

  return result;
}

Now we import this extract function and specify a recipe what we want to extract and a url to inform from where we want to do it. We are logging stringified result to a console.

import { Scope, Source } from 'extract-from-document';
import { extract } from './util/extract';

const recipe = {
  hotNetworkQuestions: new Scope({
    title: new Source('a'),
    url: new Source('a', 'href'),
  }, '#hot-network-questions li', false),
  related: new Scope({
    answer: {
      url: new Source('a[title^="Vote score"]', 'href'),
      votes: new Source('.answer-votes'),
    },
    title: new Source('.question-hyperlink'),
    url: new Source('.question-hyperlink', 'href'),
  }, '.module.sidebar-related .spacer', false),
};
const url = 'https://stackoverflow.com/questions/24825860/code-coverage-for-jest';

extract(recipe, url).then((result: any) => console.info(JSON.stringify(result, null, 2)));

In a result we will get:

{
  "hotNetworkQuestions": [
    {
      "title": "What computer would be fastest for Mathematica Home Edition?",
      "url": "https://mathematica.stackexchange.com/questions/195184/what-computer-would-be-fastest-for-mathematica-home-edition"
    },
    {
      "title": "Slither Like a Snake",
      "url": "https://codegolf.stackexchange.com/questions/183153/slither-like-a-snake"
    },
    {
      "title": "How is simplicity better than precision and clarity in prose?",
      "url": "https://writing.stackexchange.com/questions/44589/how-is-simplicity-better-than-precision-and-clarity-in-prose"
    }
  ],
  "related": [
    {
      "answer": {
        "url": "https://stackoverflow.com/q/336859?rq=1",
        "votes": "6394"
      },
      "title": "var functionName = function() {} vs function functionName() {}",
      "url": "https://stackoverflow.com/questions/336859/var-functionname-function-vs-function-functionname?rq=1"
    },
    {
      "answer": {
        "url": "https://stackoverflow.com/q/40465047?rq=1",
        "votes": "173"
      },
      "title": "How can I mock an ES6 module import using Jest?",
      "url": "https://stackoverflow.com/questions/40465047/how-can-i-mock-an-es6-module-import-using-jest?rq=1"
    }
  ]
}