npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

news-datarepo-core

v1.0.0

Published

core of news.datarepo.io; tools to enable simple retreival, caching, and querying of news data

Downloads

1

Readme

News.DataRepo Core

Usage

installation

  1. setup database and database config
  2. define configs for sources you want to use

retreive / subscribe - to a request from a source

  • core wraps sources, which you can initialize from a config.json object, and caches requests that you send from it
    • subscription runs retreivals on a cron and asks cache not to associate duplicate articles

Source Config

  • cache_identifier :
    • defined for a source for each set of defaults. should be defined based on [model_name, defaults]
    • "auto" can be used to define the cache_identifier as md5(model_name+"-"+JSON.stringify(init_args))
      • NOTE: this is dangerous as if you use APIKeys your apikey is stored in a cryptographically insecure way

Notes

Purpose:

REST API interface for retreiving news about any stock index or other query - e.g., NYSE:F, NYSE:SP500 - e.g., trucks, cars, ...

key features

  • send queries to sources, cache the response, return all data
  • "subscribe" to a data feed
    • define what queries to run for the subscription, define title of subscription, return all news results for subscription
    • run "every X"
  • TODO : search all articles based on
    • key words (in titles)(in text?)
    • date
  • TODO: enables users to scrape the data easily into their own servers
    • should give lots of warning that they should not share this data as it is copy writed

difference:

Difference between this and newsapi.org is:

  • not a service that aggregates all news data, this is software that enables developers to setup their own aggregation servers
    • not as in depth as newsapi.org
  • e.g., launch the software on your own server and retrieve data

can enable

  • "subscribe to reports and store data" service where users dont have to make their own dbs

usage

Launch on own server or use NewsOracle.Org for keeping data cached (db service) + get data from their cache

Data Stored

    {
        timestamp : __, // retreived from sources
        title : __, // retreived from sources
        description : __, // retreived from sources
        url : __, // retreived from sources
    }

Data Sources:

  • aggregators
    • newsapi
    • google_news_rss
  • publishing companies
    • reuters
    • ...

interfacing with sources

  • handled as configurable "plugins"
    • e.g., plugin sourcing from reuters
    • e.g., plugin sourcing from newsapi and plugin your apikey

should support:

  • api access to data
  • manual access to data
  • free access to data from past month
    • up to N requests
  • cheap access to one bulk request (e.g., for research)

should be able to:

be initialized from a config file:

  • sources to be used
    • including all initialization params for each source
  • db credentials
    • which are used to ensure tables are setup and for caching in future