npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

github-indexer

v1.0.0

Published

Grabs data from GitHub and pushes it to an Elasticsearch instance

Downloads

7

Readme

github-indexer

Grabs data from GitHub and pushes it to an Elasticsearch instance

oclif Version CircleCI Downloads/week License

Introduction

This script has been created to easily export Data from GitHub and import it into an Elasticsearch instance.

Whenever possible (i.e. issues, milestones, projects), it loads data sorted by the updated date in descending order (most recent first) and will stop as soon as it find the same node already in Elasticsearch. This way, first load takes some time, then you can just cron it to keep your Elasticsearch instance up to date.

The overall logic is articulated around 3 stages:

  • Identify repositories to load data from
  • [OPTIONAL] Select which repository to load data from by editing ~/.config/github-indexer/repositories.yml and applying the changes by running github-indexer cfRepos
    • or force repos to be active during initial fetch by running ghRepos with the -f flag: github-indexer ghRepos YOUR_OPTIONS -f
  • Load data from the selected repositories (for example github-indexer ghIssues to load issues)

You can then re-run the scripts at regular interval to fetch the updated nodes.

Note: GitHub doesn't provide a mechanism to fetch new or updated labels so the script will (flush the index and)load all labels every time ghLabels is executed.

Quick start with Docker

You can use github-indexer docker image to get started quickly.

For example, to pull all repositories from an org:

Fetch the latest image

docker pull zencrepes/github-indexer:latest

Run

docker run -it --rm \
-e ES_NODE='https://username:[email protected]:9200' \
-e GITHUB_TOKEN='YOUR TOKEN HERE' \
zencrepes/github-indexer:latest github-indexer ghRepos -g org -o YOUR_ORG -f

Or in a shell (you can then use github-indexer commands)

docker run -it --rm \
-e ES_NODE='https://username:[email protected]:9200' \
-e GITHUB_TOKEN='YOUR TOKEN HERE' \
zencrepes/github-indexer:latest /bin/ash

Local installation

You can choose to install github-indexer locally, although running docker is probably easier if you're just looking at using github-indexer.

npm install -g github-indexer

Configuration

A configuration file with default settings is automatically generated in ~/.config/github-indexer/config.yml the first time you run the indexer.

Environment variable are also available for some of the configuration settings:

  • ES_NODE: Elasticsearch node (for example: https://username:password@localhost:9200)
  • ES_CA: Path to the ES CA public key (for example: ./cacert.pem)
  • ES_CLOUD_ID: Elastic cloud id
  • ES_CLOUD_USERNAME: Elastic cloud id
  • ES_CLOUD_PASSWORD: Elastic cloud password
  • ES_REPO: Elasticsearch index containing the repository configuration
  • GITHUB_TOKEN: GitHub token for fetching data.
  • GITHUB_LOGIN: GitHub user login to fatch data from (for affiliated mode)
  • GITHUB_INCREMENT: Number of nodes to fetch at a time (max 100)

Environment variable will take precedence over the corresponding settings in the configuration file.

Authentication to the Elasticsearch cluster is possible either through Basic Auth (using ES_NODE only), with SSL (using ES_NODE and ES_CA), or to an Elastic Cloud cluster (using ES_CLOUD_ID, ES_CLOUD_USERNAME and ES_CLOUD_PASSWORD).

Configuration is stored in ~/.config/github-indexer/config.yml, it contains the following settings;

elasticsearch:
  node: 'https://username:[email protected]:9200' # Eleasticsearch node
  sslca: './cacert.pem'             # Path the the public CA cert, or null
  cloud:                            # Elastic Cloud credentials
    id: null
    username: null
    password: null
  indices:
    repos: 'gh_repos'               # Eleasticsearch index containing repository configuration
    issues: 'gh_issues_'            # Prefix for the Elasticsearch index containing issues, one index is created per repository, eg: gh_issues_ORG_REPO
    projects: 'gh_projects_'        # Prefix for the Elasticsearch index containing projects, one index is created for org-level project and one per repository, eg: gh_projects_ORG_REPO
    labels: 'gh_labels_'            # Prefix for the Elasticsearch index containing labels, one index is created per repository, eg: gh_labels_ORG_REPO
    milestones: 'gh_milestones_'    # Prefix for the Elasticsearch index containing milestones, one index is created per repository, eg: gh_milestones_ORG_REPO
    prs: 'gh_prs_'                  # Prefix for the Elasticsearch index containing pull requests, one index is created per repository, eg: gh_prs_ORG_REPO
fetch:
  max_nodes: 30                     # Number of nodes to request from GitHub Graphql API (max: 100), avoid using too high of a number of large repositories
github:
  token: 'TOKEN_HERE'               # GitHub authorization token
  login: 'YOUR_USERNAME'               # GitHub authorization token

All of the configuration settings should be self-explanatory with the exception of max_nodes, which is used to indicate how many root nodes should be fetched from GitHub graphql's API. The maximum number supported by GitHub is 100, but please note that GitHub's GraphQL API can be unstable with large repositories, it is recommended to keep that number around 30 -> 50. A smaller number triggers more smaller call, a larger number triggers less larger calls.

You also need to obtain a GitHub Token, to do so, simply visit: https://github.com/settings/tokens and generate a personal access token. You'll need the following scope: public_repo, read:org, user

Then you simply have to replace TOKEN_HERE with the token you just generated.

Usage

$ npm install -g github-indexer
$ github-indexer COMMAND
running command...
$ github-indexer (-v|--version|version)
github-indexer/0.1.5 darwin-x64 node-v10.16.0
$ github-indexer --help [COMMAND]
USAGE
  $ github-indexer COMMAND
...

Commands

github-indexer cfRepos

Enable/disable repositories by reading the configuration file

USAGE
  $ github-indexer cfRepos

OPTIONS
  -h, --help                         show CLI help
  --esca=esca                        Path to the ES CA public key (for example: ./cacert.pem)
  --escloudid=escloudid              Elastic cloud id
  --escloudpassword=escloudpassword  Elastic cloud password
  --escloudusername=escloudusername  Elastic cloud username
  --esnode=esnode                    Elasticsearch node (for example: https://username:password@localhost:9200)
  --esrepo=esrepo                    Elastic index containing the GitHub repository
  --gincrement=gincrement            GitHub API query increment (max nodes to fetch at a time)
  --glogin=glogin                    GitHub user Login (for fetching user repos)
  --gtoken=gtoken                    GitHub user Token

EXAMPLE
  $ github-indexer cfRepo

See code: src/commands/cfRepos.ts

github-indexer ghIssues

Fetch issues from GitHub

USAGE
  $ github-indexer ghIssues

OPTIONS
  -h, --help                         show CLI help
  --esca=esca                        Path to the ES CA public key (for example: ./cacert.pem)
  --escloudid=escloudid              Elastic cloud id
  --escloudpassword=escloudpassword  Elastic cloud password
  --escloudusername=escloudusername  Elastic cloud username
  --esnode=esnode                    Elasticsearch node (for example: https://username:password@localhost:9200)
  --esrepo=esrepo                    Elastic index containing the GitHub repository
  --gincrement=gincrement            GitHub API query increment (max nodes to fetch at a time)
  --glogin=glogin                    GitHub user Login (for fetching user repos)
  --gtoken=gtoken                    GitHub user Token

EXAMPLE
  $ github-indexer ghIssues

See code: src/commands/ghIssues.ts

github-indexer ghLabels

Fetch labels from GitHub

USAGE
  $ github-indexer ghLabels

OPTIONS
  -h, --help                         show CLI help
  --esca=esca                        Path to the ES CA public key (for example: ./cacert.pem)
  --escloudid=escloudid              Elastic cloud id
  --escloudpassword=escloudpassword  Elastic cloud password
  --escloudusername=escloudusername  Elastic cloud username
  --esnode=esnode                    Elasticsearch node (for example: https://username:password@localhost:9200)
  --esrepo=esrepo                    Elastic index containing the GitHub repository
  --gincrement=gincrement            GitHub API query increment (max nodes to fetch at a time)
  --glogin=glogin                    GitHub user Login (for fetching user repos)
  --gtoken=gtoken                    GitHub user Token

EXAMPLE
  $ github-indexer ghLabels

See code: src/commands/ghLabels.ts

github-indexer ghMilestones

Fetch milestones from GitHub

USAGE
  $ github-indexer ghMilestones

OPTIONS
  -h, --help                         show CLI help
  --esca=esca                        Path to the ES CA public key (for example: ./cacert.pem)
  --escloudid=escloudid              Elastic cloud id
  --escloudpassword=escloudpassword  Elastic cloud password
  --escloudusername=escloudusername  Elastic cloud username
  --esnode=esnode                    Elasticsearch node (for example: https://username:password@localhost:9200)
  --esrepo=esrepo                    Elastic index containing the GitHub repository
  --gincrement=gincrement            GitHub API query increment (max nodes to fetch at a time)
  --glogin=glogin                    GitHub user Login (for fetching user repos)
  --gtoken=gtoken                    GitHub user Token

EXAMPLE
  $ github-indexer ghMilestones

See code: src/commands/ghMilestones.ts

github-indexer ghProjects

Fetch projects from GitHub

USAGE
  $ github-indexer ghProjects

OPTIONS
  -h, --help                         show CLI help
  --esca=esca                        Path to the ES CA public key (for example: ./cacert.pem)
  --escloudid=escloudid              Elastic cloud id
  --escloudpassword=escloudpassword  Elastic cloud password
  --escloudusername=escloudusername  Elastic cloud username
  --esnode=esnode                    Elasticsearch node (for example: https://username:password@localhost:9200)
  --esrepo=esrepo                    Elastic index containing the GitHub repository
  --gincrement=gincrement            GitHub API query increment (max nodes to fetch at a time)
  --glogin=glogin                    GitHub user Login (for fetching user repos)
  --gtoken=gtoken                    GitHub user Token

EXAMPLE
  $ github-indexer ghIssues

See code: src/commands/ghProjects.ts

github-indexer ghPullrequests

Fetch Pull Requests (PRs) from GitHub

USAGE
  $ github-indexer ghPullrequests

OPTIONS
  -h, --help                         show CLI help
  --esca=esca                        Path to the ES CA public key (for example: ./cacert.pem)
  --escloudid=escloudid              Elastic cloud id
  --escloudpassword=escloudpassword  Elastic cloud password
  --escloudusername=escloudusername  Elastic cloud username
  --esnode=esnode                    Elasticsearch node (for example: https://username:password@localhost:9200)
  --esrepo=esrepo                    Elastic index containing the GitHub repository
  --gincrement=gincrement            GitHub API query increment (max nodes to fetch at a time)
  --glogin=glogin                    GitHub user Login (for fetching user repos)
  --gtoken=gtoken                    GitHub user Token

EXAMPLE
  $ github-indexer ghPullrequests

See code: src/commands/ghPullrequests.ts

github-indexer ghRepos

Fetch repositories from GitHub (FIRST STEP, start HERE)

USAGE
  $ github-indexer ghRepos

OPTIONS
  -f, --force                        Make all fetched repositories active by default
  -g, --grab=affiliated|org|repo     (required) Select how to fetch repositories
  -h, --help                         show CLI help
  -o, --org=org                      GitHub organization login
  -r, --repo=repo                    GitHub repository name
  --esca=esca                        Path to the ES CA public key (for example: ./cacert.pem)
  --escloudid=escloudid              Elastic cloud id
  --escloudpassword=escloudpassword  Elastic cloud password
  --escloudusername=escloudusername  Elastic cloud username
  --esnode=esnode                    Elasticsearch node (for example: https://username:password@localhost:9200)
  --esrepo=esrepo                    Elastic index containing the GitHub repository
  --gincrement=gincrement            GitHub API query increment (max nodes to fetch at a time)
  --glogin=glogin                    GitHub user Login (for fetching user repos)
  --gtoken=gtoken                    GitHub user Token

EXAMPLES
  $ github-indexer ghRepo -g affiliated
  $ github-indexer ghRepo -g org -o jetbrains
  $ github-indexer ghRepo -g repo -o microsoft -r vscode

See code: src/commands/ghRepos.ts

github-indexer help [COMMAND]

display help for github-indexer

USAGE
  $ github-indexer help [COMMAND]

ARGUMENTS
  COMMAND  command to show help for

OPTIONS
  --all  see all commands in CLI

See code: @oclif/plugin-help

github-indexer init

Initialize the configuration file

USAGE
  $ github-indexer init

OPTIONS
  --esca=esca                        Path to the ES CA public key (for example: ./cacert.pem)
  --escloudid=escloudid              Elastic cloud id
  --escloudpassword=escloudpassword  Elastic cloud password
  --escloudusername=escloudusername  Elastic cloud username
  --esnode=esnode                    Elasticsearch node (for example: https://username:password@localhost:9200)
  --esrepo=esrepo                    Elastic index containing the GitHub repository
  --gincrement=gincrement            GitHub API query increment (max nodes to fetch at a time)
  --glogin=glogin                    GitHub user Login (for fetching user repos)
  --gtoken=gtoken                    GitHub user Token

EXAMPLE
  $ github-indexer init

See code: src/commands/init.ts

Develop

git clone https://github.com/zencrepes/github-indexer.git
npm install 

Build and publish

tsc -b
npm publish