npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@enterprise_search/indexer

v0.8.37

Published

cli for indexer

Downloads

2,611

Readme

Index

This is a tool to index content for elastic search or other search engines, and to provide support for Document Level Security (DLS). This means that we ensure only people that can see the original document can see the search result.

Installation

npm i -g @itsmworkspace/indexer # Needs to be run as a adminstrator or super user

Requirements:

  • Node.js 16 or higher (it might work with lower versions, but it is not tested)

Example usage

Indexing

The main command is index, and we use the indexer sub command.

index indexer index  # Just indexes all the data
index indexer index --api --keep # indexes all the data and launches an api for looking at metrics and keeps running when finished
index indexer index --since 1d # Just index things that have changed in the last day

Api Key Management

Remember to use --help to get more information on the commands especially command configuration (which user, the url, the username and password...)

Also be aware that the apiKeys are about 'an environment'. See the --help for more information

index apikey add [email protected] # Generates a DSL api key for the user, invalidating all other DSL api keys
index apikey api # creates an api with end point /apikey/[email protected] that returns the api key for the user (invalidating all others)
index apikey remove [email protected] # Invalidates all the DSL api keys for the user.
index apikey id [email protected]     # Shows all the data that the user can access with their api key 

Pushing data to elastic search

index es push # Pushes all the data to elastic search
index es push --help # Show options for the push command
index es push --elastic-search http://localhost:9200 # Pushes all the data to the elastic search at the given url

Configuration

There is a file call indexer.yaml that is used to configure the indexer. The file is located in the same directory as the indexer executable.

The file is in YAML format and has the following structure:

Defaults

The first section of the file is the defaults

defaults:
  # here we set defaults values. For example retry policies and throttles.
  query: # Here all the defaults for queries. The 'getting of data out of the source' for example out of Jira, or Confluence or gitlab...
    retryPolicy:
      initialInterval: 2000; # In milliseconds
      maximumInterval: 10000; # In milliseconds
      maximumAttempts: 5
      multiplier?: 2. # For exponential backoff. Default is 2
      nonRecoverableErrors?: string[]; // List of errors that should not be retried. 'Not Found' is the one commonly used
    throttle:
      max: 100  # Imagine tokens in a jar. This is the size of the jar. To do a request you need to take a token from the jar.
      tokensPer100ms: 0.1 # this is how many tokens are added to the jar every 100ms
      throttlingDelay: 50;    // Max random delay before retrying if have run out of tokens in ms defaults 50ms
      countOnTooManyErrors: -500 // If we get a 429 error we will set the number of tokens to this and also reduce the tokensPer100ms a bit
    auth: # see below of other options
      method: ApiKey
      credentials:
        apiKey: "{source}_APIKEY"
  target: # Now we have the defaults for the target. This is currently just storing a file system but I expect to add more
    retry: { } # Same as above 
    throttle: { } # Same as above. Probably not needed for the file system.
    file: "target/index/{index}/{name}_{num}.json" # The filename we write to
    max: 10000000 # The maximum number of documents in a file. When this is reached a new file is created

Sources

The next section is where we get data from. Here is a sample

index: # Must be the word index
  jiraAcl: # This is the name of the source. Other things like index name and type default to this
    scan: # Where we get the data from
      groupMembersFile: 'group.members.csv'
      index: '.search-acl-filter-jira-prod'
  jiraProd:
    type: jira
    scan:
      auth:
        method: 'ApiKey'
        credentials:
          apiKey: 'JIRA_PROD'
      index: jira-prod
      projects: "*"
      baseurl: "https://jira.eon.com/"
      apiVersion: "2"

Authorisation

export type EntraIdAuthentication = {
  method: 'EntraId';
  credentials: {
    tenantId?: string
    clientId: string;            // Public identifier for the app
    clientSecret: string;        // Secret used to authenticate the app and obtain tokens
    scope: string
  };
};
export type BasicAuthentication = {
  method: 'Basic';
  credentials: {
    username: string;
    password: string;
  };
};
export type ApiKeyAuthentication = {
  method: 'ApiKey';
  credentials: {
    apiKey: string;
  };
};
export type PrivateTokenAuthentication = {
  method: 'PrivateToken';
  credentials: {
    token: string;
  };
};
export type NoAuthentication = {
  method: 'none';
};

Tika

Apache Tika is used to process files such as PDFs, Word documents, etc. The configuration for Tika is as follows:

tika:
  jar: "../../../tika-server-standard-2.9.2.jar"
  protocol: http
  host: 127.0.0.1
  port: 9997

I think they are obvious in their meaning. Note that during indexing or similar operations the index launches the tika server using the jar, and kills it at the end.

Pipelines.yaml

Currently this is a second config file. We may merge it with indexer.yaml in the near future. It is used by commands like

index es makeIndexes                      # This will create the indexes in elastic search
index es pipeline                         # This will create the pipelines in elastic search
index es remakeIndicies --all             # This will delete all the indexes and recreate them.
index es remakeIndicies --index jira-prod # This will delete the index jira-prod and recreate it.
index es remakeIndicies --all --noPush    # This will delete all the indexes and recreate them but not push the data
  

This controls the 'digest pipeline' for elastic search

Here is a sample section

jira-prod-pipeline: # Because I messed up... known issue... this name is the name of the index with -pipeline on the end. Otherwise it doesn't work properly
  index: jira-prod
  fields:           # These fields will be included in the searches
    - issue
    - comments
    - description
    - priority
    - status
  shorten:         # clip the length of this field. This is so that you can store a summary but not all the data
     description: 200
  remove:          # And remove these fields from the index
    - comments
    - full_text