npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@s2s/language-service-client

v0.0.5

Published

Client for the language service

Downloads

3

Readme

Docker Microservice for Language Detection and Text Extraction

The service

The service is based on CLD2Owners's Compact Language Detector 2, using commoncrawl's java wrapper, for language detection, and on Apache Tika for text extraction.

It's a docker container hosted at scotti2scotti/language-service, to pull:

docker pull scotti2scotti/language-service

It's a web service listening on 0.0.0.0:5656 by default, you may configure the host and port via two ENV variables: LS_HOST and LS_PORT:

docker run -d -e LS_HOST=${LS_HOST} -e LS_PORT=${LS_PORT} -p 5656:${LS_PORT} --rm --name language-service language-service

The web service has two POST endpoints:

POST /?url=
POST /stream

The first accepts the url of the resource you want to analyze as the value of the query string key url.

The second accepts the content of the resource as the body of the request.

Both endpoints return a json object of the following form, with some [...] of truncated values for brevity:

POST /?url=https://scotti2scotti.com/about
{
  "text": " scotti&scotti home notes Web Applications NLP DevOps Web Applications are at the core of scotti&scotti [...] ",
  "language": {
    "language": 0,
    "languages": ["ENGLISH"],
    "reliable": true,
    "languageName": "ENGLISH",
    "languageCode": "en",
    "languageCodeISO639_3": "eng",
    "languageCodes": ["en"]
  },
  "meta": {
    "og:image": "https://scotti2scotti.com/images/s2s-logo-300.png",
    "X-Parsed-By": "org.apache.tika.parser.DefaultParser",
    "og:type": "website",
    "keywords": "web applications, natural language processing, software, database",
    "og:title": "scotti&scotti software house: the craft of knowledge",
    "description": "scotti&scotti llc is the software house of web applications [...] ",
    "title": "scotti&scotti software house: the craft of knowledge",
    "og:description": "scotti&scotti llc is the software house of web [...]",
    "X-UA-Compatible": "ie=edge",
    "viewport": "width=device-width, initial-scale=1.0",
    "dc:title": "scotti&scotti software house: the craft of knowledge",
    "Content-Encoding": "UTF-8",
    "og:url": "https://scotti2scotti.com",
    "Content-Language": "en",
    "Content-Type": "text/html; charset=UTF-8",
    "format-detection": "telephone=no"
  }
}

where:

  • text is the extracted text
  • language is an object whose keys are defined by CLD2Owners
  • meta are the meta tags extracted by tika

The client

A very simple javascript client written in typescript is available as an npm package:

npm i -S @s2s/language-service-client

It defines a few interfaces and a class:

export interface LanguageInfo {
  text: string;
  language: {
    language: number;
    languages: string[];
    languageName: string;
    languageCode: string;
    reliable: boolean;
    languageCodeISO639_3: string;
    languageCodes: string[];
  };
  meta: {
    [key: string]: string;
  };
}
export interface ClientOptions {
  protocol: "http" | "https";
  host: string;
  port: number;
}
export declare const DefautlOptions: ClientOptions;
export declare class Client {
  constructor(options?: ClientOptions);
  readonly url: (url: string) => Promise<LanguageInfo>;
  readonly stream: (stream: Stream) => Promise<LanguageInfo>;
}

License

Source code and docker image are Apache 2.0 licensed. Also CLD2, weslang, commoncrawl and all dependencies use the Apache 2.0 license.