npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

curtiz-japanese-nlp

v2.0.1

Published

Annotate Curtiz2 Markdown files with Japanese natural language parsing

Downloads

7

Readme

curtiz-japanese-nlp — WIP ☣️☢️🧪🧫🧬⚗️☢️☣️

N.B. All references to “Curtiz” are to version 2 of the Curtiz format (using @ symbols), and not version 1 (using lozenges).

Curtiz version 2 soft-specification

The Curtiz Markdown format for defining Japanese flashcards uses Markdown headers, e.g., the following header:

@ 私 @ わたし

which is ### @ 私 @ わたし in the original Markdown, as flashcards. That is, a flashcard-header has # symbols, whitespace, @ symbol, whitespace, and then arbitrary text, separated by one or more @ separators. (@ was chosen because it is easy to type on mobile and real keyboards, in Japanese and English.) The first text is treated as the prompt: that’s what the flashcard app will show. Text after the prompt are taken as acceptable answers.

So the following header accepts three answers for the same prompt:

@ 私 @ わたし @ わたくし @ あたし

Next, any bullets immediately following the at-header that are themselves followed by @ are treated as Curtiz-specific metadata.

Example:

@ 僕の花の色 @ ぼくのはなのいろ

  • @fill 僕[の]花
  • @fill 花[の]色
  • @ 僕 @ ぼく @pos pronoun
  • @ 花 @ はな @pos noun-common-general
  • @ 色 @ いろ @pos noun-common-general

This example demonstrates both sub-quizzes that are currently supported:

  • @fill allows for a fill-in-the-blank (perhaps where the prompt is shown, minus the text to be filled in), and
  • @ indicates a flashcard just like the @-headers: prompt @ response. These are amenable to plain flashcards on their own as well as fill-in-the-blank in the sentence. If the sub-prompt (in this bullet) cannot be found or uniquely determined in the header's prompt, then an @omit adverb can be optionally used to indicate the portion of the header prompt to be hidden. The optional @pos adverb contains the part-of-speech (as determined by MeCab), and facilitates disambiguiation of flashcards.

Both these optional adverbs are demonstrated below.

@ このおはなしを話す @ このおはなしをはなす

  • @fill を
  • @ 話 @ はなし @pos noun-common-verbal_suru @omit はなし
  • @ 話す @ はなす @pos verb-general

This module's features

This module uses MeCab with the UniDic dictionary, and J.DepP bunsetsu chunker to add readings, @fill-in-the-blank quizzes, and @ flashcards into a Curtiz Markdown file.

Make sure you have these three applications installed before attempting to use this!

It will add a reading to the header if none exists.

It will add sub-quizzes (@fill and @) if there is a special bullet under the header: - @pleaseParse.

Usage

This package provides:

  1. a command-line utility that will consume an input file or standard input, and spit out the Markdown file annotated with readings and sub-quizzes; and
  2. a JavaScript library to do this programmatically.

Command-line utility

The command-line utility can be invoked on a file or can consume standard input. Make sure you have Node.js installed, then in your terminal (Terminal.app in macOS, Command Prompt in Windows, xterm in Linux, etc.), run either of the following:

$ npx curtiz-japanese-nlp README.md

and replace README.md with the path to your Markdown file, or

$ cat README.md | npx curtiz-japanese-nlp

Library API

Install this package into your JavaScript/TypeScript package via

$ npm install curtiz-japanese-nlp

Then in your JavaScript code, you may:

const curtiz = require('curtiz-japanese-nlp'); 

In TypeScript or with ES5 modules, you may:

import * as curtiz from 'curtiz-japanese-nlp';

The following functions will then be available under the curtiz namespace.

async function parseHeaderBlock(block: string[]): Promise<string[]>

A block is an array of strings, one line per element, with the first line assumed to contain a Markdown header block (something starting with one or more # hash symbols).

parseHeaderBlock returns a promise of an array of strings, which will contain annotated Markdown.

This is the core function provided by this library.

The remaining functions below are helper utility functions.

function splitAtHeaders(text: string): string[][]

This is intended to split the contents of a file (a single string text) into an array of blocks (each block being an array of strings itself, each string being a line of Markdown).

async function parseAllHeaderBlocks(blocks: string[][], concurrentLimit: number = 8)

This is intended to annotate an array of blocks (blocks), each block being an array of strings and each string being a line of Markdown.

The concurrentLimit argument allows you to limit the number of concurrent system calls to mecab and jdepp that are being made.

A promise for an array of blocks is returned.

You can use both these helper functions along with the primary function as follows, assuming you are inside an async function:

let annotated = await curtiz.parseAllHeaderBlocks(curtiz.splitAtHeaders(fs.readFileSync('README.md', 'utf8')));
console.log(annotated.map(s => s.join('\n')).join('\n'));

The first line slurps the contents of README.md and splits it into blocks at Markdown header boundaries, then annotates them all.

The second line logs the entire annotated Markdown.