npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

cham-jp-deck

v4.0.2

Published

Japanese flashcard deck generator from JMdict, KANJIDIC, and KanjiVG for Anki in CSV format

Downloads

12

Readme

Cham JP Deck

This simple code generates CSV table from JMdict, KANJIDIC, and other sources. The CSV can be imported as a deck in Anki.

Usage

  1. Download JMdict_e.gz and kanjidic2.xml.gz file.

  2. Create a new folder and open it.

  3. Extract the JMdict and KANJIDIC file. Your folder structure should look like this:

    your_new_folder
    ├── JMdict_e
    └── kanjidic2.xml

    The file names must be the same as above.

  4. Open the folder in terminal and run following commands:

    npm install -g cham-jp-deck
    chamjpdeck-csv
  5. If it runs successfully, you'll get the result as cham_jp_deck.csv.

FAQ

Which words and kanji characters are included?

  • Words from JMdict that contains has frequency tags and contains kanji.
  • Words from JMdict that don't have frequency tags but appear in ManyThings examples, Wanikani vocabs, Core 6K, Core 10K, Tanos JLPT vocabs, and Netflix Frequency 12K.
  • Kanji letters from KANJIDIC, starting from kanken 10 to kanken 1.

How the items are sorted?

First, all kanji letters are sorted by kanji frequency list in KANJIDIC. After each kanji, sample vocabs are added based on kanji that appeared previously.

What is JMdict frequency tag

Taken from https://www.edrdg.org/jmdict/edict_doc.html

Word Priority Marking

The ke_pri and equivalent re_pri fields in the JMdict file are provided to record information about the relative commonness or priority of the entry, and consist of codes indicating the word appears in various references which can be taken as an indication of the frequency with which the word is used. This field is intended for use either by applications which want to concentrate on entries of a particular priority, or to generate subset files. The current values in this field are:

  • news1/2: appears in the "wordfreq" file compiled by Alexandre Girardi from the Mainichi Shimbun. (See the Monash ftp archive for a copy.) Words in the first 12,000 in that file are marked "news1" and words in the second 12,000 are marked "news2". ichi1/2: appears in the "Ichimango goi bunruishuu", Senmon Kyouiku Publishing, Tokyo, 1998. (The entries marked "ichi2" were demoted from ichi1 because they were observed to have low frequencies in the WWW and newspapers.)
  • spec1 and spec2: a small number of words use this marker when they are detected as being common, but are not included in other lists.
  • gai1/2: common loanwords, also based on the wordfreq file. nfxx: this is an indicator of frequency-of-use ranking in the wordfreq file. "xx" is the number of the set of 500 words in which the entry can be found, with "01" assigned to the first 500, "02" to the second, and so on.

Entries with news1, ichi1, spec1/2 and gai1 values are marked with a "(P)" in the EDICT and EDICT2 files.

What are the columns in the csv file?

  • word
  • index
  • index_kanjidic_freq
  • index_netflix
  • index_core6k
  • sources
  • kanjidic_details
  • kanjidic_misc
  • jmdict_details
  • jmdict_freq
  • ccd_details
  • kanji_id
  • kanji_ids
  • tags

30k+ cards are too much, how do I reduce the items?

Yeah, I don't recommend to learn all of them. You can always suspend and unsuspend using the tags and sources field.

If you are trying to learn the joyo kanji, try this steps:

  • Suspend all the cards "deck:Cham JP Deck".
  • Unsuspend the joyo kanji "deck:Cham JP Deck" (tag:kanji_kanshudo_joyo or tag:kanji_jitenon_kanken2).
  • Unsuspend the first 500 vocabs from frequency-of-use ranking by using this filter "deck:Cham JP Deck" tag:nf01.
  • Up until this steps, you will have around 2583 items to learn. I think these are not sufficient.
  • Add more samples by unsuspending even more frequency-of-use vocabs using this filter "deck:Cham JP Deck" (tag:nf02 or tag:nf03 or tag:nf04 or tag:nf05 or tag:nf06 or tag:nf07 or tag:nf08 or tag:nf09 or tag:nf10 or tag:nf11 or tag:nf12 or tag:nf13 or tag:nf14 or tag:nf15).
  • Check the result "deck:Cham JP Deck" -is:suspended. You'll have around 8910 items.
  • Assuming you study 15 new items everyday, you will complete all items in around 1 year 8 months.

As comparison, Wanikani has 8330 total items (excluding radicals).

TLDR; Suspend eveything, unsuspend "deck:Cham JP Deck" ((tag:kanji_kanshudo_joyo or tag:kanji_jitenon_kanken2) or (tag:nf01 or tag:nf02 or tag:nf03 or tag:nf04 or tag:nf05 or tag:nf06 or tag:nf07 or tag:nf08 or tag:nf09 or tag:nf10 or tag:nf11 or tag:nf12 or tag:nf13 or tag:nf14 or tag:nf15)).

I also use Wanikani or iKnow Core 6000, how do I remove the items to avoid overlapping?

For Wanikani: use "deck:Cham JP Deck" sources:*wk* filter and suspend the items.

For Core 6K: use "deck:Cham JP Deck" tag:vocab_core6k filter and suspend the items.

I don't want to include Kanji Kentei level 1.

Filter the deck "deck:Cham JP Deck" (tag:kanji_jitenon_kanken1jyun or tag:kanji_jitenon_kanken1 or tag:kanji_jitenon_kanken1alt) and suspend them.

I found an entry that doesn't have any details JMDict and KANIDIC.

Yep.

  • Some letters in Kanji Kentei level 1 are not found in KANJIDIC. Use this filter "deck:Cham JP Deck" kanjidic_details:"" tag:kanji and suspend them.
  • Some words exist in external sources (e.g. Wanikani or ManyThings) but they don't exist in JMdict. Use this filter "deck:Cham JP Deck" jmdict_details:[] -tag:kanji and suspend them.

Is there audio in this deck?

Nope. But I'm planning to use this WaniKani open souce pronounciation audio.

Development

git clone https://github.com/echamudi/cham-jp-deck
cd cham-jp-deck

# Put JMdict_e and kanjidic2.xml in the current folder

node cli-csv.js     # run this to generate CSV
node cli-json.js    # run this to generate JSON

Acknowledgment

The items are collected from following sources:

  • JLPT Study by Peter van der Woude https://jlptstudy.net
  • JMdict https://www.edrdg.org/jmdict/j_jmdict.html
  • Japanese Core 6000 (Core 6K) https://iknow.jp/content/japanese
  • Japanese Sensei (Core 10K) http://en.colezhu.com/jsensei/
  • Jonathan Waller JLPT Lists http://www.tanos.co.uk/jlpt/
  • KANJIDIC Project http://www.edrdg.org/wiki/index.php/KANJIDIC_Project
  • KanjiVG by Ulrich Apel https://github.com/KanjiVG/kanjivg
  • Kanjium by Uros Ozvatic https://github.com/mifunetoshiro/kanjium
  • ManyThings Kanji Dictionary http://www.manythings.org/kanji/d/index.html
  • Netflix 12K list https://www.youtube.com/watch?v=DwJWld8hW0M
  • Wanikani https://wanikani.com
  • 日本漢字能力検定級別漢字表 https://www.kanken.or.jp/kanken/outline/degree.html

Authors

License

Copyright © 2020 Ezzat Chamudi

Cham JP Deck code is licensed under MPL-2.0. Images, logos, docs, and articles in this project are released under CC-BY-SA-4.0.

Libraries, dependencies, and tools used in this project are tied with their licenses.