npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@digitallinguistics/concordance

v0.4.0

Published

A Node.js library for concordancing a corpus formatted according to the Data Format for Digital Linguistis (DaFoDiL)

Downloads

16

Readme

Concordance

GitHub releases status issues npm downloads DOI license GitHub stars

The Digital Linguistics (DLx) Concordance library is a Node.js library for creating a concordance of words in a corpus (a collection of texts in a language) which is formatted according to the Data Format for Digital Linguistics (DaFoDiL) (a JSON-based format). It is useful for anybody doing research involving linguistic corpora. If your data are not yet in DaFoDiL format, there are several converters available here.

This library produces a tab-delimited file containing information about each token (instance) of the words specified. By default, the concordance is generated in Keyword in Context (KWIC) format, where the word is listed along with the immediately preceding and following context. An example of a partial concordance of the word little in The Three Little Pigs is shown in KWIC format below.

text | utterance | word | pre | token | post | ---- | --------- | ---- | -------------------------: | :----: | ------------------------ | 3LP | 1 | 14 | mother pig who had three | little | pigs and not enough food | 3LP | 3 | 3 | The first | little | pig was very lazy. | 3LP | 5 | 3 | The second | little | pig worked a little bit | 3LP | 5 | 7 | second little pig worked a | little | bit harder but he was | 3LP | 7 | 3 | The third | little | pig worked hard all day |

NOTE: This project is still in initial development phases, but should be ready for initial release by the end of September 2019.

Basic Usage

This following examples process any JSON files in the current directory and output a concordance file to concordance.tsv in Keyword in Context format. At a minimum, the concordance function requires a single argument: a wordform or list of wordforms to concordance.

As a module:

const concordance = require(`concordance`)

const wordforms = [`little`, `big`];

concordance({ wordforms });

On the command line:

dlx-conc -k --wordforms=little,big

Note: The Keyword in Context format is not enabled by default. It must be enabled by passing the -k or --kwic flag.

Options

The available options are listed below.

Module | Command Line | Default | Description ------------ | ------------------ | ------------------- | ----------- context | -c, --context | 10 | the number of words to show to either side of the token (if the KWIC option is set to true) dir | -d, --dir | "." | the directory where the corpus is located KWIC | -k, --KWIC | false | whether to create the concordance in Keyword in Context format; adds pre and post columns to the concordance if true outputPath | -o, --outputPath | "concordance.tsv" | path where the concordance file should be generated wordforms | -w, --wordforms | [] | a string or list of strings of words to concordance (formatted as an array when using as a module, and as a comma-separated list when using on the command line) wordlist | -l, --wordlist | undefined | path to a file containing a JSON array of words to concordance

Contributing

Report an issue or suggest a feature here.

Pull requests are very welcome. Please make sure you've opened and issue for your change first.

No test suite was written for this library, but you can test the results with npm test. A test concordance will be generated at test/concordance.tsv.

About

This library is authored and maintained by Daniel W. Hieber. Please consider citing this library following the model below:

Hieber, Daniel W. 2019. digitallinguistics/concordance. DOI:10.5281/zenodo.3464144