npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@kikinit/mddoc-toolkit

v2.1.2

Published

A TypeScript library for extracting key sections from Markdown-based README files, offering customizable extensions (contexts) for specific use cases like code repositories.

Downloads

340

Readme

Markdown Documents Toolkit (mddoc-toolkit)

A Typescript library for extracting key sections from Markdown-based files. It works as a general-purpose parser, with customizable extensions (contexts) for specific types of markdown files, such as READMEs in code repositories. The toolkit helps retrieve important information like installation instructions, usage examples, and more, while being adaptable to different README formats.

Features

  • General-purpose Markdown processor with support for multiple heading styles (#, ##, etc.) and underline notation (===, ---).
  • Context-based functionality: Extend the processor for specific document types, such as repository README files or changelogs.
  • Extracts key sections such as:
    • Project title and description
    • Installation instructions
    • Usage examples
    • Contribution guidelines
    • License information
  • Easily customizable for specialized data extraction tasks and additional metadata processing.
  • Supports both file-based and direct markdown content processing.

Planned Features

  • Detects and parses links to external resources (e.g., GitHub repositories, websites).
  • Detects and parses badges and shields from README files.
  • Additional context-specific processors for diverse documentation types (e.g., API docs, blogs, Open Source Project Readmes).
  • Better support for npm-specific README formats, including package metadata extraction.
  • Better support for changelog files, including version comparison and release notes extraction.

Installation

To import the toolkit as a module in your project, you can use npm:

npm install @kikinit/mddoc-toolkit

To install the Readme Toolkit manually, you can clone this repository or include it as part of your project.

git clone https://github.com/kikinit/mddoc-toolkit.git

Compile the Typescript Code

npm run build

Quick Start

A quick start guide to get an idea of how the library works and how you could use it in your project.

Example Usage with RepoReadmeProcessor

import { RepoReadmeProcessor } from 'mddoc-toolkit'

// Example input: The file path of a README.md file
const readmePath = './readme.md'

// Initialize the RepoReadmeProcessor using file path
const repoProcessor = new RepoReadmeProcessor(readmePath, true) // 'true' indicates it's a file path; 'false' for direct markdown content.

// Extract installation instructions
const installationSections = repoProcessor.installationInstructions
console.log('Installation Instructions:', installationSections)

// Extract usage examples
const usageSections = repoProcessor.usageExamples
console.log('Usage Examples:', usageSections)

Example Output:

{
  "installationInstructions": [
    {
      "level": 2,
      "heading": "Installation",
      "body": "To install, run the following commands:\n```bash\nnpm install\n```"
    }
  ],
  "usageExamples": [
    {
      "level": 2,
      "heading": "Usage",
      "body": "To use the project, run:\n```bash\nnpm start\n```"
    }
  ]
}

Extended Usage Examples

Using the Markdown Parser without Contexts for Basic Markdown Parsing

Using the Markdown Parser using filepath

import { MarkdownParser } from 'mddoc-toolkit'

// Example markdown file path
const markdownFilePath = './README.md'

// Initialize the parser using a file path
const parser = new MarkdownParser(markdownFilePath, true) // 'true' indicates it is a file path

// Get all sections
const sections = parser.sections
console.log('Parsed sections:', sections)

// Get the title of the markdown (first h1)
const title = parser.title
console.log('Title of the markdown:', title)

Using MarkdownParser for Direct Markdown Content

import { MarkdownParser } from 'mddoc-toolkit'

// Example markdown content
const markdownContent = `
# My Project
## Installation
Run npm install
`

// Initialize the parser with direct markdown content
const parser = new MarkdownParser(markdownContent, false) // 'false' indicates it's direct content

// Get all sections
const sections = parser.sections
console.log('Parsed sections:', sections)

// Get the title of the markdown (first h1)
const title = parser.title
console.log('Title of the markdown:', title)

Using the Markdown Parser with Contexts

By instantiating the parser with a specific context, you can extract additional information from the markdown file. The toolkit includes several built-in contexts, such as RepoReadmeProcessor, NpmReadmeProcessor and ChangelogProcessor, which provide specialized functionality for README and changelog files.

Using the Repo Readme Processor Context for README Files

import { RepoReadmeProcessor } from 'mddoc-toolkit'

// Example repo README file path
const repoReadmePath = './mock-repo-readme.md'

// Initialize the RepoReadmeProcessor using file path
const repoProcessor = new RepoReadmeProcessor(repoReadmePath, true) // 'true' indicates it is a file path

// Get installation instructions
const installation = repoProcessor.installationInstructions
console.log('Installation Instructions:', installation)

// Get usage examples
const usage = repoProcessor.usageExamples
console.log('Usage Examples:', usage)

Using the Changelog Processor Context for Changelog Files

import { ChangelogProcessor } from 'mddoc-toolkit'

// Example Changelog file path
const changelogPath = './CHANGELOG.md'

// Initialize the ChangelogProcessor using file path
const changelogProcessor = new ChangelogProcessor(changelogPath, true) // 'true' indicates it is a file path

// Get unreleased changes
const unreleasedChanges = changelogProcessor.unreleasedChanges
console.log('Unreleased Changes:', unreleasedChanges)

// Get added features
const addedFeatures = changelogProcessor.addedFeatures
console.log('Added Features:', addedFeatures)

// Get changes between versions
const updates = changelogProcessor.getUpdatesBetweenVersions('1.2.0', '1.4.0')
console.log('Updates between versions:', updates)

Using a Custom Dictionary

With a custom dictionary, you can define additional keywords and phrases to extract specific sections from the markdown file. This is useful for README files with non-standard formats or specialized content that requires custom parsing logic. The custom dictionary should be a JSON file with key-value pairs mapping the section names to their corresponding keywords or phrases. Passing a custom dictionary to the processor will override the default dictionary.

import { RepoReadmeProcessor } from 'mddoc-toolkit'

// Example repo README with custom dictionary
const repoReadmePath = './mock-repo-readme.md'
const customDict = './custom-dictionary.json'

// Initialize the processor with a custom dictionary
const processorWithCustomDict = new RepoReadmeProcessor(repoReadmePath, true, [
  // 'true' indicates it is a file path
  customDict,
])

// Extract installation instructions using the custom dictionary
const installation = processorWithCustomDict.installationInstructions
console.log('Installation Instructions with custom dictionary:', installation)

API Documentation

This section provides an overview of the main public methods and properties of the MarkdownParser, RepoReadmeProcessor, ChangelogProcessor, and other key classes within the toolkit.

MarkdownParser

The MarkdownParser class is responsible for parsing and extracting structured sections from a Markdown file. This can be extended by context-specific processors (e.g., RepoReadmeProcessor, ChangelogProcessor) to handle specialized data extraction tasks.

Public Methods

  • sections

    Type: Section[]

    Description: Returns all sections parsed from the markdown content. Each section contains the heading level, title (heading), and body.

  • title

    Type: string

    Description: Returns the first h1 title from the markdown file.

    Throws: An error if no h1 heading is found.

  • getHeadingLevels(level: number | null)

    Type: number | Record<number, number>

    Description: Returns the count of headings for a specific level. If no level is provided, it returns a count of all heading levels.

    Parameters:

    • level: The level of headings to count. If null, counts for all levels are returned.
  • getSectionsByHeading(keyword: string)

    Type: Section[]

    Description: Retrieves an array of sections whose headings contain the specified keyword (case-insensitive).

    Parameters:

    • keyword: The keyword to search for in section headings.

    Throws: An error if no sections with the provided keyword are found.

  • getSectionsByKeywordsInDictionary(sectionType: string)

    Type: Section[]

    Description: Retrieves all sections based on keywords from the dictionary provided during initialization. This method returns all sections that match any keyword associated with the given sectionType.

    Parameters:

    • sectionType: The type of section to look for (based on the dictionary).

    Throws: An error if no dictionary is provided.

RepoReadmeProcessor

The RepoReadmeProcessor extends MarkdownParser and provides additional methods specifically for parsing repository README files. It can extract key sections like installation instructions, usage examples, and more.

Public Methods

  • installationInstructions

    Type: Section[]

    Description: Extracts all sections related to installation instructions.

  • usageExamples

    Type: Section[]

    Description: Extracts all sections related to usage examples, including subheadings.

  • api

    Type: Section[]

    Description: Extracts all sections related to API documentation.

  • dependencies

    Type: Section[]

    Description: Extracts all sections related to dependencies.

  • contributionGuidelines

    Type: Section[]

    Description: Extracts all sections related to contribution guidelines.

  • licenseInfo

    Type: Section[]

    Description: Extracts all sections related to the license information.

  • titleAndDescription

    Type: Section

    Description: Extracts the project title (first h1) and a brief description (text under the title) from the README file.

ChangelogProcessor

The ChangelogProcessor extends MarkdownParser to handle changelog-specific sections like added features, changes, and fixes between version ranges.

Public Methods

  • unreleasedChanges

    Type: Section[]

    Description: Extracts the sections related to unreleased changes from the changelog.

  • addedFeatures

    Type: Section[]

    Description: Extracts sections related to added features.

  • changedFeatures

    Type: Section[]

    Description: Extracts sections related to changed features.

  • getUpdatesBetweenVersions(startVersion: string, endVersion: string)

    Type: Section[]

    Description: Extracts all changelog sections between two specified versions.

    Parameters:

    • startVersion: The version to start collecting updates from.
    • endVersion: The version to stop collecting updates at.

Headings Parsing Regex

The following regex is used to capture different heading styles from markdown: /^(#+)\s*(.*)|^(.+)\n(=+|-+)\s*$/gm

Capture groups explained:

  • Group 0: (.*)\n(=+|-+)
    • match[0]: The full match for the underline-style heading (e.g., Heading\n======).
  • Group 1: (#+)\s*(.*)
    • match[1]: The full match for the # heading line (e.g., ## Heading or #Heading).
  • Group 2: (#+)
    • match[2]: The # symbols, which determine the heading level (e.g., # is level 1, ## is level 2).
  • Group 3: (.*)
    • match[3]: The heading text following the # symbols (e.g., Heading).
  • Group 4: (.+)\n(=+|-+)
    • match[4]: The full match for the underline-style heading (e.g., Heading\n======).
  • Group 5: (.+)
    • match[5]: The heading text in underline-style headings (e.g., Heading).
  • Group 6: (=+|-+)
    • match[6]: The underline, which defines whether it is an h1 (=) or h2 (-) heading.