@openiti/markdown-parser

v1.2.2

Published

2 months ago

A library for parsing OpenITI's mARkdown syntax

Downloads

0High
0Medium
0Low

ahmedriad1

OpenITI mARkdown Parser

A library for parsing OpenITI special mARkdown syntax into friendly JSON format.

Features

Parses OpenITI mARkdown headers, paragraphs, verses, biographies, historical events, and more into JSON.
Extracts metadata and structural elements preserving their context and hierarchy.
Supports parsing of complex morphological patterns and riwāyāt units.
Handles pagination and block quotes within the text.

Installation

using npm:

npm install @openiti/markdown-parser

using yarn:

yarn add @openiti/markdown-parser

Usage

To use mARkdown-parser, import the parseMarkdown function from the package and pass your OpenITI mARkdown text to it. The function will return a JSON object containing the parsed content.

import { parseMarkdown } from '@openiti/markdown-parser';

const mARkdown = `
// ...
`;

const parsed = parseMarkdown(mARkdown);
console.log(parsed);

Sample Output

The following is an example output of the parser, showing how it structures different elements of the OpenITI mARkdown:

[
  {
    "type": "title",
    "content": "رسالة في التوبة"
  },
  {
    "type": "pageNumber",
    "content": {
      "volume": "01",
      "page": "218"
    }
  },
  {
    "type": "paragraph",
    "content": "فصل"
  },
  {
    "type": "paragraph",
    "content": "قال الإمام العلامة شيخ الإسلام تقي الدين أبو العباس أحمد بن عبدالحليم ابن تيمية رحمه الله"
  }
  ...
]

API Reference

parseMarkdown(markdownText: string): ParseResult

Parses a string of OpenITI mARkdown into a structured JSON format.

Parameters

markdownText (string) - The OpenITI mARkdown text to be parsed.

Returns

ParseResult (Object) - A JSON object representing the parsed content. The ParseResult object includes metadata and content properties.

Types

Block

Represents the smallest unit of content, such as a title, header, paragraph, blockquote, etc.

ParseResult

An object containing metadata and content. metadata is an object of key-value pairs extracted from the mARkdown, while content is an array of Block objects representing the structured content of the document.

Blocks

The library defines several blocks to structure the parsed content. Here's a detailed look at the Block types:

| Type | Description | |-----------------|---------------------------------------------------------------------------------------------------| | title | Represents a title within the text. | | header-1 | Denotes a level 1 header, the highest level, typically used for major sections. | | header-2 | Denotes a level 2 header, used for subsections under a header-1. | | header-3 | Denotes a level 3 header, used for sub-subsections under a header-2. | | header-4 | Denotes a level 4 header, indicating further subdivision under a header-3. | | header-5 | The lowest level header, indicating the most granular sectioning under a header-4. | | paragraph | Represents a paragraph of text. | | blockquote | Indicates a block of text that is quoted from another source. | | category | A categorization label, used for organizing content into categories. | | verse | Represents a verse, typically in poetry or Quranic verses. Each array item is a hemistich. | | pageNumber | Denotes the page number. The content includes an object with volume and page strings. | | year_of_birth | Indicates the year of birth of a person, in Hijri. | | year_of_death | Indicates the year of death of a person, in Hijri. | | year | General purpose year, used in various contexts, in Hijri. | | age | Represents the age of a person, in Hijri years. |

Contributing

Contributions are welcome! Please submit pull requests or open issues on the GitHub repository.

License

This project is licensed under the MIT License.

Acknowledgments

This library is built to support the work done by the OpenITI team and the larger community working on Arabic and Islamicate texts. For more information on OpenITI mARkdown conventions, visit Maxim Romanov's mARkdown guide.