npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

mdx2xliff

v3.2.4

Published

A utility for generating a xliff and a skeleton file from an mdx file and back.

Downloads

76

Readme

mdx2xliff

A utility for generating a xliff and a skeleton file from a mdx file and back.

Introduction

When translating a mdx document with an automatic tool, such as Google Translate or DeepL there is a significant possibility that it will break some of the syntax. It is likely that you have encountered instances where after translation some links look like this, where a space is inserted in the middle of it:

[Link text] (example.com)

Or arguably worse, since it breaks mdx compilation, alterations to html tags:

<Tabs>
  <TabItem>
    Somehow after translation both TabItem tags are opening! 
  <TabItem>
</Tabs>

The solution this package proposes is to separate text from the markup and translate only the text.

This is done using two file formats: xliff and skl. The former is just an xml with all the text content, and the latter is essentially an mdx file with all the text replaced by placeholders.

We translate only the xliff and then combine the result of the translation with the existing skeleton.

For example, a file like this:

# My file

With a paragraph, that contains a [link](https://example.com/)

Will be split into a XLIFF file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xliff xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:oasis:names:tc:xliff:document:1.2" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 http://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd" version="1.2">
    <file original="namespace" datatype="plaintext" source-language="ru" target-language="en-US">
        <body>
            <trans-unit id="0">
                <source>My file</source>
                <target></target>
            </trans-unit>
            <trans-unit id="1">
                <source>With a paragraph, that contains a</source>
                <target></target>
            </trans-unit>
            <trans-unit id="2">
                <source>link</source>
                <target></target>
            </trans-unit>
        </body>
    </file>
</xliff>

And a SKL file:

# %%%0%%%

%%%1%%%[%%%2%%%](https://example.com/)

API

This package provides two named exports: extract and reconstruct.

extract(options)

Generates a skeleton file and a xliff file from a given mdx.

Parameters

{
  fileContents: string
  beforeDefaultRemarkPlugins?: Plugin[]
  skipNodes?: string[]
  sourceLanguage?: string
  targetLanguage?: string
  xliffVersion?: "1.2" | "2.0"
}
Default values
{
  beforeDefaultRemarkPlugins: []
  skipNodes: ["code", "inlineCode", "mdxjsEsm", "mdxFlowExpression", "mdxTextExpression"]
  sourceLanguage: "ru"
  targetLanguage: "en"
  xliffVersion: "2.0"
}

Returns

Promise<{
  skeleton: string
  xliff: string
}>

reconstruct(options)

Takes two files: skl and xliff, and replaces the placeholders in the skeleton file with the translations from the xliff.

If a translation is missing it throws an error by default. This can be changed by setting ignoreUntranslated. Then any missing translation will be replaced with the source string.

Parameters

{
  skeleton: string
  xliff: string
  ignoreUntranslated?: boolean
  xliffVersion?: "1.2" | "2.0"
}
Default values
{
  ignoreUntranslated: false
  xliffVersion: "2.0"
}

Returns

string

Example usage

import { readFileSync, writeFileSync } from 'fs'
import { extract } from 'mdx2xliff'
import headingToHtml from 'mdx2xliff/remarkPlugins/headingToHtml'

;(async () => {
  const fileContents = readFileSync('test.mdx', 'utf8')
  const { skeleton, xliff } = await extract({
    fileContents,
    sourceLanguage: 'en',
    targetLanguage: 'fr',
    beforeDefaultRemarkPlugins: [headingToHtml]
  })

  writeFileSync('test.skl', skeleton)
  writeFileSync('test.xliff', xliff)
})()

Weak spots of this approach

Loss of context

Whatever app is responsible for translation will have to deal with very short chunks of text. In a lot of cases they will be one or two words, this leads to suboptimal machine translation quality.

MDX headings with IDs

Headings like this:

## Some heading {#some-id}

are not part of any markdown spec and their MDX AST representation is the same as for a normal Markdown heading. This leads to that a machine translation can mess up and change the ID or malform the curly brace part so that the MDX will not even compile.

This can be worked around by using a built-in remark plugin mdx2xliff/remarkPlugins/headingToHtml. It replaces all Markdown headings with HTML headings, preserving the IDs.

Frontmatter

Similar to the previous issue, frontmatter is easily malformed by machine translation. mdx2xliff does not yet provide a way of dealing with this.

Similar projects

md2xliff

Pretty old, last commit was in 2022. Uses unified version 6. Focuses on plain Markdown.

@diplodoc/markdown-translation

Actively maintained and developed. Focuses on YFM. No way to add support for MDX. Despite being new, uses xliff version 1.2.