@wearemothership/dicom-character-set-ii
v1.0.4-opt.1
Published
Converts bytes of encoded DICOM text to Javascript DOMString
Downloads
3
Readme
dicom-character-set
Converts DICOM text (as bytes) to a JavaScript string. Handles multiple character sets (single-byte and multi-byte, with and without extensions) within a single block of text according to the DICOM standard. All encodings specified in the standard are currently supported. For a complete list of all encodings, see here.
Install
Install via NPM:
npm install dicom-character-set
Or get a packaged source file:
Usage
Firefox/Chrome/Safari/Opera:
import { convertBytes } from 'dicom-character-set';
const str = convertBytes('ISO 2022 IR 149\\ISO 2022 IR 13', uint8ArrayBytes, {vr: 'LT'});
Backward compatibility (browsers that don't support TextDecoder, e.g. Internet Explorer and Edge) :
import { convertBytesPromise } from 'dicom-character-set';
convertBytesPromise('ISO 2022 IR 6\\ISO 2022 IR 13', uint8ArrayBytes, {vr: 'LT'}).then(str => {
console.log(str);
});
Note: Make sure you're passing the text as a Uint8Array, not as a string. Also, only pass the bytes of the value you want converted, not the bytes for the entire DICOM file.
Arguments
Both convertBytes and convertBytesPromise take the same arguments. They are, in order:
- Specific Character Set attribute value (0008,0005) from the DICOM file (either
string
orundefined
) - Text bytes as Uint8Array
- Options object (optional). Supported options are:
- vr (string) : the value representation of the text being converted. Gives the decoder a hint for properly handling delimiters. If not specified, the decoder assumes backslash, carriage return, line feed, form feed, and tab all reset the active character set to the first one specified (see the standard for details).
Node.js
This library can be used as-is on Node.js versions 11 and up. You may need to use an ICU-enabled build of Node.js or else add the full-icu package to your app.
Differences from DICOM standard
In the name of robustness, the behavior varies from the standard DICOM in the following ways:
- If one of the multi-byte character sets not supporting extensions (e.g. GBK) appears first, all following character sets will be ignored; if it appears after any other character set, it will be ignored.
- If multiple character sets are specified, the non-extension character sets are switched to their extension equivalents where applicable (i.e. "ISO_IR 100\ISO_IR 101" would become "ISO 2022 IR 100\ISO 2022 IR 101")
- Control characters (in the CL and CR planes) are allowed, though they probably won't print much
- A multi-byte character set supporting code extensions can be the first character set
- If the same character set appears multiple times, ignore any duplicate occurrences
- If a character is encountered in a code element that hasn't been assigned, it's printed using the currently active code element instead of throwing an error