@stenway/sml
v1.1.0
Published
Serialize and deserialze SML documents
Downloads
14
Maintainers
Readme
SML
About SML
The Simple Markup Language is an easy and fast to type markup language. It only uses a minimal set of special characters and therefor feels very natural. It is line-based, and if you are a touch typist you will love it.
SML was specifically designed to be understandable for even non-computer experts. It is human-friendly, while also being machine-friendly. It has multi-language support and offers a 100% reliable encoding and decoding.
SML is a lightweight markup language but still powerful and flexible. It is meant to be an alternative for XML, JSON and YAML.
It is especially suited for hierarchical data, but can also nest tabular data with ease. Through its support for comments and whitespace-preserving loading and saving techniques, it is the number one choice for configuration files. But it's not limited to that.
Here is a simple example of an SML document, a simple message data structure:
Message
From Andrew
To James
Timestamp 2021-01-02 15:43
Text "I'll be there at 5pm"
End
Learn more about SML on the official website www.simpleml.com where you can find the complete specification and can try out SML in an online editor.
SML builds the foundation for text file formats like S3D and TBL (see also the Stenway Text File Format Stack). All of these formats don't need to bother about encoding and decoding anymore, because they rely on ReliableTXT, which takes care of that aspect (see also the NPM packages reliabletxt, reliabletxt-io, and reliabletxt-browser). SML is also built upon WSV (see also the NPM packages for wsv, wsv-io, and wsv-browser).
Find out what can be done with WSV on the official YouTube channel from Stenway. Here is a selection of videos you might start with:
- SML Explained - The Simple Markup Language
- SML in 60 seconds
- Stop Using CSV !
- A Better Alternative to XML
- Writing SML on a #typewriter
- Static3D File Format - Sneak Peek (S3D)
- PoseML - Describe your dance moves (The Pose Markup Language)
- UI Development like in the 90's - With TypeScript and SML
About this package
This package provides functionality to handle the parsing and serialization of SML documents. This package works both in the browser and Node.js, because it does not require environment specific functionality. If you want to read and write SML files using Node.js's file system module, you can use the sml-io package. The sml-browser package on the other hand offers functionality to easily provide SML documents as downloadable files.
If you want to get a first impression on how to use this package, you can watch this video. But always check the changelog of the presented packages for possible changes, that are not reflected in the video.
Getting started
First get the SML package installed with a package manager of your choice. If you are using NPM just run the following command:
npm install @stenway/sml
As a start, we are going to recreate the example from the SML in 60 seconds video, which describes a touristic point of interest and is a good example to show the basic functionality of the package. First we create a new SmlElement object. We pass the name of the element and as a second step, convert the element to a string by using the toString method.
import { SmlElement } from "@stenway/sml"
const rootElement = new SmlElement("PointOfInterest")
const rootElementStr = rootElement.toString()
The resulting string is the name of the element followed by the end keyword on the next line.
PointOfInterest
End
We now want to add our first attribute to the element and thus create a new SmlAttribute object. We pass the name of the attribute and an array of string values containing a single string value.
const attribute = new SmlAttribute("City", ["Seattle"])
const attributeStr = attribute.toString()
The string we get by calling the toString method looks like this. The name and the single value are simply separated by a space character.
City Seattle
Let's add the attribute to our element and see how the serialized element looks like.
rootElement.addNode(attribute)
let result = rootElement.toString()
We can see that the element now encloses the attribute line, which is indented:
PointOfInterest
City Seattle
End
To add an attribute to an element, we can also use the comfort method addAttribute.
rootElement.addAttribute("Name", ["Space Needle"])
result = rootElement.toString()
Here our value itself contains a space character, and thus it needs to be enclosed in double quotes in our SML string.
PointOfInterest
City Seattle
Name "Space Needle"
End
We already saw, that an attribute takes an array as second parameter and we now want to pass two values as arguments. For that we add another attribute called GpsCoords and pass a value for the latitude and another value for the longitude.
rootElement.addAttribute("GpsCoords", ["47.6205", "-122.3493"])
result = rootElement.toString()
As you can see this simply results in line with the second value also being separated from the first value by a space character:
PointOfInterest
City Seattle
Name "Space Needle"
GpsCoords 47.6205 -122.3493
End
By default, the indentation character is a tab character. If we want to change that, we can create an SmlDocument object by passing our element and then change the used indentation string by setting the defaultIndentation property.
const document = new SmlDocument(rootElement)
document.defaultIndentation = " "
result = document.toString()
Here we change it to use two space characters, instead of the default tab character:
PointOfInterest
City Seattle
Name "Space Needle"
GpsCoords 47.6205 -122.3493
End
If we want to make the generated SML string visually more pleasing, we can use the alignAttributes method, to align the first values of the attributes nicely in the same row.
rootElement.alignAttributes(" ", 1)
result = document.toString()
This gives us:
PointOfInterest
City Seattle
Name "Space Needle"
GpsCoords 47.6205 -122.3493
End
To add a child element to an element, we use the addElement method and supply the name of the child element as argument. Here we describe the opening hours by adding attributes for every day, which we align as well as a last step.
const hoursElement = rootElement.addElement("OpeningHours")
hoursElement.addAttribute("Sunday", ["9am", "10pm"])
hoursElement.addAttribute("Monday", ["9am", "10pm"])
hoursElement.addAttribute("Tuesday", ["9am", "10pm"])
hoursElement.addAttribute("Wednesday", ["9am", "10pm"])
hoursElement.addAttribute("Thursday", ["9am", "10pm"])
hoursElement.addAttribute("Friday", ["9am", "11pm"])
hoursElement.addAttribute("Saturday", ["9am", "11pm"])
hoursElement.alignAttributes(" ", 1)
result = document.toString()
This gives us:
PointOfInterest
City Seattle
Name "Space Needle"
GpsCoords 47.6205 -122.3493
OpeningHours
Sunday 9am 10pm
Monday 9am 10pm
Tuesday 9am 10pm
Wednesday 9am 10pm
Thursday 9am 10pm
Friday 9am 11pm
Saturday 9am 11pm
End
End
To show the comment functionality, we remove the OpeningHours element from our root element and add an empty node with the addEmptyNode method. An empty node is essentially an empty line, when it's been serialized as an SML string. By providing a comment string to the comment property, we can create a line with just a comment.
rootElement.nodes.pop()
const commentLine = rootElement.addEmptyNode()
commentLine.comment = " Opening hours should go here"
result = document.toString()
This gives us:
PointOfInterest
City Seattle
Name "Space Needle"
GpsCoords 47.6205 -122.3493
# Opening hours should go here
End
Parsing and modifications
Now that we've recreated the example from the SML in 60 seconds video, we will try to parse that SML string with the static method parse of the SmlDocument class and will convert the parsed document back as string again, so that you can see, that all the indentation and alignment space characters will be preserved.
import { SmlDocument } from "@stenway/sml"
const content = `PointOfInterest
City Seattle
Name "Space Needle"
GpsCoords 47.6205 -122.3493
OpeningHours
Sunday 9am 10pm
Monday 9am 10pm
Tuesday 9am 10pm
Wednesday 9am 10pm
Thursday 9am 10pm
Friday 9am 11pm
Saturday 9am 11pm
End
End`
const document = SmlDocument.parse(content)
let result = document.toString()
We can access the root element of the document with the root property and it's name with the name property. Attributes of an element can be easily accessed with the attribute method, which takes the name of the attribute as argument. Here we get our three attributes and convert the first two arguments as single string values, and our third attribute as an array of floating point numbers.
const rootElement = document.root
const rootElementName = rootElement.name
const city = rootElement.attribute("City").asString()
const name = rootElement.attribute("Name").asString()
const gpsCoords = rootElement.attribute("GpsCoords").asFloatArray()
Values can be changed, by simply setting the values property. Here we change the GpsCoords. This also shows an important feature of SML, which is that names of attributes and elements are case insensitive, so we can write the GpsCoords attribute name in upper case, completely in lower case, or in any other variation, without getting an error.
rootElement.attribute("GPSCOORDS").values = ["12.345", "67.890"]
result = document.toString()
And here we see, that only the values were changed and the formatting remained:
PointOfInterest
City Seattle
Name "Space Needle"
GpsCoords 12.345 67.890
OpeningHours
Sunday 9am 10pm
Monday 9am 10pm
Tuesday 9am 10pm
Wednesday 9am 10pm
Thursday 9am 10pm
Friday 9am 11pm
Saturday 9am 11pm
End
End
We can also append a comment to an attribute line, and adjust the whitespace strings so that our comment will be preceeded by two space characters.
const nameAttribute = rootElement.attribute("name")
nameAttribute.comment = " Important comment"
nameAttribute.whitespaces = [" ", " ", " "]
result = document.toString()
This gives us:
PointOfInterest
City Seattle
Name "Space Needle" # Important comment
GpsCoords 12.345 67.890
OpeningHours
Sunday 9am 10pm
Monday 9am 10pm
Tuesday 9am 10pm
Wednesday 9am 10pm
Thursday 9am 10pm
Friday 9am 11pm
Saturday 9am 11pm
End
End
Let's do another parsing test with an SML string that contains a comment line. As you will see, the comment is also preserved.
const contentWithComment = `PointOfInterest
City Seattle
Name "Space Needle"
GpsCoords 47.6205 -122.3493
# Opening hours should go here
End`
const documentWithComment = SmlDocument.parse(contentWithComment)
result = documentWithComment.toString()
In case we don't need the formating and comments to be preserved, we can provide a second argument to the parse method and will see that the comment will be gone, as well as the indentation of two characters and the alignment spaces.
const documentWithoutWsAndComment = SmlDocument.parse(contentWithComment, false)
result = documentWithoutWsAndComment.toString()
This will give us:
PointOfInterest
City Seattle
Name "Space Needle"
GpsCoords 47.6205 -122.3493
End
Minification
SML is also very well suited to be minified. For that we can use the toMinifiedString method, which strips away all unneccessary whitespace characters like the indentation, all comments and additionally reduces the end keyword to a null string, which will be represented with a single minus character.
result = document.toMinifiedString()
This will give us:
PointOfInterest
City Seattle
Name "Space Needle"
GpsCoords 12.345 67.890
OpeningHours
Sunday 9am 10pm
Monday 9am 10pm
Tuesday 9am 10pm
Wednesday 9am 10pm
Thursday 9am 10pm
Friday 9am 11pm
Saturday 9am 11pm
-
-
If you want to know more about the minification, you can also watch this video.
Relationship to WSV
If you look at the minified version of the document and you've maybe already seen WSV, you might already have noticed, what the relationship between SML and WSV is. An SML document is a valid WSV document, which creates a hierarchy by counting the number of values per line. If the line has more than one value it's an attribute. If the line has no value, than it's an empty node, and if the line has exactly one value, it's either the starting line of an element or the closing line, if it matches the end keyword. And the end keyword is simply derived by looking at the last non empty line. And that's the beautiful concept of SML.
In order to demonstrate that relation, we will parse the SML string with the static parse method of the WsvDocument class from the WSV package, which will return a WsvDocument object. We then access the parsed Wsv lines and change the first value of line 6 to the capitalized word Sunday.
import { WsvDocument } from "@stenway/wsv"
const content = `PointOfInterest
City Seattle
Name "Space Needle"
GpsCoords 47.6205 -122.3493
OpeningHours
Sunday 9am 10pm
Monday 9am 10pm
Tuesday 9am 10pm
Wednesday 9am 10pm
Thursday 9am 10pm
Friday 9am 11pm
Saturday 9am 11pm
End
End`
const wsvDocument = WsvDocument.parse(content)
wsvDocument.lines[5].values[0] = "SUNDAY"
let result = wsvDocument.toString()
When we convert it back to a string, we can see that the first attribute of the opening hours element has changed it's name:
PointOfInterest
City Seattle
Name "Space Needle"
GpsCoords 47.6205 -122.3493
OpeningHours
SUNDAY 9am 10pm
Monday 9am 10pm
Tuesday 9am 10pm
Wednesday 9am 10pm
Thursday 9am 10pm
Friday 9am 11pm
Saturday 9am 11pm
End
End
We can also parse the SML document string without preserving whitespaces and comments.
const wsvDocumentWithoutWS = WsvDocument.parse(content, false)
result = wsvDocumentWithoutWS.toString()
And we will get this result:
PointOfInterest
City Seattle
Name "Space Needle"
GpsCoords 47.6205 -122.3493
OpeningHours
Sunday 9am 10pm
Monday 9am 10pm
Tuesday 9am 10pm
Wednesday 9am 10pm
Thursday 9am 10pm
Friday 9am 11pm
Saturday 9am 11pm
End
End
We could even parse the SML document string as a jagged array, which also demonstrates the relationship and the concept of SML quite nicely.
const jaggedArray = WsvDocument.parseAsJaggedArray(content)
And last but not least, the ultimate test. We parse the SML string as a WsvDocument, convert it to a string with the toString method and parse that result as an SmlDocument.
const wsvSmlDocument = SmlDocument.parse(
WsvDocument.parse(content).toString()
)
result = wsvSmlDocument.toString()
And it works as expected.
Parser errors
When we parse an SML document string, we always should consider that it might contain syntactical errors. In this example the end keyword is missing and the parse method will throw an SmlParserError that also tells us in which line the error occured:
const invalidContent = `PointOfInterest
City Seattle
Name "Space Needle"
GpsCoords 47.6205 -122.3493`
try {
const document = SmlDocument.parse(invalidContent)
} catch (error) {
const smlError = error as SmlParserError
console.log(`Parser error: ${smlError.message}`)
}
It's not the only type of error that could occur, especially because SML is based on WSV, we could also get a WsvParserError, like in this example where a value is missing a closing double quote:
SmlDocument.parse(`Root\nAttribute "My Value\nEnd`)
Another type of error is the NoReliableTxtPreambleError. Because WSV is based on ReliableTXT and thus SML is as well, we can get a NoReliableTxtPreambleError when a byte sequence is decoded and does not have a valid ReliableTXT preamble, like in the following example:
SmlDocument.fromBytes(new Uint8Array([0x31, 0x32, 0x33]))
Validation
We will now look at some built-in validation methods, which will help you, when you want to load an SML document in a reliable way, without using a schema. For this example we will use a simple file list, which has a root element called files, and uses attributes with the name file to represent the items of the file list.
const content = `#=============================
# My file list
# Copyright Steve Wilson 2022
#=============================
Files
File Readme.txt
File c:\\Directory\\File.txt
File "d:\\My directory\\Test.sml"
End`
We first check the root element's name, with the assureName method. We also make sure, that the root element has only attributes and not elements as child nodes, with the assureNoElements method. In order to restrict the child attributes to only attributes with the name file, we can use the assureAttributeNames method.
try {
const document = SmlDocument.parse(content)
console.log(document.toString())
const root = document.root
root.assureName("Files")
root.assureNoElements()
root.assureAttributeNames(["File"])
let filePaths: string[] = root.attributes().map(x => x.asString())
console.log(filePaths)
} catch (error) {
console.log(`Error: ${(error as Error).message}`)
}
To convert our SML file list to a string array, we use the attributes method which returns all child attributes of the specified element, in our case the root element. And we take this array of attributes and map each attribute to a string, by using the asString method. The asString method also makes sure, that the attribute only has one value and not multiple, and checks that the value is not null.
And that's all we need to get our file list.
As a side note, you can add comments before or after a root element, like the copyright information in our example.
We can now test our validation methods. We could for example change the root element name, or the first attribute's name, or we could add an element to the root element. All this would produce an error message. We can also test what happens, when we supply a null value as file path. That case is also handled. Also adding another value to the attribute, would not be allowed. This already gives us a nice set of methods, to load our SML document in a robust way.
Values with units
Of course there are many more of these helper and validation methods and we will now have a further look at some of them.
Here is an example where we want to specify a value that can have a unit. With SML that's pretty easy to do:
enum Unit {
Meter,
Centimeter,
Millimeter
}
const content = `Test
Length 10
End`
const document = SmlDocument.parse(content)
const root = document.root
try {
const lengthAttribute = root.attribute("Length")
lengthAttribute.assureValueCountMinMax(1, 2)
const lengthValue = lengthAttribute.getFloat(0)
const lengthUnit: Unit | null = null
if (lengthAttribute.valueCount === 2) {
lengthUnit = lengthAttribute.getEnum(["m", "cm", "mm"], 1)
}
console.log(`Value: ${lengthValue} Unit: ${lengthUnit}`)
} catch (error) {
if (error instanceof Error) { console.log(error.message) }
else { console.log(error) }
}
We use the assureValueCountMinMax method, to specify the minimum value count and the maximum value count. For an attribute with optional unit value, that's one and two. The getFloat method converts a value at the specified index of the attribute's value array into a number and also assures that the format of the value is correct. We then check, if a second value is available and convert it with the getEnum method to a unit. We pass an array of possible enum string values and the method will compare the value with them, returning actually an index number into the array where the method found a match. When we specify a not expected unit, we will get an error message.
Null values and required or optional nodes
In some situations we might want a null value as possible value or want to specify whether an attribute is optional or required. To specify that an attribute must exist and only has an occurrance of one, we can use the requiredAttribute method, which will throw an error, if the conditions are not met. Here the name of the player is a must, which we can express like this.
const content = `Player
Name SuperGamer123
FavoriteFood Cheeseburger
End`
try {
const document = SmlDocument.parse(content)
const root = document.root
const playerName = root.requiredAttribute("Name").asString()
console.log(`Player name: ${playerName}`)
const favoriteFoodAttribute = root.optionalAttribute("FavoriteFood")
if (favoriteFoodAttribute !== null) {
const favoriteFood = favoriteFoodAttribute.asNullableString()
console.log(`Favorite food: ${favoriteFood ?? `Eats everything`}`)
}
if (!root.hasAttribute("FavoriteFood")) {
console.log(`Favorite food was not yet specified.`)
}
} catch (error) {
if (error instanceof Error) { console.log(error.message) }
else { console.log(error) }
}
The FavoriteFood attribute does not need to be specified, so we can use the optionalAttribute method which will return either the attribute or null. If the attribute was specified, we can get its value. We want to allow null as a value, so we use the asNullableString method, which will either return a string or null.
With the hasAttribute method, we can check if the element has at least one attribute with the specified name.
We can play around with the SML document string and see the validation in action. For example we can comment out the required name attribute or could duplicate the name attribute. Both cases would produce errors that already tell us in a nice way, what the problem is.
If we would change the FavoriteFood attribute and would provide a null value, we would see the default string. When we comment out the optional attribute, we would get the message printed, for the case that no attribute with the name FavoriteFood was found.
Multi-line values
One really beautiful aspect about SML is how multi-line strings are handled. Here we create an SmlElement and add two attributes that both have multi-line strings as values:
const rootElement = new SmlElement("RootElement")
rootElement.addAttribute("Attribute1", ["Line1\nLine2\nLine3"])
rootElement.addAttribute("Attribute2", ["Line1\nLine2"])
const result = rootElement.toString()
And here is how the serialized SML element looks like:
RootElement
Attribute1 "Line1"/"Line2"/"Line3"
Attribute2 "Line1"/"Line2"
End
All line-feed characters are replaced with the special WSV line break syntax "/"
and therefor
an attribute with multi-line string values will always remain
on one line. The document structure or let's call it outline
therefor always remains nicely visible.
End keyword
In our last example we will change the end keyword, which we can individually choose if we want it to be different than the default English end keyword. This is helpful, when your requirement is to create a completely localized SML document like here in this example, where we create a completely Japanese SML document:
const japaneseElement = new SmlElement("契約")
const japaneseSubElement = japaneseElement.addElement("個人情報")
japaneseSubElement.addAttribute("名字", ["田中"])
japaneseSubElement.addAttribute("名前", ["蓮"])
japaneseElement.addAttribute("日付", ["2021-01-02"])
const japaneseDocument = new SmlDocument(japaneseElement, "エンド")
japaneseDocument.defaultIndentation = "\u3000"
const result = japaneseDocument.toString()
We also change our default indentation to a special space character, called ideographic space, which aligns Japanese, Chinese, Korean and other characters nicely:
契約
個人情報
名字 田中
名前 蓮
エンド
日付 2021-01-02
エンド
Fun fact: The following code would create a valid SML document:
import { SmlElement } from "@stenway/sml"
console.log(
new SmlElement("The").toString()
)
As you can see in the following video.
Encoding and decoding
The SmlDocument class offers the method toBytes and the static method fromBytes to directly serialize the document to a byte array and deserialize it again. This bytes would be the bytes of the text file written and because SML is based on ReliableTXT would be prefixed with a ReliableTXT preamble. Here is an example of the methods in use:
const document = SmlDocument.parse(`Root\nEnd`)
const bytes = document.toBytes()
const fromBytesDocument = SmlDocument.fromBytes(bytes)
The default encoding is UTF-8. If you want to specify another ReliableTXT encoding, import the ReliableTxtEncoding enum from the ReliableTXT package and change the encoding property of the SML document:
document.encoding = ReliableTxtEncoding.Utf16
Calling now the fromBytes method, would return a byte sequence using the UTF-16 encoding.
The toBytes method can be used, when you want to transfer SML documents via HTTP. For that see the related video Transferring ReliableTXT Documents Using HTTP.
BinarySML
BinarySML is the binary representation of SML documents. It starts with the magic code 'BS1'. BinarySML is made for scenarios, where parsing speed of the textual representation might be a limitation. Like BinaryWSV it uses invalid UTF-8 codepoints to separate elements, attributes, and values from each other, and to signal null values. The special bytes are:
11111111 = Element Start Byte
11111110 = Element End Byte
11111101 = Attribute End Byte
11111100 = Value Separator Byte
11111011 = Null Value Byte
It always produces smaller document size than the textual representation. It is also well suited for streaming.
The SmlDocument class offers the method toBinarySml and the static method fromBinarySml to comfortable get the byte sequences and decode those again as SmlDocument objects:
const document = SmlDocument.parse(`Root\nEnd`)
const bytes = document.toBinarySml()
const decodedDocument = SmlDocument.fromBinarySml(bytes)