mkql
v1.0.9
Published
Query language for markdown documents
Downloads
3,695
Readme
Query Language
Query a document tree with selectors
Extracts nodes using a selector syntax that is a subset of the CSS selectors specification.
Install
npm i mkql --save
For the command line interface install mkdoc globally (npm i -g mkdoc
).
Usage
Pass selectors when creating the stream:
var ql = require('mkql')
, ast = require('mkast');
ast.src('Paragraph\n\n* 1\n* 2\n* 3\n\n```javascript\nvar foo;\n```')
.pipe(ql('p, ul, pre[info^=javascript]'))
.pipe(ast.stringify({indent: 2}))
.pipe(process.stdout);
Example
mkcat README.md | mkql 'p, ul, pre[info^=javascript]' | mkout
printf 'Para 1\n\nPara 2\n\n* List item\n\n' | mkcat | mkql '*' | mkout -y
Selectors
Implemented selectors work like their CSS counterparts and in some cases extensions have been added specific to markdown tree nodes.
Type Selectors
Types are based on the equivalent HTML element name, so to select a node of paragraph
type use p
; the universal selector *
will select nodes of any type.
The map of standard HTML tag names to node types is:
p
: paragraphul
: listol
: listli
: itemh1-h6
: headingpre
: code_blockblockquote
: block_quotehr
: thematic_breakcode
: codeem
: emphstrong
: stronga
: linkbr
: linebreakimg
: image
Extensions for markdown specific types:
nl
: softbreaktext
: texthtml
: html_blockinline
: html_inline
Descendant Combinator
Use whitespace for a descendant combinator or if you prefer use the explicit >>
notation from CSS4:
ol li
ol >> li
Child Combinator
A selector such as ol li
will find all descendants use the child combinator operator when you just want direct children:
ol > li
Adjacent Sibling Combinator
The adjacent sibling combinator is supported; select all lists that are directly preceeded by a paragraph:
p + ul
Following Sibling Combinator
The following sibling combinator is supported; select code that is preceeded by a text node:
p text ~ code
Attribute Selectors
You can match on attributes in the same way as usual but attributes are matched against tree nodes not HTML elements so the attribute names are often different.
a[href^=http://domain.com]
See attribute selectors (@mdn) for more information on the available operators.
The operator =~
(not to be confused with ~=
) is a non-standard operator that may be used to match by regular expression pattern:
img[src=~\.(png|jpg)$]
Literal Attribute
For all nodes that have a literal
property you may match on the attribute.
p text[literal~=example]
Nodes that have a literal
property include:
pre
: code_blockcode
: codetext
: texthtml
: html_blockinline
: html_inline
Content Attribute
The content
attribute is available for containers that can contain text
nodes. This is a more powerful (but slower) method to match on the text content.
Consider the document:
Paragraph with some *emphasis* and *italic*.
If we select on the literal
attribute we would get a text
node, for example:
p [literal^=emph]
Results in the child text
node with a literal value of emphasis
. Often we may wish to match the parent element instead to do so use the content
attribute:
p [content^=emph]
Which returns the emph
node containing the text
node matched with the previous literal
query.
The value for the content
attribute is all the child text nodes concatenated together which is why it will always be less performant than matching on the literal
.
Anchor Attributes
Links support the href
and title
attributes.
a[href^=http://]
a[title^=Example]
Image Attributes
Images support the src
and title
attributes.
img[src$=.jpg]
img[title^=Example]
Code Block Attributes
Code blocks support the info
and fenced
attributes.
pre[info^=javascript]
pre[fenced]
List Attributes
The list
and item
types (ul
, ol
and li
) support the bullet
and delimiter
attributes.
So you can select elements depending upon the bullet character used (unordered lists) or the delimiter (ordered lists). For the bullet
attribute valid values are +
, *
and -
; for the delimiter
attribute valid values are .
or )
.
This selector will match lists declared using the *
character:
ul[bullet=*]
Or for all ordered lists declared using the 1)
style:
ol[delimiter=)]
Use a child selector to get list items:
ul li[bullet=+]
Pseudo Classes
The pseudo classes :first-child
, :last-child
, :only-child
and :nth-child
are supported.
p a:first-child
p a:last-child
ul li:nth-child(5)
ul li:nth-child(2n+1)
ul li:nth-child(odd) /* same as above */
ul li:nth-child(2n)
ul li:nth-child(even) /* same as above */
ul li:only-child
See the :nth-child docs (@mdn) for more information.
Relational
The relational pseudo-class :has
is useful for selecting parents based on a condition:
p:has(em)
a:has(> img)
Negation
The negation pseudo-class :not
is also available:
p:not(:first-child)
Empty
Use the :empty
pseudo-class to select nodes with no children:
p :empty
Pseudo Elements
Use the pseudo element prefix ::
to select elements not directly in the tree.
HTML
The pseudo elements used to select the html_block
and html_inline
nodes by type are:
::comment
Select comments<!-- -->
::pi
Select processing instructions<? ?>
::doctype
Select doctype declarations<!doctype html>
::cdata
Select CDATA declarations<![CDATA[]]>
::element
Select block and inline elements<div></div>
::doctype /* select doctype declarations */
p ::comment /* select inline html comments */
Help
Usage: mkql [-dprmnh] [--delete] [--preserve] [--range] [--multiple]
[--newline] [--help] [--version] <selector...>
mkql [-dprmnh] [--delete] [--preserve] [--multiple] [--newline] [--help]
[--version] --range <start-selector> [end-selector]
Query documents with selectors.
Options
-d, --delete Remove matched nodes
-p, --preserve Preserve text when deleting
-r, --range Execute a range query
-m, --multiple Include multiple ranges
-n, --newline Add line break between matches
-h, --help Display help and exit
--version Print the version and exit
[email protected]
API
compile
compile(source)
Compile a source selector string to a tree representation.
Returns Object result tree.
source
String input selector.
range
range(start[, end])
Compile a range query.
When an end
selector is given it must have the same number of
selectors in the list as the start
selector.
If the end
selector is not given the range will end when the start
selector matches again or the end of file is reached.
start
String selector to start the range match.end
String selector to end the range match.
slice
slice(source[, opts])
Execute a range query on the input nodes.
Returns Range query execution object.
source
Object compiled range query.opts
Object range query options.
query
query(markdown, source[, opts])
Query a markdown document tree with a source selector.
If the markdown parameter is a string it is parsed into a document tree.
If the given source selector is a string it is compiled otherwise it should be a previously compiled result tree.
If the source selector appears to be a range query the slice
function is
called with the range query.
Returns Array list of matched nodes.
markdown
Array|Object|String input data.source
String|Object input selector.opts
Object query options.
ql
ql([opts][, cb])
Run queries on an input stream.
Returns an output stream.
opts
Object processing options.cb
Function callback function.
Options
input
Readable input stream.output
Writable output stream.
License
MIT
Created by mkdoc on April 24, 2016