Advanced Topics
Intermediate output structure
The parser produces a dict (Python) parse tree, which is later transformed into XML. This intermediate step makes it easier to adapt the parser's output it AKN's sometimes finnicky requirements.
You can see an example of this structure by running bluebell act act.txt --json
{
type: 'type of the element, such as "hier", "block" etc.',
name: 'name of the element, such as "body", "part" or "p",
attribs: {
key1: value1,
key2: value2,
},
children: [],
}
Valid types:
element: simple elementhier: hierarchical elementblock: block elementcontent: content element, such asporlistIntroductioninline: inline element, such asborreftext: plain textmarker: a marker element that cannot contain children, such asimg
Children
Most elements can contain children. This is an array of elements.
These elements cannot contain children: text, marker.
Text elements (text)
The content of a text element is in the value attribute.
An element that contains a single, pure-text child, can instead have a text attribute containing the equivalent of the value
attribute of the text element.
Inline elements (inline)
Inline elements such as b and term may have other inline or text children,
or a single text attribute.
Block elements (block)
Block elements cannot have hier children.
num: number; inline element
Hierarchical elements (hier)
heading: heading element; a block elementsubheading: subheading element; a block elementnum: number; inline element
Footnotes
Footnotes are treated specially because they are a subflow element that appears inline. The are split into two parts: a reference to the footnote inline, and the footnote content:
This text[++FN 1++] has a footnote.
FOOTNOTE 1
the content of the footnote
When converting the parse tree into XML, the footnote content must be inserted into
the point where it is referenced. This is done by post-processing the XML. A special non-AKN
<displaced name="footnote" marker="1">...</displaced> element is inserted in the
XML where the footnote content appears. The reference to the footnote inline is an
appropriate authorialNote element with a displaced="footnote" attribute.
Post-processing code then matches the <displaced> element with its <authorialNote>
using the name and marker, moves the children of the displaced into the authorialNote,
and removes the displaced element.
Example
{
type: 'hier',
name: 'chapter',
num: '1',
heading: 'Definitions and fundamental principles',
children: [{
type: 'hier',
name: 'section',
num: '1.',
heading: 'Definitions',
children: [{
type: 'inline',
name: 'p',
text: 'In this By-law, unless the context indicates otherwise -',
}, {
type: 'inline',
name: 'p',
text: '"Air Quality" means:',
}, {
type: 'block',
name: 'list',
children: [{
type: 'block',
name: 'item',
num: '(a)',
children: [{
type: 'inline',
name: 'p',
children: [{
type: 'text',
value: 'something with '
}, {
type: 'inline',
name: 'b',
text: 'bold',
}, {
type: 'text',
value: ' in it.'
}],
}],
}, {
type: 'block',
name: 'item',
num: '(b)',
children: [{
type: 'inline',
name: 'p',
children: [{
type: 'text',
value: 'text with a',
}, {
type: 'marker',
name: 'eol',
}, {
type: 'text',
value: 'newline in it.',
}],
}],
}],
}],
}],
}