Phase II of the Markdown parser: fragmenting inline material into emphasis, links, backticked code and so forth.
§1. Disclaimer. Do not call functions in this section directly: use the API in Markdown.
§2. Inline code. Phase II of the parser consists entirely of walking the tree built in Phase I, with the following function.
void MDInlineParser::inline_recursion(markdown_variation *variation, md_links_dictionary *link_refs, markdown_item *at) { if (at == NULL) return; if ((at->type == PARAGRAPH_MIT) || (at->type == HEADING_MIT) || ((at->type == TABLE_COLUMN_MIT) && (Str::len(at->stashed) > 0))) { markdown_item *matter = MDInlineParser::inline(variation, link_refs, at->stashed); Markdown::add_to(matter, at); } for (markdown_item *c = at->down; c; c = c->next) MDInlineParser::inline_recursion(variation, link_refs, c); }
§3. What it does, then, is to look at the text of paragraphs and headings to process them as "inline matter", which may contain emphasis, code excerpts and raw HTML, or images, or links. Since all these constructions can be nested, the result is a further tree, using a whole lot of new items never generated in Phase I.
To recap the syntaxes we're looking for:
This is a line of plain copy.
This one has `a backtick string`.
This one an autolink, <stanley.gibbons@heaven.org>, in email form.
This one an autolink, <https://www.stanleygibbons.com>, in web form.
This one [uses a link](https://www.stanleygibbons.com).
This one depicts .
This one uses **strong** and *weak* emphasis.
This one also uses __strong__ and _weak_ emphasis.
Every inline string becomes a subtree under a MATERIAL_MIT item. We then
proceed in three stages: make the "chain", which involves finding backtick
strings, raw HTML and autolinks, all constructions of equal precedence.
Then we look for links and images, images being essentially a form of link
for most purposes. Finally, but with some difficulty, we look for emphasis
notation.
markdown_item *MDInlineParser::inline(markdown_variation *variation, md_links_dictionary *link_refs, text_stream *text) { markdown_item *owner = Markdown::new_item(MATERIAL_MIT); MDInlineParser::make_inline_chain(variation, owner, text); MDInlineParser::links_and_images(variation, link_refs, owner, FALSE); int mask = 0; if (MarkdownVariations::supports(variation, ASTERISK_EMPHASIS_MARKDOWNFEATURE)) mask += ASTERISK_EMPHASIS_BIT; if (MarkdownVariations::supports(variation, UNDERSCORE_EMPHASIS_MARKDOWNFEATURE)) mask += UNDERSCORE_EMPHASIS_BIT; if (MarkdownVariations::supports(variation, STRIKETHROUGH_MARKDOWNFEATURE)) mask += TILDE_STRIKETHROUGH_BIT; if (MarkdownVariations::supports(variation, ALT_TEX_MARKDOWNFEATURE)) mask += DOLLAR_TEX_BIT; if (MarkdownVariations::supports(variation, ALT_INWEB_LINKS_MARKDOWNFEATURE)) mask += SLASHSLASH_LINK_BIT; if (mask) MDInlineParser::emphasis(variation, owner, mask); return owner; } markdown_item *MDInlineParser::make_inline_chain(markdown_variation *variation, markdown_item *owner, text_stream *text) { int i = 0; while (Str::get_at(text, i) == ' ') i++; int from = i, escaped = FALSE; for (; i<Str::len(text); i++) { if ((escaped == FALSE) && (Str::get_at(text, i) == '\\') && (Characters::is_ASCII_punctuation(Str::get_at(text, i+1)))) { escaped = TRUE; } else { if (escaped == FALSE) { Does a backtick begin here?3.2; Does TeX mathematics begin here?3.3; Does an Inweb link begin here?3.4; Does an index mark begin here?3.5; Does an autolink begin here?3.6; Does a raw HTML tag begin here?3.11; } Does a hard or soft line break occur here?3.12; ContinueOuter: ; escaped = FALSE; } } if (from <= Str::len(text)-1) { int to = Str::len(text)-1; while (Str::get_at(text, to) == ' ') to--; if (to >= from) { markdown_item *md = Markdown::new_slice(PLAIN_MIT, text, from, to); Markdown::add_to(md, owner); } } return owner; }
§3.1. See CommonMark 6.1: "A backtick string is a string of one or more backtick
characters that is neither preceded nor followed by a backtick." This returns
the length of a backtick string beginning at at, if one does, or 0 if it
does not.
int MDInlineParser::backtick_string(text_stream *text, int at, inchar32_t btc) { int count = 0; while (Str::get_at(text, at + count) == btc) count++; if (count == 0) return 0; if ((at > 0) && (Str::get_at(text, at - 1) == btc)) return 0; return count; }
§3.2. Does a backtick begin here?3.2 =
inchar32_t btc = 0; if (MarkdownVariations::supports(variation, BACKTICKED_CODE_MARKDOWNFEATURE)) btc = '`'; if (MarkdownVariations::supports(variation, STROKED_CODE_MARKDOWNFEATURE)) btc = '|'; if (btc) { int count = MDInlineParser::backtick_string(text, i, btc); if (count > 0) { for (int j=i+count+1; j<Str::len(text); j++) { if (MDInlineParser::backtick_string(text, j, btc) == count) { if (i-1 >= from) { markdown_item *md = Markdown::new_slice(PLAIN_MIT, text, from, i-1); Markdown::add_to(md, owner); } Insert an inline code item3.2.1; i = j+count; from = j+count; goto ContinueOuter; } } } }
- This code is used in §3.
§3.2.1. "The contents of the code span are the characters between these two backtick strings". Inside it, "line endings are converted to spaces", and "If the resulting string both begins and ends with a space character, but does not consist entirely of space characters, a single space character is removed from the front and back."
Insert an inline code item3.2.1 =
int start = i+count, end = j-1; text_stream *codespan = Str::new(); int all_spaces = TRUE; for (int k=start; k<=end; k++) { inchar32_t c = Str::get_at(text, k); if (c == '\n') c = ' '; if (c != ' ') all_spaces = FALSE; PUT_TO(codespan, c); } if ((all_spaces == FALSE) && (Str::get_first_char(codespan) == ' ') && (Str::get_last_char(codespan) == ' ')) { markdown_item *md = Markdown::new_slice(CODE_MIT, codespan, 1, Str::len(codespan)-2); Markdown::set_backtick_count(md, count); Markdown::add_to(md, owner); } else { markdown_item *md = Markdown::new_slice(CODE_MIT, codespan, 0, Str::len(codespan)-1); Markdown::set_backtick_count(md, count); Markdown::add_to(md, owner); }
- This code is used in §3.2.
§3.3. Until December 2025, TeX was handled as a form of inline emphasis markup, that
is, $ and $$ were treated as of equal precedence with _ and *. This
worked well in some ways, but had poor running time on really messy TeX code
because there were too many potential matches, and also meant that e.g.
now read~$N$ would fail to work because the non-breaking space ~ would
prevent $ from being recognised as a TeX opening. It seems cleaner to regard
TeX as on a par with backticked code in precedence, so that's what we now do,
and this removes both problems.
The code for the old way remains, though, as the ALT_TEX_MARKDOWNFEATURE.
They should not both be used.
Does TeX mathematics begin here?3.3 =
if (MarkdownVariations::supports(variation, TEX_MARKDOWNFEATURE)) { if (Str::get_at(text, i) == '$') { int j = i+1, count = 1; if (Str::get_at(text, j) == '$') j++, count++; for (; j<Str::len(text); j++) { if ((Str::get_at(text, j) == '$') && ((count == 1) || (Str::get_at(text, j+1) == '$'))) { if (i-1 >= from) { markdown_item *md = Markdown::new_slice(PLAIN_MIT, text, from, i-1); Markdown::add_to(md, owner); } Insert a TeX maths item3.3.1; i = j+count; from = j+count; goto ContinueOuter; } } } }
- This code is used in §3.
§3.3.1. Insert a TeX maths item3.3.1 =
int start = i+count, end = j-1; markdown_item *md = Markdown::new_slice((count == 1)?TEX_MIT:DISPLAYED_TEX_MIT, Str::duplicate(text), start, end); Markdown::add_to(md, owner);
- This code is used in §3.3.
Does an Inweb link begin here?3.4 =
if (MarkdownVariations::supports(variation, INWEB_LINKS_MARKDOWNFEATURE)) { inchar32_t pc = 0; if (i > 0) pc = Str::get_at(text, i-1); if ((pc != '/') && (pc != ':') && (Str::get_at(text, i) == '/') && (Str::get_at(text, i+1) == '/')) { int j = i+2; for (; j<Str::len(text); j++) { if ((Str::get_at(text, j-1) != ':') && (Str::get_at(text, j) == '/') && (Str::get_at(text, j+1) == '/')) { if (i-1 >= from) { markdown_item *md = Markdown::new_slice(PLAIN_MIT, text, from, i-1); Markdown::add_to(md, owner); } Insert an Inweb link item3.4.1; i = j+2; from = j+2; goto ContinueOuter; } } } }
- This code is used in §3.
§3.4.1. Insert an Inweb link item3.4.1 =
int start = i+2, end = j-1; markdown_item *md = Markdown::new_item(INWEB_LINK_MIT); markdown_item *address_md = Markdown::new_slice(PLAIN_MIT, Str::duplicate(text), start, end); Markdown::add_to(address_md, md); Markdown::add_to(md, owner);
- This code is used in §3.4.
§3.5. This provides an extension borrowed from traditional TeX manual-indexing notation, also used by indoc.
Does an index mark begin here?3.5 =
if (MarkdownVariations::supports(variation, INDEXING_MARKS_MARKDOWNFEATURE)) { if (Str::get_at(text, i) == '^') { int j = i+1, count = 1; if (Str::get_at(text, j) == '^') j++, count++; if (Str::get_at(text, j) == '^') j++, count++; if (Str::get_at(text, j) == '{') { TEMPORARY_TEXT(lemma) j++; int escaped = FALSE, end_found = FALSE; while (Str::get_at(text, j)) { if ((escaped == FALSE) && (Str::get_at(text, j) == '\\')) escaped = TRUE; else { if ((escaped == FALSE) && (Str::get_at(text, j) == '}')) { end_found = TRUE; break; } PUT_TO(lemma, Str::get_at(text, j)); escaped = FALSE; } j++; } if (end_found) { if (i-1 >= from) { markdown_item *md = Markdown::new_slice(PLAIN_MIT, text, from, i-1); Markdown::add_to(md, owner); } Insert an index item3.5.1; i = j; from = j+1; } DISCARD_TEXT(lemma) if (end_found) goto ContinueOuter; } } }
- This code is used in §3.
§3.5.1. Insert an index item3.5.1 =
markdown_item *md = Markdown::new_item(INDEX_MARKER_MIT); md->stashed = Str::duplicate(lemma); md->details = count; Markdown::add_to(md, owner); if (count == 1) { markdown_item *md = Markdown::new_slice(PLAIN_MIT, Str::duplicate(lemma), 0, Str::len(lemma)-1); Markdown::add_to(md, owner); }
- This code is used in §3.5.
§3.6. Does an autolink begin here?3.6 =
if (Str::get_at(text, i) == '<') { for (int j=i+1; j<Str::len(text); j++) { inchar32_t c = Str::get_at(text, j); if (c == '>') { int link_from = i+1, link_to = j-1, count = j-i+1; if (tracing_Markdown_parser) { text_stream *OUT = STDOUT; WRITE("Investigating potential autolink: "); for (int k=i; k<=j; k++) PUT(Str::get_at(text, k)); WRITE("\n"); } if (MarkdownVariations::supports(variation, WEB_AUTOLINKS_MARKDOWNFEATURE)) Test for URI autolink3.6.1; if (MarkdownVariations::supports(variation, EMAIL_AUTOLINKS_MARKDOWNFEATURE)) Test for email autolink3.6.2; break; } if ((c == '<') || (Characters::is_Unicode_whitespace(c)) || (Characters::is_control_character(c))) break; } } if (MarkdownVariations::supports(variation, EXTENDED_AUTOLINKS_MARKDOWNFEATURE)) { if ((MDInlineParser::extended_autolink_domain_char(Str::get_at(text, i))) && ((i == 0) || (Str::get_at(text, i-1) == '\n') || (Str::get_at(text, i-1) == '*') || (Str::get_at(text, i-1) == '_') || (Str::get_at(text, i-1) == '~') || ((Str::get_at(text, i-1) == '(') && (Str::get_at(text, i-2) != ']')) || (Characters::is_Unicode_whitespace(Str::get_at(text, i-1))))) Test for extended autolink3.6.3; }
- This code is used in §3.
§3.6.1. "A URI autolink consists of... a scheme followed by a colon followed by zero or more characters other than ASCII control characters, space, <, and >... a scheme is any sequence of 2–32 characters beginning with an ASCII letter and followed by any combination of ASCII letters, digits, or the symbols plus, period, or hyphen."
Test for URI autolink3.6.1 =
int colon_at = -1; for (int k=link_from; k<=link_to; k++) if (Str::get_at(text, k) == ':') { colon_at = k; break; } if (colon_at >= 0) { int scheme_valid = TRUE; Vet the scheme3.6.1.1; int link_valid = TRUE; Vet the link3.6.1.2; if ((scheme_valid) && (link_valid)) { if (i-1 >= from) { markdown_item *md = Markdown::new_slice(PLAIN_MIT, text, from, i-1); Markdown::add_to(md, owner); } markdown_item *md = Markdown::new_slice(URI_AUTOLINK_MIT, text, link_from, link_to); Markdown::add_to(md, owner); i = link_to+1; from = link_to+2; if (tracing_Markdown_parser) PRINT("Found URI from = %c\n", Markdown::get_at(md, from)); goto ContinueOuter; } else if (tracing_Markdown_parser) { if (scheme_valid == FALSE) PRINT("Colon suggested URI but scheme invalid\n"); if (link_valid == FALSE) PRINT("Colon suggested URI but link invalid\n"); } } else { if (tracing_Markdown_parser) PRINT("Not a URI: no colon\n"); }
- This code is used in §3.6.
§3.6.1.1. Vet the scheme3.6.1.1 =
int scheme_length = colon_at - link_from; if ((scheme_length < 2) || (scheme_length > 32)) scheme_valid = FALSE; for (int i=link_from; i<colon_at; i++) { inchar32_t c = Str::get_at(text, i); if (!((Characters::is_ASCII_letter(c)) || ((i > link_from) && ((Characters::is_ASCII_digit(c)) || (c == '+') || (c == '-') || (c == '.'))))) scheme_valid = FALSE; }
- This code is used in §3.6.1.
§3.6.1.2. Vet the link3.6.1.2 =
for (int i=colon_at+1; i<=link_to; i++) { inchar32_t c = Str::get_at(text, i); if ((c == '<') || (c == '>') || (c == ' ') || (Characters::is_control_character(c))) link_valid = FALSE; }
- This code is used in §3.6.1.
§3.6.2. Test for email autolink3.6.2 =
int atsign_at = -1; for (int k=link_from; k<=link_to; k++) if (Str::get_at(text, k) == '@') { atsign_at = k; break; } if (atsign_at >= 0) { int username_valid = TRUE; Vet the username3.6.2.1; int domain_valid = TRUE; Vet the domain name3.6.2.2; if ((username_valid) && (domain_valid)) { if (i-1 >= from) { markdown_item *md = Markdown::new_slice(PLAIN_MIT, text, from, i-1); Markdown::add_to(md, owner); } markdown_item *md = Markdown::new_slice(EMAIL_AUTOLINK_MIT, text, link_from, link_to); Markdown::set_add_protocol_state(md, TRUE); Markdown::add_to(md, owner); i = j+count; from = j+count; if (tracing_Markdown_parser) PRINT("Found email\n"); goto ContinueOuter; } else if (tracing_Markdown_parser) { if (username_valid == FALSE) PRINT("At suggested email but username invalid\n"); if (domain_valid == FALSE) PRINT("At suggested email but domain invalid\n"); } } else { if (tracing_Markdown_parser) PRINT("Not an email: no at-sign\n"); }
- This code is used in §3.6.
§3.6.2.1. What constitutes a legal email address follows the HTML 5 regular expression,
according to CommonMark. Good luck using {{@1-x.2.z.w as your email address,
but you absolutely can.
Vet the username3.6.2.1 =
int username_length = atsign_at - link_from; if (username_length < 1) username_valid = FALSE; for (int i=link_from; i<atsign_at; i++) { inchar32_t c = Str::get_at(text, i); if (!((Characters::is_ASCII_letter(c)) || (Characters::is_ASCII_digit(c)) || (c == '.') || (c == '!') || (c == '#') || (c == '$') || (c == '%') || (c == '&') || (c == '\'') || (c == '*') || (c == '+') || (c == '/') || (c == '=') || (c == '?') || (c == '^') || (c == '_') || (c == '`') || (c == '{') || (c == '|') || (c == '}') || (c == '~') || (c == '-'))) username_valid = FALSE; }
- This code is used in §3.6.2.
§3.6.2.2. Vet the domain name3.6.2.2 =
int segment_length = 0; for (int i=atsign_at+1; i<=link_to; i++) { inchar32_t c = Str::get_at(text, i); if (segment_length == 0) { if (!((Characters::is_ASCII_letter(c)) || (Characters::is_ASCII_digit(c)))) domain_valid = FALSE; } else { if (c == '.') { segment_length = 0; continue; } if (c == '-') { if ((Str::get_at(text, i+1) == 0) || (Str::get_at(text, i+1) == '.')) domain_valid = FALSE; } else if (!((Characters::is_ASCII_letter(c)) || (Characters::is_ASCII_digit(c)))) domain_valid = FALSE; } segment_length++; if (segment_length >= 64) domain_valid = FALSE; } if (segment_length >= 64) domain_valid = FALSE;
- This code is used in §3.6.2.
§3.6.3. Extended autolinks are a GitHub-flavored Markdown extension, and allow (some) websites and email addresses to be turned into links without needing angle brackets around them. This part of GitHub's specification uses sloppily different criteria for what is a legal domain or email address, but I follow it as written.
Test for extended autolink3.6.3 =
int domain_from = i, to = i; int add_protocol = FALSE, email_address = FALSE, email_required = FALSE, xmpp = FALSE; Look for a domain prefix3.6.3.1; if ((email_required) || (to == domain_from)) Look for an email address opening3.6.3.2; if (((email_required == FALSE) || (email_address)) && (to > domain_from)) { if (tracing_Markdown_parser) { PRINT("Found valid extended autolink prefix: "); for (int j=domain_from; j<=to; j++) PUT_TO(STDOUT, Str::get_at(text, j)); PRINT("\n"); } Look for the rest of the domain3.6.3.3; int domain_name_invalid = FALSE; Test the domain name for validity3.6.3.4; if (domain_name_invalid == FALSE) { Push the link forwards through URL or resource3.6.3.5; Insert as an extended autolink3.6.3.6; } }
- This code is used in §3.6.
§3.6.3.1. Look for a domain prefix3.6.3.1 =
if ((Str::get_at(text, domain_from) == 'w') && (Str::get_at(text, domain_from+1) == 'w') && (Str::get_at(text, domain_from+2) == 'w') && (Str::get_at(text, domain_from+3) == '.') && (MDInlineParser::extended_autolink_domain_char(Str::get_at(text, domain_from+4)))) { add_protocol = TRUE; to = domain_from+4; } else if ((Str::get_at(text, domain_from) == 'm') && (Str::get_at(text, domain_from+1) == 'a') && (Str::get_at(text, domain_from+2) == 'i') && (Str::get_at(text, domain_from+3) == 'l') && (Str::get_at(text, domain_from+4) == 't') && (Str::get_at(text, domain_from+5) == 'o') && (Str::get_at(text, domain_from+6) == ':') && (MDInlineParser::extended_autolink_domain_char(Str::get_at(text, domain_from+7)))) { to = domain_from+7; email_required = TRUE; } else if ((Str::get_at(text, domain_from) == 'x') && (Str::get_at(text, domain_from+1) == 'm') && (Str::get_at(text, domain_from+2) == 'p') && (Str::get_at(text, domain_from+3) == 'p') && (Str::get_at(text, domain_from+4) == ':') && (MDInlineParser::extended_autolink_domain_char(Str::get_at(text, domain_from+5)))) { to = domain_from+5; email_required = TRUE; xmpp = TRUE; } else if ((Str::get_at(text, domain_from) == 'h') && (Str::get_at(text, domain_from+1) == 't') && (Str::get_at(text, domain_from+2) == 't') && (Str::get_at(text, domain_from+3) == 'p') && (Str::get_at(text, domain_from+4) == ':') && (Str::get_at(text, domain_from+5) == '/') && (Str::get_at(text, domain_from+6) == '/') && (MDInlineParser::extended_autolink_domain_char(Str::get_at(text, domain_from+7)))) { to = domain_from+7; } else if ((Str::get_at(text, domain_from) == 'h') && (Str::get_at(text, domain_from+1) == 't') && (Str::get_at(text, domain_from+2) == 't') && (Str::get_at(text, domain_from+3) == 'p') && (Str::get_at(text, domain_from+4) == 's') && (Str::get_at(text, domain_from+5) == ':') && (Str::get_at(text, domain_from+6) == '/') && (Str::get_at(text, domain_from+7) == '/') && (MDInlineParser::extended_autolink_domain_char(Str::get_at(text, domain_from+8)))) { to = domain_from+8; }
- This code is used in §3.6.3.
§3.6.3.2. Look for an email address opening3.6.3.2 =
int j = to; while ((MDInlineParser::extended_autolink_email_char(Str::get_at(text, j))) || (Str::get_at(text, j) == '+') || (Str::get_at(text, j) == '.')) j++; if ((j > to) && (Str::get_at(text, j) == '@')) { to = j+1; email_address = TRUE; if (email_required == FALSE) add_protocol = TRUE; }
- This code is used in §3.6.3.
§3.6.3.3. Look for the rest of the domain3.6.3.3 =
if (email_address) { while (TRUE) { if (MDInlineParser::extended_autolink_email_char(Str::get_at(text, to+1))) to++; else if ((Str::get_at(text, to+1) == '.') && (Str::get_at(text, to) != '.')) to++; else break; } } else { while (TRUE) { if (MDInlineParser::extended_autolink_domain_char(Str::get_at(text, to+1))) to++; else if ((Str::get_at(text, to+1) == '.') && (Str::get_at(text, to) != '.')) to++; else break; } } if (Str::get_at(text, to) == '.') to--;
- This code is used in §3.6.3.
§3.6.3.4. Test the domain name for validity3.6.3.4 =
int dot_count = 0; for (int j=domain_from; j<=to; j++) if (Str::get_at(text, j) == '.') dot_count++; if (dot_count == 0) domain_name_invalid = TRUE; if (email_address) { if ((Str::get_at(text, to) == '_') || (Str::get_at(text, to) == '-')) domain_name_invalid = TRUE; } else { for (int j=domain_from, dots_passed=0; j<=to; j++) { inchar32_t c = Str::get_at(text, j); if (c == '.') dots_passed++; if ((c == '_') && (dots_passed >= dot_count - 2)) domain_name_invalid = TRUE; } }
- This code is used in §3.6.3.
§3.6.3.5. Push the link forwards through URL or resource3.6.3.5 =
int domain_to = to; if (email_address == FALSE) { while ((Str::get_at(text, to+1) != 0) && (Str::get_at(text, to+1) != '<') && (Characters::is_Unicode_whitespace(Str::get_at(text, to+1)) == FALSE)) to++; while (MDInlineParser::extended_autolink_trailing_punctuation_char(Str::get_at(text, to))) to--; } else if (xmpp) { if (Str::get_at(text, to+1) == '/') { to++; while (MDInlineParser::extended_autolink_xmpp_resource_char(Str::get_at(text, to+1))) to++; } } while (TRUE) { int initial_to = to; if (Str::get_at(text, to) == ')') { int bl = 0; for (int j=domain_to+1; j<=to; j++) { if (Str::get_at(text, j) == '(') bl++; if (Str::get_at(text, j) == ')') bl--; if (bl < 0) { to = j-1; break; } } } else if (Str::get_at(text, to) == ';') { int b = to-1; while (Characters::isalnum(Str::get_at(text, b))) b--; if (Str::get_at(text, b) == '#') b--; if (Str::get_at(text, b) == '&') to = b-1; } if (to == initial_to) break; }
- This code is used in §3.6.3.
§3.6.3.6. Insert as an extended autolink3.6.3.6 =
if (i-1 >= from) { markdown_item *md = Markdown::new_slice(PLAIN_MIT, text, from, i-1); Markdown::add_to(md, owner); } int link_from = domain_from, link_to = to; i = link_to; from = link_to+1; int type = URI_AUTOLINK_MIT; if (email_address) type = EMAIL_AUTOLINK_MIT; if (xmpp) type = XMPP_AUTOLINK_MIT; markdown_item *md = Markdown::new_slice(type, text, link_from, link_to); Markdown::set_add_protocol_state(md, add_protocol); Markdown::add_to(md, owner); if (tracing_Markdown_parser) { PRINT("Found extended autolink "); for (int j=domain_from; j<=to; j++) PUT_TO(STDOUT, Str::get_at(text, j)); PRINT("\n"); } goto ContinueOuter;
- This code is used in §3.6.3.
§3.7. "Alphanumeric characters, underscores and hyphens." It's not entirely clear what the GitHub-flavored Markdown specification means by "alphanumeric" here: are arbitrary Unicode letters and numerals allowed? I'm going to say not.
int MDInlineParser::extended_autolink_domain_char(inchar32_t c) { if ((Characters::isalnum(c)) || (c == '_') || (c == '-')) return TRUE; return FALSE; }
§3.8. "One ore [sic] more characters which are alphanumeric, or ., -, _, or +."
But we take care of + and . above.
int MDInlineParser::extended_autolink_email_char(inchar32_t c) { if ((Characters::isalnum(c)) || (c == '_') || (c == '-')) return TRUE; return FALSE; }
int MDInlineParser::extended_autolink_xmpp_resource_char(inchar32_t c) { if ((Characters::isalnum(c)) || (c == '@') || (c == '.')) return TRUE; return FALSE; }
§3.10. "Trailing punctuation (specifically, ?, !, ., ,, :, *, _, and ~) will not be considered part of the autolink."
int MDInlineParser::extended_autolink_trailing_punctuation_char(inchar32_t c) { if ((c == '?') || (c == '!') || (c == ',') || (c == '.') || (c == ':') || (c == '*') || (c == '_') || (c == '~')) return TRUE; return FALSE; }
§3.11. Does a raw HTML tag begin here?3.11 =
if (MarkdownVariations::supports(variation, INLINE_HTML_MARKDOWNFEATURE)) { int filter = NOT_APPLICABLE; if (Str::get_at(text, i) == '<') { switch (Str::get_at(text, i+1)) { case '?': Does a processing instruction begin here?3.11.1; break; case '!': if ((Str::get_at(text, i+2) == '-') && (Str::get_at(text, i+3) == '-')) Does an HTML comment begin here?3.11.2; if ((Str::get_at(text, i+2) == '[') && (Str::get_at(text, i+3) == 'C') && (Str::get_at(text, i+4) == 'D') && (Str::get_at(text, i+5) == 'A') && (Str::get_at(text, i+6) == 'T') && (Str::get_at(text, i+7) == 'A') && (Str::get_at(text, i+8) == '[')) Does a CDATA section begin here?3.11.4; if (Characters::is_ASCII_letter(Str::get_at(text, i+2))) Does an HTML declaration begin here?3.11.3; break; case '/': Does a close tag begin here?3.11.6; break; default: Does an open tag begin here?3.11.5; break; } NotATag: ; } }
- This code is used in §3.
Does a processing instruction begin here?3.11.1 =
for (int j = i+3; j<Str::len(text); j++) if ((Str::get_at(text, j) == '?') && (Str::get_at(text, j+1) == '>')) { int tag_from = i, tag_to = j+1; Allow it as a raw HTML tag3.11.1.1; }
- This code is used in §3.11.
Does an HTML comment begin here?3.11.2 =
int bad_start = FALSE; if (Str::get_at(text, i+4) == '>') bad_start = TRUE; if ((Str::get_at(text, i+4) == '-') && (Str::get_at(text, i+5) == '>')) bad_start = TRUE; if (bad_start == FALSE) for (int j = i+4; j<Str::len(text); j++) if ((Str::get_at(text, j) == '-') && (Str::get_at(text, j+1) == '-')) { if (Str::get_at(text, j+2) == '>') { int tag_from = i, tag_to = j+2; Allow it as a raw HTML tag3.11.1.1; } break; }
- This code is used in §3.11.
Does an HTML declaration begin here?3.11.3 =
for (int j = i+2; j<Str::len(text); j++) if (Str::get_at(text, j) == '>') { int tag_from = i, tag_to = j; Allow it as a raw HTML tag3.11.1.1; }
- This code is used in §3.11.
Does a CDATA section begin here?3.11.4 =
for (int j = i+10; j<Str::len(text); j++) if ((Str::get_at(text, j) == ']') && (Str::get_at(text, j+1) == ']') && (Str::get_at(text, j+2) == '>')) { int tag_from = i, tag_to = j+2; Allow it as a raw HTML tag3.11.1.1; }
- This code is used in §3.11.
§3.11.5. Does an open tag begin here?3.11.5 =
int at = i+1; filter = FALSE; Advance past tag name3.11.5.1; Advance past attributes3.11.5.2; Advance past optional tag-whitespace3.11.5.3; if (Str::get_at(text, at) == '/') at++; if (Str::get_at(text, at) == '>') { int tag_from = i, tag_to = at; Allow it as a raw HTML tag3.11.1.1; }
- This code is used in §3.11.
§3.11.6. Does a close tag begin here?3.11.6 =
int at = i+2; Advance past tag name3.11.5.1; Advance past optional tag-whitespace3.11.5.3; if (Str::get_at(text, at) == '>') { int tag_from = i, tag_to = at; Allow it as a raw HTML tag3.11.1.1; }
- This code is used in §3.11.
§3.11.5.1. Advance past tag name3.11.5.1 =
inchar32_t c = Str::get_at(text, at); if (Characters::is_ASCII_letter(c) == FALSE) goto NotATag; TEMPORARY_TEXT(tag) while ((c == '-') || (Characters::is_ASCII_letter(c)) || (Characters::is_ASCII_digit(c))) { PUT_TO(tag, c); c = Str::get_at(text, ++at); } if ((filter == FALSE) && (MarkdownVariations::supports(variation, DISALLOWED_RAW_HTML_MARKDOWNFEATURE)) && (Markdown::tag_should_be_filtered(tag))) filter = TRUE; DISCARD_TEXT(tag)
§3.11.5.2. Advance past attributes3.11.5.2 =
while (TRUE) { int start_at = at; Advance past optional tag-whitespace3.11.5.3; if (at == start_at) break; inchar32_t c = Str::get_at(text, at); if ((c == '_') || (c == ':') || (Characters::is_ASCII_letter(c))) { while ((c == '_') || (c == ':') || (c == '.') || (c == '-') || (Characters::is_ASCII_letter(c)) || (Characters::is_ASCII_digit(c))) c = Str::get_at(text, ++at); int start_value_at = at; Advance past optional tag-whitespace3.11.5.3; if (Str::get_at(text, at) != '=') { at = start_value_at; goto DoneValueSpecification; } at++; Advance past optional tag-whitespace3.11.5.3; Try for a single-quoted attribute value3.11.5.2.2; Try for a double-quoted attribute value3.11.5.2.3; Try for an unquoted attribute value3.11.5.2.1; DoneValueSpecification: ; } else { at = start_at; break; } }
- This code is used in §3.11.5.
§3.11.5.2.1. Try for an unquoted attribute value3.11.5.2.1 =
int k = at; while (TRUE) { inchar32_t c = Str::get_at(text, k); if ((c == ' ') || (c == '\t') || (c == '\n') || (c == '"') || (c == '\'') || (c == '=') || (c == '<') || (c == '>') || (c == '`') || (c == 0)) break; k++; } if (k == at) { at = start_value_at; goto DoneValueSpecification; } at = k; goto DoneValueSpecification;
- This code is used in §3.11.5.2.
§3.11.5.2.2. Try for a single-quoted attribute value3.11.5.2.2 =
if (Str::get_at(text, at) == '\'') { int k = at + 1; while ((Str::get_at(text, k) != '\'') && (Str::get_at(text, k) != 0)) k++; if (Str::get_at(text, k) == '\'') { at = k+1; goto DoneValueSpecification; } at = start_value_at; goto DoneValueSpecification; }
- This code is used in §3.11.5.2.
§3.11.5.2.3. Try for a double-quoted attribute value3.11.5.2.3 =
if (Str::get_at(text, at) == '"') { int k = at + 1; while ((Str::get_at(text, k) != '"') && (Str::get_at(text, k) != 0)) k++; if (Str::get_at(text, k) == '"') { at = k+1; goto DoneValueSpecification; } at = start_value_at; goto DoneValueSpecification; }
- This code is used in §3.11.5.2.
§3.11.5.3. Advance past optional tag-whitespace3.11.5.3 =
int line_ending_count = 0; while (TRUE) { inchar32_t c = Str::get_at(text, at++); if (c == '\n') { line_ending_count++; if (line_ending_count == 2) break; } if ((c != ' ') && (c != '\t') && (c != '\n')) break; } at--;
§3.11.1.1. Allow it as a raw HTML tag3.11.1.1 =
if (i-1 >= from) { markdown_item *md = Markdown::new_slice(PLAIN_MIT, text, from, i-1); Markdown::add_to(md, owner); } markdown_item *md = Markdown::new_slice(INLINE_HTML_MIT, text, tag_from, tag_to); if (filter == TRUE) Markdown::set_filtered_state(md, TRUE); Markdown::add_to(md, owner); i = tag_to; from = tag_to + 1; if (tracing_Markdown_parser) PRINT("Found raw HTML\n"); goto ContinueOuter;
§3.12. Does a hard or soft line break occur here?3.12 =
if (Str::get_at(text, i) == '\n') { int soak = 0; if (Str::get_at(text, i-1) == '\\') soak = 2; int preceding_spaces = 0; while (Str::get_at(text, i-1-preceding_spaces) == ' ') preceding_spaces++; if (preceding_spaces >= 2) soak = preceding_spaces+1; if (soak > 0) { if (i-soak >= from) { markdown_item *md = Markdown::new_slice(PLAIN_MIT, text, from, i-soak); Markdown::add_to(md, owner); } markdown_item *md = Markdown::new_slice(LINE_BREAK_MIT, I"\n\n", 0, 1); Markdown::add_to(md, owner); } else { if (i-preceding_spaces-1 >= from) { markdown_item *md = Markdown::new_slice(PLAIN_MIT, text, from, i-preceding_spaces-1); Markdown::add_to(md, owner); } markdown_item *md = Markdown::new_slice(SOFT_BREAK_MIT, I"\n", 0, 0); Markdown::add_to(md, owner); } i++; while (Str::get_at(text, i) == ' ') i++; from = i; i--; if (tracing_Markdown_parser) PRINT("Found line break\n"); goto ContinueOuter; }
- This code is used in §3.
§4. Links and images. Well, so now we come to the middle-precedence items: links and images. These have equal precedence, so they ought to be read from left to right, but there's a twist: a link is not valid if its linked material contains another link.
This [looks like [a link within a link](somewhere.html)](nowhere.html).
results in only the inner link being processed as such. That restriction does not hold for images, so this is all legal:
This ](strip.jpg).
Indeed, image notations can contain other image notations. Of course, that
does no good, because rendering uses the textual part of an image as its "alt"
text, which is not allowed to contain tags, only plain text. But syntactically
it is legal Markdown.
void MDInlineParser::links_and_images(markdown_variation *variation, md_links_dictionary *link_refs, markdown_item *owner, int images_only) { if (owner == NULL) return; if ((MarkdownVariations::supports(variation, LINKS_MARKDOWNFEATURE) == FALSE) && (MarkdownVariations::supports(variation, IMAGES_MARKDOWNFEATURE) == FALSE) && (MarkdownVariations::supports(variation, FOOTNOTES_MARKDOWNFEATURE) == FALSE)) return; if (tracing_Markdown_parser) { PRINT("Beginning link/image pass:\n"); Markdown::debug_subtree(STDOUT, owner); } md_charpos leftmost_pos = Markdown::left_edge_of(owner->down); while (TRUE) { if (tracing_Markdown_parser) { if (Markdown::somewhere(leftmost_pos)) { PRINT("Link/image notation scan from %c\n", Markdown::get(leftmost_pos)); Markdown::debug_subtree(STDOUT, leftmost_pos.md); } else { PRINT("Link/image notation scan from start\n"); } } md_link_parse found = MDInlineParser::first_valid_link(variation, link_refs, leftmost_pos, Markdown::nowhere(), images_only, FALSE); if (found.is_link == NOT_APPLICABLE) break; md_link_dictionary_entry *ref = found.link_reference; if (tracing_Markdown_parser) { PRINT("Link matter: "); if (found.link_text_empty) PRINT("EMPTY\n"); else Markdown::debug_interval(STDOUT, found.link_text_from, found.link_text_to); if (ref) { PRINT("Link destination (reference): %S\n", ref->destination); PRINT("Link title (reference): %S\n", ref->title); } if (found.footnote_link != 0) { PRINT("Link is footnote cue %d\n", found.footnote_link); } else { PRINT("Link destination: "); if (found.link_destination_empty) PRINT("EMPTY\n"); else Markdown::debug_interval(STDOUT, found.link_destination_from, found.link_destination_to); PRINT("Link title: "); if (found.link_title_empty) PRINT("EMPTY\n"); else Markdown::debug_interval(STDOUT, found.link_title_from, found.link_title_to); } } Deal with a valid link or image4.2; } }
§4.1. The following structure holds the findings of MDInlineParser::first_valid_link
if it succeeds. Note that a link/image which makes a reference is only syntactically
valid if the reference matches something in the dictionary.
It might seem redundant to have the *_empty fields below, but there are some
awkward cases where, as a result of earlier splicing, it's difficult to represent
nonexistent text with md_charpos structs.
typedef struct md_link_parse { int is_link; /*TRUEfor link,FALSEfor image,NOT_APPLICABLEfor fail */ struct md_charpos first; /* leftmost character of the whole construct */ struct md_charpos link_text_from; /* leftmost character inside the squares */ struct md_charpos link_text_to; /* ...and so on */ int link_text_empty; /*TRUEorFALSE*/ struct md_charpos link_destination_from; struct md_charpos link_destination_to; int link_destination_empty; struct md_charpos link_title_from; struct md_charpos link_title_to; int link_title_empty; struct md_link_dictionary_entry *link_reference; /* orNULLif it's not by reference */ int footnote_link; /* or 0 if it's not a footnote cue */ struct md_charpos last; /* rightmost character of the whole construct */ } md_link_parse;
- The structure md_link_parse is accessed in 4/fsm and here.
§4.2. See Markdown for how all this splicing is done.
Deal with a valid link or image4.2 =
markdown_item *chain = owner->down, *found_text = NULL, *remainder = NULL; Markdown::cut_interval(chain, found.first, found.last, &chain, &found_text, &remainder); markdown_item *link_text = NULL; markdown_item *link_destination = NULL; markdown_item *link_title = NULL; if (found.link_text_empty == FALSE) Markdown::cut_interval(found_text, found.link_text_from, found.link_text_to, NULL, &link_text, &found_text); if ((Markdown::somewhere(found.link_destination_from)) && (found.link_destination_empty == FALSE)) Markdown::cut_interval(found_text, found.link_destination_from, found.link_destination_to, NULL, &link_destination, &found_text); if ((Markdown::somewhere(found.link_title_from)) && (found.link_title_empty == FALSE)) Markdown::cut_interval(found_text, found.link_title_from, found.link_title_to, NULL, &link_title, &found_text); markdown_item *link_item = Markdown::new_item((found.is_link == TRUE)?LINK_MIT:IMAGE_MIT); markdown_item *matter = Markdown::new_item(MATERIAL_MIT); if (found.link_text_empty == FALSE) matter->down = link_text; Markdown::add_to(matter, link_item); if (found.is_link == TRUE) MDInlineParser::links_and_images(variation, link_refs, matter, TRUE); else MDInlineParser::links_and_images(variation, link_refs, matter, FALSE); if (ref) { if (Str::len(ref->destination) > 0) { markdown_item *dest_item = Markdown::new_item(LINK_DEST_MIT); dest_item->down = Markdown::new_slice(PLAIN_MIT, ref->destination, 0, Str::len(ref->destination)-1); Markdown::add_to(dest_item, link_item); } if (Str::len(ref->title) > 0) { markdown_item *title_item = Markdown::new_item(LINK_TITLE_MIT); title_item->down = Markdown::new_slice(PLAIN_MIT, ref->title, 0, Str::len(ref->title)-1); Markdown::add_to(title_item, link_item); } } else if (found.footnote_link > 0) { link_item->details = found.footnote_link; } else { if (link_destination) { markdown_item *dest_item = Markdown::new_item(LINK_DEST_MIT); if (found.link_destination_empty == FALSE) dest_item->down = link_destination; Markdown::add_to(dest_item, link_item); } if (link_title) { markdown_item *title_item = Markdown::new_item(LINK_TITLE_MIT); if (found.link_title_empty == FALSE) title_item->down = link_title; Markdown::add_to(title_item, link_item); } } if (chain) { owner->down = chain; while (chain->next) chain = chain->next; chain->next = link_item; } else { owner->down = link_item; } link_item->next = remainder; if (tracing_Markdown_parser) { PRINT("After link surgery:\n"); Markdown::debug_subtree(STDOUT, owner); } leftmost_pos = Markdown::left_edge_of(remainder);
- This code is used in §4.
§5. Okay, so it's time to go looking for the leftmost valid link or image in
the given part of a chain of nodes. Links begin with an unescaped [
and images with an unescaped ![.
md_link_parse MDInlineParser::first_valid_link(markdown_variation *variation, md_links_dictionary *link_refs, md_charpos from, md_charpos to, int images_only, int links_only) { md_link_parse result; Initialise the parse result to a no5.1; inchar32_t prev_c = 0; md_charpos prev_pos = Markdown::nowhere(); int escaped = FALSE; while ((Markdown::somewhere(from)) && (Markdown::plainish(from.md) == FALSE)) from = Markdown::advance_up_to(from, to); for (md_charpos pos = from; Markdown::somewhere(pos); pos = Markdown::advance_up_to(pos, to)) { inchar32_t c = Markdown::get(pos); if ((c == '\\') && (escaped == FALSE)) escaped = TRUE; else { if ((c == '[') && (escaped == FALSE)) { md_charpos pass_pos = pos; for (int pass=1; pass<=2; pass++) { if (tracing_Markdown_parser) { PRINT("Pass %d: at ", pass); Markdown::debug_pos(STDOUT, pos); PRINT("\n"); } if (pass == 2) pos = pass_pos; See if a link begins here5.2; AbandonHope: ; } } if (escaped == FALSE) { prev_c = c; prev_pos = pos; } escaped = FALSE; } } return result; }
§5.1. Initialise the parse result to a no5.1 =
result.is_link = NOT_APPLICABLE; result.first = Markdown::nowhere(); result.link_text_from = Markdown::nowhere(); result.link_text_to = Markdown::nowhere(); result.link_text_empty = NOT_APPLICABLE; result.link_destination_from = Markdown::nowhere(); result.link_destination_to = Markdown::nowhere(); result.link_destination_empty = NOT_APPLICABLE; result.link_title_from = Markdown::nowhere(); result.link_title_to = Markdown::nowhere(); result.link_title_empty = NOT_APPLICABLE; result.link_reference = FALSE; result.footnote_link = 0; result.last = Markdown::nowhere();
- This code is used in §5.
define ABANDON_LINK(reason) { if (tracing_Markdown_parser) { PRINT("Link abandoned: %s\n", reason); } pos = abandon_at; goto AbandonHope; }
See if a link begins here5.2 =
if (((links_only == FALSE) || (prev_c != '!')) && ((images_only == FALSE) || (prev_c == '!'))) { int link_rather_than_image = TRUE, uses = LINKS_MARKDOWNFEATURE; result.first = pos; if ((prev_c == '!') && (links_only == FALSE)) { link_rather_than_image = FALSE; uses = IMAGES_MARKDOWNFEATURE; result.first = prev_pos; } if ((MarkdownVariations::supports(variation, uses)) || ((uses == LINKS_MARKDOWNFEATURE) && (MarkdownVariations::supports(variation, FOOTNOTES_MARKDOWNFEATURE)))) { if (link_rather_than_image) { if (tracing_Markdown_parser) PRINT("Potential link found\n"); } else { if (tracing_Markdown_parser) PRINT("Potential image found\n"); } md_charpos abandon_at = pos; Work out the link text5.2.1; if (Markdown::get(pos) == '[') { Work out the reference5.2.2; } else { if ((Markdown::get(pos) != '(') || (pass == 2)) { TEMPORARY_TEXT(label) for (md_charpos pos = result.link_text_from; Markdown::somewhere(pos); pos = Markdown::advance(pos)) { PUT_TO(label, Markdown::get(pos)); if (Markdown::pos_eq(pos, result.link_text_to)) break; } Deal with escape characters in the label5.2.3; int valid = TRUE; if (MarkdownVariations::supports(variation, FOOTNOTES_MARKDOWNFEATURE)) { if (Str::get_at(label, 0) == '0') valid = FALSE; if ((Str::len(label) < 1) || (Str::len(label) > 9)) valid = FALSE; for (int i=0; i<Str::len(label); i++) if (Characters::isdigit(Str::get_at(label, i)) == FALSE) valid = FALSE; if (valid) result.footnote_link = Str::atoi(label, 0); } if (result.footnote_link == 0) { if (MarkdownVariations::supports(variation, uses) == FALSE) ABANDON_LINK("not a valid footnote number"); md_link_dictionary_entry *ref = Markdown::look_up(link_refs, label); if (ref == NULL) ABANDON_LINK("no '(' and not a valid reference"); #ifdef SUPERVISOR_MODULE DocumentationCompiler::notify_image_use(ref->destination); #endif result.link_reference = ref; } pos = result.link_text_to; pos = Markdown::advance_up_to(pos, to); DISCARD_TEXT(label) } else { pos = Markdown::advance_up_to_quasi_plainish_only(pos, to); Advance pos by optional small amount of white space5.2.6; if (Markdown::get(pos) != ')') Work out the link destination5.2.4; Advance pos by optional small amount of white space5.2.6; if (Markdown::get(pos) != ')') Work out the link title5.2.5; Advance pos by optional small amount of white space5.2.6; if (Markdown::get(pos) != ')') ABANDON_LINK("no ')'"); } } result.last = pos; result.is_link = link_rather_than_image; if (tracing_Markdown_parser) PRINT("Confirmed\n"); return result; } }
- This code is used in §5.
§5.2.1. Work out the link text5.2.1 =
inchar32_t c = Markdown::get(pos); md_charpos prev_pos = pos; result.link_text_from = Markdown::advance_up_to(pos, to); inchar32_t prev_c = 0; int bl = 0, count = 0, escaped = FALSE; while (c != 0) { if ((c == '\\') && (escaped == FALSE)) { escaped = TRUE; } else { count++; if ((c == '[') && (escaped == FALSE)) bl++; if ((c == ']') && (escaped == FALSE)) { bl--; if (bl == 0) break; } escaped = FALSE; } prev_pos = pos; prev_c = c; pos = Markdown::advance_up_to(pos, to); c = Markdown::get(pos); } if (c == 0) { pos = abandon_at; ABANDON_LINK("no end to linked matter"); } result.link_text_empty = (count<=2)?TRUE:FALSE; result.link_text_to = prev_pos; if (link_rather_than_image) { md_link_parse nested = MDInlineParser::first_valid_link(variation, link_refs, result.link_text_from, result.link_text_to, FALSE, TRUE); if (nested.is_link != NOT_APPLICABLE) return nested; } pos = Markdown::advance_up_to_plainish_only(pos, to);
- This code is used in §5.2.
§5.2.2. Work out the reference5.2.2 =
md_charpos prev_pos = pos; pos = Markdown::advance_up_to_plainish_only(pos, to); result.link_destination_from = pos; inchar32_t prev_c = 0; int bl = 1, escaping = FALSE; TEMPORARY_TEXT(label) inchar32_t c = Markdown::get(pos); while (c != 0) { if ((c == '\\') && (escaping == FALSE)) { escaping = TRUE; } else { if (escaping) { if ((c != '[') && (c != ']') && (c != '\\')) PUT_TO(label, '\\'); } else { if (c == '[') bl++; if (c == ']') { bl--; if (bl == 0) break; } } PUT_TO(label, c); escaping = FALSE; } prev_pos = pos; prev_c = c; pos = Markdown::advance_up_to_plainish_only(pos, to); c = Markdown::get(pos); } if (c == 0) { pos = abandon_at; ABANDON_LINK("no end to reference"); } if (Str::len(label) == 0) { for (md_charpos pos = result.link_text_from; Markdown::somewhere(pos); pos = Markdown::advance(pos)) { PUT_TO(label, Markdown::get(pos)); if (Markdown::pos_eq(pos, result.link_text_to)) break; } } md_link_dictionary_entry *ref = Markdown::look_up(link_refs, label); if (ref == NULL) ABANDON_LINK("unknown reference"); result.link_reference = ref; #ifdef SUPERVISOR_MODULE DocumentationCompiler::notify_image_use(ref->destination); #endif
- This code is used in §5.2.
§5.2.3. Note that only square brackets and backslashes can be escaped in a link or image reference label, not other ASCII punctuation.
Deal with escape characters in the label5.2.3 =
TEMPORARY_TEXT(to) for (int i=0; i<Str::len(label); i++) { if ((Str::get_at(label, i) == '\\') && ((Str::get_at(label, i+1) == '[') || (Str::get_at(label, i+1) == '\\') || (Str::get_at(label, i+1) == ']'))) i++; PUT_TO(to, Str::get_at(label, i)); } Str::clear(label); WRITE_TO(label, "%S", to); DISCARD_TEXT(to)
- This code is used in §5.2.
§5.2.4. Work out the link destination5.2.4 =
if (Markdown::get(pos) == '<') { pos = Markdown::advance_up_to_quasi_plainish_only(pos, to); result.link_destination_from = pos; int empty = TRUE; inchar32_t prev_c = 0; while ((Markdown::get(pos) != '>') || (prev_c == '\\')) { if (Markdown::get(pos) == 0) ABANDON_LINK("no end to destination in angles"); if (Markdown::get(pos) == '<') ABANDON_LINK("'<' in destination in angles"); if (Markdown::get(pos) == '\n') ABANDON_LINK("reference includes line end"); prev_pos = pos; prev_c = Markdown::get(pos); pos = Markdown::advance_up_to_quasi_plainish_only(pos, to); empty = FALSE; } result.link_destination_empty = empty; result.link_destination_to = prev_pos; pos = Markdown::advance_up_to_quasi_plainish_only(pos, to); if ((Markdown::get(pos) == '"') || (Markdown::get(pos) == '\'') || (Markdown::get(pos) == '(')) ABANDON_LINK("no gap between destination and title"); } else { result.link_destination_from = pos; int bl = 1; inchar32_t prev_c = 0; md_charpos prev_pos = pos; int empty = TRUE; while ((Markdown::get(pos) != ' ') && (Markdown::get(pos) != '\n') && (Markdown::get(pos) != '\t')) { inchar32_t c = Markdown::get(pos); if ((c == '(') && (prev_c != '\\')) bl++; if ((c == ')') && (prev_c != '\\')) { bl--; if (bl == 0) break; } if (c == 0) ABANDON_LINK("no end to destination"); if (Characters::is_control_character(c)) ABANDON_LINK("control character in destination"); prev_pos = pos; prev_c = c; pos = Markdown::advance_up_to_quasi_plainish_only(pos, to); empty = FALSE; } result.link_destination_empty = empty; result.link_destination_to = prev_pos; if ((Markdown::get(pos) == '"') || (Markdown::get(pos) == '\'') || (Markdown::get(pos) == '(')) ABANDON_LINK("no gap between destination and title"); }
- This code is used in §5.2.
§5.2.5. Work out the link title5.2.5 =
if (Markdown::get(pos) == '"') { pos = Markdown::advance_up_to_plainish_only(pos, to); result.link_title_from = pos; inchar32_t prev_c = 0; md_charpos prev_pos = pos; int empty = TRUE; inchar32_t c = Markdown::get(pos); while (c != 0) { inchar32_t c = Markdown::get(pos); if ((c == '"') && (prev_c != '\\')) break; prev_pos = pos; prev_c = c; pos = Markdown::advance_up_to_plainish_only(pos, to); empty = FALSE; } if (c == 0) ABANDON_LINK("no end to title"); result.link_title_empty = empty; result.link_title_to = prev_pos; pos = Markdown::advance_up_to_plainish_only(pos, to); } else if (Markdown::get(pos) == '\'') { pos = Markdown::advance_up_to_plainish_only(pos, to); result.link_title_from = pos; inchar32_t prev_c = 0; md_charpos prev_pos = pos; int empty = TRUE; inchar32_t c = Markdown::get(pos); while (c != 0) { inchar32_t c = Markdown::get(pos); if ((c == '\'') && (prev_c != '\\')) break; prev_pos = pos; prev_c = c; pos = Markdown::advance_up_to_plainish_only(pos, to); empty = FALSE; } if (c == 0) ABANDON_LINK("no end to title"); result.link_title_empty = empty; result.link_title_to = prev_pos; pos = Markdown::advance_up_to_plainish_only(pos, to); } else if (Markdown::get(pos) == '(') { pos = Markdown::advance_up_to(pos, to); result.link_title_from = pos; inchar32_t prev_c = 0; md_charpos prev_pos = pos; int empty = TRUE; inchar32_t c = Markdown::get(pos); while (c != 0) { inchar32_t c = Markdown::get(pos); if ((c == '(') && (prev_c != '\\')) ABANDON_LINK("unescaped '(' in title"); if ((c == ')') && (prev_c != '\\')) break; prev_pos = pos; prev_c = c; pos = Markdown::advance_up_to(pos, to); empty = FALSE; } if (c == 0) ABANDON_LINK("no end to title"); result.link_title_empty = empty; result.link_title_to = prev_pos; pos = Markdown::advance_up_to_plainish_only(pos, to); }
- This code is used in §5.2.
§5.2.6. Advance pos by optional small amount of white space5.2.6 =
int line_endings = 0; inchar32_t c = Markdown::get(pos); while ((c == ' ') || (c == '\t') || (c == '\n')) { if (c == '\n') { line_endings++; if (line_endings >= 2) break; } pos = Markdown::advance_up_to_quasi_plainish_only(pos, to); c = Markdown::get(pos); }
- This code is used in §5.2 (three times).
§6. Emphasis. Well, that was easy. Now for the hardest pass, in which we look for the use of asterisks and underscores for emphasis. This notation is deeply ambiguous on its face, and CommonMark's precise specification is a bit of an ordeal, but here goes.
define ASTERISK_EMPHASIS_BIT 1
define UNDERSCORE_EMPHASIS_BIT 2
define TILDE_STRIKETHROUGH_BIT 4
define DOLLAR_TEX_BIT 8
define SLASHSLASH_LINK_BIT 16
void MDInlineParser::emphasis(markdown_variation *variation, markdown_item *owner, int mask) { for (markdown_item *md = owner->down; md; md = md->next) if ((md->type == LINK_MIT) || (md->type == IMAGE_MIT)) MDInlineParser::emphasis(variation, md->down, mask); text_stream *OUT = STDOUT; if (tracing_Markdown_parser) { WRITE("Seeking emphasis (%d) in:\n", mask); INDENT; Markdown::debug_subtree(STDOUT, owner); } Seek emphasis6.4; if (tracing_Markdown_parser) { OUTDENT; WRITE("Emphasis search complete\n"); } }
§6.1. "A delimiter run is either a sequence of one or more * characters that is not preceded or followed by a non-backslash-escaped * character, or a sequence of one or more _ characters that is not preceded or followed by a non-backslash-escaped _ character."
This function returns 0 unless a delimiter run begins at at, and then returns
its length if this was asterisked, and minus its length if underscored.
The GitHub extension to CommonMark for tildes meaning strikethrough is restricted to runs of length 1 or 2, and we return those as 10000001 and 10000002.
int MDInlineParser::delimiter_run(md_charpos pos, int mask) { int count = 0; if (mask & ASTERISK_EMPHASIS_BIT) { count = Markdown::unescaped_run(pos, '*'); if ((count > 0) && (Markdown::get_unescaped(pos, -1) != '*')) return count; } if (mask & UNDERSCORE_EMPHASIS_BIT) { count = Markdown::unescaped_run(pos, '_'); if ((count > 0) && (Markdown::get_unescaped(pos, -1) != '_')) return -count; } if (mask & TILDE_STRIKETHROUGH_BIT) { count = Markdown::unescaped_run(pos, '~'); if ((count >= 1) && (count <= 2) && (Markdown::get_unescaped(pos, -1) != '~')) return 10000000 + count; } if (mask & DOLLAR_TEX_BIT) { count = Markdown::unescaped_run(pos, '$'); if ((count == 1) && (Markdown::get_unescaped(pos, -1) != '$')) return 20000000 + count; if ((count == 2) && (Markdown::get_unescaped(pos, -1) != '$')) return 30000000 + count; } if (mask & SLASHSLASH_LINK_BIT) { count = Markdown::unescaped_run(pos, '/'); if ((count == 2) && (Markdown::get_unescaped(pos, -1) != '/')) return 40000000 + count; } return 0; }
§6.2. "A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a Unicode punctuation character, or (2b) followed by a Unicode punctuation character and preceded by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace."
"A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and either (2a) not preceded by a Unicode punctuation character, or (2b) preceded by a Unicode punctuation character and followed by Unicode whitespace or a Unicode punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace."
int MDInlineParser::left_flanking(md_charpos pos, int count) { if (count == 0) return FALSE; if (count < 0) count = -count; inchar32_t followed_by = Markdown::get_unescaped(pos, count); if ((followed_by == 0) || (Characters::is_Unicode_whitespace(followed_by))) return FALSE; if (Characters::is_Unicode_punctuation(followed_by) == FALSE) return TRUE; inchar32_t preceded_by = Markdown::get_unescaped(pos, -1); if ((preceded_by == 0) || (Characters::is_Unicode_whitespace(preceded_by)) || (Characters::is_Unicode_punctuation(preceded_by))) return TRUE; return FALSE; } int MDInlineParser::right_flanking(md_charpos pos, int count) { if (count == 0) return FALSE; if (count < 0) count = -count; inchar32_t preceded_by = Markdown::get_unescaped(pos, -1); if ((preceded_by == 0) || (Characters::is_Unicode_whitespace(preceded_by))) return FALSE; if (Characters::is_Unicode_punctuation(preceded_by) == FALSE) return TRUE; inchar32_t followed_by = Markdown::get_unescaped(pos, count); if ((followed_by == 0) || (Characters::is_Unicode_whitespace(followed_by)) || (Characters::is_Unicode_punctuation(followed_by))) return TRUE; return FALSE; }
int MDInlineParser::can_open_emphasis(md_charpos pos, int count) { if (MDInlineParser::left_flanking(pos, count) == FALSE) return FALSE; if (count > 0) return TRUE; if (MDInlineParser::right_flanking(pos, count) == FALSE) return TRUE; inchar32_t preceded_by = Markdown::get_unescaped(pos, -1); if (Characters::is_Unicode_punctuation(preceded_by)) return TRUE; return FALSE; } int MDInlineParser::can_close_emphasis(md_charpos pos, int count) { if (MDInlineParser::right_flanking(pos, count) == FALSE) return FALSE; if (count > 0) return TRUE; if (MDInlineParser::left_flanking(pos, count) == FALSE) return TRUE; inchar32_t followed_by = Markdown::get_unescaped(pos, -count); /* count < 0 here */ if (Characters::is_Unicode_punctuation(followed_by)) return TRUE; return FALSE; }
§6.4. This naive algorithm has every possibility of becoming computationally explosive if a really knotty tangle of nested emphasis delimiters comes along, though of course that is a rare occurrence. We're going to find every possible way to pair opening and closing delimiters, and then score the results with a system of penalties. Whichever solution has the least penalty is the winner.
In almost every example of normal Markdown written by actual human beings, there will be just one open/close option at a time.
define MAX_MD_EMPHASIS_PAIRS (MAX_MD_EMPHASIS_DELIMITERS*MAX_MD_EMPHASIS_DELIMITERS)
Seek emphasis6.4 =
int no_delimiters = 0; md_emphasis_delimiter delimiters[MAX_MD_EMPHASIS_DELIMITERS]; Find the possible emphasis delimiters6.4.2; markdown_item *options[MAX_MD_EMPHASIS_DELIMITERS]; int no_options = 0; for (int open_i = 0; open_i < no_delimiters; open_i++) { md_emphasis_delimiter *OD = &(delimiters[open_i]); if (OD->can_open == FALSE) continue; for (int close_i = open_i+1; close_i < no_delimiters; close_i++) { md_emphasis_delimiter *CD = &(delimiters[close_i]); if (CD->can_close == FALSE) continue; Reject this as a possible closer if it cannot match the opener6.4.3; if (tracing_Markdown_parser) { WRITE("Option %d is to pair D%d with D%d\n", no_options, open_i, close_i); } Create the subtree which would result from this option being chosen6.4.4; } } if (no_options > 0) Select the option with the lowest penalty6.4.5;
- This code is used in §6.
§6.4.1. We don't want to find every possible delimiter, in case the source text is
absolutely huge: indeed, we never exceed MAX_MD_EMPHASIS_DELIMITERS.
A further optimisation is that (a) we needn't even record delimiters which can't open or close, (b) or delimiters which can only close and which occur before any openers, (c) or anything after a point where we can clearly complete at least one pair correctly.
For example, consider This is *emphatic* and **so is this**. Rule (c) makes
it unnecessary to look past the end of the word "emphatic", because by that
point we have seen an opener which cannot close and a closer which cannot open,
of equal widths. These can only pair with each other; so we can stop.
As a result, in almost all human-written Markdown, the algorithm below returns exactly two delimiters, one open, one close.
In other situations, it's harder to predict what will happen. We will contain
the possible explosion by restricting to cases where at least one pair can be
made within the first MAX_MD_EMPHASIS_DELIMITERS potential delimiters, and
we can pretty safely keep that number small.
define MAX_MD_EMPHASIS_DELIMITERS 10
classdef md_emphasis_delimiter in 1000s { struct md_charpos pos; /* first character in the run */ int width; /* for example, 7 for a run of seven asterisks */ int type; /* 1 for asterisks, -1 for underscores, 0 for tildes */ int can_open; /* result ofMDInlineParser::can_open_emphasison it */ int can_close; /* result ofMDInlineParser::can_close_emphasison it */ }
- The structure md_emphasis_delimiter is accessed in 2/trs, 5/mrk, 5/mpi, 5/mr, 5/mv, 5/im and here.
§6.4.2. Find the possible emphasis delimiters6.4.2 =
int open_count[3] = { 0, 0, 0 }, close_count[3] = { 0, 0, 0 }, both_count[3] = { 0, 0, 0 }; md_charpos from = Markdown::left_edge_of(owner->down); while ((Markdown::somewhere(from)) && (Markdown::plainish(from.md) == FALSE)) from = Markdown::advance(from); for (md_charpos pos = from; Markdown::somewhere(pos); pos = Markdown::advance(pos)) { int run = MDInlineParser::delimiter_run(pos, mask); if (run != 0) { if (no_delimiters >= MAX_MD_EMPHASIS_DELIMITERS) break; int run_combined = run; if (run > 10000000) run_combined = run % 10000000; int can_open = MDInlineParser::can_open_emphasis(pos, run_combined); int can_close = MDInlineParser::can_close_emphasis(pos, run_combined); if ((no_delimiters == 0) && (can_open == FALSE)) continue; if ((can_open == FALSE) && (can_close == FALSE)) continue; md_emphasis_delimiter *P = &(delimiters[no_delimiters++]); P->pos = pos; P->width = (run_combined>0)?run_combined:(-run_combined); P->type = (run_combined>0)?1:-1; switch (run/10000000) { case 1: P->type = 0; break; case 2: P->type = 2; break; case 3: P->type = 3; break; case 4: P->type = 4; break; } P->can_open = can_open; P->can_close = can_close; if (tracing_Markdown_parser) { WRITE("DR%d at ", no_delimiters); Markdown::debug_pos(OUT, pos); WRITE(" width %d type %d", P->width, P->type); if (MDInlineParser::left_flanking(pos, run_combined)) WRITE(", left-flanking"); if (MDInlineParser::right_flanking(pos, run_combined)) WRITE(", right-flanking"); if (P->can_open) WRITE(", can-open"); if (P->can_close) WRITE(", can-close"); WRITE(", preceded by "); Markdown::debug_char(OUT, Markdown::get_unescaped(P->pos, -1)); WRITE(", followed by "); Markdown::debug_char(OUT, Markdown::get_unescaped(P->pos, P->width)); WRITE("\n"); } int x = (P->type>0)?0:1; if (P->type == 0) x = 2; if ((can_open) && (can_close == FALSE)) open_count[x] += P->width; if ((can_open == FALSE) && (can_close)) close_count[x] += P->width; if ((can_open) && (can_close)) both_count[x] += P->width; if ((both_count[0] == 0) && (open_count[0] == close_count[0]) && (both_count[1] == 0) && (open_count[1] == close_count[1]) && (both_count[2] == 0) && (open_count[2] == close_count[2])) break; } }
- This code is used in §6.4.
§6.4.3. We vet OD and CD to see if it's possible to pair them together. We
already know that OD can open and CD can close, and that OD precedes
CD ("The opening and closing delimiters must belong to separate delimiter
runs."). They must have the same type: asterisk pair with asterisks, underscores
with underscores.
That's when the CommonMark specification becomes kind of hilarious: "If one of the delimiters can both open and close emphasis, then the sum of the lengths of the delimiter runs containing the opening and closing delimiters must not be a multiple of 3 unless both lengths are multiples of 3."
Reject this as a possible closer if it cannot match the opener6.4.3 =
if (CD->type != OD->type) continue; if ((CD->can_open) || (OD->can_close)) { int sum = OD->width + CD->width; if (sum % 3 == 0) { if (OD->width % 3 != 0) continue; if (CD->width % 3 != 0) continue; } }
- This code is used in §6.4.
§6.4.4. Okay, so now OD and CD are conceivable pairs to each other, and we
investigate the consequences. We need to copy the existing situation so
that we can alter it without destroying the original.
Note the two recursive uses of MDInlineParser::emphasis to continue
the process of pairing: this is where the computational fuse is lit, with
the explosion to follow. But since each subtree contains fewer delimiter runs
than the original, it does at least terminate.
Create the subtree which would result from this option being chosen6.4.4 =
markdown_item *option = Markdown::deep_copy(owner); options[no_options++] = option; markdown_item *OI = NULL, *CI = NULL; for (markdown_item *md = option->down; md; md = md->next) { if (md->copied_from == OD->pos.md) OI = md; if (md->copied_from == CD->pos.md) CI = md; } if ((OI == NULL) || (CI == NULL)) internal_error("copy accident"); int width; /* number of delimiter characters we will trim */ md_charpos first_trimmed_char_left; md_charpos last_trimmed_char_left; md_charpos first_trimmed_char_right; md_charpos last_trimmed_char_right; Draw the dotted lines where we will cut6.4.4.1; Deactivate the active characters being acted on6.4.4.2; markdown_item *em_top, *em_bottom; Make the chain of emphasis items from top to bottom6.4.4.3; Perform the tree surgery to insert the emphasis item6.4.4.4; MDInlineParser::emphasis(variation, em_bottom, mask); MDInlineParser::emphasis(variation, option, mask); if (tracing_Markdown_parser) { WRITE("Option %d is to fragment thus:\n", no_options); Markdown::debug_subtree(STDOUT, option); WRITE("Resulting in: "); Markdown::render_extended(STDOUT, option, variation); WRITE("\nWhich scores %d penalty points\n", MDInlineParser::penalty(option)); }
- This code is used in §6.4.
§6.4.4.1. This innocent-looking code is very tricky. The issue is that the two delimiters
may be of unequal width. We want to take as many asterisks/underscores away
as we can, so we set width to the minimum of the two lengths. But a complication
is that they need to be cropped to fit inside the slice of the node they belong
to first.
We then mark to remove width characters from the inside edges of each
delimiter, not the outside edges.
Draw the dotted lines where we will cut6.4.4.1 =
int O_start = OD->pos.at, O_width = OD->width; if (O_start < OI->from) { O_width -= (OI->from - O_start); O_start = OI->from; } int C_start = CD->pos.at, C_width = CD->width; if (C_start + C_width - 1 > CI->to) { C_width = CI->to - C_start + 1; } width = O_width; if (width > C_width) width = C_width; first_trimmed_char_left = Markdown::pos(OI, O_start + O_width - width); last_trimmed_char_left = Markdown::pos(OI, O_start + O_width - 1); first_trimmed_char_right = Markdown::pos(CI, C_start); last_trimmed_char_right = Markdown::pos(CI, C_start + width - 1); if (tracing_Markdown_parser) { WRITE(" first left = "); Markdown::debug_pos(OUT, first_trimmed_char_left); WRITE("\n last left = "); Markdown::debug_pos(OUT, last_trimmed_char_left); WRITE("\nfirst right = "); Markdown::debug_pos(OUT, first_trimmed_char_right); WRITE("\n last right = "); Markdown::debug_pos(OUT, last_trimmed_char_right); WRITE("\n"); }
- This code is used in §6.4.4.
§6.4.4.2. Deactivate the active characters being acted on6.4.4.2 =
for (int w=0; w<width; w++) { Markdown::put_offset(first_trimmed_char_left, w, ':'); Markdown::put_offset(first_trimmed_char_right, w, ':'); }
- This code is used in §6.4.4.
§6.4.4.3. Suppose we are peeling away 5 asterisks from the inside edges of each delimiter,
so that width is 5. There are only two strengths of emphasis in Markdown, so
this must be read as one of the various ways to add 1s and 2s to make 5.
CommonMark rule 13 reads "The number of nestings should be minimized.", so we
must use all 2s except for the 1 left over. Rule 14 says that left-over 1 must
be outermost. So this would give us:
EMPHASIS_MIT <--- this is em_top
STRONG_MIT
STRONG_MIT <--- this is em_bottom
...the actual content being emphasised
Make the chain of emphasis items from top to bottom6.4.4.3 =
if (OD->type == 0) { em_top = Markdown::new_item(STRIKETHROUGH_MIT); em_bottom = em_top; } else if (OD->type == 2) { em_top = Markdown::new_item(TEX_MIT); em_bottom = em_top; } else if (OD->type == 3) { em_top = Markdown::new_item(DISPLAYED_TEX_MIT); em_bottom = em_top; } else if (OD->type == 4) { em_top = Markdown::new_item(INWEB_LINK_MIT); em_bottom = em_top; } else { em_top = Markdown::new_item(((width%2) == 1)?EMPHASIS_MIT:STRONG_MIT); if ((width%2) == 1) width -= 1; else width -= 2; em_bottom = em_top; while (width > 0) { markdown_item *g = Markdown::new_item(STRONG_MIT); width -= 2; em_bottom->down = g; em_bottom = g; } }
- This code is used in §6.4.4.
§6.4.4.4. Perform the tree surgery to insert the emphasis item6.4.4.4 =
markdown_item *chain = option->down; if (tracing_Markdown_parser) { Markdown::debug_chain_label(OUT, chain, I"Before surgery"); } markdown_item *before_emphasis = NULL, *emphasis = NULL, *after_emphasis = NULL; Markdown::cut_to_just_before(chain, first_trimmed_char_left, &before_emphasis, &emphasis); Markdown::cut_to_just_at(emphasis, last_trimmed_char_left, NULL, &emphasis); Markdown::cut_to_just_before(emphasis, first_trimmed_char_right, &emphasis, &after_emphasis); Markdown::cut_to_just_at(after_emphasis, last_trimmed_char_right, NULL, &after_emphasis); if (tracing_Markdown_parser) { Markdown::debug_chain_label(OUT, before_emphasis, I"Before emphasis"); Markdown::debug_chain_label(OUT, emphasis, I"Emphasis"); Markdown::debug_chain_label(OUT, after_emphasis, I"After emphasis"); } option->down = before_emphasis; if (option->down) { chain = option->down; while ((chain) && (chain->next)) chain = chain->next; chain->next = em_top; } else { option->down = em_top; } em_top->next = after_emphasis; em_bottom->down = emphasis;
- This code is used in §6.4.4.
§6.4.5. Select the option with the lowest penalty6.4.5 =
int best_is = 1, best_score = 100000000; for (int pair_i = 0; pair_i < no_options; pair_i++) { int score = MDInlineParser::penalty(options[pair_i]); if (score < best_score) { best_score = score; best_is = pair_i; } } if (tracing_Markdown_parser) { WRITE("Selected option %d with penalty %d\n", best_is, best_score); } owner->down = options[best_is]->down;
- This code is used in §6.4.
§7. That just leaves the penalty scoring system: how unfortunate is a possible reading of the Markdown syntax?
We score a whopping penalty for any unescaped asterisks and underscores left over, because above all we want to pair as many delimiters as possible together. (Some choices of pairings preclude others: it's a messy dynamic programming problem to work this out in detail.)
We then impose a modest penalty on the width of a piece of emphasis, in order to achieve CommonMark's rule 16: "When there are two potential emphasis or strong emphasis spans with the same closing delimiter, the shorter one (the one that opens later) takes precedence."
int MDInlineParser::penalty(markdown_item *md) { if (md) { int penalty = 0; if (md->type == PLAIN_MIT) { for (int i=md->from; i<=md->to; i++) { md_charpos pos = Markdown::pos(md, i); inchar32_t c = Markdown::get_unescaped(pos, 0); if ((c == '*') || (c == '_')) penalty += 100000; } } if ((md->type == EMPHASIS_MIT) || (md->type == STRONG_MIT)) penalty += Markdown::width(md->down); for (markdown_item *c = md->down; c; c = c->next) penalty += MDInlineParser::penalty(c); return penalty; } return 0; }