Preprocessing pipeline that transforms raw wikitext before tokenization.
Wikidot applies two categories of text substitutions before the main parser
sees the input. This module orchestrates those substitutions in the correct
order: whitespace normalization first (to establish consistent line structure),
then typographic transformations (to convert ASCII quote/ellipsis patterns
into Unicode equivalents).
The preprocessing step is essential because the lexer and parser assume
normalized input (Unix newlines, no tabs, consistent whitespace).
Preprocessing pipeline that transforms raw wikitext before tokenization.
Wikidot applies two categories of text substitutions before the main parser sees the input. This module orchestrates those substitutions in the correct order: whitespace normalization first (to establish consistent line structure), then typographic transformations (to convert ASCII quote/ellipsis patterns into Unicode equivalents).
The preprocessing step is essential because the lexer and parser assume normalized input (Unix newlines, no tabs, consistent whitespace).