Whitespace normalization preprocessing for Wikidot markup.
This module ensures the lexer and parser receive input with consistent
whitespace conventions. It handles platform differences (DOS/Mac newlines),
normalizes exotic whitespace characters that users may paste from external
sources, and applies Wikidot-specific behaviors like backslash line continuation.
Substitutions are applied in a deliberate order:
Newline normalization (DOS \r\n and legacy Mac \r to Unix \n)
Non-standard leading whitespace replacement (nbsp, figure space to regular space)
Whitespace-only line stripping (collapse to empty lines)
Backslash line continuation (\\\n to line-break marker U+E000)
Whitespace normalization preprocessing for Wikidot markup.
This module ensures the lexer and parser receive input with consistent whitespace conventions. It handles platform differences (DOS/Mac newlines), normalizes exotic whitespace characters that users may paste from external sources, and applies Wikidot-specific behaviors like backslash line continuation.
Substitutions are applied in a deliberate order:
\r\nand legacy Mac\rto Unix\n)\\\nto line-break marker U+E000)