Tokenize source code using a language definition's state machine.
This is a faithful port of PEAR Text_Highlighter's _getToken algorithm.
The key difference from PHP is that JavaScript lacks PREG_OFFSET_CAPTURE,
so capture group positions are computed from the match result.
The input is preprocessed to normalize line endings, replace tabs with
spaces, and ensure empty lines have at least one space character
(matching PHP's behavior).
Tokenize source code using a language definition's state machine.
This is a faithful port of PEAR Text_Highlighter's
_getTokenalgorithm. The key difference from PHP is that JavaScript lacksPREG_OFFSET_CAPTURE, so capture group positions are computed from the match result.The input is preprocessed to normalize line endings, replace tabs with spaces, and ensure empty lines have at least one space character (matching PHP's behavior).