wdpr
    Preparing search index...
    • Tokenize source code using a language definition's state machine.

      This is a faithful port of PEAR Text_Highlighter's _getToken algorithm. The key difference from PHP is that JavaScript lacks PREG_OFFSET_CAPTURE, so capture group positions are computed from the match result.

      The input is preprocessed to normalize line endings, replace tabs with spaces, and ensure empty lines have at least one space character (matching PHP's behavior).

      Parameters

      • def: LanguageDefinition

        The language definition describing the state machine.

      • input: string

        Raw source code string to tokenize.

      Returns Token[]

      Array of tokens, each with a CSS class and content string.