Module lex

Module lex 

Source
Expand description

HLSL tokenizer.

Streaming lexer over a &str of HLSL source. Produces a Vec<Token> with position tracking (byte offset + line + column). The tokenizer is intentionally permissive: HLSL semantic quirks (e.g. typed swizzle on RHS, float2x2 matrix constructors, leading-zero integers, f/h/u numeric suffixes) survive as plain tokens, and the parser disambiguates them in context. Comments and preprocessor lines are skipped — MD2 user shaders don’t depend on them. Continuation back-ticks (`) that appear in .milk preset files are stripped one layer above (the .milk parser joins comp shader lines), so the lexer never sees them.

Output is consumed by crate::parse and downstream emitters. The existing regex-based pipeline in crate::translate_shader does not call into the lexer; it stays in place and continues to drive the current pass-rate on test-presets-200/.

Structs§

LexError
Lexer error. Position is preserved so the parser can wrap it with extra context (function name, surrounding text, etc.).
Span
Source span: byte offsets [start, end) plus 1-indexed line/column of start. Keeping the start position is enough for error reporting; the parser slices &source[start..end] to recover the original lexeme.
Token
One token: kind plus source span. The span is enough to recover the original lexeme via &source[span.start as usize..span.end as usize].

Enums§

Keyword
Reserved-word identity. Identifiers that match one of these are classified at lex time so the parser does a single match instead of a string comparison.
TokenKind
Token kinds produced by tokenize. Identifiers and keywords are distinguished here so the parser doesn’t have to re-lookup the table for every ident. Literal values are pre-parsed into i64 / f64 so the parser doesn’t have to re-parse the lexeme either.

Functions§

classify_ident 🔒
Recognise a reserved word. Returns None if the lexeme is a plain identifier. Type names (float, float2, mat3x3, …) are intentionally not classified as keywords — the parser treats them as identifiers in declaration contexts so vector/matrix variants can be added without touching the lexer.
lex_number 🔒
Lex one numeric literal starting at i. Returns the token kind plus the end byte offset. Accepts:
tokenize
Tokenize a complete HLSL source string. Returns the full token stream terminated by TokenKind::Eof, or the first error encountered.