Expand description
HLSL tokenizer.
Streaming lexer over a &str of HLSL source. Produces a Vec<Token> with
position tracking (byte offset + line + column). The tokenizer is
intentionally permissive: HLSL semantic quirks (e.g. typed swizzle on RHS,
float2x2 matrix constructors, leading-zero integers, f/h/u numeric
suffixes) survive as plain tokens, and the parser disambiguates them in
context. Comments and preprocessor lines are skipped — MD2 user shaders
don’t depend on them. Continuation back-ticks (`) that appear in
.milk preset files are stripped one layer above (the .milk parser
joins comp shader lines), so the lexer never sees them.
Output is consumed by crate::parse and downstream emitters. The
existing regex-based pipeline in crate::translate_shader does not
call into the lexer; it stays in place and continues to drive the current
pass-rate on test-presets-200/.
Structs§
- LexError
- Lexer error. Position is preserved so the parser can wrap it with extra context (function name, surrounding text, etc.).
- Span
- Source span: byte offsets
[start, end)plus 1-indexed line/column ofstart. Keeping the start position is enough for error reporting; the parser slices&source[start..end]to recover the original lexeme. - Token
- One token: kind plus source span. The span is enough to recover the
original lexeme via
&source[span.start as usize..span.end as usize].
Enums§
- Keyword
- Reserved-word identity. Identifiers that match one of these are
classified at lex time so the parser does a single
matchinstead of a string comparison. - Token
Kind - Token kinds produced by
tokenize. Identifiers and keywords are distinguished here so the parser doesn’t have to re-lookup the table for every ident. Literal values are pre-parsed intoi64/f64so the parser doesn’t have to re-parse the lexeme either.
Functions§
- classify_
ident 🔒 - Recognise a reserved word. Returns
Noneif the lexeme is a plain identifier. Type names (float,float2,mat3x3, …) are intentionally not classified as keywords — the parser treats them as identifiers in declaration contexts so vector/matrix variants can be added without touching the lexer. - lex_
number 🔒 - Lex one numeric literal starting at
i. Returns the token kind plus the end byte offset. Accepts: - tokenize
- Tokenize a complete HLSL source string. Returns the full token stream
terminated by
TokenKind::Eof, or the first error encountered.