There are two solutions I figured out but neither solves the original problem because the types become too complex or recursive. The second solution is definitely more scalable than the first one.
Solution 1: recursive parsing
This solution recursively parses the input string. type Split
splits the input string by whitespace and returns an array of the tokens (or words).
type EndOfInput = '';
// Validates given `UnprocessedInput` input string
// It recursively iterates through each character in the string,
// and appends characters into the second type parameter `Current` until the
// token has been consumed. When the token is fully consumed, it is added to
// `Result` and `Current` memory is cleared.
//
// NOTE: Do not pass anything else than the first type parameter. Other type
// parameters are for internal tracking during recursive loop
//
// See https://github.com/microsoft/TypeScript/pull/40336 for more template literal
// examples.
type Split<UnprocessedInput extends string, Current extends string = '', Result extends string[] = []> =
// Have we reached to the end of the input string ?
UnprocessedInput extends EndOfInput
// Yes. Is the `Current` empty?
? Current extends EndOfInput
// Yes, we're at the end of processing and no need to add new items to result
? Result
// No, add the last item to results, and return result
: [...Result, Current]
// No, use template literal inference to get first char, and the rest of the string
: UnprocessedInput extends `${infer Head}${infer Rest}`
// Is the next character whitespace?
? Head extends Whitespace
// No, and is the `Current` empty?
? Current extends EndOfInput
// Yes, continue "eating" whitespace
? Split<Rest, Current, Result>
// No, it means we went from a token to whitespace, meaning the token
// is fully parsed and can be added to the result
: Split<Rest, '', [...Result, Current]>
// No, add the character to Current
: Split<Rest, `${Current}${Head}`, Result>
// This shouldn't happen since UnprocessedInput is restricted with
// `extends string` type narrowing.
// For example ValidCssClassName<null> would be a `never` type if it didn't
// already fail to "Type 'null' does not satisfy the constraint 'string'"
: [never]
This works for smaller inputs, but not for larger strings because of TS recursion limit:
type Result5 = Split<`
a
b
c`>
// Fails for larger string values, because of recursion limit
type Result6 = Split<`aaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbbb`
Playground link
Solution 2: known classes as tokens
Since we actually have the valid class names as a string union, we can use that as a part of the template literal type to consume whole class names.
To understand this solution, let's build it from parts. First let's use the ValidClass
in the template literal:
type SplitDebug1<T extends string> =
T extends `${ValidClass}${Whitespace}${infer Tail}`
? [ValidClass, Whitespace, Tail]
: never
// The grammar is not ambiguous anymore!
// [ValidClass, Whitespace, "b-class c-class"]
type Result1 = SplitDebug1<"a-class b-class c-class">
This solves the ambiguity issue, but now we can't access the parsed Head anymore, since ValidClass
is just referring to the type type ValidClass = "a-class" | "b-class" | "c-class"
. Unfortunately TypeScript doesn't allow to infer and restrict the token at the same time, so this is not possible:
type SplitDebug2<T extends string> =
T extends `${infer Head extends ValidClass ? infer Head : never}${Whitespace}${infer Tail}`
? [Head, Whitespace, Tail]
: never
// Still just [ValidClass, Whitespace, "b-class c-class"]
type Result2 = SplitDebug1<"a-class b-class c-class">
But here come's the hack. We can use the known Tail
as a way to reverse the matching to get access to the Head
:
type SplitDebug3<T extends string> =
T extends `${ValidClass}${Whitespace}${infer Tail}`
? T extends `${infer Head}${Whitespace}${Tail}`
? [Head, Whitespace, Tail]
: never
: never
// Now we now the first valid token aka class name!
// ["a-class", Whitespace, "b-class c-class"]
type Result3 = SplitDebug3<"a-class b-class c-class">
This trick can be used to parse the valid class names, the full solution:
// Demonstrating with large amount of class names
// Breaks to "too complex union type" with 20k class names
type Digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';
type ValidClass1000 = `class-${Digit}${Digit}${Digit}`;
type SplitToValidClasses<T extends string> = SplitToValidClassesInner<Trim<T>>;
type SplitToValidClassesInner<T extends string> =
T extends `${ValidClass1000}${Whitespace}${infer Tail}`
? T extends `${infer Head}${Whitespace}${Tail}`
? Trim<Head> extends ValidClass1000
? [Trim<Head>, ...SplitToValidClassesInner<Trim<Tail>>]
: [Err<`'${Head}' is not a valid class`>]
: never
: T extends `${infer Tail}`
? Tail extends ValidClass1000
? [Tail]
: [Err<`'${Tail}' is not a valid class`>]
: [never];
// ["class-001", "class-002", "class-003", "class-004", "class-000"]
type Result4 = SplitToValidClasses<`
class-001 class-002
class-003
class-004 class-000
`>
Playground link
This is the best solution I could come up with, and works for fairly large union type too. The error message could be polished but it still hints to the correct location.
While supporting large amount of choices in the union type, this didn't work for our real world use case where we have ~40k Tailwind class names in a single type union. That type represents all the possible class names one might add during the development time (unused are purged in prod).