Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
249 views
in Technique[技术] by (71.8m points)

ambiguous - How to avoid ambiquity in TypeScript template literal type inference?

I'm trying to write a type that validates that a given input string has valid class names separated by 1 or more whitespace characters. The input may also have leading or trailing whitespace.

The typing I have now is quite close, but the template literal can be inferred by TS compiler in multiple ways, meaning the grammar is ambiguous. This leads to unwanted results.

First we define primitive types:

// To avoid recursion as much as possible
type Spaces = (
  | "     "
  | "    "
  | "   "
  | "  "
  | " "
);
type Whitespace = Spaces | "
" | "";
type ValidClass = 'a-class' | 'b-class' | 'c-class';

Then the utility types

// Utility type to provide nicer error messages
type Err<Message extends string> = `Error: ${Message}`;

type TrimEnd<T extends string> = (
  T extends `${infer Rest}${Whitespace}`
  ? TrimEnd<Rest>
  : T
);
type TrimStart<T extends string> = (
  T extends `${Whitespace}${infer Rest}`
  ? TrimStart<Rest>
  : T
);
type Trim<T extends string> = TrimEnd<TrimStart<T>>;

and finally the actual type that checks the input string:

// Forces the string to be trimmed before starting recursive loop.
type SplitToValidClasses<T extends string> = SplitToValidClassesInner<Trim<T>>;

// Splits the input string into array of `Array<Token | 'Error: ...'>`
// strings. The input is converted to an array format mostly because I found it
// easier to work with arrays in other TS generics, instead of e.g space separated
// values.
type SplitToValidClassesInner<T extends string> =
  // Does `T` contain more than one string? For example 'aaaa

  bbbb'
  T extends `${infer Head}${Whitespace}${infer Tail}`
    // Yes, `T` could be infered into three parts.
    // Is `Head` a valid class name?
    ? Trim<Head> extends ValidClass
        // Yes, it's a valid name. Continue recursively with rest of the string
        // but trim white space from both sides.
        ? [Trim<Head>, ...SplitToValidClassesInner<Trim<Tail>>]
        : [Err<`'${Head}' is not a valid class`>]
    : T extends `${infer Tail}`
      ? Tail extends ValidClass
        ? [Tail]
        : [Err<`'${Tail}' is not a valid class`>]
      : [never];

// This works
type CorrectResult = SplitToValidClasses<'a-class b-class c-class'>

But when testing with different inputs, we can notice incorrect results:

// Should be ["a-class", "b-class", "c-class"]
type Input1 = `a-class b-class  c-class`;
type Result = SplitToValidClasses<Input1>;

// Should be ["a-class", "b-class", "c-class", "a-class"]
type Result2 = SplitToValidClasses<`

  a-class    b-class
c-class

    a-class
`>;

// Should be ["a-class", "Error: 'wrong-class' is not a valid class"]
type Result3 = SplitToValidClasses<`
  a-class
  wrong-class
  c-class
`>;

The issue happens in the template inference:

type SplitToValidClassesInnerFirstLevelDebug<T extends string> =
  T extends `${infer Head}${Whitespace}${infer Tail}`
    ? [Head, Whitespace, Tail]
    : never

// The grammar is ambiguous, leading to 
// "["a-class b-class" | "a-class", Whitespace, "c-class" | "b-class  c-class"]
// Removing the ambiguousity should fix the issue
type Result4 = SplitToValidClassesInnerFirstLevelDebug<Input1>

Playground link

I couldn't find much documentation of the details how template literals are inferred, except what Anders Hejlsberg explained in his PR:

For inference to succeed the starting and ending literal character spans (if any) of the target must exactly match the starting and ending spans of the source. Inference proceeds by matching each placeholder to a substring in the source from left to right: A placeholder followed by a literal character span is matched by inferring zero or more characters from the source until the first occurrence of that literal character span in the source. A placeholder immediately followed by another placeholder is matched by inferring a single character from the source.

How could this typing be achieved, without ambiguous results? One approach I thought of was to recursively parse the input character by character, but it very quickly hits the recursion limit in TS.

question from:https://stackoverflow.com/questions/65844206/how-to-avoid-ambiquity-in-typescript-template-literal-type-inference

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There are two solutions I figured out but neither solves the original problem because the types become too complex or recursive. The second solution is definitely more scalable than the first one.

Solution 1: recursive parsing

This solution recursively parses the input string. type Split splits the input string by whitespace and returns an array of the tokens (or words).

type EndOfInput = '';

// Validates given `UnprocessedInput` input string
// It recursively iterates through each character in the string,
// and appends characters into the second type parameter `Current` until the
// token has been consumed. When the token is fully consumed, it is added to 
// `Result` and `Current` memory is cleared.
//
// NOTE: Do not pass anything else than the first type parameter. Other type
//       parameters are for internal tracking during recursive loop
//
// See https://github.com/microsoft/TypeScript/pull/40336 for more template literal
// examples.
type Split<UnprocessedInput extends string, Current extends string = '', Result extends string[] = []> =
  // Have we reached to the end of the input string ?
  UnprocessedInput extends EndOfInput
    // Yes. Is the `Current` empty?
    ? Current extends EndOfInput
      // Yes, we're at the end of processing and no need to add new items to result
      ? Result
      // No, add the last item to results, and return result
      : [...Result, Current]
    // No, use template literal inference to get first char, and the rest of the string
    : UnprocessedInput extends `${infer Head}${infer Rest}`
      // Is the next character whitespace?
      ? Head extends Whitespace
        // No, and is the `Current` empty?
        ? Current extends EndOfInput
          // Yes, continue "eating" whitespace
          ? Split<Rest, Current, Result>
          // No, it means we went from a token to whitespace, meaning the token
          // is fully parsed and can be added to the result
          : Split<Rest, '', [...Result, Current]>
        // No, add the character to Current 
        : Split<Rest, `${Current}${Head}`, Result>
      // This shouldn't happen since UnprocessedInput is restricted with
      // `extends string` type narrowing.
      // For example ValidCssClassName<null> would be a `never` type if it didn't
      // already fail to "Type 'null' does not satisfy the constraint 'string'"
      : [never] 

This works for smaller inputs, but not for larger strings because of TS recursion limit:

type Result5 = Split<`
a   


b 

c`>

// Fails for larger string values, because of recursion limit
type Result6 = Split<`aaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbbb`

Playground link

Solution 2: known classes as tokens

Since we actually have the valid class names as a string union, we can use that as a part of the template literal type to consume whole class names.

To understand this solution, let's build it from parts. First let's use the ValidClass in the template literal:

type SplitDebug1<T extends string> =
  T extends `${ValidClass}${Whitespace}${infer Tail}`
  ? [ValidClass, Whitespace, Tail]
  : never

// The grammar is not ambiguous anymore!
// [ValidClass, Whitespace, "b-class c-class"]
type Result1 = SplitDebug1<"a-class b-class c-class">

This solves the ambiguity issue, but now we can't access the parsed Head anymore, since ValidClass is just referring to the type type ValidClass = "a-class" | "b-class" | "c-class". Unfortunately TypeScript doesn't allow to infer and restrict the token at the same time, so this is not possible:

type SplitDebug2<T extends string> =
  T extends `${infer Head extends ValidClass ? infer Head : never}${Whitespace}${infer Tail}`
  ? [Head, Whitespace, Tail]
  : never

// Still just [ValidClass, Whitespace, "b-class c-class"]
type Result2 = SplitDebug1<"a-class b-class c-class">

But here come's the hack. We can use the known Tail as a way to reverse the matching to get access to the Head:

type SplitDebug3<T extends string> =
  T extends `${ValidClass}${Whitespace}${infer Tail}`
    ? T extends `${infer Head}${Whitespace}${Tail}` 
      ? [Head, Whitespace, Tail]
      : never
    : never

// Now we now the first valid token aka class name!
// ["a-class", Whitespace, "b-class c-class"]
type Result3 = SplitDebug3<"a-class b-class c-class">

This trick can be used to parse the valid class names, the full solution:


// Demonstrating with large amount of class names
// Breaks to "too complex union type" with 20k class names
type Digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';
type ValidClass1000 = `class-${Digit}${Digit}${Digit}`;

type SplitToValidClasses<T extends string> = SplitToValidClassesInner<Trim<T>>;
type SplitToValidClassesInner<T extends string> =
  T extends `${ValidClass1000}${Whitespace}${infer Tail}`
    ? T extends `${infer Head}${Whitespace}${Tail}` 
      ? Trim<Head> extends ValidClass1000
          ? [Trim<Head>, ...SplitToValidClassesInner<Trim<Tail>>]
          : [Err<`'${Head}' is not a valid class`>]
      : never
    : T extends `${infer Tail}`
      ? Tail extends ValidClass1000
        ? [Tail]
        : [Err<`'${Tail}' is not a valid class`>]
      : [never];

// ["class-001", "class-002", "class-003", "class-004", "class-000"]
type Result4 = SplitToValidClasses<`

    class-001 class-002 
  class-003
      class-004 class-000

  `>

Playground link

This is the best solution I could come up with, and works for fairly large union type too. The error message could be polished but it still hints to the correct location.

While supporting large amount of choices in the union type, this didn't work for our real world use case where we have ~40k Tailwind class names in a single type union. That type represents all the possible class names one might add during the development time (unused are purged in prod).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...