Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
973 views
in Technique[技术] by (71.8m points)

javascript - How do I make toLowerCase() and toUpperCase() consistent across browsers

Are there JavaScript polyfill implementations of String.toLowerCase() and String.toUpperCase(), or other methods in JavaScript that can work with Unicode characters and are consistent across browsers?

Background info

Performing the following will give difference results in browsers, or even between browser versions (E.g FireFox 54 vs 55):

document.write(String.fromCodePoint(223).normalize("NFKC").toLowerCase().toUpperCase().toLowerCase())

In Firefox 55 it gives you ss, in Firefox 54 it gives you ?.

Generally this is fine, and mechanisms such as Locales handle a lot of the cases you'd want; however, when you need consistent behavior across platforms such as talking to BaaS systems like it can greatly simplify interactions where you're essentially processing internal data on the client.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Note that this issue only seems to affect outdated versions of Firefox, so unless you explicitly need to support those old versions, you could choose to just not bother at all. The behavior for your example is the same in all modern browsers (since the change in Firefox). This can be verified using jsvu + eshost:

$ jsvu # Update installed JavaScript engine binaries to the latest version.

$ eshost -e '"xDF".normalize("NFKC").toLowerCase().toUpperCase().toLowerCase()'
#### Chakra
ss

#### V8 --harmony
ss

#### JavaScriptCore
ss

#### V8
ss

#### SpiderMonkey
ss

#### xs
ss

But you asked how to solve this problem, so let’s continue.

Step 4 of https://tc39.github.io/ecma262/#sec-string.prototype.tolowercase states:

Let cuList be a List where the elements are the result of toLowercase(cpList), according to the Unicode Default Case Conversion algorithm.

This Unicode Default Case Conversion algorithm is specified in section 3.13 Default Case Algorithms of the Unicode standard.

The full case mappings for Unicode characters are obtained by using the mappings from SpecialCasing.txt plus the mappings from UnicodeData.txt, excluding any of the latter mappings that would conflict. Any character that does not have a mapping in these files is considered to map to itself.

[…]

The following rules specify the default case conversion operations for Unicode strings. These rules use the full case conversion operations, Uppercase_Mapping(C), Lowercase_Mapping(C), and Titlecase_Mapping(C), as well as the context-dependent mappings based on the casing context, as specified in Table 3-17.

For a string X:

  • R1 toUppercase(X): Map each character C in X to Uppercase_Mapping(C).
  • R2 toLowercase(X): Map each character C in X to Lowercase_Mapping(C).

Here’s an example from SpecialCasing.txt, with my annotation added below:

00DF  ; 00DF   ; 0053 0073; 0053 0053;                      # LATIN SMALL LETTER SHARP S
<code>; <lower>; <title>  ; <upper>  ; (<condition_list>;)? # <comment>

This line says that U+00DF ('?') lowercases to U+00DF (?) and uppercases to U+0053 U+0053 (SS).

Here’s an example from UnicodeData.txt, with my annotation added below:

0041  ; LATIN CAPITAL LETTER A; Lu;0;L;;;;;N;;;; 0061   ;
<code>; <name>                ; <ignore>       ; <lower>; <upper>

This line says that U+0041 ('A') lowercases to U+0061 ('a'). It doesn’t have an explicit uppercase mapping, meaning it uppercases to itself.

Here’s another example from UnicodeData.txt:

0061  ; LATIN SMALL LETTER A; Ll;0;L;;;;;N;; ;0041;        ; 0041
<code>; <name>              ; <ignore>            ; <lower>; <upper>

This line says that U+0061 ('a') uppercases to U+0041 ('A'). It doesn’t have an explicit lowercase mapping, meaning it lowercases to itself.

You could write a script that parses these two files, reads each line following these examples, and builds lowercase/uppercase mappings. You could then turn those mappings into a small JavaScript library that provides spec-compliant toLowerCase/toUpperCase functionality.

This seems like a lot of work. Depending on the old behavior in Firefox and what exactly changed (?) you could probably limit the work to just the special mappings in SpecialCasing.txt. (I’m making this assumption that only the special casings changed in Firefox 55, based on the example you provided.)

// Instead of…
function normalize(string) {
  const normalized = string.normalize('NFKC');
  const lowercased = normalized.toLowerCase();
  return lowercased;
}

// …one could do something like:
function lowerCaseSpecialCases(string) {
  // TODO: replace all SpecialCasing.txt characters with their lowercase
  // mapping.
  return string.replace(/TODO/g, fn);
}
function normalize(string) {
  const normalized = string.normalize('NFKC');
  const fixed = lowerCaseSpecialCases(normalized); // Workaround for old Firefox 54 behavior.
  const lowercased = fixed.toLowerCase();
  return lowercased;
}

I wrote a script that parses SpecialCasing.txt and generates a JS library that implements the lowerCaseSpecialCases functionality mentioned above (as toLower) as well as toUpper. Here it is: https://gist.github.com/mathiasbynens/a37e3f3138069729aa434ea90eea4a3c Depending on your exact use case, you might not need the toUpper and its corresponding regex and map at all. Here’s the full generated library:

const reToLower = /[u0130u1F88-u1F8Fu1F98-u1F9Fu1FA8-u1FAFu1FBCu1FCCu1FFC]/g;
const toLowerMap = new Map([
  ['u0130', 'iu0307'],
  ['u1F88', 'u1F80'],
  ['u1F89', 'u1F81'],
  ['u1F8A', 'u1F82'],
  ['u1F8B', 'u1F83'],
  ['u1F8C', 'u1F84'],
  ['u1F8D', 'u1F85'],
  ['u1F8E', 'u1F86'],
  ['u1F8F', 'u1F87'],
  ['u1F98', 'u1F90'],
  ['u1F99', 'u1F91'],
  ['u1F9A', 'u1F92'],
  ['u1F9B', 'u1F93'],
  ['u1F9C', 'u1F94'],
  ['u1F9D', 'u1F95'],
  ['u1F9E', 'u1F96'],
  ['u1F9F', 'u1F97'],
  ['u1FA8', 'u1FA0'],
  ['u1FA9', 'u1FA1'],
  ['u1FAA', 'u1FA2'],
  ['u1FAB', 'u1FA3'],
  ['u1FAC', 'u1FA4'],
  ['u1FAD', 'u1FA5'],
  ['u1FAE', 'u1FA6'],
  ['u1FAF', 'u1FA7'],
  ['u1FBC', 'u1FB3'],
  ['u1FCC', 'u1FC3'],
  ['u1FFC', 'u1FF3']
]);
const toLower = (string) => string.replace(reToLower, (match) => toLowerMap.get(match));

const reToUpper = /[xDFu0149u01F0u0390u03B0u0587u1E96-u1E9Au1F50u1F52u1F54u1F56u1F80-u1FAFu1FB2-u1FB4u1FB6u1FB7u1FBCu1FC2-u1FC4u1FC6u1FC7u1FCCu1FD2u1FD3u1FD6u1FD7u1FE2-u1FE4u1FE6u1FE7u1FF2-u1FF4u1FF6u1FF7u1FFCuFB00-uFB06uFB13-uFB17]/g;
const toUpperMap = new Map([
  ['xDF', 'SS'],
  ['uFB00', 'FF'],
  ['uFB01', 'FI'],
  ['uFB02', 'FL'],
  ['uFB03', 'FFI'],
  ['uFB04', 'FFL'],
  ['uFB05', 'ST'],
  ['uFB06', 'ST'],
  ['u0587', 'u0535u0552'],
  ['uFB13', 'u0544u0546'],
  ['uFB14', 'u0544u0535'],
  ['uFB15', 'u0544u053B'],
  ['uFB16', 'u054Eu0546'],
  ['uFB17', 'u0544u053D'],
  ['u0149', 'u02BCN'],
  ['u0390', 'u0399u0308u0301'],
  ['u03B0', 'u03A5u0308u0301'],
  ['u01F0', 'Ju030C'],
  ['u1E96', 'Hu0331'],
  ['u1E97', 'Tu0308'],
  ['u1E98', 'Wu030A'],
  ['u1E99', 'Yu030A'],
  ['u1E9A', 'Au02BE'],
  ['u1F50', 'u03A5u0313'],
  ['u1F52', 'u03A5u0313u0300'],
  ['u1F54', 'u03A5u0313u0301'],
  ['u1F56', 'u03A5u0313u0342'],
  ['u1FB6', 'u0391u0342'],
  ['u1FC6', 'u0397u0342'],
  ['u1FD2', 'u0399u0308u0300'],
  ['u1FD3', 'u0399u0308u0301'],
  ['u1FD6', 'u0399u0342'],
  ['u1FD7', 'u0399u0308u0342'],
  ['u1FE2', 'u03A5u0308u0300'],
  ['u1FE3', 'u03A5u0308u0301'],
  ['u1FE4', 'u03A1u0313'],
  ['u1FE6', 'u03A5u0342'],
  ['u1FE7', 'u03A5u0308u0342'],
  ['u1FF6', 'u03A9u0342'],
  ['u1F80', 'u1F08u0399'],
  ['u1F81', 'u1F09u0399'],
  ['u1F82', 'u1F0Au0399'],
  ['u1F83', 'u1F0Bu0399'],
  ['u1F84', 'u1F0Cu0399'],
  ['u1F85', 'u1F0Du0399'],
  ['u1F86', 'u1F0Eu0399'],
  ['u1F87', 'u1F0Fu0399'],
  ['u1F88', 'u1F08u0399'],
  ['u1F89', 'u1F09u0399'],
  ['u1F8A', 'u1F0Au0399'],
  ['u1F8B', 'u1F0Bu0399'],
  ['u1F8C', 'u1F0Cu0399'],
  ['u1F8D', 'u1F0Du0399'],
  ['u1F8E', 'u1F0Eu0399'],
  ['u1F8F', 'u1F0Fu0399'],
  ['u1F90', 'u1F28u0399'],
  ['u1F91', 'u1F29u0399'],
  ['u1F92', 'u1F2Au0399'],
  ['u1F93', 'u1F2Bu0399'],
  ['u1F94', 'u1F2Cu0399'],
  ['u1F95', 'u1F2Du0399'],
  ['u1F96', 'u1F2Eu0399'],
  ['u1F97', 'u1F2Fu0399'],
  ['u1F98', 'u1F28u0399'],
  ['u1F99', 'u1F29u0399'],
  ['u1F9A', 'u1F2Au0399'],
  ['u1F9B', 'u1F2Bu0399'],
  ['u1F9C', 'u1F2Cu0399'],
  ['u1F9D', 'u1F2Du0399'],
  ['u1F9E', 'u1F2Eu0399'],
  ['u1F9F', 'u1F2Fu0399'],
  ['u1FA0', 'u1F68u0399'],
  ['u1FA1', 'u1F69u0399'],
  ['u1FA2', 'u1F6Au0399'],
  ['u1FA3', 'u1F6Bu0399'],
  ['u1FA4', 'u1F6Cu0399'],
  ['u1FA5', 'u1F6Du0399'],
  ['u1FA6', 'u1F6Eu0399'],
  ['u1FA7', 'u1F6Fu0399'],
  ['u1FA8', 'u1F68u0399'],
  ['u1FA9', 'u1F69u0399'],
  ['u1FAA', 'u1F6Au0399'],
  ['u1FAB', 'u1F6Bu0399'],
  ['u1FAC', 'u1F6Cu0399'],
  ['u1FAD', 'u1F6Du0399'],
  ['u1FAE', 'u1F6Eu0399'],
  ['u1FAF', 'u1F6Fu0399'],
  ['u1FB3', 'u0391u0399'],
  ['u1FBC', 'u0391u0399'],
  ['u1FC3', 'u0397u0399'],
  ['u1FCC', 'u0397u0399'],
  ['u1FF3', 'u03A9u0399'],
  ['u1FFC', 'u03A9u0399'],
  ['u1FB2', 'u1FBAu0399'],
  ['u1FB4', 'u0386u0399'],
  ['u1FC2', 'u1FCAu0399'],
  ['u1FC4', 'u0389u0399'],
  ['u1FF2', 'u1FFAu0399'],
  ['u1FF4', 'u038Fu0399'],
  ['u1FB7', 'u0391u0342u0399'],
  ['u1FC7', 'u0397u0342u0399'],
  ['u1FF7', 'u03A9u0342u0399']
]);
const toUpper = (string) => string.replace(reToUpper, (match) => toUpperMap.get(match));

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...