Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
429 views
in Technique[技术] by (71.8m points)

c# - Unexpected behavior when sorting strings with letters and dashes

If I have some list of strings contain all numbers and dashes they will sort ascending like so:

s = s.OrderBy(t => t).ToList();

66-0616280-000
66-0616280-100
66-06162801000
66-06162801040

This is as expected.

However, if the strings contain letters, the sort is somewhat unexpected. For example, here is the same list of string with trailing A's replacing the 0s, and yes, it is sorted:

66-0616280-00A
66-0616280100A
66-0616280104A
66-0616280-10A

I would have expected them to sort like so:

66-0616280-00A
66-0616280-10A
66-0616280100A
66-0616280104A

Why does the sort behave differently on the string when it contains letters vs. when it contains only numbers?

Thanks in advance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It's because the default StringComparer is culture-sensitive. As far as I can tell, Comparer<string>.Default delegates to string.CompareTo(string) which uses the current culture:

This method performs a word (case-sensitive and culture-sensitive) comparison using the current culture. For more information about word, string, and ordinal sorts, see System.Globalization.CompareOptions.

Then the page for CompareOptions includes:

The .NET Framework uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them. For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases. Therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string.

("Small weight" isn't quite the same as "ignored" as quoted in Andrei's answer, but the effects are similar here.)

If you specify StringComparer.Ordinal, you get results of:

66-0616280-00A
66-0616280-10A
66-0616280100A
66-0616280104A

Specify it as the second argument to OrderBy:

s = s.OrderBy(t => t, StringComparer.Ordinal).ToList();

You can see the difference here:

Console.WriteLine(Comparer<string>.Default.Compare
    ("66-0616280104A", "66-0616280-10A"));
Console.WriteLine(StringComparer.Ordinal.Compare
    ("66-0616280104A", "66-0616280-10A"));

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...