Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
693 views
in Technique[技术] by (71.8m points)

c# - Why Char.IsDigit returns true for chars which can't be parsed to int?

I often use Char.IsDigit to check if a char is a digit which is especially handy in LINQ queries to pre-check int.Parse as here: "123".All(Char.IsDigit).

But there are chars which are digits but which can't be parsed to int like ?.

// true
bool isDigit = Char.IsDigit('?'); 

var cultures = CultureInfo.GetCultures(CultureTypes.SpecificCultures);
int num;
// false
bool isIntForAnyCulture = cultures
    .Any(c => int.TryParse('?'.ToString(), NumberStyles.Any, c, out num)); 

Why is that? Is my int.Parse-precheck via Char.IsDigit thus incorrect?

There are 310 chars which are digits:

List<char> digitList = Enumerable.Range(0, UInt16.MaxValue)
   .Select(i => Convert.ToChar(i))
   .Where(c => Char.IsDigit(c))
   .ToList(); 

Here's the implementation of Char.IsDigit in .NET 4 (ILSpy):

public static bool IsDigit(char c)
{
    if (char.IsLatin1(c))
    {
        return c >= '0' && c <= '9';
    }
    return CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.DecimalDigitNumber;
}

So why are there chars that belong to the DecimalDigitNumber-category("Decimal digit character, that is, a character in the range 0 through 9...") which can't be parsed to an int in any culture?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It's because it is checking for all digits in the Unicode "Number, Decimal Digit" category, as listed here:

http://www.fileformat.info/info/unicode/category/Nd/list.htm

It doesn't mean that it is a valid numeric character in the current locale. In fact using int.Parse(), you can ONLY parse the normal English digits, regardless of the locale setting.

For example, this doesn't work:

int test = int.Parse("?", CultureInfo.GetCultureInfo("ar"));

Even though ? is a valid Arabic digit character, and "ar" is the Arabic locale identifier.

The Microsoft article "How to: Parse Unicode Digits" states that:

The only Unicode digits that the .NET Framework parses as decimals are the ASCII digits 0 through 9, specified by the code values U+0030 through U+0039. The .NET Framework parses all other Unicode digits as characters.

However, note that you can use char.GetNumericValue() to convert a unicode numeric character to its numeric equivalent as a double.

The reason the return value is a double and not an int is because of things like this:

Console.WriteLine(char.GetNumericValue('?')); // Prints 0.25

You could use something like this to convert all numeric characters in a string into their ASCII equivalent:

public string ConvertNumericChars(string input)
{
    StringBuilder output = new StringBuilder();

    foreach (char ch in input)
    {
        if (char.IsDigit(ch))
        {
            double value = char.GetNumericValue(ch);

            if ((value >= 0) && (value <= 9) && (value == (int)value))
            {
                output.Append((char)('0'+(int)value));
                continue;
            }
        }

        output.Append(ch);
    }

    return output.ToString();
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...