Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
139 views
in Technique[技术] by (71.8m points)

c# - someString.IndexOf(someString) returns 1 instead of 0 under .NET 4

We have recently upgraded all our projects from .NET 3.5 to .NET 4. I have come across a rather strange issue with respect to string.IndexOf().

My code obviously does something slightly different, but in the process of investigating the issue, I found that calling IndexOf() on a string with itself returned 1 instead of 0. In other words:

string text = "xADx2D";          // problem happens with "--dely N.China", too;
int index = text.IndexOf(text);    // see update note below.

Gave me an index of 1, instead of 0. A couple of things to note about this problem:

  • The problems seems related to these hyphens (the first character is the Unicode soft hyphen, the second is a regular hyphen).

  • I have double checked, this does not happen in .NET 3.5 but does in .NET 4.

  • Changing the IndexOf() to do an ordinal compare fixes the issue, so for some reason that first character is ignored with the default IndexOf.

Does anyone know why this happens?

EDIT

Sorry guys, made a bit of a stuff up on the original post and got the hidden dash in there twice. I have updated the string, this should return index of 1 instead of 2, as long as you paste it in the correct editor.

Update:

Changed the original problem string to one where every actual character is clearly visible (using escaping). This simplifies the question a bit.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your string exists of two characters: a soft hyphen (Unicode code point 173) and a hyphen (Unicode code point 45).

Wiki: According to the Unicode standard, a soft hyphen is not displayed if the line is not broken at that point.

When using "xADx2D".IndexOf("xADx2D") in .NET 4, it seems to ignore that you're looking for the soft hyphen, returning a starting index of 1 (the index of x2D). In .NET 3.5, this returns 0.

More fun, if you run this code (so when only looking for the soft hyphen):

string text = "xADx2D";
string shy = "xAD";
int i1 = text.IndexOf(shy);

then i1 becomes 0, regardless of the .NET version used. The result of text.IndexOf(text); varies indeed, which at a glance looks like a bug to me.

As far as I can track back through the framework, older .NET versions use an InternalCall to IndexOfString() (I can't figure out to which API call that goes), while from .NET 4 a QCall to InternalFindNLSStringEx() is made, which in turn calls FindNLSStringEx().

The issue (I really can't figure out if this is intended behaviour) indeed occurs when calling FindNLSStringEx:

LPCWSTR lpStringSource = L"xADx2D";
LPCWSTR lpStringValue = L"xAD";

int length;

int i = FindNLSStringEx(
    LOCALE_NAME_SYSTEM_DEFAULT,
    FIND_FROMSTART,
    lpStringSource,
    -1,
    lpStringValue,
    -1,
    &length,
    NULL,
    NULL,
    1);

Console::WriteLine(i);

i = FindNLSStringEx(
    LOCALE_NAME_SYSTEM_DEFAULT,
    FIND_FROMSTART,
    lpStringSource,
    -1,
    lpStringSource,
    -1,
    &length,
    NULL,
    NULL,
    1);

Console::WriteLine(i);

Console::ReadLine();

Prints 0 and then 1. Note that length, an out parameter indicating the length of the found string, is 0 after the first call and 1 after the second; the soft hyphen is counted as having a length of 0.

The workaround is to use text.IndexOf(text, StringComparison.OrdinalIgnoreCase);, as you've noted. This makes a QCall to InternalCompareStringOrdinalIgnoreCase() which in turn calls FindStringOrdinal(), which returns 0 for both cases.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...