Your string exists of two characters: a soft hyphen (Unicode code point 173) and a hyphen (Unicode code point 45).
Wiki: According to the Unicode standard, a soft hyphen is not displayed if the line is not broken at that point.
When using "xADx2D".IndexOf("xADx2D")
in .NET 4, it seems to ignore that you're looking for the soft hyphen, returning a starting index of 1 (the index of x2D
). In .NET 3.5, this returns 0.
More fun, if you run this code (so when only looking for the soft hyphen):
string text = "xADx2D";
string shy = "xAD";
int i1 = text.IndexOf(shy);
then i1
becomes 0, regardless of the .NET version used. The result of text.IndexOf(text);
varies indeed, which at a glance looks like a bug to me.
As far as I can track back through the framework, older .NET versions use an InternalCall to IndexOfString()
(I can't figure out to which API call that goes), while from .NET 4 a QCall to InternalFindNLSStringEx()
is made, which in turn calls FindNLSStringEx()
.
The issue (I really can't figure out if this is intended behaviour) indeed occurs when calling FindNLSStringEx
:
LPCWSTR lpStringSource = L"xADx2D";
LPCWSTR lpStringValue = L"xAD";
int length;
int i = FindNLSStringEx(
LOCALE_NAME_SYSTEM_DEFAULT,
FIND_FROMSTART,
lpStringSource,
-1,
lpStringValue,
-1,
&length,
NULL,
NULL,
1);
Console::WriteLine(i);
i = FindNLSStringEx(
LOCALE_NAME_SYSTEM_DEFAULT,
FIND_FROMSTART,
lpStringSource,
-1,
lpStringSource,
-1,
&length,
NULL,
NULL,
1);
Console::WriteLine(i);
Console::ReadLine();
Prints 0 and then 1. Note that length
, an out parameter indicating the length of the found string, is 0 after the first call and 1 after the second; the soft hyphen is counted as having a length of 0.
The workaround is to use text.IndexOf(text, StringComparison.OrdinalIgnoreCase);
, as you've noted. This makes a QCall to InternalCompareStringOrdinalIgnoreCase()
which in turn calls FindStringOrdinal()
, which returns 0 for both cases.