It looks like the glob() function depends on how your copy of PHP was built and whether it was compiled with a unicode-aware WIN32 API (I don't believe the standard builid is.
Philippe Verdy 2010-09-26 8:53 am
The output from your PHP installation on Windows is easy to explain :
you installed the wrong version of PHP, and used a version not
compiled to use the Unicode version of the Win32 API. For this reason,
the filesystem calls used by PHP will use the legacy "ANSI" API and so
the C/C++ libraries linked with this version of PHP will first try to
convert yout UTF-8-encoded PHP string into the local "ANSI" codepage
selected in the running environment (see the CHCP command before
starting PHP from a command line window)
Your version of Windows is MOST PROBABLY NOT responsible of this weird
thing. Actually, this is YOUR version of PHP which is not compiled
correctly, and that uses the legacy ANSI version of the Win32 API (for
compatibility with the legacy 16-bit versions of Windows 95/98 whose
filesystem support in the kernel actually had no direct support for
Unicode, but used an internal conversion layer to convert Unicode to
the local ANSI codepage before using the actual ANSI version of the
API).
Recompile PHP using the compiler option to use the UNICODE version of
the Win32 API (which should be the default today, and anyway always
the default for PHP installed on a server that will NEVER be Windows
95 or Windows 98...)
Then Windows will be able to store UTF-16 encoded filenames (including
on FAT32 volumes, even if, on these volumes, it will also generate an
aliased short name in 8.3 format using the filesystem's default
codepage, something that can be avoided in NTFS volumes).
All what you describe are problems of PHP (incorrect porting to
Windows, or incorrect system version identification at runtime) :
reread the README files coming with PHP sources explaining the
compilation flags. I really think that the makefile on Windows should
be able to configure and autodetect if it really needs to use ONLY the
ANSI version of the API. If you are compiling it for a server, make
sure that the Configure script will effectively detect the full
support of the UNICODE version of the Win32 aPI and will use it when
compiling PHP and when selecting the runtime libraries to link.
I use PHP on Windows, correctly compiled, and I absolutely DON'T know
the problems you cite in your article.
Let's forget now forever these non-UNICODE versions of the Win32
API (which are using inconsistantly the local ANSI codepage for the
Windows graphical UI, and the OEM codepage for the filesystem APIs,
the DOS/BIOS-compatible APIs, the Console APIs) : these non-Unicode
versions of the APIs are even MUCH slower and more costly than the
Unicode versions of the APIs, because they are actually translating
the codepage to Unicode before using the core Unicode APIs (the
situation on Windows NT-based kernels is exactly the reverse from the
situation on versions of Windows based on a virtual DOS extender, such
as Windows 95/98/ME).
When you don't use the native version of the API, your API call will
pass through a thunking layer that will transcode the strings between
Unicode and one of the legacy ANSI or CHCP-selected OEM codepages, or
the OEM codepage hinted on the filesystem: this requires additional
temporary memory allocation within the non-native version of the Win32
API. This takes additional time to convert things before doing the
actual work by calling the native API.
In summary: the PHP binary you install on Windows MUST be different
depending on if you compiled it for Windows 95/98/SE (or the old
Win16s emulation layer for Windows 3.x, which had a very mimimum
support of UTF-8, only to support the Unicode subsets of Unicode used
by the ANSI and OEM codapges selected when starting Windows from a DOS
extender) or if it was compiled for any other version of Windows based
on the NT kernel.
The best proof that this is a problem of PHP and not Windows, is that
your weird results will NOT occur in other languages like C#,
Javascript, VB, Perl, Ruby... PHP has a very bad history in tracking
versions (and too many historical source code quirks and wrong
assumptions that should be disabled today, and an inconsistant library
that has inherited all those quirks initially made in old versions of
PHP for old versions of Windows that are even no longer officially
supported, by Microsoft or even by PHP itself !).
In other words : RTM ! Or download and install a binary version of
PHP for Windows precompield with the correct settings : I really think
that PHP should distribute Windows binaries already compiled by
default for the Unicode version of the Win32 API, and using the
Unicode version of the C/C++ libraries : internally the PHP code will
convert its UTF-8 strings to UTF-16 before calling the Win32 API, and
back from UTF-16 to UTF-8 when retrieving Win32 results, instead of
converting PHP's internal UTF-8 strings back/to the local OEM codepage
(for the filesystem calls) or the local ANSI codepage (for all other
Win32 APIs, including the registry or process).