Long story short: Maybe, depends on the system the app is deployed to, depends how PHP was compiled, welcome to the CF of localization and internationalization.
The underlying PCRE engine takes locale into account when determining what "a-z" means. In a Spanish based locale, ? would be caught by a-z). The semantic meaning of a-z is "all the letters between a and z, and ? is a separate letter in Spanish.
However, the way PHP blindly handles strings as collections of bytes rather than a collection of UTF code points means you have a situation where a-z MIGHT match an accented character. Given the variety of different systems Drupal gets deployed to, it makes sense that they would choose to be explicit about the allowed characters rather than just trust a-z to do the right thing.
I'd also conjecture that the existence of this regular expression is the result of a bug report being filed about German umlauts not being filtered.
Update in 2014: Per JimmiTh's answer below, it looks like (despite some "confusing-to-non-pcre-core-developers" documentation) that [a-z]
will only match the characters abcdefghijklmnopqrstuvwxyz
a proverbial 99% of the time. That said —?framework developers tend to get twitchy about vagueness in their code, especially when the code relies on systems (locale specific strings) that PHP doesn't handle as gracefully as you'd like, and servers the developers have no control over. While the anonymous Drupal developer's comments are incorrect — it wasn't a matter of "getting [a-z]
confused with w
", but instead a Drupal developer being unclear/unsure of how PCRE handled [a-z]
, and choosing the more specific form of abcdefghijklmnopqrstuvwxyz
to ensure the specific behavior they wanted.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…