Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
323 views
in Technique[技术] by (71.8m points)

php - UTF-8贯穿始终(UTF-8 all the way through)

I'm setting up a new server and want to support UTF-8 fully in my web application.

(我正在设置一个新服务器,并希望在我的Web应用程序中完全支持UTF-8。)

I have tried this in the past on existing servers and always seem to end up having to fall back to ISO-8859-1.

(我过去曾在现有服务器上尝试过此操作,但最终似乎总是不得不退回到ISO-8859-1。)

Where exactly do I need to set the encoding/charsets?

(我到底需要在哪里设置编码/字符集?)

I'm aware that I need to configure Apache, MySQL, and PHP to do this — is there some standard checklist I can follow, or perhaps troubleshoot where the mismatches occur?

(我知道我需要配置Apache,MySQL和PHP来执行此操作-是否可以遵循一些标准清单,或者对出现不匹配的地方进行故障排除?)

This is for a new Linux server, running MySQL 5, PHP, 5 and Apache 2.

(这是用于运行Linux 5,PHP,5和Apache 2的新Linux服务器。)

  ask by mercutio translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Data Storage :

(资料储存)

  • Specify the utf8mb4 character set on all tables and text columns in your database.

    (在数据库的所有表和文本列上指定utf8mb4字符集。)

    This makes MySQL physically store and retrieve values encoded natively in UTF-8.

    (这使得MySQL在物理上存储和检索以UTF-8本地编码的值。)

    Note that MySQL will implicitly use utf8mb4 encoding if a utf8mb4_* collation is specified (without any explicit character set).

    (请注意,如果指定了utf8mb4_*排序规则(没有任何显式字符集),则MySQL将隐式使用utf8mb4编码。)

  • In older versions of MySQL (< 5.5.3), you'll unfortunately be forced to use simply utf8 , which only supports a subset of Unicode characters.

    (在旧版本的MySQL(<5.5.3)中,不幸的是,您将被迫仅使用utf8 ,后者仅支持Unicode字符的子集。)

    I wish I were kidding.

    (我希望我在开玩笑。)

Data Access :

(资料存取 :)

  • In your application code (eg PHP), in whatever DB access method you use, you'll need to set the connection charset to utf8mb4 .

    (在您的应用程序代码(例如PHP)中,无论使用utf8mb4数据库访问方法,都需要将连接字符集设置为utf8mb4 。)

    This way, MySQL does no conversion from its native UTF-8 when it hands data off to your application and vice versa.

    (这样,当MySQL将数据交给您的应用程序时,MySQL不会从其本地UTF-8进行转换,反之亦然。)

  • Some drivers provide their own mechanism for configuring the connection character set, which both updates its own internal state and informs MySQL of the encoding to be used on the connection—this is usually the preferred approach.

    (一些驱动程序提供了自己的配置连接字符集的机制,该机制既可以更新其自身的内部状态,又可以将要在连接上使用的编码通知MySQL-这通常是首选方法。)

    In PHP:

    (在PHP中:)

    • If you're using the PDO abstraction layer with PHP ≥ 5.3.6, you can specify charset in the DSN :

      (如果您使用PHP≥5.3.6的PDO抽象层,则可以在DSN中指定charset :)

       $dbh = new PDO('mysql:charset=utf8mb4'); 
    • If you're using mysqli , you can call set_charset() :

      (如果您使用的是mysqli ,则可以调用set_charset() :)

       $mysqli->set_charset('utf8mb4'); // object oriented style mysqli_set_charset($link, 'utf8mb4'); // procedural style 
    • If you're stuck with plain mysql but happen to be running PHP ≥ 5.2.3, you can call mysql_set_charset .

      (如果您坚持使用普通的mysql,但碰巧正在运行PHP≥5.2.3,则可以调用mysql_set_charset 。)

  • If the driver does not provide its own mechanism for setting the connection character set, you may have to issue a query to tell MySQL how your application expects data on the connection to be encoded: SET NAMES 'utf8mb4' .

    (如果驱动程序没有提供自己的设置连接字符集的机制,则可能必须发出查询以告知MySQL您的应用程序希望连接上的数据如何被编码: SET NAMES 'utf8mb4' 。)

  • The same consideration regarding utf8mb4 / utf8 applies as above.

    (如上所述,关于utf8mb4 / utf8注意事项相同。)

Output :

(输出 :)

  • If your application transmits text to other systems, they will also need to be informed of the character encoding.

    (如果您的应用程序将文本传输到其他系统,则还需要告知他们字符编码。)

    With web applications, the browser must be informed of the encoding in which data is sent (through HTTP response headers or HTML metadata ).

    (对于Web应用程序,必须告知浏览器发送数据的编码(通过HTTP响应标头或HTML元数据 )。)

  • In PHP, you can use the default_charset php.ini option, or manually issue the Content-Type MIME header yourself, which is just more work but has the same effect.

    (在PHP中,您可以使用default_charset php.ini选项,或自己手动发出Content-Type MIME标头,这虽然工作更多,但效果相同。)

  • When encoding the output using json_encode() , add JSON_UNESCAPED_UNICODE as a second parameter.

    (使用json_encode()编码输出时,添加JSON_UNESCAPED_UNICODE作为第二个参数。)

Input :

(输入 :)

  • Unfortunately, you should verify every received string as being valid UTF-8 before you try to store it or use it anywhere.

    (不幸的是,在尝试存储或在任何地方使用它之前,您应该验证每个收到的字符串都是有效的UTF-8。)

    PHP's mb_check_encoding() does the trick, but you have to use it religiously.

    (PHP的mb_check_encoding()可以达到目的,但您必须mb_check_encoding()使用。)

    There's really no way around this, as malicious clients can submit data in whatever encoding they want, and I haven't found a trick to get PHP to do this for you reliably.

    (真的没有办法解决这个问题,因为恶意客户端可以使用他们想要的任何编码来提交数据,而且我还没有找到使PHP可靠地为您执行此操作的技巧。)

  • From my reading of the current HTML spec , the following sub-bullets are not necessary or even valid anymore for modern HTML.

    (从我对当前HTML规范的阅读中,对于现代HTML ,以下子项目不再是必需的,甚至不再有效。)

    My understanding is that browsers will work with and submit data in the character set specified for the document.

    (我的理解是浏览器将使用为文档指定的字符集并提交数据。)

    However, if you're targeting older versions of HTML (XHTML, HTML4, etc.), these points may still be useful:

    (但是,如果您定位的是旧版HTML(XHTML,HTML4等),则以下几点可能仍然有用:)

    • For HTML before HTML5 only : you want all data sent to you by browsers to be in UTF-8.

      (仅适用于HTML5之前的HTML :您希望浏览器发送给您的所有数据都使用UTF-8。)

      Unfortunately, if you go by the the only way to reliably do this is add the accept-charset attribute to all your <form> tags: <form ... accept-charset="UTF-8"> .

      (不幸的是,如果唯一可靠的方法是将accept-charset属性添加到所有<form>标签: <form ... accept-charset="UTF-8"> 。)

    • For HTML before HTML5 only : note that the W3C HTML spec says that clients "should" default to sending forms back to the server in whatever charset the server served, but this is apparently only a recommendation, hence the need for being explicit on every single <form> tag.

      (仅对于HTML5之前的HTML :请注意,W3C HTML规范指出,客户端“应”默认使用服务器提供的任何字符集将表单发送回服务器,但这显然仅是建议,因此需要在每一个服务器上都明确<form>标签。)

Other Code Considerations :

(其他代码注意事项 :)

  • Obviously enough, all files you'll be serving (PHP, HTML, JavaScript, etc.) should be encoded in valid UTF-8.

    (显然,您将要提供的所有文件(PHP,HTML,JavaScript等)都应使用有效的UTF-8进行编码。)

  • You need to make sure that every time you process a UTF-8 string, you do so safely.

    (您需要确保每次处理UTF-8字符串时,都必须安全进行。)

    This is, unfortunately, the hard part.

    (不幸的是,这是困难的部分。)

    You'll probably want to make extensive use of PHP's mbstring extension.

    (您可能需要广泛使用PHP的mbstring扩展名。)

  • PHP's built-in string operations are not by default UTF-8 safe.

    (PHP的内置字符串操作默认情况下不是 UTF-8安全的。)

    There are some things you can safely do with normal PHP string operations (like concatenation), but for most things you should use the equivalent mbstring function.

    (您可以使用正常的PHP字符串操作(例如串联)安全地进行某些操作,但是对于大多数事情,您应该使用等效的mbstring函数。)

  • To know what you're doing (read: not mess it up), you really need to know UTF-8 and how it works on the lowest possible level.

    (要知道您在做什么(请阅读:不要搞砸),您确实需要了解UTF-8及其在最低级别上的工作方式。)

    Check out any of the links from utf8.com for some good resources to learn everythi

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...