Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
130 views
in Technique[技术] by (71.8m points)

java - HTML : Form does not send UTF-8 format inputs

I've visited each one of the questions about UTF-8 encoding in HTML and nothing seems to be making it work like expected.

I added the meta tag : nothing changed.
I added the accept-charset attribute in form : nothing changed.


JSP File

<%@ page pageEncoding="UTF-8" %>
<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<title>Editer les sous-titres</title>
</head>
<body>
    <form method="post" action="/Subtitlor/edit" accept-charset="UTF-8"> 
        <h3 name="nameOfFile"><c:out value="${ nameOfFile }"/></h3> 
        <input type="hidden" name="nameOfFile" id="nameOfFile" value="${ nameOfFile }"/>
        <c:if test="${ !saved }">
            <input value ="Enregistrer le travail" type="submit" style="position:fixed; top: 10px; right: 10px;" />
        </c:if>
        <a href="/Subtitlor/" style="position:fixed; top: 50px; right: 10px;">Retour à la page d'accueil</a>
        <c:if test="${ saved }">
            <div style="position:fixed; top: 90px; right: 10px;">
                <c:out value="Travail enregistré dans la base de donnée"/>
            </div>
        </c:if>
        <table border="1">
            <c:if test="${ !saved }">
                <thead>
                    <th style="weight:bold">Original Line</th>
                    <th style="weight:bold">Translation</th>
                    <th style="weight:bold">Already translated</th>
                </thead>
            </c:if>
            <c:forEach items="${ subtitles }" var="line" varStatus="status">
                <tr>
                    <td style="text-align:right;"><c:out value="${ line }" /></td>
                    <td><input type="text" name="line${ status.index }" id="line${ status.index }" size="35" /></td>
                    <td style="text-align:right"><c:out value="${ lines[status.index].content }"/></td>
                </tr>
            </c:forEach>
        </table>
    </form>
</body>
</html>

Servlet

for (int i = 0 ; i < 2; i++){
    System.out.println(request.getParameter("line"+i));
}

Output

Et ton p?¨re et sa soeur
Il ne sera jamais parti.
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I added the meta tag : nothing changed.

It indeed doesn't have any effect when the page is served over HTTP instead of e.g. from local disk file system (i.e. the page's URL is http://... instead of e.g. file://...). In HTTP, the charset in HTTP response header will be used. You've already set it as below:

<%@page pageEncoding="UTF-8"%>

This will not only write out the HTTP response using UTF-8, but also set the charset attribute in the Content-Type response header.

This one will be used by the webbrowser to interpret the response and encode any HTML form params.


I added the accept-charset attribute in form : nothing changed.

It has only effect in Microsoft Internet Explorer browser. Even then it is doing it wrongly. Never use it. All real webbrowsers will instead use the charset attribute specified in the Content-Type header of the response. Even MSIE will do it the right way as long as you do not specify the accept-charset attribute. As said before, you have already properly set it via pageEncoding.


Get rid of both the meta tag and accept-charset attribute. They do not have any useful effect and they will only confuse yourself in long term and even make things worse when enduser uses MSIE. Just stick to pageEncoding. Instead of repeating the pageEncoding over all JSP pages, you could also set it globally in web.xml as below:

<jsp-config>
    <jsp-property-group>
        <url-pattern>*.jsp</url-pattern>
        <page-encoding>UTF-8</page-encoding>
    </jsp-property-group>
</jsp-config>

As said, this will tell the JSP engine to write HTTP response output using UTF-8 and set it in the HTTP response header too. The webbrowser will use the same charset to encode the HTTP request parameters before sending back to server.

Your only missing step is to tell the server that it must use UTF-8 to decode the HTTP request parameters before returning in getParameterXxx() calls. How to do that globally depends on the HTTP request method. Given that you're using POST method, this is relatively easy to achieve with the below servlet filter class which automatically hooks on all requests:

@WebFilter("/*")
public class CharacterEncodingFilter implements Filter {

    @Override
    public void init(FilterConfig config) throws ServletException {
        // NOOP.
    }

    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
        request.setCharacterEncoding("UTF-8");
        chain.doFilter(request, response);
    }

    @Override
    public void destroy() {
        // NOOP.
    }
}

That's all. In Servlet 3.0+ (Tomcat 7 and newer) you don't need additional web.xml configuration.

You only need to keep in mind that it's very important that setCharacterEncoding() method is called before the POST request parameters are obtained for the first time using any of getParameterXxx() methods. This is because they are parsed only once on first access and then cached in server memory.

So e.g. below sequence is wrong:

String foo = request.getParameter("foo"); // Wrong encoding.
// ...
request.setCharacterEncoding("UTF-8"); // Attempt to set it.
String bar = request.getParameter("bar"); // STILL wrong encoding!

Doing the setCharacterEncoding() job in a servlet filter will guarantee that it runs timely (at least, before any servlet).


In case you'd like to instruct the server to decode GET (not POST) request parameters using UTF-8 too (those parameters you see after ? character in URL, you know), then you'd basically need to configure it in the server end. It's not possible to configure it via servlet API. In case you're using for example Tomcat as server, then it's a matter of adding URIEncoding="UTF-8" attribute in <Connector> element of Tomcat's own /conf/server.xml.

In case you're still seeing Mojibake in the console output of System.out.println() calls, then chances are big that the stdout itself is not configured to use UTF-8. How to do that depends on who's responsible for interpreting and presenting the stdout. In case you're using for example Eclipse as IDE, then it's a matter of setting Window > Preferences > General > Workspace > Text File Encoding to UTF-8.

See also:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...