如何使 UTF-8在 Java 网络应用程序中工作?

I need to get UTF-8 working in my Java webapp (servlets + JSP, no framework used) to support äöå etc. for regular Finnish text and Cyrillic alphabets like ЦжФ for special cases.

My setup is the following:

  • Development environment: Windows XP
  • Production environment: Debian

Database used: MySQL 5.x

Users mainly use Firefox2 but also Opera 9.x, FF3, IE7 and Google Chrome are used to access the site.

How to achieve this?

转载于:https://stackoverflow.com/questions/138948/how-to-get-utf-8-working-in-java-webapps

I think you summed it up quite well in your own answer.

In the process of UTF-8-ing(?) from end to end you might also want to make sure java itself is using UTF-8. Use -Dfile.encoding=utf-8 as parameter to the JVM (can be configured in catalina.bat).

This is for Greek Encoding in MySql tables when we want to access them using Java:

Use the following connection setup in your JBoss connection pool (mysql-ds.xml)

<connection-url>jdbc:mysql://192.168.10.123:3308/mydatabase</connection-url>
<driver-class>com.mysql.jdbc.Driver</driver-class>
<user-name>nts</user-name>
<password>xaxaxa!</password>
<connection-property name="useUnicode">true</connection-property>
<connection-property name="characterEncoding">greek</connection-property>

If you don't want to put this in a JNDI connection pool, you can configure it as a JDBC-url like the next line illustrates:

jdbc:mysql://192.168.10.123:3308/mydatabase?characterEncoding=greek

For me and Nick, so we never forget it and waste time anymore.....

In case you have specified in connection pool (mysql-ds.xml), in your Java code you can open the connection as follows:

DriverManager.registerDriver(new com.mysql.jdbc.Driver());
Connection conn = DriverManager.getConnection(
    "jdbc:mysql://192.168.1.12:3308/mydb?characterEncoding=greek",
    "Myuser", "mypass");

Nice detailed answer. just wanted to add one more thing which will definitely help others to see the UTF-8 encoding on URLs in action .

Follow the steps below to enable UTF-8 encoding on URLs in firefox.

  1. type "about:config" in the address bar.

  2. Use the filter input type to search for "network.standard-url.encode-query-utf8" property.

  3. the above property will be false by default, turn that to TRUE.
  4. restart the browser.

UTF-8 encoding on URLs works by default in IE6/7/8 and chrome.

I want also to add from here this part solved my utf problem:

runtime.encoding=<encoding>

I'm with a similar problem, but, in filenames of a file I'm compressing with apache commons. So, i resolved it with this command:

convmv --notest -f cp1252 -t utf8 * -r

it works very well for me. Hope it help anyone ;)

For my case of displaying Unicode character from message bundles, I don't need to apply "JSP page encoding" section to display Unicode on my jsp page. All I need is "CharsetFilter" section.

To add to kosoant's answer, if you are using Spring, rather than writing your own Servlet filter, you can use the class org.springframework.web.filter.CharacterEncodingFilter they provide, configuring it like the following in your web.xml:

 <filter>
    <filter-name>encoding-filter</filter-name>
    <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
    <init-param>
       <param-name>encoding</param-name>
       <param-value>UTF-8</param-value>
    </init-param>
    <init-param>
       <param-name>forceEncoding</param-name>
       <param-value>FALSE</param-value>
    </init-param>
 </filter>
 <filter-mapping>
    <filter-name>encoding-filter</filter-name>
    <url-pattern>/*</url-pattern>
 </filter-mapping>

One other point that hasn't been mentioned relates to Java Servlets working with Ajax. I have situations where a web page is picking up utf-8 text from the user sending this to a JavaScript file which includes it in a URI sent to the Servlet. The Servlet queries a database, captures the result and returns it as XML to the JavaScript file which formats it and inserts the formatted response into the original web page.

In one web app I was following an early Ajax book's instructions for wrapping up the JavaScript in constructing the URI. The example in the book used the escape() method, which I discovered (the hard way) is wrong. For utf-8 you must use encodeURIComponent().

Few people seem to roll their own Ajax these days, but I thought I might as well add this.

About CharsetFilter mentioned in @kosoant answer ....

There is a build in Filter in tomcat web.xml (located at conf/web.xml). The filter is named setCharacterEncodingFilter and is commented by default. You can uncomment this ( Please remember to uncomment its filter-mapping too )

Also there is no need to set jsp-config in your web.xml (I have test it for Tomcat 7+ )

Some time you can solve problem through MySQL Administrator wizard. In

Startup variables > Advanced >

and set Def. char Set:utf8

Maybe this config need restart MySQL.

Previous responses didn't work with my problem. It was only in production, with tomcat and apache mod_proxy_ajp. Post body lost non ascii chars by ? The problem finally was with JVM defaultCharset (US-ASCII in a default instalation: Charset dfset = Charset.defaultCharset();) so, the solution was run tomcat server with a modifier to run the JVM with UTF-8 as default charset:

JAVA_OPTS="$JAVA_OPTS -Dfile.encoding=UTF-8" 

(add this line to catalina.sh and service tomcat restart)

Maybe you must also change linux system variable (edit ~/.bashrc and ~/.profile for permanent change, see https://perlgeek.de/en/article/set-up-a-clean-utf8-environment)

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

export LANGUAGE=en_US.UTF-8