问一个java字符编码的问题

情况如下：

String str = "中";
String newStr = new String(str.getBytes("GBK"), "UTF-8");

newStr还能还原成str吗？感觉是不能了。

[color=red]UTF-8是可变长度编码，就是说一个英文字符占用一个字节，而对于汉字是占用三个字节。[/color]

str.getBytes("GBK")指的是：[color=red]使用GBK将str字符串解码为字节序列。[/color]

String(byte[] bytes,String charsetName)，指的是：
构造一个新的 String，方法是使用指定的字符集解码指定的字节数组。

只要明白以上三点，再举个例子，应该就明白了。

例子如下：

[code="java"]public class A {
public static void main(String[] args) throws Exception {
String str = "中";
System.out.println(str.getBytes("UTF-8").length);
System.out.println(str.getBytes("GBK").length);
String str1 = new String(str.getBytes("UTF-8"), "GBK");
System.out.println(str1);

    System.out.println("=====================");
    System.out.println(str1.getBytes("UTF-8").length);
    System.out.println(str1.getBytes("GBK").length);
    String str2 = new String(str1.getBytes("GBK"), "UTF-8");
    System.out.println(str2);

    System.out.println("**********************");
    String s = "中国";
    System.out.println(s.getBytes("UTF-8").length);
    System.out.println(s.getBytes("GBK").length);
    String str3 = new String(s.getBytes("UTF-8"), "GBK");
    System.out.println(str3);

    System.out.println("=====================");
    System.out.println(str3.getBytes("UTF-8").length);
    System.out.println(str3.getBytes("GBK").length);
    String str4 = new String(str3.getBytes("GBK"), "UTF-8");
    System.out.println(str4);
}

}[/code]

输出结果：
3
2

涓?

6
3
??

6
4

涓浗

9
6
中国

通过结果，应该可以发现，如果有奇数个汉字的时候，编码在解码之后，就会乱码，而对于偶数个汉字没有问题。

呵呵，强，讲解的很清楚，原来一直以为UTF-8是两个字节长度，领教了。

你这种方式就是想把GBK的字符转换成UTF-8，这就要看你要转换的字符能不能直接也被utf-8支持了

[code="java"]
import java.io.File;

import java.io.FileOutputStream;

import java.io.UnsupportedEncodingException;

/**

2007-8-10 jyin at gomez dot com

*/

public class CharsetConvertor {

public static void main(String[] args) {

String str = "This is a test for *中网!@#$。，？";

try {
```
    String s = gbToUtf8(str);    

}    
catch (Exception e) {    
    e.printStackTrace();    
}    
```
}
public static String gbToUtf8(String str) throws UnsupportedEncodingException {

StringBuffer sb = new StringBuffer();

for (int i = 0; i < str.length(); i++) {

String s = str.substring(i, i + 1);

if (s.charAt(0) > 0x80) {

byte[] bytes = s.getBytes("Unicode");

String binaryStr = "";

for (int j = 2; j < bytes.length; j += 2) {

// the first byte

String hexStr = getHexString(bytes[j + 1]);

String binStr = getBinaryString(Integer.valueOf(hexStr, 16));

binaryStr += binStr;

// the second byte

hexStr = getHexString(bytes[j]);

binStr = getBinaryString(Integer.valueOf(hexStr, 16));

binaryStr += binStr;

}

// convert unicode to utf-8

String s1 = "1110" + binaryStr.substring(0, 4);

String s2 = "10" + binaryStr.substring(4, 10);

String s3 = "10" + binaryStr.substring(10, 16);

byte[] bs = new byte[3];

bs[0] = Integer.valueOf(s1, 2).byteValue();

bs[1] = Integer.valueOf(s2, 2).byteValue();

bs[2] = Integer.valueOf(s3, 2).byteValue();

String ss = new String(bs, "UTF-8");

sb.append(ss);

} else {

sb.append(s);

}

}

return sb.toString();

}
private static String getHexString(byte b) {

String hexStr = Integer.toHexString(b);

int m = hexStr.length();

if (m < 2) {

hexStr = "0" + hexStr;

} else {

hexStr = hexStr.substring(m - 2);

}

return hexStr;

}
private static String getBinaryString(int i) {

String binaryStr = Integer.toBinaryString(i);

int length = binaryStr.length();

for (int l = 0; l < 8 - length; l++) {

binaryStr = "0" + binaryStr;

}

return binaryStr;

}

}

[/code]

肯定是乱码
两中不同的编码体系，编码后的字节都不一样的。