见题:
有一中文字符串“随便一做”,我想编码 再反编码 ,请教能否给点建议,以及对编码的指导?
String temp = "随便一做";
String tempUTF = null;
tempUTF = new String(temp.getBytes("utf-8"),"gbk");
System.out.println("gbk to utf8 : "+tempUTF);
System.out.println("utf8 to gbk : "+new String(tempUTF.getBytes("gbk"),"utf-8"));BASE64Encoder encoder = new BASE64Encoder(); BASE64Decoder decoder = new BASE64Decoder(); String t1 = encoder.encode(tempUTF.getBytes()); String t2 = new String(decoder.decodeBuffer(t1)).toString(); String t3 = new String(t2.getBytes("gbk"),"UTF-8"); System.out.println("utf8 to base64 : "+t1); System.out.println("base64 to utf8 : "+t2); System.out.println("utf8 to gbk : "+t3); String t4 = encoder.encode(temp.getBytes()); String t5 = new String(decoder.decodeBuffer(t4)).toString(); System.out.println("gbk to base64 : "+t4); System.out.println("base64 to gbk : "+t5);
得出来的结果是:
1)没有得到预期效果:
gbk to utf8 : 闅忎究涓?鍋?
utf8 to gbk : 随便????
2)同上:
utf8 to base64 : 6ZqP5L6/5Lg/5YE/
base64 to utf8 : 闅忎究涓?鍋?
utf8 to gbk : 随便????
3)理想效果:
gbk to base64 : y+ax49K71/Y=
base64 to gbk : 随便一做
BASE64Decoder decoder = new BASE64Decoder();
String str;
try {
str = new String(decoder.decodeBuffer(s),"utf-8");
System.out.println(str);
} catch (IOException e) {
// TODO 自动生成 catch 块
e.printStackTrace();
}
BASE64Decoder decoder = new BASE64Decoder();
String str = new String(decoder.decodeBuffer(baseStr)).toString();
System.out.println(str);
str = new String(str.getBytes("gbk"),"utf-8");
System.out.println(str);
另外UTF-8编码是不定长的.GBK是两个字节长度的.也就是说同一个字符在UTF-8中只要一个字节就可以了.而在GBK中是有两个字节的.
在getBytes用gbk编码得出来是2长度的数组.而用UTF-8就可能只有1长度的数组
[code="java"]String temp = "随便一做";
BASE64Encoder encoder = new BASE64Encoder();
BASE64Decoder decoder = new BASE64Decoder();
String t1 = encoder.encode(temp.getBytes("gbk"));
String t2 = new String(decoder.decodeBuffer(t1),"gbk");
System.out.println("gbk to base64 : "+t1);
System.out.println("base64 to gbk : "+t2);
String temp = "随便一做";
BASE64Encoder encoder = new BASE64Encoder();
BASE64Decoder decoder = new BASE64Decoder();
String t1 = encoder.encode(temp.getBytes("utf-8"));
String t2 = new String(decoder.decodeBuffer(t1),"utf-8");
System.out.println("utf-8 to base64 : "+t1);
System.out.println("base64 to utf-8 : "+t2); [/code]
你首先要清楚str.getBytes("gbk")是怎么运行的.
再弄清楚new String(byte[],"utf-8")是怎么运行的就清楚了
Java内部存储字符为unicode编码的,
char说白了就是N个(一个或多个)字节组成的数.
在不同编码下相同的汉字的字节个数可能不同,每个字节的大小也不一样.
str.getBytes("gbk")是按照gbk编码对字符串进行了一个转换生成一个byte数组,你想再转换成字符串只能用new String(byte[],"gbk").
你可以反这个想象成一对加密和解密的过程.你就gbk加密,反过来也只能用gbk进行解密.