I am creating MD5 digest in Java which is needed to calculate 4-byte hex hash of the input string. Following is the code in Java:
public static String hashString(String s) {
MessageDigest md;
try {
md = MessageDigest.getInstance("MD5");
byte[] digest = md.digest(s.getBytes("US-ASCII"));
byte[] output = new byte[digest.length / 4];
for (int i = 0; i < output.length; i++) {
for (int j = 0; j < digest.length; j += 4) {
System.out.print(digest[j]);
output[i] ^= digest[i + j];
}
}
return getHexString(output);
} catch (NoSuchAlgorithmException | UnsupportedEncodingException e) {
return null;
}
}
I wanted to use the same code in Golang, however, the MD5 output is different than what I am getting in Java. Below is the code in Go:
func hashString(s string) string {
md := md5.New()
md.Write([]byte(s))
data := md.Sum(nil)
fmt.Println(data)
output := make([]byte, len(data)/4)
for i := 0; i < len(output); i++ {
for j:=0 ;j < len(data); j++ {
output[i] ^= data[i + j]
fmt.Print(output[i])
}
}
return getHexString(output)
}
I have added print statements in both code samples. As I am new in Go, I am not aware if there are any other libraries or way available to do so. I just followed what I found on internet. It would be really great if someone can help with this.
Your inner loops are different.
In Java:
for (int j = 0; j < digest.length; j += 4) {
System.out.print(digest[j]);
output[i] ^= digest[i + j];
}
In Go:
for j:=0; j < len(data); j++ {
output[i] ^= data[i + j]
fmt.Print(output[i])
}
Notice in Java you increment the loop variable by 4, and in Go only by 1. Use:
for j:=0; j < len(data); j += 4 {
output[i] ^= data[i + j]
fmt.Print(output[i])
}
UPDATE: Asker clarified that this was just a typo in the posted code and since then was removed (edited out).
Also your Java solution returns the hex representation of the output:
return getHexString(output);
But in Go you return the hex representation of the (full) MD5 digest:
return getHexString(md.Sum(nil))
So in Go also do:
return getHexString(output)
One last note. In Java you convert your input string to a sequence of bytes using US-ASCII
encoding, and in Go you are using the UTF-8
encoded sequence of your input string, because this is how Go stores strings naturally (so you get the UTF-8 byte sequence when you do []byte("some text")
).
This will result in the same input data for texts using only characters of the ASCII table (whose code is less than 128), but will result in different data for texts that contain characters above that (as they will translate into multi-byte sequences in UTF-8). Something to keep in mind!
Also note that to calculate the MD5 digest of some data, you may simply use the md5.Sum()
function since you're throwing away the created hash.Hash
anyway:
func hashString(s string) string {
dataArr := md5.Sum([]byte(s))
data := dataArr[:]
fmt.Println(data)
output := make([]byte, len(data)/4)
for i := 0; i < len(output); i += 4 {
for j := 0; j < len(data); j++ {
output[i] ^= data[i+j]
fmt.Print(output[i])
}
}
return getHexString(output)
}
You said the content of the result arrays are different. This is due to the fact that the byte
type in Java is signed, it has a range of -128..127
, while in Go byte
is an alias of uint8
and has a range of 0..255
. So if you want to compare the results, you have to shift negative Java values by 256 (add 256).
If you convert the byte arrays (or slices) to a hex representation, it will be the same (hex representation has no "signness" property).
You can read more about this here: Java vs. Golang for HOTP (rfc-4226)