开始:符文,Int32和二进制编码

I get that a rune is an alias for int32 because it's supposed to hold all valid Unicode code points. There are apparently about 1,114,112 Unicode code points currently, so it would make sense that they would have to be stored in four bits, or an int32-sized register, which can store an integer up to 2,147,483,647.

I have a few questions about binary encoding of UTF-8 characters and integers, however.

  • It appears that both rune and int32 both occupy four bytes. If 2147483647 is the highest integer able to be represented in four bytes (four bit octets), why is its binary representation 1111111111111111111111111111111, i.e., 31 1's instead of 32? Is there a bit reserved for its sign? It's possible that there's a bug in the binary converter I used, because -2147483648 should still be able to be represented in 4 bytes (as it's still able to be represented in the int32 type), but it is output there as 1111111111111111111111111111111110000000000000000000000000000000, i.e., 33 1's and 31 0's which clearly overruns a four byte allowance. What's the story there?
  • In the binary conversion, how would the compiler differentiate between a rune like 'C' (01000011, according to the unicode to binary table and the integer 67 (also 01000011, according to the binary to decimal converter I used). Intuition tells me that some of the bits must be reserved for that information. Which ones?

I've done a fair amount of Googling, but I'm obviously missing the resources that explain this well, so feel free to explain like I'm 5. Please also feel free to correct terminology misuses.