为什么我们需要一个恒定的时间*单字节*比较功能?

Looking at Go standard library, there's a ConstantTimeByteEq function that looks like this:

func ConstantTimeByteEq(x, y uint8) int {
    z := ^(x ^ y)
    z &= z >> 4
    z &= z >> 2
    z &= z >> 1

    return int(z)
}

Now, I understand the need for constant time string (array, etc.) comparison, as a regular algorithm could short-circuit after the first unequal element. But in this case, isn't a regular comparison of two fixed-sized integers a constant time operation at the CPU level already?

Not necessarily. And it is hard to tell what the compiler will emit after doing its optimizations. You could end up with different machine code for the highlevel "compare one byte". Leaking just a tiny bit in a side channel might change your encryption from "basically unbreakable" to "hopefully not worth the money needed for a crack".

The point is likely to avoid branch mispredictions, in addition to having the result as 1 or 0 instead of true or false (allowing follow ups as bitwise operations).

Compare how this compiles:

var a, b, c, d byte
_ =  a == b && c == d

=>

0017 (foo.go:15) MOVQ    $0,BX
0018 (foo.go:15) MOVQ    $0,DX
0019 (foo.go:15) MOVQ    $0,CX
0020 (foo.go:15) MOVQ    $0,AX
0021 (foo.go:16) JMP     ,24
0022 (foo.go:16) MOVQ    $1,AX
0023 (foo.go:16) JMP     ,30
0024 (foo.go:16) CMPB    BX,DX
0025 (foo.go:16) JNE     ,29
0026 (foo.go:16) CMPB    CX,AX
0027 (foo.go:16) JNE     ,29
0028 (foo.go:16) JMP     ,22
0029 (foo.go:16) MOVQ    $0,AX

With this:

var a, b, c, d byte
_ =  subtle.ConstantTimeByteEq(a, b) & subtle.ConstantTimeByteEq(c, d)

=>

0018 (foo.go:15) MOVQ    $0,DX
0019 (foo.go:15) MOVQ    $0,AX
0020 (foo.go:15) MOVQ    $0,DI
0021 (foo.go:15) MOVQ    $0,SI
0022 (foo.go:16) XORQ    AX,DX
0023 (foo.go:16) XORQ    $-1,DX
0024 (foo.go:16) MOVQ    DX,BX
0025 (foo.go:16) SHRB    $4,BX
0026 (foo.go:16) ANDQ    BX,DX
0027 (foo.go:16) MOVQ    DX,BX
0028 (foo.go:16) SHRB    $2,BX
0029 (foo.go:16) ANDQ    BX,DX
0030 (foo.go:16) MOVQ    DX,AX
0031 (foo.go:16) SHRB    $1,DX
0032 (foo.go:16) ANDQ    DX,AX
0033 (foo.go:16) MOVBQZX AX,DX
0034 (foo.go:16) MOVQ    DI,BX
0035 (foo.go:16) XORQ    SI,BX
0036 (foo.go:16) XORQ    $-1,BX
0037 (foo.go:16) MOVQ    BX,AX
0038 (foo.go:16) SHRB    $4,BX
0039 (foo.go:16) ANDQ    BX,AX
0040 (foo.go:16) MOVQ    AX,BX
0041 (foo.go:16) SHRB    $2,BX
0042 (foo.go:16) ANDQ    BX,AX
0043 (foo.go:16) MOVQ    AX,BX
0044 (foo.go:16) SHRB    $1,BX
0045 (foo.go:16) ANDQ    BX,AX
0046 (foo.go:16) MOVBQZX AX,BX

Although the latter version is longer, it's also linear -- there are no branches.

If the code which called the function were to immediately branch based upon the result, the use of the constant-time method wouldn't provide much extra security. On the other hand, if one were to call the function on a bunch of different pairs of bytes, keeping a running total of the results, and only branch based upon the final result, then an outside snooper might be able to determine whether that last branch was taken, but wouldn't know which of the previous byte comparisons was responsible for it.

That having been said, I'm not sure I see a whole lot of advantages in most usage cases for going through the trouble of distilling the method's output to a zero or one; simply keeping a running tally of notEqual = (A0 ^ B0); notEqual |= (A1 ^ B1); notEqual |= (A2 ^ B2); ... would achieve the same effect and be much faster.