去扫描仪-空格是否正确?

The Go scanner package in text/scanner/scanner.go uses trick to find whitespace:

const GoWhitespace = 1<<'\t' | 1<<'
' | 1<<'' | 1<<' '

And then:

// skip white space
for s.Whitespace&(1<<uint(ch)) != 0 {
    ch = s.next()
}

Since character values shift left by more than 31, can there be cases where this is not unique? I mean, when some char is the same as tab modulo 32, it will be recognized as whitespace?

Fully answer:

Spec explicitly say that for operations on unsigned, we get high bits masked out so the low bits are really "wrap around".

The reason it works is:

  1. Scanner.Whitespace is actually uint64 so value of GoWhitespace fully fit
  2. Operation s.Whitespace&(1<<uint(ch)) on unsigned integer at runtime can have intermediate values arbitrary large and will wrap around. So if say char is "a" (96) we have 1 << 96 which overflows so modulo size of 64-bit int it's 0.

As mentioned in the spec, the << is a shift operation:

The shift operators shift the left operand by the shift count specified by the right operand. They implement arithmetic shifts if the left operand is a signed integer and logical shifts if it is an unsigned integer. There is no upper limit on the shift count. Shifts behave as if the left operand is shifted n times by 1 for a shift count of n. As a result, x << 1 is the same as x*2 and x >> 1 is the same as x/2 but truncated towards negative infinity.

For large values of ch, 1<<uint(ch) will cause an overflow:

For unsigned integer values, the operations +, -, *, and << are computed modulo 2n, where n is the bit width of the unsigned integer's type. Loosely speaking, these unsigned integer operations discard high bits upon overflow, and programs may rely on ``wrap around''.

For signed integers, the operations +, -, *, and << may legally overflow and the resulting value exists and is deterministically defined by the signed integer representation, the operation, and its operands. No exception is raised as a result of overflow. A compiler may not optimize code under the assumption that overflow does not occur. For instance, it may not assume that x < x + 1 is always true.

So implementing << with a bitwise rotation operator (what you seem to be describing) would violate the spec. 1<<uint(ch) will evaluate to zero for values of ch larger than the size of the int type, so won't cause any false positives.