确定Go中的空白

From the documentation of Go's unicode package:

func IsSpace

func IsSpace(r rune) bool

IsSpace reports whether the rune is a space character as defined by Unicode's White Space property; in the Latin-1 space this is

'\t', ' ', '\v', '\f', '', ' ', U+0085 (NEL), U+00A0 (NBSP).

Other definitions of spacing characters are set by category Z and property Pattern_White_Space.

My question is: What does it mean that "other definitions" are set by the Z category and Pattern_White_Space? Does this mean that calling unicode.IsSpace(), checking whether a character is in the Z category, and checking whether a character is in Pattern_White_Space will all yield different results? If so, what are the differences? And why are there differences?

The IsSpace function will first check if your rune is in the Latin1 char space. If it is, it will use the space characters you listed to determine white-spacing.

If not, isExcludingLatin (http://golang.org/src/unicode/letter.go?h=isExcludingLatin#L170) is called which looks like:

   170  func isExcludingLatin(rangeTab *RangeTable, r rune) bool {
   171      r16 := rangeTab.R16
   172      if off := rangeTab.LatinOffset; len(r16) > off && r <= rune(r16[len(r16)-1].Hi) {
   173          return is16(r16[off:], uint16(r))
   174      }
   175      r32 := rangeTab.R32
   176      if len(r32) > 0 && r >= rune(r32[0].Lo) {
   177          return is32(r32, uint32(r))
   178      }
   179      return false
   180  }

The *RangeTable being passed in is White_Space which looks is defined here:

http://golang.org/src/unicode/tables.go?h=White_Space#L6069

  6069  var _White_Space = &RangeTable{
  6070      R16: []Range16{
  6071          {0x0009, 0x000d, 1},
  6072          {0x0020, 0x0020, 1},
  6073          {0x0085, 0x0085, 1},
  6074          {0x00a0, 0x00a0, 1},
  6075          {0x1680, 0x1680, 1},
  6076          {0x2000, 0x200a, 1},
  6077          {0x2028, 0x2029, 1},
  6078          {0x202f, 0x202f, 1},
  6079          {0x205f, 0x205f, 1},
  6080          {0x3000, 0x3000, 1},
  6081      },
  6082      LatinOffset: 4,
  6083  }

To answer your main question, the IsSpace check is not limited to Latin-1.

EDIT
For clarification, if the character you are testing is not in the Latin-1 charset, then the range table lookup is used. The Range16 values in the table represent ranges of 16bit numbers {Low, Hi, Stride}. The isExcludingLatin will call is16 with that range table sub-section (R16) and determine if the rune provided falls in any of the ranges after the index of LatinOffset (which is 4 in this case).

So, that is checking these ranges:

 {0x1680, 0x1680, 1},
 {0x2000, 0x200a, 1},
 {0x2028, 0x2029, 1},
 {0x202f, 0x202f, 1},
 {0x205f, 0x205f, 1},
 {0x3000, 0x3000, 1},

There are unicode code points for:

http://www.fileformat.info/info/unicode/char/1680/index.htm http://www.fileformat.info/info/unicode/char/2000/index.htm http://www.fileformat.info/info/unicode/char/2001/index.htm http://www.fileformat.info/info/unicode/char/2002/index.htm http://www.fileformat.info/info/unicode/char/2003/index.htm http://www.fileformat.info/info/unicode/char/2004/index.htm http://www.fileformat.info/info/unicode/char/2005/index.htm http://www.fileformat.info/info/unicode/char/2006/index.htm http://www.fileformat.info/info/unicode/char/2007/index.htm http://www.fileformat.info/info/unicode/char/2008/index.htm http://www.fileformat.info/info/unicode/char/2009/index.htm http://www.fileformat.info/info/unicode/char/200a/index.htm http://www.fileformat.info/info/unicode/char/2028/index.htm http://www.fileformat.info/info/unicode/char/2029/index.htm http://www.fileformat.info/info/unicode/char/202f/index.htm http://www.fileformat.info/info/unicode/char/205f/index.htm http://www.fileformat.info/info/unicode/char/3000/index.htm

All of the above are considers "white space"