I have apps in Go and Swift which process strings, such as finding substrings and their indices. At first it worked nicely even with multi-byte characters (e.g. emojis), using to Go's utf8.RuneCountInString()
and Swift's native String.
But there are some UTF8 characters that break the string length and indices for substrings, e.g. a string "Lorem
In Swift a Character
is an “extended grapheme cluster,” and each of "
A rune
in Go identifies a specific UTF-8 code point; that does not necessarily mean it maps 1:1 to visually distinct characters. Some characters may be made up of multiple runes/code points, therefor counting runes may not give you what you'd expect from a visual inspection of the string. I don't know what "some text".count
actually counts in Swift so I can't offer any comparison there.