I don't know Go, I am just studying various languages syntax.
From Go FAQ: "Go borrows a trick from BCPL: the semicolons that separate statements are in the formal grammar but are injected automatically, without lookahead, by the lexer at the end of any line that could be the end of a statement."
I wonder how it is done, I took a look at lex.go
but maybe I don't know Go enough (very little actually) but I didn't find any reference to "statement" or "semicolon".
So – how can you detect at lexer stage end of the valid statement without even lookahead?
You can look in the language specification:
The formal grammar uses semicolons ";" as terminators in a number of productions. Go programs may omit most of these semicolons using the following two rules:
When the input is broken into tokens, a semicolon is automatically inserted into the token stream at the end of a non-blank line if the line's final token is
- an identifier
- an integer, floating-point, imaginary, rune, or string literal
- one of the keywords break, continue, fallthrough, or return
- one of the operators and delimiters ++, --, ), ], or }
To allow complex statements to occupy a single line, a semicolon may be omitted before a closing ")" or "}".
Go parser recognizes the sentence structures (e.g., statements, expressions) according to the Go grammar. Parser uses tokens produced by the scanner (lexical analyzer).
Semicolon is automatically inserted into the token stream by scanner, therefore, there is no extra workload for parser. The semicolon insertion code can be found here in Go scanner.
Go language specification defines how the scanner inserts semicolon as follow;
The formal grammar uses semicolons ";" as terminators in a number of productions. Go programs may omit most of these semicolons using the following two rules:
When the input is broken into tokens, a semicolon is automatically inserted into the token stream at the end of a non-blank line if the line's final token is
- an identifier
- an integer, floating-point, imaginary, rune, or string literal
- one of the keywords break, continue, fallthrough, or return
- one of the operators and delimiters ++, --, ), ], or }
To allow complex statements to occupy a single line, a semicolon may be omitted before a closing ")" or "}".