I am trying to create a simple HOCON parser (started from the existing JSON one).
The grammar is defined as:
/** Taken from "The Definitive ANTLR 4 Reference" by Terence Parr */
// Derived from http://json.org
grammar HOCON;
hocon
: value
| pair
;
obj
: object_begin pair (','? pair)* object_end
| object_begin object_end
;
pair
: STRING KV? value {fmt.Println("pairstr",$STRING.GetText())}
| KEY KV? value {fmt.Println("pairkey",$KEY.GetText())}
;
array
: array_begin value (',' value)* array_end
| array_begin array_end
;
value
: STRING {fmt.Println($STRING.GetText())}
| REFERENCE {fmt.Println($REFERENCE.GetText())}
| RAWSTRING {fmt.Println($RAWSTRING.GetText())}
| NUMBER {fmt.Println($NUMBER.GetText())}
| obj
| array
| 'true'
| 'false'
| 'null'
;
COMMENT
: '#' ~( '' | '
' )* -> skip
;
STRING
: '"' (ESC | ~ ["\\])* '"'
| '\'' (ESC | ~ ['\\])* '\''
;
RAWSTRING
: (ESC | ALPHANUM)+
;
KEY
: ( '.' | ALPHANUM | '-')+
;
REFERENCE
: '${' (ALPHANUM|'.')+ '}'
;
fragment ESC
: '\\' (["\\/bfnrt] | UNICODE)
;
fragment UNICODE
: 'u' HEX HEX HEX HEX
;
fragment ALPHANUM
: [0-9a-zA-Z]
;
fragment HEX
: [0-9a-fA-F]
;
KV
: [=:]
;
array_begin
: '[' { fmt.Println("BEGIN [") }
;
array_end
: ']' { fmt.Println("] END") }
;
object_begin
: '{' { fmt.Println("OBJ {") }
;
object_end
: '}' { fmt.Println("} OBJ") }
;
NUMBER
: '-'? INT '.' [0-9] + EXP? | '-'? INT EXP | '-'? INT
;
fragment INT
: '0' | [1-9] [0-9]*
;
// no leading zeros
fragment EXP
: [Ee] [+\-]? INT
;
// \- since - means "range" inside [...]
WS
: [ \t
] + -> skip
;
the error is:
line 2:2 no viable alternative at input '{journal'
pairkey akka.persistence
the sample input that gives the error is:
akka.persistence {
journal {
# Absolute path to the journal plugin configuration entry used by
# persistent actor or view by default.
# Persistent actor or view can override `journalPluginId` method
# in order to rely on a different journal plugin.
plugin = ""
}
}
however if I will update it to use quoted strings:
akka.persistence {
'journal' {
# Absolute path to the journal plugin configuration entry used by
# persistent actor or view by default.
# Persistent actor or view can override `journalPluginId` method
# in order to rely on a different journal plugin.
'plugin' = ""
}
}
everything works as expected.
Looks like I miss something in the KEY
definition, but I can't really find out what exactly.
The Go code to test it out is:
package main
import (
"github.com/antlr/antlr4/runtime/Go/antlr"
"go-hocon/parser"
)
func main() {
is, _ := antlr.NewFileStream("test/simple1.conf")
lex := parser.NewHOCONLexer(is)
p := parser.NewHOCONParser(antlr.NewCommonTokenStream(lex, 0))
p.BuildParseTrees = true
p.Hocon()
}
Your first input makes journal lex as a RAWSTRING
.
[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:28='journal',<RAWSTRING>,2:2]
[@3,30:30='{',<'{'>,2:10]
[@4,277:282='plugin',<RAWSTRING>,7:4]
[@5,284:284='=',<KV>,7:11]
[@6,286:287='""',<STRING>,7:13]
[@7,292:292='}',<'}'>,8:2]
[@8,295:295='}',<'}'>,9:0]
[@9,298:297='<EOF>',<EOF>,10:0]
line 2:2 no viable alternative at input '{journal'
On the other hand, 'journal' lexes as a string, but has those single quotes which you clearly don't want:
[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:30=''journal'',<STRING>,2:2] <-- now it's a string implicit token
[@3,32:32='{',<'{'>,2:12]
[@4,279:284='plugin',<RAWSTRING>,7:4]
[@5,286:286='=',<KV>,7:11]
[@6,288:289='""',<STRING>,7:13]
[@7,294:294='}',<'}'>,8:2]
[@8,297:297='}',<'}'>,9:0]
[@9,300:299='<EOF>',<EOF>,10:0]
line 7:4 no viable alternative at input '{plugin'
line 8:2 mismatched input '}' expecting {'true', 'false', 'null', '[', '{', STRING, RAWSTRING, REFERENCE, KV, NUMBER}
Why? Because lexer rules bind in the following way: 1. Match longest input first. 2. Match implicit tokens (like 'journal') 3. If length of input match is equal, match based on the order of the lexer rules.
In your case, putting 'journal'
makes it match as an implicit token, so it seems to work okay. But only because of those single quotes, which makes it match per rule 2 above Without the quotes, these two tokens are being matched as RAWSTRING, which doesn't fit the rule
pair
: STRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}
Hence the error.
How to fix? Well, I reversed the lexer rules:
RAWSTRING
: (ESC | ALPHANUM)+
;
STRING
: '"' (ESC | ~ ["\\])* '"'
| '\'' (ESC | ~ ['\\])* '\''
;
And changed pair
:
pair
: RAWSTRING KV? value //{fmt.Println("pairstr",$STRING.GetText())}
Now it parses fine:
[@0,0:15='akka.persistence',<KEY>,1:0]
[@1,17:17='{',<'{'>,1:17]
[@2,22:28='journal',<RAWSTRING>,2:2]
[@3,30:30='{',<'{'>,2:10]
[@4,277:282='plugin',<RAWSTRING>,7:4]
[@5,284:284='=',<KV>,7:11]
[@6,286:287='""',<STRING>,7:13]
[@7,292:292='}',<'}'>,8:2]
[@8,295:295='}',<'}'>,9:0]
[@9,298:297='<EOF>',<EOF>,10:0]