How should I parse email headers ? Is there any go powered lib to parse email headers correctly? I'm developing an email client and for some reasons the standard mail lib doesn't seem to handle all kind of mime types/headers. Actually it fails on aprox 20% of of the emails I've tested.
header := imap.AsBytes(rsp.MessageInfo().Attrs["RFC822.HEADER"])
msg, err := mail.ReadMessage(bytes.NewReader(header))
if err != nil {
err = fmt.Errorf("msg %s, err", err)
log.Error(err)
}
mg.From, err = msg.Header.AddressList("From")
if err != nil {
log.Error(err)
}
The code seems to fail on the following header. I've also started to use enmime for cases not handled by mail but there are still headers not handled by this either ( see below ). I'm not sure how I should approach this issue. Should I throw a regex on it ?
E0722 17:01:33.876922 89702 imap.go:146] header map[X-Gmx-Antivirus:[0 (no virus found)] Received:[from smtp2.ono.com ([62.42.230.179]) by mx-ha.gmx.net (mxgmxus002) with ESMTP (Nemesis) id 0MQRb8-1XLJaM3en6-00Tmzn for ; Mon, 09 Jun 2014 11:27:43 +0200 from PCRafaelRomero (85.137.226.62) by smtp2.ono.com (8.6.122.04) (authenticated as sanromero) id 5385D185003E1FAE for admin@xax.com; Mon, 9 Jun 2014 11:27:40 +0200] Cc:[]
Content-Type:[multipart/alternative; boundary="----=_NextPart_000_3F41_01CF83D5.D1B66CE0"] Content-Language:[es] X-Antivirus-Status:[Clean] Return-Path:[romero@ono.com] X-Gmx-Antispam:[0 (Mail was not recognized as spam); Detail=V3;] From:[] X-Mailer:[Microsoft Office Outlook 12.0] X-Antivirus:[avast! (VPS 140608-1, 08/06/2014), Outbound message] X-Ui-Filterresults:[notjunk:1;V01:K0:nbs2GLFYQvI=:2WsgrcdLWXLFAMJ3EjKYwQVnkC oiOf739mgPzbtBEXoW8E51lMNdd8vNfEFb0+OkeNCBh8OsnZap9qjj8b+hzWGsIEHvhFW1W5j 0h4k3ZxERUU3vVKNgAG+//QA3GnXL67cHvc0rLbyytAtv2ydIdsQVp1wG/IkJ3p9bscQVKKd/ TE9Jfqg7YxyPDlS3zXIYql4IQQ8MMG8T+pCqUQ+SNDZ/hcr2otZNk729nQMHlw0I2B5CZ6N99 FRmFvfhUn67ZPjLZVzKrfk2cRVGISw8/GMrrrm2zggVrlS2GhpzIchxD1TR14fYZ3qz2M4UCI S86WLbTAaQZp3PuIlhAqx1K13DV1IUTdlEs+J6QF1UdJUthb7IGQXCYzIogA6OWOdXzybYpeI foiqJDkSXyBDmHiDi1dwBS6W0u7+nBW9zlhc26rDXImEcbAv+wrdMyUXxlJi3Tqnd0cZ8BuZY IhrarB4/fFFuVdnCz970O7PyoC6+O5g+QoFU9LJRx0O2U6sgjXXe8c21EysIyqCg73M53Z0EM hbcZ5xk/6Bc880+yKrfB2w42kZg6bZVMKFStHPhZsgJFvZftB9/AmG08zp1O0uQBGlULFE4+k DhwCfEAWKkKJClvXPo1Svu9Qw9K59jwPqQVlqLGwdgzE7vscfkj/PomuUUkWRkIwaS7o/WrxR pWyWB/xMVm2ysvUV3Obur2a5J4jKIDLCJNX2grtz7mCjI3DzSL8g6i2qqUn/wueyPxWJGAE3M /93MRR6Vq4Dh/xUWLi6Z0sOtxjVyymBzL93EWXfzkKmTT3Kk4Fl130S/dJlZL9BRbmQo1/nfB yIhOjS9CZz1O3XGbCEq1Rl2EWDQoircWoLLV2I40S00qvXlSBgXvbqpr8oGEIrHl5Y2JwyoM0 T+7h0/zly8R4UKMizQ4Kh08finfDmTawxI9oD+ap60wB7I4elcntWBA4dUix0DpKd4wYuQxmD RAZdTOflVzi2rrftPMHpWqZ9Qr4LdKqs4fvnI9VHvmfqD68uZN4NVncmUo+xuN0koEWqLOcGM 30niY/2rtAFOdi10v1dPVJXya/tssEfwhTjT7BFa01jZIcx/IK/I7FkmDAHfIIBiTKcZNaTch XEiOEAX3GR1YnQdcT5Upb87syLJwM8OdwvWVmq4UaVw3Ogrq2t5ZTG/98/7A7aPASMFd2jTVV 97LZ6iJGGbkRkzxZhb0VPjhq2rJFxihpHcCHe7exqicy2+FLbFetRYaI1JthWjj5PTSdsqmGH pBo+vg1S147tH4vPBii25Op5f3JXr5OUX2uSmcDhSrG3og4hWTZI6zMUtaJVE5IZsuGuVuInJ TWCLzuqkpJ9g3uIA6ECoKW7ODibg4evlXJp9VMEgxyOqBRg=] X-Ctch-Refid:[str=0001.0A0B0203.53957E0C.00D9,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0] Date:[Mon, 9 Jun 2014 11:27:27 +0200] Thread-Index:[Ac+DxFZYas1DTYB/S/anWwcW8Cz4Ag==] X-Ctch-Spam:[Unknown] X-Ctch-Vod:[Unknown] Message-Id:[<3f4001cf83c5$0e2d9ce0$2a88d6a0$@com>] Envelope-To:[] Subject:[Reservation.] Mime-Version:[1.0]], err mail: header not in message
Edit: I've slightly modified the code and brought enmime package to check few cases that fail on mail. However I'm still getting an error: multipart: NextPart: EOF
so I'm wondering what should I try next. I've also noticed that I get this error which may be directly related to the multipart error. Basically the message is not parsed by the mail package thus the enime package reports end of file . I'm trying to parse emails and I get this kind of errors
missing word in phrase: charset not supported: "gb18030"
charset not supported: "koi8-r"
missing word in phrase: charset not supported: "ks_c_5601-1987"
header := imap.AsBytes(rsp.MessageInfo().Attrs["RFC822.HEADER"])
msg, err := mail.ReadMessage(bytes.NewReader(header))
if err != nil {
err = fmt.Errorf("msg %s, err", err)
log.Error(err)
// return mgs, err
}
mg.From, err = msg.Header.AddressList("From")
if err != nil {
mime, err := enmime.ParseMIMEBody(msg)
if err != nil {
log.Error(err)
return mgs, err
}
mg.From[0].Address = mime.GetHeader("From")
if mg.From[0].Address == "" {
log.Error(fmt.Errorf("from is empty %v", header))
return mgs, err
}
mg.From[0].Name = mime.GetHeader("From")
log.Infof("mime FROM is %v", mg.From[0].Address)
You're not failing to parse any header in particular, all headers are initially parsed the same way.
None of that info you're presenting is relevant to the actual error message you have:
mail: header not in message
mail.ErrHeaderNotPresent
is returned by your call to Header.AddressList("From")
(the only other place mail returns that error is Header.Date
).
This isn't fatal; just check for for mail.ErrHeaderNotPresent
and move on if you don't need it.
If you're using Go 1.5, you can use the new functions of the mime package.
If you're using an older version of Go, you can use my drop-in replacement.
Example:
dec := new(mime.WordDecoder)
from, err := dec.DecodeHeader(msg.Header["From"])
if err != nil {
panic(err)
}
fmt.Println(subject)
// from now contains the decoded header