There are http.DetectContentType([]byte)
function in net/http
package. But only limited number of types are supported. How to add support of docx
, doc
, xls
, xlsx
, ppt
, pps
, odt
, ods
, odp
files not by extension, but by the content. As far as I know, there are some problems, because docx
/xlsx
/pptx
/odp
/odt
files has the same signature as the zip
file (50 4B 03 04).
Disclaimer: I'm the author of mimetype.
For anyone having the same problem 3 years later, nowadays the packages for mime type detection based on the content are the following:
man magic
For files with x
at the end are relatively easy to detect. Just unzip it and read .rels/_rels
file. It contains path to the main file in document. It denoted by namespace http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument
. Just check its name. It's document.xml
for docx, workbook.xml
for xlsx and presentation.xml
for pptx.
More info here can be found here ECMA-376.
Binary formats harder to detect. Basically you need to read MS-CFB filesystem and check for entries:
WordDocument
for docWorkbook
or Book
for xlsPowerPoint Document
for pptEncryptedPackage
means file is encrypted.There's currently no way to extend http.DetectContentType
as it uses a fixed, unexported slice of "sniffers": https://golang.org/src/net/http/sniff.go (sniffSignatures
on line 49 at the time of writing).
Also, I looked quickly through godoc.org in search of a better package but didn't find any that is extensible and content-oriented as you require.
My advice would be: build your own package, guided by Go's content sniffer implementation (which follows https://mimesniff.spec.whatwg.org/).
Edit: If you're willing to use CGO and you're on nix, you could use libmagic bindings like for example https://github.com/jteeuwen/magic.