I would like to write Hadoop Map/Reduce jobs in Go (and not the Streaming API!) .
I tried to get a grasp of hortonworks/gohadoop and colinmarc/hdfs but I still don't see how to write jobs for real. I have searched on github codes importing these modules but there is nothing relevant apparently.
Is there any WordCount.go
somewhere?
here's a simple implementation of Map/Reduce written in Golang (available at github):
This github: https://github.com/vistarmedia/gossamr is a good example for starting to use a golang job on Hadoop:
Jist:
package main
import (
"log"
"strings"
"github.com/vistarmedia/gossamr"
)
type WordCount struct{}
func (wc *WordCount) Map(p int64, line string, c gossamr.Collector) error {
for _, word := range strings.Fields(line) {
c.Collect(strings.ToLower(word), int64(1))
}
return nil
}
func (wc *WordCount) Reduce(word string, counts chan int64, c gossamr.Collector) error {
var sum int64
for v := range counts {
sum += v
}
c.Collect(sum, word)
return nil
}
func main() {
wordcount := gossamr.NewTask(&WordCount{})
err := gossamr.Run(wordcount)
if err != nil {
log.Fatal(err)
}
}
Kicking off the script:
./bin/hadoop jar ./contrib/streaming/hadoop-streaming-1.2.1.jar \
-input /mytext.txt \
-output /output.15 \
-mapper "gossamr -task 0 -phase map" \
-reducer "gossamr -task 0 -phase reduce" \
-io typedbytes \
-file ./wordcount
-numReduceTasks 6