如何用python进行mapreduce编写统计单词所在行?

编写一个地图诱导程序,该程序将包含逗号分离单词和输出的 CSV 文件作为输入,每个单词的行都显示在单词中。

例如:

goat,chicken,horse

cat,horse

dog,cat,sheep

buffalo,dolphin,cat

sheep

相应的输出如下:

"buffalo" ["buffalo,dolphin,cat"]

"cat" ["buffalo,dolphin,cat", "cat,horse", "dog,cat,sheep"]

"chicken" ["goat,chicken,horse"] "dog" ["dog,cat,sheep"]

"dolphin" ["buffalo,dolphin,cat"] "goat" ["goat,chicken,horse"]

"horse" ["cat,horse", "goat,chicken,horse"]

"sheep" ["dog,cat,sheep", "sheep"]

代码没写完,思路如下:

from mrjob.job import MRJob
from mrjob.step import MRStep
import csv

class part2(MRJob):

    def steps(self):
        return [MRStep(mapper=self.mapper, reducer=self.reducer)]
        #return [MRStep(mapper=self.mapper)]


    def mapper(self, key, document):
        for word in document.split(','):

            yield word, 1




    def reducer(self, word, line):
        line1 = ["goat,chicken,horse"]
        line2 = ["cat,horse"]
        line3 = ["dog,cat,sheep"]
        line4 = ["buffalo,dolphin,cat"]
        line5 = ["sheep"]
        

         yield word, w1



part2.run()

 

你这个是数组,有下标,在那个下标找到了加1就是行号了。

您好,我是有问必答小助手,你的问题已经有小伙伴为您解答了问题,您看下是否解决了您的问题,可以追评进行沟通哦~

如果有您比较满意的答案 / 帮您提供解决思路的答案,可以点击【采纳】按钮,给回答的小伙伴一些鼓励哦~~

ps:问答VIP仅需29元,即可享受5次/月 有问必答服务,了解详情>>> https://vip.csdn.net/askvip?utm_source=1146287632