如何用python进行mapreduce编写统计单词所在行?

编写一个地图诱导程序,该程序将包含逗号分离单词和输出的 CSV 文件作为输入,每个单词的行都显示在单词中。

例如:

goat,chicken,horse

cat,horse

dog,cat,sheep

buffalo,dolphin,cat

sheep

相应的输出如下:

"buffalo" ["buffalo,dolphin,cat"]

"cat" ["buffalo,dolphin,cat", "cat,horse", "dog,cat,sheep"]

"chicken" ["goat,chicken,horse"]

"dog" ["dog,cat,sheep"]

"dolphin" ["buffalo,dolphin,cat"]

"goat" ["goat,chicken,horse"]

"horse" ["cat,horse", "goat,chicken,horse"]

"sheep" ["dog,cat,sheep", "sheep"]

代码没写完,思路如下:

from mrjob.job import MRJob


from mrjob.step import MRStep


import csv


 


class part2(MRJob):


 


    def steps(self):


        return [MRStep(mapper=self.mapper, reducer=self.reducer)]


        #return [MRStep(mapper=self.mapper)]


 


 


    def mapper(self, key, document):


        for word in document.split(','):


 


            yield word, 1


 


 


 


 


    def reducer(self, word, line):



 


         yield word, line


 


 


 


part2.run()

 

from mrjob.job import MRJob
class FrequencyCount(MRJob):
	def mapper(self, _, line):
		words = line.split(',')
		for word in words:
			yield (word, line)

	def reducer(self, key, values):
		lines = []
		for line in values:
			lines.append(line)
		yield key, lines

if __name__ == '__main__':
	FrequencyCount.run()

亲测有效~~如果有其他问题,欢迎私信与我交流~

如果问题得到解决,要记得采纳一手呀~ 

比较疑惑已创建的csv文看作一个数组吗?那我这里把单个单词作为键值映射可以吗?因为它要求同时输出单词和所在行。我还有个疑问就是,用行号的话,是把行还是单词看作一个值?如果要把行看作一个值的话是不是得的把行映射进去还是单独在reduce函数里操作