Today I tried to execute the following command in Linux, I want to test the Streaming interface in hadoop,
cat test.txt|php wc_mapper.php|python Reducer.py
an error happened:
"Traceback (most recent call last):
File "Reducer.py", line 7, in <module>
word,count = line.split()
ValueError: need more than 0 values to unpack
"
the content of test.txt is as follows:
hello world
hello world
hello world
the content of wc_mapper.php which is written by PHP is
#!/usr/bin/php
<?php
error_reporting(E_ALL ^ E_NOTICE);
$word2count = array();
while (($line = fgets(STDIN)) !== false) {
$line = trim($line);
$words = preg_split('/\W/', $line, 0, PREG_SPLIT_NO_EMPTY);
foreach ($words as $word) {
echo $word, chr(9), "1", PHP_EOL;
}
}
?>
and the content of Reducer.py which is written by Python is
#!/usr/bin/python
from operator import itemgetter
import sys
word2count = {}
for line in sys.stdin:
line = line.strip()
word,count = line.split()
try:
count = int(count)
word2count[word] = word2count.get(word, 0) + count
except ValueError:
pass
sorted_word2count = sorted(word2count.items(), key=itemgetter(0))
for word,count in sorted_word2count:
print '%s\t%s'%(word,count)
who knows the reason of the error, how to fix this issue? when I execute the first part command
cat test.txt|php wc_mapper.php|sort
, I got the following output:
hello 1
hello 1
hello 1
world 1
world 1
world 1
the first line is null, but it occupy one line.
Provide delimiter in split()
function
try:
word,count = line.split(" ")
except:
print("Error")
I have put single space as delimiter. you can change accordingly.