I've tried to convert bz2 to text with "Wikixtractor (https://github.com/attardi/wikiextractor).
I've downloaded wikipedia dump with bz2 extension then on command line used this line of code:
python Wikiextractor.py -b 85M -o extracted D:\wikiextractor-master\wikiextractor\zhwiki-latest-pages-articles.xml.bz2
After finishing preprocessing the pages, I came out with error like this:
INFO: Preprocessed 3700000 pages
INFO: Loaded 974903 templates in 589.2s
INFO: Starting page extraction from D:\wikiextractor-master\wikiextractor\zhwiki-latest-pages-articles.xml.bz2.
Traceback (most recent call last):
File "Wikiextractor.py", line 641, in <module>
main()
File "Wikiextractor.py", line 636, in main
process_dump(input_file, args.templates, output_path, file_size,
File "Wikiextractor.py", line 364, in process_dump
reduce.start()
File "D:\anoconda\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "D:\anoconda\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\anoconda\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "D:\anoconda\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
reduction.dump(process_obj, to_child)
File "D:\anoconda\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "D:\anoconda\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "D:\anoconda\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
How can I fix this? Thanks!
在用wikiextractor解压维基百科数据包时报错“EOFError: Ran out of input”,转了一圈没人遇到我这个问题。求答案,感激不尽!
您好,我是有问必答小助手,你的问题已经有小伙伴为您解答了问题,您看下是否解决了您的问题,可以追评进行沟通哦~
如果有您比较满意的答案 / 帮您提供解决思路的答案,可以点击【采纳】按钮,给回答的小伙伴一些鼓励哦~~
ps:问答VIP仅需29元,即可享受5次/月 有问必答服务,了解详情>>>https://vip.csdn.net/askvip?utm_source=1146287632
非常感谢您使用有问必答服务,为了后续更快速的帮您解决问题,现诚邀您参与有问必答体验反馈。您的建议将会运用到我们的产品优化中,希望能得到您的支持与协助!
速戳参与调研>>>https://t.csdnimg.cn/Kf0y
哎,我也遇到了同样的错误,请问你解决了吗?