在pyspark环境下,无法使用sortbykey函数
rdd = sc.parallelize([("hello",1),("world",2),("china",3),("Beijing",4)])
print("rdd数据集做sorybyKey操作结果如下:\n")
print(rdd.sortByKey(False).collect())
报错为
2022-10-18 05:01:07,861 ERROR executor.Executor: Exception in task 0.0 in stage 4.0 (TID 3)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 668, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 85, in read_command
command = serializer._read_with_length(file)
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 173, in _read_with_length
return self.loads(obj)
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 452, in loads
return pickle.loads(obj, encoding=encoding)
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle.py", line 590, in _create_parametrized_type_hint
return origin[args]
File "/usr/lib64/python3.6/typing.py", line 682, in inner
return func(*args, **kwds)
File "/usr/lib64/python3.6/typing.py", line 1131, in getitem
_check_generic(self, params)
File "/usr/lib64/python3.6/typing.py", line 662, in _check_generic
("many" if alen > elen else "few", repr(cls), alen, elen))
TypeError: Too many parameters for typing.Iterable; actual 2, expected 1
应该是你其他地方的代码的问题,跟sortByKey无关