Python中使用pyspark模块,用os.environ配置环境变量,运行时报错说无法找到可用的python。
运行错误如下:
Missing Python executable 'D:\Python\Python 3.11.1\python.exe', defaulting to 'D:\Python\Python 3.11.1\Lib\site-packages\pyspark\bin\..' for SPARK_HOME environment variable. Please install Python or specify the correct Python executable in PYSPARK_DRIVER_PYTHON or PYSPARK_PYTHON environment variable to detect SPARK_HOME safely.
具体代码如下:
from pyspark import SparkContext, SparkConf
import os
os.environ['PYSPARK_PYTHON'] = "D:/Python/Python 3.11.1/python.exe"
os.environ['HADOOP_HOME'] = "D:/hadoop-3.0.0"
conf = SparkConf().setMaster("local[*]").setAppName("test_spark_app")
sc = SparkContext(conf=conf)
rdd = sc.parallelize([1, 2, 3, 4, 5], 1)
rdd.saveAsTextFile("D:/output")
python已经安装并且设置环境变量。
可以试一下设置运行环境
set PYSPARK_PYTHON=D:\Python\Python 3.11.1\python.exe #应该是这个路径 可以再确认一下