使用jupyter远程开发pyspark报错

#报错信息



23/04/22 18:48:21 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1]
org.apache.spark.SparkException: 
Error from python worker:
  usage: jupyter [-h] [--version] [--config-dir] [--data-dir] [--runtime-dir]
                 [--paths] [--json] [--debug]
                 [subcommand]
  
  Jupyter: Interactive Computing
  
  positional arguments:
    subcommand     the subcommand to launch
  
  options:
    -h, --help     show this help message and exit
    --version      show the versions of core jupyter packages and exit
    --config-dir   show Jupyter config dir
    --data-dir     show Jupyter data dir
    --runtime-dir  show Jupyter runtime dir
    --paths        show all Jupyter paths. Add --json for machine-readable
                   format.
    --json         output paths as machine-readable json
    --debug        output debug information about paths
  
  Available subcommands: bundlerextension console contrib dejavu execute kernel
  kernelspec lab labextension labhub migrate nbclassic nbconvert nbextension
  nbextensions_configurator notebook qtconsole run server serverextension
  troubleshoot trust
  
  Jupyter command `jupyter-pyspark.daemon` not found.
PYTHONPATH was:
  /opt/module/spark/python/lib/pyspark.zip:/opt/module/spark/python/lib/py4j-0.10.9-src.zip:/opt/module/spark/jars/spark-core_2.12-3.1.3.jar:/opt/module/spark/python/lib/py4j-0.10.9-src.zip:/opt/module/spark/python/:
org.apache.spark.SparkException: EOFException occurred while reading the port number from pyspark.daemon's stdout
    at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:217)
    at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:132)
    at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105)
    at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
    at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:145)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)

#bashrc文件


export JAVA_HOME=/opt/module/java
export PYSPARK_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
不知道你这个问题是否已经解决, 如果还没有解决的话:
  • 文章:使用jupyter交互pyspark 中也许有你想要的答案,请看下吧
  • 以下回答来自chatgpt:

    我需要更具体的报错信息以及正在使用的操作系统、pyspark版本以及jupyter notebook版本,才能给出准确的解答。请提供这些信息。


如果你已经解决了该问题, 非常希望你能够分享一下解决方案, 写成博客, 将相关链接放在评论区, 以帮助更多的人 ^-^