pycharm提交到yarn报错
# coding:utf8
from pyspark import SparkContext, SparkConf
import os
# 制定集群环境变量
os.environ['HADOOP_CONF_DIR'] = "/export/server/hadoop/etc/hadoop"
if __name__ == '__main__':
conf = SparkConf().setMaster('yarn').setAppName("test")
sc = SparkContext(conf=conf)
# 通过并行化集合的方式创建rdd
rdd1 = sc.parallelize([1, 2, 3, 4, 5, 6, 7, 8, 9])
rdd2 = sc.parallelize([1, 2, 3, 4, 5, 6, 7, 8, 9], 4)
print("默认分区数rdd1", rdd1.getNumPartitions(), rdd1.collect())
print("默认分区数rdd2", rdd2.getNumPartitions(), rdd2.collect())
但是在finalshell中运行正常
[hadoop@node1 spark]$ bin/spark-submit --master yarn /export/server/spark/examples/src/main/python/pi.py 1000
23/06/17 22:03:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/06/17 22:03:37 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
23/06/17 22:03:38 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Pi is roughly 3.140840
[hadoop@node1 spark]$ bin/spark-submit --master yarn /helloworld.py
23/06/17 22:10:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/06/17 22:10:16 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
23/06/17 22:10:17 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
['hello', 'tom', 'hello', 'allen', 'hello', 'allen', 'tom', 'mac', 'apple', 'hello', 'allen', 'apple', 'hello', 'apple', 'spark', 'allen', 'hadoop', 'spark']
[('hello', 1), ('tom', 1), ('hello', 1), ('allen', 1), ('hello', 1), ('allen', 1), ('tom', 1), ('mac', 1), ('apple', 1), ('hello', 1), ('allen', 1), ('apple', 1), ('hello', 1), ('apple', 1), ('spark', 1), ('allen', 1), ('hadoop', 1), ('spark', 1)]
[('allen', 4), ('hadoop', 1), ('tom', 2), ('mac', 1), ('hello', 5), ('apple', 3), ('spark', 2)]
yarn-site.xml 配置
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
<description>ResourceManager设置在node1节点</description>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data/nm-local</value>
<description>NodeManager中间数据本地存储路径</description>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/data/nm-log</value>
<description>NodeManager日志本地存储路径</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>为MapReduce开启shuffle服务</description>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://node1:19888/jobhistory/logs</value>
<description>历史服务器url</description>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>node1:8089</value>
<description>代理服务器主机和端口</description>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description>开启日志聚合</description>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
<description>程序日志HDFS的存储路径</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
<description>选择公平调度器</description>
</property>
</configuration>
报错信息
23/06/17 22:13:10 ERROR SparkContext: Error initializing SparkContext.
org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1687010107759_0006 to YARN : root is not a leaf queue
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:336)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:225)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:235)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:590)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
23/06/17 22:13:10 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to send shutdown message before the AM has registered!
23/06/17 22:13:10 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
23/06/17 22:13:10 WARN MetricsSystem: Stopping a MetricsSystem that is not running
Traceback (most recent call last):
File "/tmp/pycharm_project_602/01_rdd/01_rdd.py", line 10, in <module>
sc = SparkContext(conf=conf)
File "/export/server/anaconda3/envs/pyspark/lib/python3.10/site-packages/pyspark/context.py", line 200, in __init__
self._do_init(
File "/export/server/anaconda3/envs/pyspark/lib/python3.10/site-packages/pyspark/context.py", line 287, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "/export/server/anaconda3/envs/pyspark/lib/python3.10/site-packages/pyspark/context.py", line 417, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "/export/server/anaconda3/envs/pyspark/lib/python3.10/site-packages/py4j/java_gateway.py", line 1587, in __call__
return_value = get_return_value(
File "/export/server/anaconda3/envs/pyspark/lib/python3.10/site-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1687010107759_0006 to YARN : root is not a leaf queue
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:336)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:225)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:235)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:590)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
-
YARN的root队列中提交Spark应用,但root队列不是一个leaf队列,无法直接提交应用。
在YARN上为你的用户创建一个单独的队列,并将此队列设置为leaf队列。然后你可以直接提交应用到这个队列中。这个方法需要YARN管理员权限。
你去看下在SparkConf中指定的队列是否正确,并检查该队列是否有足够的资源可供使用
conf = (SparkConf()
.setMaster('yarn')
.setAppName("test")
.set('spark.yarn.queue', 'your_queue_name'))
其中,your_queue_name
应该被替换成你的 YARN 配置中存在的队列名称。
另一种可能的解决方案是在你的 YARN 配置中为 root 队列添加一个默认的子队列。这通常在 FairScheduler 的配置文件 fair-scheduler.xml
中进行。
这个错误是由于yarn配置不正确或yarn集群不可用引起的。具体来说,它表示您在提交应用程序时指定的yarn队列(root)不是一个叶子队列,而yarn要求应用程序必须提交到叶子队列。
建议:
1、通过运行“yarn application -list”命令来检查yarn应用程序是否正在运行。
2、可以在PyCharm的“Edit Configurations”菜单中指定正确的yarn配置,例如集群的地址和端口,以及要提交的应用程序的主类。
Spark>提交Yarn集群报错异常以及解决方法
https://blog.csdn.net/qq_44509920/article/details/105454355
回答如下:
如何解决在pycharm中提交spark到yarn时报错的问题?
首先,从报错信息入手,定位问题所在。在提交spark到yarn时,常见的报错信息包括但不限于以下几种:
java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx--x--x
SparkException: Application application_xxx failed 2 times due to AM Container for appattempt_xxx exited with exitCode: -1000
java.io.IOException: Failed to run spark-submit command. Reason: org.apache.spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:710)
根据不同的报错信息,采取对应的解决方案,分别如下:
java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx--x--x
解决方案:
通过修改HDFS上/tmp/hive的权限,将其改为可写,即可解决该问题。具体步骤如下:
```shell # 查看当前/tmp/hive的权限 hadoop fs -ls /tmp/
# 修改/tmp/hive的权限,注意:“-chmod 1777”表示将/tmp/hive的权限改为“rwxrwxrwt” hadoop fs -chmod 1777 /tmp/hive ```
SparkException: Application application_xxx failed 2 times due to AM Container for appattempt_xxx exited with exitCode: -1000
解决方案:
该问题通常是因为yarn中application所在的node节点空间不足或权限不足导致的。解决方案如下:
给yarn中application所在的node节点分配足够的权限
java.io.IOException: Failed to run spark-submit command. Reason: org.apache.spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:710)
解决方案:
首先,请确保yarn-site.xml中的配置正确,配置项一般包括:
如果配置正确,但仍然遇到该问题,可能是因为yarn的内存不够,建议通过增大yarn的内存或调整yarn的配置,来解决该问题。
另外,建议在报错的时候,记录下具体的报错信息,以便更好地进行分析和解决。
以上是解决在pycharm中提交spark到yarn时报错的一些常见的方案,希望会对你有所帮助。
根据报错信息来看,你的YARN框架无法将应用程序提交到YARN资源管理器中。将“root”队列设置为叶队列试试
root is not a leaf queue这个错误的意思就是没有配置节点队列。解决方法就是修改yarn-site.xml配置,添加一下配置:
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler