hive使用Tez引擎失败

相关环境描述:
1.java版本:jdk1.8_212
2.hadoop版本:3.1.3
3.hive版本:3.1.2
4.tez版本:0.9.2

问题描述:
在hive中使用tez引擎报错,就连更换回mr也使用不了了,具体原因如下:

0: jdbc:hive2://node01:10000> select count(1) from emp;
INFO  : Compiling command(queryId=root_20210817113418_58b6d06e-46d9-47d6-a893-fca739db7ea0): select count(1) from emp
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=root_20210817113418_58b6d06e-46d9-47d6-a893-fca739db7ea0); Time taken: 0.146 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=root_20210817113418_58b6d06e-46d9-47d6-a893-fca739db7ea0): select count(1) from emp
INFO  : Query ID = root_20210817113418_58b6d06e-46d9-47d6-a893-fca739db7ea0
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
WARN  : The session: sessionId=c7f2b41a-0426-40ae-993b-b3f17640520b, queueName=null, user=root, doAs=true, isOpen=false, isDefault=false has not been opened
INFO  : Subscribed to counters: [] for queryId: root_20210817113418_58b6d06e-46d9-47d6-a893-fca739db7ea0
INFO  : Tez session hasn't been created yet. Opening session
ERROR : Failed to execute tez graph.
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1629168275110_0004 failed 2 times due to AM Container for appattempt_1629168275110_0004_000002 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2021-08-17 11:34:23.036]Exception from container-launch.
Container id: container_1629168275110_0004_02_000001
Exit code: 1

[2021-08-17 11:34:23.038]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :


[2021-08-17 11:34:23.039]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :


For more detailed output, check the application tracking page: http://node02:8088/cluster/app/application_1629168275110_0004 Then click on links to logs of each attempt.
. Failing the application.
    at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:1013) ~[tez-api-0.9.2.jar:0.9.2]
    at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:982) ~[tez-api-0.9.2.jar:0.9.2]
    at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezSessionState.java:453) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(TezSessionState.java:368) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.openInternal(TezSessionPoolSession.java:124) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:245) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezTask.java:368) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:195) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) ~[hive-exec-3.1.2.jar:3.1.2]
    at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224) ~[hive-service-3.1.2.jar:3.1.2]
    at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) ~[hive-service-3.1.2.jar:3.1.2]
    at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316) ~[hive-service-3.1.2.jar:3.1.2]
    at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_212]
    at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_212]
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) ~[hadoop-common-3.1.3.jar:?]
    at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329) ~[hive-service-3.1.2.jar:3.1.2]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_212]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_212]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_212]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_212]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
INFO  : Completed executing command(queryId=root_20210817113418_58b6d06e-46d9-47d6-a893-fca739db7ea0); Time taken: 4.709 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=1)


相关配置信息:
1.hadoop core-site

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node01:8020</value>
    </property>
    <!-- 文件存储目录 -->
    <property>
        <name>hadoop.data.dir</name>
        <value>/opt/servers/hadoop-3.1.3/datas</value>
    </property>
    <!-- 临时文件存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/servers/hadoop-3.1.3/datas/tmp</value>
    </property>
    <property>
            <name>hadoop.proxyuser.root.hosts</name>
            <value>*</value>
     </property>
    <property>
            <name>hadoop.proxyuser.root.groups</name>
            <value>*</value>
    </property>
        <property>
            <name>hadoop.http.staticuser.user</name>
            <value>root</value>
        </property>

</configuration>

2.hadoop capacity-schedular

<property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,MyLine</value>
    <description>
      The queues at the this level (root is the root queue).
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>40</value>
    <description>Default queue target capacity.</description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
    <value>1</value>
    <description>
      Default queue user limit a percentage from 0.0 to 1.0.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>60</value>
    <description>
      The maximum capacity of the default queue. 
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.state</name>
    <value>RUNNING</value>
    <description>
      The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
    <value>*</value>
    <description>
      The ACL of who can submit jobs to the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
    <value>*</value>
    <description>
      The ACL of who can administer jobs on the default queue.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.default.acl_application_max_priority</name>
    <value>*</value>
    <description>
      The ACL of who can submit applications with configured priority.
      For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
    </description>
  </property>

   <property>
     <name>yarn.scheduler.capacity.root.default.maximum-application-lifetime
     </name>
     <value>-1</value>
     <description>
        Maximum lifetime of an application which is submitted to a queue
        in seconds. Any value less than or equal to zero will be considered as
        disabled.
        This will be a hard time limit for all applications in this
        queue. If positive value is configured then any application submitted
        to this queue will be killed after exceeds the configured lifetime.
        User can also specify lifetime per application basis in
        application submission context. But user lifetime will be
        overridden if it exceeds queue maximum lifetime. It is point-in-time
        configuration.
        Note : Configuring too low value will result in killing application
        sooner. This feature is applicable only for leaf queue.
     </description>
   </property>

   <property>
     <name>yarn.scheduler.capacity.root.default.default-application-lifetime
     </name>
     <value>-1</value>
     <description>
        Default lifetime of an application which is submitted to a queue
        in seconds. Any value less than or equal to zero will be considered as
        disabled.
        If the user has not submitted application with lifetime value then this
        value will be taken. It is point-in-time configuration.
        Note : Default lifetime can't exceed maximum lifetime. This feature is
        applicable only for leaf queue.
     </description>
   </property>

    <property>
    <name>yarn.scheduler.capacity.root.MyLine.capacity</name>
    <value>60</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.MyLine.user-limit-factor</name>
    <value>1</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.MyLine.maximum-capacity</name>
    <value>80</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.MyLine.state</name>
    <value>RUNNING</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.MyLine.acl_submit_applications</name>
    <value>*</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.MyLine.acl_administer_queue</name>
    <value>*</value>
  </property>

  <property>
    <name>yarn.scheduler.capacity.root.MyLine.acl_application_max_priority</name>
    <value>*</value>
  </property>

   <property>
     <name>yarn.scheduler.capacity.root.MyLine.maximum-application-lifetime
     </name>
     <value>-1</value>
   </property>

   <property>
     <name>yarn.scheduler.capacity.root.MyLine.default-application-lifetime
     </name>
     <value>-1</value>
   </property>
    
  <property>
    <name>yarn.scheduler.capacity.node-locality-delay</name>
    <value>40</value>
    <description>
      Number of missed scheduling opportunities after which the CapacityScheduler 
      attempts to schedule rack-local containers.
      When setting this parameter, the size of the cluster should be taken into account.
      We use 40 as the default value, which is approximately the number of nodes in one rack.
      Note, if this value is -1, the locality constraint in the container request
      will be ignored, which disables the delay scheduling.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.rack-locality-additional-delay</name>
    <value>-1</value>
    <description>
      Number of additional missed scheduling opportunities over the node-locality-delay
      ones, after which the CapacityScheduler attempts to schedule off-switch containers,
      instead of rack-local ones.
      Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will
      attempt rack-local assignments after 40 missed opportunities, and off-switch assignments
      after 40+20=60 missed opportunities.
      When setting this parameter, the size of the cluster should be taken into account.
      We use -1 as the default value, which disables this feature. In this case, the number
      of missed opportunities for assigning off-switch containers is calculated based on
      the number of containers and unique locations specified in the resource request,
      as well as the size of the cluster.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings</name>
    <value></value>
    <description>
      A list of mappings that will be used to assign jobs to queues
      The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
      Typically this list will be used to map users to queues,
      for example, u:%user:%user maps all users to queues with the same name
      as the user.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
    <value>false</value>
    <description>
      If a queue mapping is present, will it override the value specified
      by the user? This can be used by administrators to place jobs in queues
      that are different than the one specified by the user.
      The default is false.
    </description>
  </property>

  <property>
    <name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments</name>
    <value>1</value>
    <description>
      Controls the number of OFF_SWITCH assignments allowed
      during a node's heartbeat. Increasing this value can improve
      scheduling rate for OFF_SWITCH containers. Lower values reduce
      "clumping" of applications on particular nodes. The default is 1.
      Legal values are 1-MAX_INT. This config is refreshable.
    </description>
  </property>


  <property>
    <name>yarn.scheduler.capacity.application.fail-fast</name>
    <value>false</value>
    <description>
      Whether RM should fail during recovery if previous applications'
      queue is no longer valid.
    </description>
  </property>

</configuration>


  1. hadoop hdfs-site
<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file://${hadoop.data.dir}/namenode</value>
    </property>
    <!-- datanode数据的存放路径 -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file://${hadoop.data.dir}/datanode</value>
    </property>
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>file://${hadoop.data.dir}/namesecondry</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>node02:9868</value>
    </property>
    <property>
        <name>dfs.client.datanode-restart.timeout</name>
        <value>30</value>
    </property>

</configuration>


4.hadoop mapred-site


```html
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn-tez</value>
    </property>

    <!-- 历史服务器端地址 -->
    <property>
            <name>mapreduce.jobhistory.address</name>
            <value>node01:10020</value>
    </property>
    <!-- 历史服务器web端地址 -->
    <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>node01:19888</value>
    </property>
    <property>
      <name>mapreduce.map.memory.mb</name>
      <value>2048</value>
    </property>
    <property>
      <name>mapreduce.map.java.opts</name>
      <value>-Xmx2048M</value>
    </property>
    <property>
      <name>mapreduce.reduce.memory.mb</name>
      <value>4096</value>
    </property>
    <property>
      <name>mapreduce.reduce.java.opts</name>
      <value>-Xmx4096M</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*, 
        $HADOOP_COMMON_HOME/lib/*, 
        $HADOOP_HDFS_HOME/*, 
        $HADOOP_HDFS_HOME/lib/*, 
        $HADOOP_MAPRED_HOME/*, 
        $HADOOP_MAPRED_HOME/lib/*, 
        $HADOOP_YARN_HOME/*, 
        $HADOOP_YARN_HOME/lib/*</value>
    </property>
</configuration>

  1. hadoop yarn -site
<configuration>
    <!-- 设置不检查虚拟内存的值,不然内存不够会报错 -->
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property> 
    <!--检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉 -->
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property> 
    
    <!-- yarn上面运行一个任务,最少需要1.5G内存,虚拟机没有这么大的内存就调小这个值,不然会报错 -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>node02</value>
    </property>
    <property>
            <name>yarn.nodemanager.env-whitelist</name>
            <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
        </property>
    <!-- 开启日志聚集 -->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>  
        <name>yarn.log.server.url</name>  
        <value>http:/node01:19888/jobhistory/logs</value>
    </property>
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>
    <property>
        <name>yarn.scheduler.fair.user-as-default-queue</name>
        <value>MyLine</value>
    </property>


</configuration>

  1. hive-site
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://node01:3306/metastore?useSSL=false</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
    </property>

    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>

    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>

    <property>
        <name>datanucleus.schema.autoCreateAll</name>
        <value>true</value> 
    </property>

    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://node01:9083</value>
    </property>

    <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    </property>

    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>node01</value>
    </property>

    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.execution.engine</name>
        <value>tez</value>
    </property>

</configuration>

  1. tez-site
<configuration>
<property>
    <name>tez.lib.uris</name>
    <value>${fs.defaultFS}/tez/tez.tar.gz</value>
</property>
<property>
     <name>tez.use.cluster.hadoop-libs</name>
     <value>false</value>
</property>
<property>
     <name>tez.history.logging.service.class</name>
     <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>

<property>
    <name>tez.am.resource.memory.mb</name>
    <value>1024</value>
</property>
<property>
    <name>tez.am.resource.cpu.vcores</name>
    <value>1</value>
</property>
<property>
    <name>tez.container.max.java.heap.fraction</name>
    <value>0.4</value>
</property>
<property>
    <name>tez.task.resource.memory.mb</name>
    <value>1024</value>
</property>
<property>
    <name>tez.task.resource.cpu.vcores</name>
    <value>1</value>
</property>
</configuration>


  1. hadoop hadoop-env
export TEZ_CONF_DIR=$HADOOP_HOME/etc/hadoop
export TEZ_JARS=/opt/servers/tez
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*

9.环境变量相关设置

#JAVA_HOME
export JAVA_HOME=/opt/servers/jdk1.8.0_212
export PATH=$PATH:$JAVA_HOME/bin

##HADOOP_HOME
export HADOOP_HOME=/opt/servers/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

#ZOOKEEPER_HOME
export ZOOKEEPER_HOME=/opt/servers/zookeeper-3.5.7
export PATH=$PATH:$ZOOKEEPER_HOME/bin

#HIVE_HOME
export HIVE_HOME=/opt/servers/hive
export PATH=$PATH:$HIVE_HOME/bin
救救孩子吧,要被折磨疯了

关闭所有进程,将配置文件里的所有的中文注释全部删除,记得全部删除,然后保存,完成后从新启动,再次尝试你的操作

这里有相应版本的Tez安装的各种踩坑总结,可以对照参考一下:
Hive3.1.2+大数据引擎Tez0.9.2安装部署到使用测试(踩坑详情)_后来X大数据-CSDN博客 安装tez的过程可谓是坑有点多,编译还是相对简单的。现在复盘一下,以下是我的版本号框架版本号Hadoop3.1.3Hive3.1.2Tez0.10.1能看到这篇文章的,说明各位也能知道tez是干啥的,这里就不介绍了,直接开始安装我们可以在官网看到,Hadoop3.X版本要使用Tez引擎是需要自己编译的(对于0.8.3和更高版本的Tez,Tez需要Apache Hadoop的版本为2.6.0或更高。对于0.9.0及更高版本的Tez,Tez需要Apache Had https://blog.csdn.net/weixin_38586230/article/details/106029127
如有帮助,望及时采纳