HADOOP HA环境下,手动kill掉活动的namenode(nn1),然后用hdfs haadmin -transitionToActive nn2命令将活动namenode切换到nn2这个节点上,报错如下,特别注意到报错信息中为什么是centos01/192.168.10.131 to centos01:8020,感觉应该连centos02才对啊。
请问如何处理:
[hadoop@centos01 tmp]$ jps
8868 NameNode
9413 Jps
8990 DataNode
9199 JournalNode
[hadoop@centos01 tmp]$ kill -9 8868
[hadoop@centos01 tmp]$ hdfs haadmin -transitionToActive nn2
21/07/18 08:54:30 INFO ipc.Client: Retrying connect to server: centos01/192.168.10.131:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
21/07/18 08:54:30 WARN ipc.Client: Failed to connect to server: centos01/192.168.10.131:8020: retries get failed due to exceeded maximum allowed retries number: 1
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:682)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:778)
at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1544)
at org.apache.hadoop.ipc.Client.call(Client.java:1375)
at org.apache.hadoop.ipc.Client.call(Client.java:1339)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy8.getServiceStatus(Unknown Source)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.getServiceStatus(HAServiceProtocolClientSideTranslatorPB.java:122)
at org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:179)
at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:147)
at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:451)
at org.apache.hadoop.hdfs.tools.DFSHAAdmin.runCmd(DFSHAAdmin.java:121)
at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:384)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.hdfs.tools.DFSHAAdmin.main(DFSHAAdmin.java:135)
Unexpected error occurred Call From centos01/192.168.10.131 to centos01:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Usage: haadmin [-ns ] [-transitionToActive [--forceactive] ]
[hadoop@centos01 tmp]$
--hdfs-site.xml 已同步到centos02,centos03节点
dfs.replication
2
dfs.nameservices
mycluster
dfs.ha.namenodes.mycluster
nn1,nn2
dfs.namenode.rpc-address.mycluster.nn1
centos01:8020
dfs.namenode.rpc-address.mycluster.nn2
centos02:8020
dfs.namenode.http-address.mycluster.nn1
centos01:50070
dfs.namenode.http-address.mycluster.nn2
centos02:50070
dfs.namenode.shared.edits.dir
qjournal://centos01:8485;centos02:8485;centos03:8485/mycluster
dfs.journalnode.edits.dir
/opt/modules/hadoop-2.8.2/tmp/dfs/jn
dfs.client.failover.proxy.provider.mycluster
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.fencing.methods
sshfence
shell(/bin/true)
dfs.ha.fencing.ssh.private-key-files
/home/hadoop/.ssh/id_rsa
dfs.permission.enable
false
dfs.namenode.name.dir
file:/opt/modules/hadoop-2.8.2/tmp/dfs/name
dfs.datanode.data.dir
file:/opt/modules/hadoop-2.8.2/tmp/dfs/data
--core-site.xml
fs.defaultFS hdfs://mycluster hadoop.tmp.dir file:/opt/modules/hadoop-2.8.2/tmp可对比以下以下的过程:https://www.cnblogs.com/nucdy/p/5707914.html
我的操作过程基本就是这个过程,还是报错。