GBase 8a数据库通过LOAD方式加载时,如果出现功能报错,如何排查?

GBase 8a数据库通过LOAD方式加载时,如果出现功能报错,或者性能问题,如何排查?

前提参数设置
// 默认值是3,这个是session级别的,无需恢复
gbase> set gcluster_log_level=7;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)

// 默认值是0,建议先show一下原始值,排查完了记得改回去。
gbase> set global gbase_sql_trace_level=15;
Query OK, 0 rows affected (Elapsed: 00:00:00.12)

// 默认值是0,这个是session级别,无需恢复
gbase> set gbase_sql_trace=1;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)

执行语句
gbase> load data infile ‘ftp://gbase:gbase1234@192.168.163.100/t1.txt’ into table testdb.t1;
Query OK, 4 rows affected (Elapsed: 00:00:00.35)
Task 394392 finished, Loaded 4 records, Skipped 0 records

查看日志
/opt/gbase/gcluster/log/gcluster/express.log

SQL处于check permission状态

获得表的锁

如果长时间无法拿到锁,通过gcadmin showlock 看看对应的表的锁在哪个节点哪个ID持有

2019-07-03 15:02:41.352 [LOCK][INFO ][S:2108][Q:13221]:acquired WRITE lock: testdb.t1580D5F90-B287-4199-B057-E6FBD44B5BFA, lwp:35693
2019-07-03 15:02:41.352 [LOCK][INFO ][S:2108][Q:13221]:acquired READ lock: testdb.t1, lwp:35693
2019-07-03 15:02:41.353 [LOCK][INFO ][S:2108][Q:13221]:acquired READ lock: testdb.t1.rsync, lwp:35693

读取数据源ftp文件信息

检查数据库和FTP之间的性能问题,包括网络和FTP服务器的设置

2019-07-03 15:02:41.354 [LOAD][INFO ][S:2108][Q:13221]:Start cluster load
2019-07-03 15:02:41.358 == Information : Trying 192.168.163.100...
2019-07-03 15:02:41.362 == Information : Connected to 192.168.163.100 (192.168.163.100) port 21 (#0)
2019-07-03 15:02:41.370 <= Recv header : 220 (vsFTPd 3.0.2)
2019-07-03 15:02:41.371 => Send header : USER gbase
2019-07-03 15:02:41.371 <= Recv header : 331 Please specify the password.
2019-07-03 15:02:41.371 => Send header : PASS gbase1234
2019-07-03 15:02:41.419 <= Recv header : 230 Login successful.
2019-07-03 15:02:41.419 => Send header : PWD
2019-07-03 15:02:41.419 <= Recv header : 257 "/home/gbase"
2019-07-03 15:02:41.419 == Information : Entry path is '/home/gbase'
2019-07-03 15:02:41.419 => Send header : TYPE I
2019-07-03 15:02:41.419 == Information : ftp_perform ends with SECONDARY: 0
2019-07-03 15:02:41.420 <= Recv header : 200 Switching to Binary mode.
2019-07-03 15:02:41.420 => Send header : SIZE t1.txt
2019-07-03 15:02:41.420 <= Recv header : 213 8
2019-07-03 15:02:41.420 => Send header : REST 0
2019-07-03 15:02:41.420 <= Recv header : 350 Restart position accepted (0).
2019-07-03 15:02:41.421 == Information : Remembering we are in dir ""
2019-07-03 15:02:41.421 == Information : Connection #0 to host 192.168.163.100 left intact
2019-07-03 15:02:41.422 == Information : Trying 192.168.163.100...
2019-07-03 15:02:41.422 == Information : Connected to 192.168.163.100 (192.168.163.100) port 21 (#0)
2019-07-03 15:02:41.424 <= Recv header : 220 (vsFTPd 3.0.2)
2019-07-03 15:02:41.424 => Send header : USER gbase
2019-07-03 15:02:41.425 <= Recv header : 331 Please specify the password.
2019-07-03 15:02:41.425 => Send header : PASS gbase1234
2019-07-03 15:02:41.465 <= Recv header : 230 Login successful.
2019-07-03 15:02:41.465 => Send header : PWD
2019-07-03 15:02:41.465 <= Recv header : 257 "/home/gbase"
2019-07-03 15:02:41.465 == Information : Entry path is '/home/gbase'
2019-07-03 15:02:41.465 => Send header : TYPE I
2019-07-03 15:02:41.465 == Information : ftp_perform ends with SECONDARY: 0
2019-07-03 15:02:41.465 <= Recv header : 200 Switching to Binary mode.
2019-07-03 15:02:41.465 => Send header : SIZE t1_2.txt
2019-07-03 15:02:41.466 <= Recv header : 213 8
2019-07-03 15:02:41.466 => Send header : REST 0
2019-07-03 15:02:41.466 <= Recv header : 350 Restart position accepted (0).
2019-07-03 15:02:41.466 == Information : Remembering we are in dir ""
2019-07-03 15:02:41.466 == Information : Connection #0 to host 192.168.163.100 left intact

#准备分派任务
2019-07-03 15:02:41.467 [LOAD][INFO ][S:2108][Q:13221]:AddNodeLoadFileNum host[::ffff:192.168.163.101]add num [1].
2019-07-03 15:02:41.474 [LOCK][INFO ][S:2108][Q:13221]:unlocked: testdb.t1
2019-07-03 15:02:41.474 [LOCK][INFO ][S:2108][Q:13221]:acquired WRITE lock: testdb.t16ef6f8a9-87f0-4d6c-8043-899367d02df3, lwp:35693
2019-07-03 15:02:41.475 [LOAD][INFO ][S:2108][Q:13221]:AddNodeLoadFileNum cur node state:[ip:::ffff:192.168.163.100:num:0][ip:::ffff:192.168.163.101:num:1].

此时SQL已经在loading状态,

开始下发任务。一般是某个节点性能导致,或者数据非常倾斜。

a) 可以测试从每个数据库的数据节点,用ftp直接获取数据文件的性能

比如 curl ftp://XXXXX/XX.csv > 1.txt

多跑几次,看看哪个节点比较慢

b) 排查机器的内存是否出现SWAP, 操作系统/var/log/messages是否有磁盘报错

c) 查看CPUINFO是否有异常降频

d) 用top、iotop、nmon等工具观察一段时间系统资源,是否有资源紧张

2019-07-03 15:02:41.476 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:set scn = 395447.
2019-07-03 15:02:41.521 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:LOAD DATA INFILE 'ftp://gbase:*********@192.168.163.100/t1.txt#offset=0&length=8&firstblock&ffsize=8,ftp://gbase:*********@192.168.163.100/t1_2.txt#offset=0&length=8&firstblock&ffsize=8' INTO TABLE testdb.t1_n1 DATA_FORMAT 3 FILE_FORMAT UNDEFINED HOST '::ffff:192.168.163.101' CURRENT_TIMESTAMP 1562137361 SCN_NUMBER 395447 GCLUSTER_PORT 5258 INTO SERVER (HOST '::ffff:192.168.163.100, ::ffff:192.168.163.101', PORT '5050', USER 'root', DATABASE 'testdb', TABLE 't1_n1, t1_n2', COMMENT 'table_host 0 1 0 # 1 0 1, scn 395447, group -1').
2019-07-03 15:02:42.004 [LOAD][INFO ][S:2108][Q:13221]:ReduceNodeLoadFileNum host[::ffff:192.168.163.100] reduce num [0].
2019-07-03 15:02:42.004 [LOAD][INFO ][S:2108][Q:13221]:ReduceNodeLoadFileNum host[::ffff:192.168.163.101] reduce num [1].
2019-07-03 15:02:42.004 [LOAD][INFO ][S:2108][Q:13221]:ReduceNodeLoadFileNum cur node state:[ip:::ffff:192.168.163.100:num:0][ip:::ffff:192.168.163.101:num:0].
2019-07-03 15:02:42.004 [LOAD][INFO ][S:2108][Q:13221]:Task 395447 finished, Loaded 8 records, Skipped 0 records

拿到排它锁,要切换版本了。

2019-07-03 15:02:42.008 [LOCK][INFO ][S:2108][Q:13221]:acquired WRITE lock: testdb.t1.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F, lwp:35693

设置SCN

2019-07-03 15:02:42.008 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:SET SELF SCN = 395447.
2019-07-03 15:02:42.009 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:SET SELF SCN = 395447.
2019-07-03 15:02:42.009 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:SET SELF SCN = 395447.
2019-07-03 15:02:42.009 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:SET SELF SCN = 395447.

以指定的SCN来刷新数据

2019-07-03 15:02:42.011 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:flush commit "testdb"."t1_n1" scn_number 395447.
2019-07-03 15:02:42.011 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:flush commit "testdb"."t1_n2" scn_number 395447.
2019-07-03 15:02:42.019 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:flush commit "testdb"."t1_n2" scn_number 395447.
2019-07-03 15:02:42.055 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:flush commit "testdb"."t1_n1" scn_number 395447.

验证SCN

2019-07-03 15:02:42.060 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:select scn from information_schema.tables where table_schema = 'testdb' and table_name = 't1_n1'.
2019-07-03 15:02:42.061 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:select scn from information_schema.tables where table_schema = 'testdb' and table_name = 't1_n1'.
2019-07-03 15:02:42.061 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:select scn from information_schema.tables where table_schema = 'testdb' and table_name = 't1_n2'.
2019-07-03 15:02:42.061 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:select scn from information_schema.tables where table_schema = 'testdb' and table_name = 't1_n2'.
2019-07-03 15:02:42.063 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:set autocommit=1.
2019-07-03 15:02:42.063 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:set autocommit=1.
2019-07-03 15:02:42.063 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:set autocommit=1.
2019-07-03 15:02:42.063 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:set autocommit=1.

SQL处于commit阶段 或者 rollback阶段

提交commit成功

2019-07-03 15:02:42.067 [COMMIT][INFO ][S:2108][Q:13221]:commit table(testdb.t1) success for scn:395447.
2019-07-03 15:02:42.068 [LOAD][INFO ][S:2108][Q:13221]:successful to cluster load

释放锁

2019-07-03 15:02:42.068 [LOCK][INFO ][S:2108][Q:13221]:unlocked: testdb
2019-07-03 15:02:42.069 [LOCK][INFO ][S:2108][Q:13221]:unlocked: testdb.t1.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F
2019-07-03 15:02:42.070 [LOCK][INFO ][S:2108][Q:13221]:unlocked: testdb.t1.rsync
2019-07-03 15:02:42.071 [LOCK][INFO ][S:2108][Q:13221]:unlocked: testdb.t1580D5F90-B287-4199-B057-E6FBD44B5BFA
2019-07-03 15:02:42.072 [LOCK][INFO ][S:2108][Q:13221]:unlocked: testdb.t16ef6f8a9-87f0-4d6c-8043-899367d02df3

你好,我是有问必答小助手,非常抱歉,本次您提出的有问必答问题,技术专家团超时未为您做出解答


本次提问扣除的有问必答次数,将会以问答VIP体验卡(1次有问必答机会、商城购买实体图书享受95折优惠)的形式为您补发到账户。


因为有问必答VIP体验卡有效期仅有1天,您在需要使用的时候【私信】联系我,我会为您补发。