目前某软件工作在CentOS7.7.1908的环境中,据前任管理员所述,软件初装时的CentOS版本为7.1。系统初装后只安装了该软件和VNC服务,目前出现软件启动时间过长的问题,使用 strace 命令跟踪软件的启动命令,发现启动过程卡在如下步骤:
(下为图中部分代码,进程号因非同次运行有所差异)
...
mmap(NULL, 2101248, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x2b84b04e4000
mprotect(0x2b84b04e4000, 4096, PROT_NONE) = 0
clone(child_stack=0x2b84b06e3fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x2b84b06e49d0, tls=0x2b84b06e4700, child_tidptr=0x2b84b06e49d0) = 44948
futex(0x2b84b06e49d0, FUTEX_WAIT, 44948, NULL
使用 pstack 命令跟踪 FUTEX 内的进程号,如下:
[soi@soi217 ~]$ pstack 44948
Thread 1 (process 44948):
#0 0x00002b84a5a44bed in poll () from /lib64/libc.so.6
#1 0x0000000000673a58 in ?? ()
#2 0x0000000000671e16 in ?? ()
#3 0x000000000066c964 in ?? ()
#4 0x00000000006856ea in tGNGsTqIW9kHrIA ()
#5 0x00002b84a4af2e65 in start_thread () from /lib64/libpthread.so.0
#6 0x00002b84a5a4f88d in clone () from /lib64/libc.so.6
同时使用 strace -p 命令跟踪该进程号,如下:
...
restart_syscall(<... resuming interrupted poll ...>) = 0
poll([{fd=4, events=POLLOUT}], 1, 0) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 1000) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 0) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 1000) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 0) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 1000) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 0) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 1000) = 0 (Timeout)
...
该进程中poll命令不断超时,直至数分钟后:
...
poll([{fd=4, events=POLLOUT}], 1, 1000) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 0) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 1000) = 1 ([{fd=4, revents=POLLOUT|POLLERR|POLLHUP}])
poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT|POLLERR|POLLHUP}])
getsockopt(4, SOL_SOCKET, SO_ERROR, [110], [4]) = 0
close(4) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [ALRM], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [ALRM], NULL, 8) = 0
nanosleep({1, 0}, 0x2b84b06e3cf0) = 0
...
同时strace跟踪的终端停止卡顿,软件随之运行(下述代码中+-+-+处为之前的卡顿点)。
...
clone(child_stack=0x2b84b06e3fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x2b84b06e49d0, tls=0x2b84b06e4700, child_tidptr=0x2b84b06e49d0) = 44948
futex(0x2b84b06e49d0, FUTEX_WAIT, 44948, NULL+-+-+) = 0
clone(child_stack=0x2b84b06e3fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x2b84b06e49d0, tls=0x2b84b06e4700, child_tidptr=0x2b84b06e49d0) = 50072
rt_sigprocmask(SIG_BLOCK, [CHLD], [ALRM], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [ALRM], NULL, 8) = 0
nanosleep({1, 0}, 0x7ffc6071fd30) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [ALRM], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [ALRM], NULL, 8) = 0
nanosleep({1, 0}, 0x7ffc6071fd30) = 0
...
本人并非计算机相关从业人员,不能很好的理解代码所传递的信息,所以希望有熟悉相关知识的前辈可以指点迷津,能够指出相关的解决思路,后辈在此感激不尽。
后续:
仔细阅读strace -p后发现企图连接198.182.50.26和149.117.73.80两个地址,分别位于东京和加利福尼亚,采取断网措施后,软件能够正常启动。
鉴于所涉及的设备属于生产设备,下一步会维持断网运行的状态,如果有大佬能够解决这一问题,希望不吝赐教,感激不尽。
...
poll([{fd=4, events=POLLIN}], 1, 4996) = 1 ([{fd=4, revents=POLLIN}])
ioctl(4, FIONREAD, [43]) = 0
recvfrom(4, "\366C\201\200\0\1\0\0\0\0\0\0\4srv1\7updates\10synops"..., 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.50.1")}, [16]) = 43
close(4) = 0
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
fcntl(4, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(4, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("198.182.50.26")}, 16) = -1 EINPROGRESS (Operation now in progress)
poll([{fd=4, events=POLLOUT}], 1, 0) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 0) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 0) = 0 (Timeout)
...
以及
...
poll([{fd=4, events=POLLOUT}], 1, 199) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 0) = 0 (Timeout)
poll([{fd=4, events=POLLOUT}], 1, 1000) = 1 ([{fd=4, revents=POLLOUT}])
poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}])
getsockopt(4, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
getpeername(4, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("149.117.73.80")}, [16]) = 0
getsockname(4, {sa_family=AF_INET, sin_port=htons(54164), sin_addr=inet_addr("192.168.50.157")}, [16]) = 0
sendto(4, "POST /swupdate/ HTTP/1.1\r\nHost: "..., 228, MSG_NOSIGNAL, NULL, 0) = 228
...
https://blog.csdn.net/weixin_30681615/article/details/95199844