服务器报错内容,如何解决?(关键词-重启)

问题遇到的现象和发生背景

一直发生不明原因的重启
硬件R4950 G3 12LFF ROME
cpu: AMD EPYC 7502 32-Core Processor
系统redhat7.8

运行结果及报错内容

[root@data42 ~]# grep -E 'error|Error|ERROR|fail|FAIL' /var/log/messages
Nov 21 16:15:04 data42 kernel: pci 0000:42:00.0: BAR 9: failed to assign [mem size 0x01000000 64bit pref]
Nov 21 16:15:04 data42 kernel: pci 0000:42:00.1: BAR 9: failed to assign [mem size 0x01000000 64bit pref]
Nov 21 16:15:04 data42 kernel: pci 0000:42:00.1: BAR 11: failed to assign [mem size 0x00040000 64bit pref]
Nov 21 16:15:04 data42 kernel: pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x01000000 64bit pref]
Nov 21 16:15:04 data42 kernel: pci 0000:01:00.1: BAR 9: failed to assign [mem size 0x01000000 64bit pref]
Nov 21 16:15:04 data42 kernel: pci 0000:01:00.1: BAR 11: failed to assign [mem size 0x00040000 64bit pref]
Nov 21 16:15:04 data42 kernel: pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x01000000 64bit pref]
Nov 21 16:15:04 data42 kernel: pci 0000:01:00.1: BAR 9: failed to assign [mem size 0x01000000 64bit pref]
Nov 21 16:15:04 data42 kernel: pci 0000:01:00.1: BAR 11: failed to assign [mem size 0x00040000 64bit pref]
Nov 21 16:15:04 data42 kernel: ERST: Error Record Serialization Table (ERST) support is initialized.
Nov 21 16:15:04 data42 systemd-udevd: Change current work dir to /etc/ndctl/keys failed: No such file or directory
Nov 21 16:15:07 data42 kernel: bnxt_en 0000:42:00.0: bnxt_re: probe error: RoCE is not supported on this device
Nov 21 16:15:07 data42 kernel: bnxt_en 0000:42:00.1: bnxt_re: probe error: RoCE is not supported on this device
Nov 21 16:15:07 data42 kernel: bnxt_en 0000:01:00.0: bnxt_re: probe error: RoCE is not supported on this device
Nov 21 16:15:07 data42 kernel: bnxt_en 0000:01:00.1: bnxt_re: probe error: RoCE is not supported on this device
Nov 21 16:15:09 data42 augenrules: failure 1
Nov 21 16:15:09 data42 augenrules: failure 1
Nov 21 16:15:09 data42 mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor. Please use the edac_mce_amd module instead.
Nov 21 16:15:09 data42 mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor. Please use the edac_mce_amd module instead.
Nov 21 16:15:09 data42 systemd: Unit mcelog.service entered failed state.
Nov 21 16:15:09 data42 systemd: mcelog.service failed.
Nov 21 16:15:12 data42 teamd_Mdcnteam0[2067]: Loop callback failed with: No such file or directory
Nov 21 16:15:17 data42 rsyslogd: imjournal: fscanf on state file `/var/lib/rsyslog/imjournal.state' failed [v8.24.0-52.el7 try http://www.rsyslog.com/e/2027 ]
Nov 21 16:15:33 data42 gnome-session: libEGL warning: DRI2: failed to authenticate
Nov 21 16:15:34 data42 journal: failed to get edid data: EDID length is too small
Nov 21 16:15:34 data42 journal: Error looking up permission: GDBus.Error:org.freedesktop.portal.Error.NotFound: No entry for geolocation
Nov 21 16:15:34 data42 journal: failed to get edid: unable to get EDID for output

[root@data42 ~]# grep -E 'error|ERROR|Error|fail|Fail|FAIL' /var/log/dmesg
[ 4.517733] pci 0000:42:00.0: BAR 9: failed to assign [mem size 0x01000000 64bit pref]
[ 4.517737] pci 0000:42:00.1: BAR 9: failed to assign [mem size 0x01000000 64bit pref]
[ 4.517741] pci 0000:42:00.1: BAR 11: failed to assign [mem size 0x00040000 64bit pref]
[ 4.517811] pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x01000000 64bit pref]
[ 4.517815] pci 0000:01:00.1: BAR 9: failed to assign [mem size 0x01000000 64bit pref]
[ 4.517819] pci 0000:01:00.1: BAR 11: failed to assign [mem size 0x00040000 64bit pref]
[ 4.517823] pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x01000000 64bit pref]
[ 4.517826] pci 0000:01:00.1: BAR 9: failed to assign [mem size 0x01000000 64bit pref]
[ 4.517830] pci 0000:01:00.1: BAR 11: failed to assign [mem size 0x00040000 64bit pref]
[ 4.933457] ERST: Error Record Serialization Table (ERST) support is initialized.
[ 9.274408] bnxt_en 0000:42:00.0: bnxt_re: probe error: RoCE is not supported on this device
[ 9.274414] bnxt_en 0000:42:00.1: bnxt_re: probe error: RoCE is not supported on this device
[ 9.274417] bnxt_en 0000:01:00.0: bnxt_re: probe error: RoCE is not supported on this device
[ 9.274421] bnxt_en 0000:01:00.1: bnxt_re: probe error: RoCE is not supported on this device

你先使用 lspci -vv 列出pci内存大小,将内存调大应该就可以了。