the pid of the process is 1996291.
there are 65534 fds in /proc/1996291/fd
, most of the fds are sockets, like this:
lrwx------ 1 root root 64 Dec 30 13:59 10000 -> socket:[952574733]
lrwx------ 1 root root 64 Dec 30 13:59 10001 -> socket:[952566188]
I know that the number in bracket is inode of the socket. There should be one same inode in /proc/net/tcp
for every socket. However, some inode can be found, but some can't:
cat /proc/net/tcp | grep 952574733
If I found the inode, the output like follows:
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
336: 4114C80A:271A 1914C80A:0CEA 01 00000000:00000000 02:0000BE1B 00000000 0 0 962759319 2 ffff88035a20cb00 20 4 30 10 16
This is a real connection.
I use netstat -tnp
to show connections and get a great many TIME_WAIT
connections. I don't know whether they have relationship with my problem.
I use lsof -p 1996291
, the output is like this, a great many sockets:
app 1996291 root *520u sock 0,8 0t0 953021420 protocol: TCP
app 1996291 root *521u sock 0,8 0t0 953027193 protocol: TCP
app 1996291 root *522u sock 0,8 0t0 953021422 protocol: TCP
app 1996291 root *523u sock 0,8 0t0 953038715 protocol: TCP
There three kernal options have been set to 1:
net.ipv4.tcp_tw_reuse
net.ipv4.tcp_tw_recycle
net.ipv4.tcp_syncookies
I can't solve these problem for several days, anyone can help me?
For each socket on your machine there is a file descriptor. When you have too many open connections there will be too many files open and it will crash.
You can try to prevent this by limiting your amount of open connections at the same time or by properly closing the fd's by closing the body of your returned responses. Quickly recycling sockets may also help.
Another hacky approach would be to up the limit of open files with:
ulimit -n [new limit]