如何确定服务器接近完全潜在的负载

My server running php / mysql has had issues resulting in thousands of 502 bad gateway errors per minute during peak hours.

Server configuration is

  • Dedicated server
  • 24v cores
  • 120GB ram
  • 1.5TB hdd

First fix was switching from Fastcgi to Nginx.

Second fix a while later was adding a few more indexes into database and code optimization.

A while later again during peak hours, several thousand of connections end in 502 bad gateway while connections that do not end up in 502 error, are fast and smooth.

My error log is filled with following:

 1708#0: *66048341 connect() to unix:***/php-fpm.sock failed (11: Resource temporarily unavailable) while connecting to upstream

[alert] 1708#0: *66066227 socket() failed (24: Too many open files) while connecting to upstream

During peak hours, phpmyadmin shows tops 20% CPU load and 50% ram usage.

So the question is, how to detect weak spots in code and poor index usage? How to determine that a server is reaching full potential load?