For my current web development project I'm implementing a back end system that will flag errors and send an email to the administrator automatically with details about what occurred. Trapping the error and generating the email with appropriate error information is pretty straight forward; but a problem arises when one considers certain groups of error types especially if the site is being visited frequently.
Consider a couple of examples:
What are some approaches/strategies I can use to prevent this scenario from occurring? (I am only interested in monitoring of errors generated by the script, infrastructure issues are beyond the scope of this solution)
I going to assume that I can almost always uniquely identify errors using a digest of some of the values passed to the error handler callback set by set_error_handler.
The first and probably most obvious solution is recording in a database and only send the email if a reasonable minimum period of time has passed since it last occurred. This isn't the ideal approach especially if the database is causing the problem. Another solution would be to write files to disk when errors occur and check if a reasonable minimum period of time has passed since the file was last modified. Is there any mechanism to solve this problem beyond the two methods I have described?
Have you tried looking into monitoring software like SiteScope?
Why not simply allow them all to be sent out and then collect and store them in a database on the recipient end. That way you bypass the possibility of the database being the problem in the server.
Also, a greater advantage in my opinion, is that you don't arbitrarily throw out valuable forensic data. Post hoc analysis is very important and any kind of filtering could make it incredibly difficult, or impossible.
What i did was monitoring the error log, and sending a digest every 5 minutes. I'd like to think it's because of my high quality code (versus an unpopular app!), but i don't get hassled too much :P I basically read the log file from end to start, parse error messages, and stop when the timestamp < the last time i ran the job, then send a simple email.
This works well enough. However, if you use POST alot, there is a limited amount of information you could get from correlating the apache access log with your php error log. I remember reading about a module to log POSTs to a file from within apache, but don't remember the specifics.
However, if you're willing to use the error handler to write somewhere, that might be best as you've got access to much more information. ip, session id (and any user information, which might impact settings, like pagination or whatever), function arguments (debug_backtrace, or whatever it is) ... Write every error, just send messages when new errors occur, or after an error has been acknowledged (if you care to write such a system).
You should go ahead and generate whatever log files you want. But instead of sending the emails yourself, hook the logs up to a monitoring system like Nagios. Let the monitoring solution decide when to alert the admins, and how often.