使用PHP禁用API URL中的机器人

I know I can disallow robots using robots.txt but few search engines does not follow this. Hence I have a API where my users sends transactional info to insert/update/delete etc., using my API Request Parameters. But when I look at my logs, huge hits have been made to my .php page, Hence I google to use it in my php API page and found nothing.

Hence I landed on SO to get help from experts, is there any way I can block/disallow SE robots to access my base API URL?

The main approaches that I know of for dealing with bots that are ignoring robots.txt are to either:

  1. Blacklist them via your firewall or server
  2. Only allow whitelisted users to access your API

However, you should ask yourself whether they're having any impact on your website. If they're not spamming you with requests (which would be a DDoS attack) then you can probably safely ignore them and filter them out of your logs if you need to analyse real traffic.

If you're running a service that people use and you don't want it to be wide open to spam then here's a few more options on how to limit usage:

  1. Restrict access to your API just to your users by assigning them an API token
  2. Rate limit your API (either via the server and/or via your application)
  3. Read the User Agent (UA) of your visitors, a lot of bots will mention they're bots or have fake UAs, the malicious ones will pretend to be users
  4. Implement more advanced measures such as limiting access to a region if a lot of requests suddenly come from there in a short period of time
  5. Use DDoS protection services such as CloudFlare

There's no perfect solution and each option involves trade-offs. If you're worried about DDoS then you could start by looking into your server's capabilities, for example here's an introduction into how NGINX can control traffic: https://www.nginx.com/blog/rate-limiting-nginx/

In a nutshell, any IP hitting your site can be a bot so you should defend by imposing limits and analysing behaviour, since there's no way to know for sure who is a malicious visitor and who isn't until they start using your service.