I know I can disallow robots using robots.txt but few search engines does not follow this. Hence I have a API where my users sends transactional info to insert/update/delete etc., using my API Request Parameters. But when I look at my logs, huge hits have been made to my .php page, Hence I google to use it in my php API page and found nothing.
Hence I landed on SO to get help from experts, is there any way I can block/disallow SE robots to access my base API URL?
The main approaches that I know of for dealing with bots that are ignoring robots.txt are to either:
However, you should ask yourself whether they're having any impact on your website. If they're not spamming you with requests (which would be a DDoS attack) then you can probably safely ignore them and filter them out of your logs if you need to analyse real traffic.
If you're running a service that people use and you don't want it to be wide open to spam then here's a few more options on how to limit usage:
There's no perfect solution and each option involves trade-offs. If you're worried about DDoS then you could start by looking into your server's capabilities, for example here's an introduction into how NGINX can control traffic: https://www.nginx.com/blog/rate-limiting-nginx/
In a nutshell, any IP hitting your site can be a bot so you should defend by imposing limits and analysing behaviour, since there's no way to know for sure who is a malicious visitor and who isn't until they start using your service.