如何防止蜘蛛和爬虫消耗昂贵的API?

I am using an API that is pretty expensive. Each call costs about 1 cent. I noticed that visits from spiders and crawlers generate thousands of calls to that API and I am being charged for them. Is there a way to block the section of the webpage that shows content generated by that API, in the way that only actual visitors can see it and no API calls will be generated when the webpage is being crawled?

You could do the API call in front-end instead of doing it server-side. For example, during the page load, do an AJAX request to your server that will make a call to the API and return the data.

Presumably, the spiders and crawlers just parse the source code and do not execute the JS, thus they will not execute the AJAX request and you will not be charged. However, if some of your visitors do not have JS enabled, you should provide them a way to get the results anyway.

Apart from this, what you could do if you want to reduce your cost is to implement a caching system so that you do not do the same call multiple times in a row to the API. You can define the caching time according to the criticality of fresh data.

There are some many methods to prevent from crawlers crawl your site / specific pages. The problem is that you need to define which kind of crawlers you want to block as there are many types of them. As a starting point, Google & Bing do not respect robots.txt setting for the crawl-delays (you can change the crawl rate of those by changing this manually in their dashboard).

As you mentioned you are working with PHP, well if you are using Apache than you can try the Apache access log - it registers all requests Apache receives - analyze the log files and you can which crawlers are making all of the traffic you are talking about (when you know which crawlers make the heavy traffic you know which one you can kill by using blocking them using .htaccess file - you can redirect web requests coming from specific IP addresses or user agents to 403 http error or any desired redirect output)

I figured out this but still looking for better ideas:

<?php
if (preg_match('/slurp|inktomisearch|[Gg]rub|[Bb]ot|archiver|[Ss]qworm/', $_SERVER['HTTP_USER_AGENT'])) {
include("no-api-call.php");
}  else {
include("yes-api-call.php");
}
?>