We have an App Engine app that handles an average .5 requests per second, and seemingly all those requests can be handled by the same instance running a Go app as the main version.
However, sometimes App Engine kicks off a second instance (and sometimes even a third one), that doesn't seem to do anything past handling one or two requests. Here's an example.
Shutting down that instance manually doesn't seem to cause any harm, so my question is, why does App Engine not kill the instance after it did not get any requests for a while? (The above example had four requests in the past hour, often the requests/age ratio gets even lower).
A similar situation is when an instance is started on a different version. App Engine only seems to kill the instance after hours of not getting any requests.
I wish I knew what controls if/when it kills idle instances, but I can't see any documentation of it.
To avoid excess instances starting, I think the main thing you can do here is increase the pending latency:
The Pending Latency slider controls how long requests spend in the pending queue before being served by an Instance of the default version of your application. If the minimum pending latency is high App Engine will allow requests to wait rather than start new Instances to process them. This can reduce the number of instance hours your application uses, but can result in more user-visible latency.
Even if you only average 4 requests/hour, if you happen to get two closely spaced I suppose it's possible it would start a new instance.
You can also see some small amount of information in the logs about why it started a new instance.
The "How Applications Scale" section of the Google App Engine documentation states:
Scaling in Instances
Each instance has its own queue for incoming requests. App Engine monitors the number of requests waiting in each instance's queue. If App Engine detects that queues for an application are getting too long due to increased load, it automatically creates a new instance of the application to handle that load.
App Engine also scales instances in reverse when request volumes decrease. This scaling helps ensure that all of your application's current instances are being used to optimal efficiency and cost effectiveness.
It also states you can "specify a minimum number of idle instances", and to "optimize for high performance or low cost" in the administration console.
Try setting the "Idle instances" field to something like 3 - 5, and "optimize for low cost" and see if that affects the instance kill time.