I have written some code in Go which is used to scale an instance group in AWS. It takes two environment variables: ASG_MAX_SCALE and ASG_MIN_SCALE, and check the Desired Capacity of a specified ASG in AWS. I am running it as a cronjob in Kubernetes, and, unfortunately, I sometimes get the wrong answer from the AWS API. The way in which it should work is the following: If the current Desired Capacity is equal to the ASG_MAX_SCALE, then set the Desired Capacity to ASG_MIN_SCALE, and vice-versa. The problem is that sometimes, for example, even if the current Desired Capacity is 2 (which I can see in the AWS console), the AWS API would answer with 3, thus not upscaling the ASG. It has also happened the other way around, the real Desired Capacity was 3, but it answered with 2. The problem is weird because it appears to happen randomly. If I use the part for checking the Desired Capacity locally (on my laptop), it always gives the right answer.
Here is the code:
var newASGsize int64
maxScaleASG, err := strconv.Atoi(os.Getenv("ASG_MAX_SCALE"))
if err != nil {
panic(err)
}
minScaleASG, err := strconv.Atoi(os.Getenv("ASG_MIN_SCALE"))
if err != nil {
panic(err)
}
svc := autoscaling.New(session.New())
describeInput := &autoscaling.DescribeAutoScalingGroupsInput{
AutoScalingGroupNames: []*string{
aws.String(os.Getenv("ASG_NAME")),
},
}
fmt.Println("Instance Group: " + os.Getenv("ASG_NAME"))
describeResult, err := svc.DescribeAutoScalingGroups(describeInput)
awsErr(err)
if *describeResult.AutoScalingGroups[0].DesiredCapacity == int64(maxScaleASG) {
newASGsize = int64(minScaleASG)
} else {
newASGsize = int64(maxScaleASG)
}
updateInput := &autoscaling.UpdateAutoScalingGroupInput{
AutoScalingGroupName: aws.String(os.Getenv("ASG_NAME")),
DesiredCapacity: aws.Int64(newASGsize),
}
_, err = svc.UpdateAutoScalingGroup(updateInput)
awsErr(err)
for {
describeResult, err := svc.DescribeAutoScalingGroups(describeInput)
awsErr(err)
if asgReadinessCheck(describeResult.AutoScalingGroups[0].Instances, newASGsize) == true {
break
}
fmt.Println("The instance group is not ready. Sleeping for 5 seconds...")
time.Sleep(5 * time.Second)
}
Can someone please tell me if I am doing something wrong?
Thanks in advance.