Go的连续MySQL查询在某些时候变慢了

I'm writing a job in go that goes through some MySQL tables, selects some of the rows based on some critera, extracts email addresses from them and sends an email to each.

The filtering process looks at a table (let's call it storage) which is pretty big (~6gb dumped) and looks like this:

Columns:
id      varchar(64) PK
path    varchar(64) PK
game    varchar(64)
guid    varchar(64)
value   varchar(512)
timestamp   timestamp

There are two indices: (id, path) (the PK as seen above) and guid.

The job first retrieves a long list of guids from one table, then batches them and performs consecutive queries like this on the storage table:

SELECT guid, timestamp FROM storage 
WHERE game = 'somegame' 
AND path = 'path' AND value = 'value' AND timestamp >= '2015-04-22 00:00:00.0' AND timestamp <= '2015-04-29T14:53:07+02:00' 
AND guid IN ( ... )

Where the IN clause contains a list of guids.

I need to retrieve timestamps to be able to filter further.

When running against my local MySQL, everything works as expected, the query takes about 180ms with batches of 1000 guids.

When running against the same DB on Amazon RDS, the queries begin quick, but after some point, they suddenly start taking around 30 seconds, and continue to do so until the job ends.

I have tried many many things to fix this, but can't figure out the reason. Some notes:

  • The job uses only one sql.DB object. Also, I prepare the above statement once and reuse it quite heavily.
  • At first, I thought it was because the RDS DB was running MySQL 5.5, and I was running 5.6. I made a replica of the RDS DB, upgraded to 5.6, ran the job again. The problem happened again.
  • The volume of data in the two databases is the same: I dumped the production database and imported it into my local database and ran the job. Same behaviour (it still ran quickly locally).
  • The AWS monitoring of the RDS nodes doesn't show any significant spikes. The CPU usage jumps from 1% to up to 10%, and the job seems to open just a few connections (~4).
  • I had a colleague run the job on their PC, pointing to my MySQL DB, just to make sure the great performance didn't stem from the fact that the connection was local. It ran just as quickly as on my PC (admittedly, over LAN).
  • I ran the job against RDS both from my local PC and from an Amazon EC2 node, which is considerably closer to RDS. From EC2, it performed better, but the problem still appeared.
  • The job is very concurrent, each step has input and output channels (with buffer sizes of 1000), and the work is performed by goroutines. Between the steps, I have other goroutines that batch the output of the previous goroutine.
  • The slowdown is sudden, one query takes milliseconds, and the next one takes tens of seconds.

I haven't the faintest idea why this happens. Any suggestions would be appreciated.

So, after lots and lots of experimentation, I found the solution.

I am using Magnetic Storage on the RDS instances involved, which guarantees approximately 100 IOPS. This limited the speed with which we could query the data.

I tested using 2000 Provisioned IOPS, and the job ran quickly all the way.