I'm writing a job in go that goes through some MySQL tables, selects some of the rows based on some critera, extracts email addresses from them and sends an email to each.
The filtering process looks at a table (let's call it storage
) which is pretty big (~6gb dumped) and looks like this:
Columns:
id varchar(64) PK
path varchar(64) PK
game varchar(64)
guid varchar(64)
value varchar(512)
timestamp timestamp
There are two indices: (id, path)
(the PK as seen above) and guid
.
The job first retrieves a long list of guids from one table, then batches them and performs consecutive queries like this on the storage
table:
SELECT guid, timestamp FROM storage
WHERE game = 'somegame'
AND path = 'path' AND value = 'value' AND timestamp >= '2015-04-22 00:00:00.0' AND timestamp <= '2015-04-29T14:53:07+02:00'
AND guid IN ( ... )
Where the IN
clause contains a list of guids.
I need to retrieve timestamps to be able to filter further.
When running against my local MySQL, everything works as expected, the query takes about 180ms with batches of 1000 guids.
When running against the same DB on Amazon RDS, the queries begin quick, but after some point, they suddenly start taking around 30 seconds, and continue to do so until the job ends.
I have tried many many things to fix this, but can't figure out the reason. Some notes:
sql.DB
object. Also, I prepare the above statement once and reuse it quite heavily.I haven't the faintest idea why this happens. Any suggestions would be appreciated.
So, after lots and lots of experimentation, I found the solution.
I am using Magnetic Storage on the RDS instances involved, which guarantees approximately 100 IOPS. This limited the speed with which we could query the data.
I tested using 2000 Provisioned IOPS, and the job ran quickly all the way.