I wanted to get some numbers to prove my reading that DynamoDB Key-Value storage has better read performance compare to relational DB (MySQL, PostgreSQL, Aurora). So I decided to compare READ Latencies of DynamoDB & AWS-Aurora (which is a/c to AWS website - "up to five times faster than standard MySQL databases and three times faster than standard PostgreSQL databases")
Step1: Created a Table in Aurora with the following Schema and added 1.02 million records to that table.
Table gift_log (
gift_uuid BINARY(16) NOT NULL,
user_uuid BINARY(16) NOT NULL,
parent_uuid BINARY(16),
operation_time TIMESTAMP,
operation VARCHAR(20) NOT NULL,
gift_type VARCHAR(20) NOT NULL,
parent_type VARCHAR(20),
relation_type VARCHAR(20),
PRIMARY KEY (gift_uuid)
);
Used Golang Client that uses MySQL driver for database/sql package to query table.
Step2; Created DynamoDB table with following Attributes. Added 1 million Items to the table. DID NOT use any sort key. All queries used Partition key.
Table: GiftLog {
gift_uuid Binary (Partition Key)
user_uuid Binary
operation_time Number,
operation String,
gift_type String,
parent_type String
}
Used Golang Client that uses AWS Go-SDK to query DynamoDB table.
AURORA
startTime := time.Now().UnixNano()
rows, err := db.Query("SELECT * FROM gift_log WHERE gift_uuid=?", giftIDsToRead[i])
endTimt := time.Now().UnixNano()
DynamoDB
queryInput := &dynamodb.QueryInput{
TableName: aws.String(tableName),
KeyConditions: map[string]*dynamodb.Condition{
"GiftUUID": {
ComparisonOperator: aws.String("EQ"),
AttributeValueList: []*dynamodb.AttributeValue{
{
B: giftIDsToRead[i],
},
},
},
},
}
startTime := time.Now().UnixNano()
resp, err := svc.Query(queryInput)
endTime := time.Now().UnixNano()
Aurora Latency: 543.89 DynamoDB Latency: 2934.96 usec
These numbers don't seem right. Am I not comparing apples to apples?
You don't show the timing results... but I'd say you're comparing apples to oranges. If you know the primary key of a DynamoDB item, you should be using GetItem() not Query().
Using GetItem(), you should have "single-digit millisecond" response times; excluding the network / HTTP latency
That last bit is important, but should hopefully be similar for the requests to Aurora.
I think you're missing a few very important points.
The first point is important because it means that in order to get data from DynamoDB you're making web requests which have some degree of overhead compared to a more traditional database. This overhead may well be on the order of 1-2 milliseconds per request. But this is presumably fine in the context of most applications as long as the application is well architected and doesn't make a ton of unnecessary requests.
The second point is important because if you don't look at it correctly you could be measuring outliers: meaning you might see some results that are not representative of typical performance and could spend a lot of time chasing red herrings. Instead of measuring the performance of one query, measure the performance of many of the same type of queries and compute some stats, like: the mean and standard deviation; or the Nth percentile (50th, 90th, 99th are typical)
The last point is really the one that pretty much motivates using DynamoDB versus a classic database engine. You're looking at the happiest of happy cases: a (presumably) small table with a few items where you're retrieving a single one using it's primary key. DynamoDB is really all about what happens as your data grows over time. You want to have the same performance when retrieving that item now, when your table has 1000 items in it as when your table has 100,000,000 items in it. And things do get more interesting with more complex queries.
With DynamoDB you're trading a little performance overhead in the simplest of cases for stability.
But DynamoDB is not a panacea! There are situations where a relational database will always beat DynamoDB.