Aurora与DynamoDB延迟不符合预期

I wanted to get some numbers to prove my reading that DynamoDB Key-Value storage has better read performance compare to relational DB (MySQL, PostgreSQL, Aurora). So I decided to compare READ Latencies of DynamoDB & AWS-Aurora (which is a/c to AWS website - "up to five times faster than standard MySQL databases and three times faster than standard PostgreSQL databases")

Step1: Created a Table in Aurora with the following Schema and added 1.02 million records to that table.

Table gift_log (
  gift_uuid               BINARY(16) NOT NULL,
  user_uuid               BINARY(16) NOT NULL,
  parent_uuid             BINARY(16),
  operation_time          TIMESTAMP,
  operation               VARCHAR(20) NOT NULL,
  gift_type               VARCHAR(20) NOT NULL,
  parent_type             VARCHAR(20),
  relation_type           VARCHAR(20),
  PRIMARY KEY (gift_uuid)
);

Used Golang Client that uses MySQL driver for database/sql package to query table.

Step2; Created DynamoDB table with following Attributes. Added 1 million Items to the table. DID NOT use any sort key. All queries used Partition key.


Table: GiftLog {
    gift_uuid               Binary (Partition Key)
    user_uuid               Binary
    operation_time          Number,
    operation               String,
    gift_type               String,
    parent_type             String
}

Used Golang Client that uses AWS Go-SDK to query DynamoDB table.

AURORA

startTime := time.Now().UnixNano()

rows, err := db.Query("SELECT * FROM gift_log WHERE gift_uuid=?", giftIDsToRead[i])

endTimt := time.Now().UnixNano()

DynamoDB

queryInput := &dynamodb.QueryInput{
        TableName: aws.String(tableName),
        KeyConditions: map[string]*dynamodb.Condition{
                        "GiftUUID": {
                            ComparisonOperator: aws.String("EQ"),
                            AttributeValueList: []*dynamodb.AttributeValue{
                                {
                                    B: giftIDsToRead[i],
                                },
                            },
                        },
        },
}

startTime := time.Now().UnixNano()

resp, err := svc.Query(queryInput)

endTime := time.Now().UnixNano()

Aurora Latency: 543.89 DynamoDB Latency: 2934.96 usec

These numbers don't seem right. Am I not comparing apples to apples?

You don't show the timing results... but I'd say you're comparing apples to oranges. If you know the primary key of a DynamoDB item, you should be using GetItem() not Query().

Using GetItem(), you should have "single-digit millisecond" response times; excluding the network / HTTP latency

That last bit is important, but should hopefully be similar for the requests to Aurora.

I think you're missing a few very important points.

  1. DynamoDB is a "database as a service" whereas Aurora is a more traditional database
  2. Whenever you do performance benchmarking or any kind, you can't just run a single test: you need to perform lots and then compute a statistic, such as mean, or better yet a top percentile (say the 99th percentile)
  3. DynamoDB shines when you need "predictable performance at any scale"

The first point is important because it means that in order to get data from DynamoDB you're making web requests which have some degree of overhead compared to a more traditional database. This overhead may well be on the order of 1-2 milliseconds per request. But this is presumably fine in the context of most applications as long as the application is well architected and doesn't make a ton of unnecessary requests.

The second point is important because if you don't look at it correctly you could be measuring outliers: meaning you might see some results that are not representative of typical performance and could spend a lot of time chasing red herrings. Instead of measuring the performance of one query, measure the performance of many of the same type of queries and compute some stats, like: the mean and standard deviation; or the Nth percentile (50th, 90th, 99th are typical)

The last point is really the one that pretty much motivates using DynamoDB versus a classic database engine. You're looking at the happiest of happy cases: a (presumably) small table with a few items where you're retrieving a single one using it's primary key. DynamoDB is really all about what happens as your data grows over time. You want to have the same performance when retrieving that item now, when your table has 1000 items in it as when your table has 100,000,000 items in it. And things do get more interesting with more complex queries.

With DynamoDB you're trading a little performance overhead in the simplest of cases for stability.

But DynamoDB is not a panacea! There are situations where a relational database will always beat DynamoDB.