I'm having some strange issues with datastore queries on golang. The query executes fine however when it comes to pulling the total (cli.Count()) it fails every time consistently by timing out. Can someone explain if I'm doing something wrong or if there is a correct way to do this? The strange part is just that it fails on a query count and I don't get why, I've tried to create a NewQuery and intializing a new client too but neither works.
// GetListByID provides an iterative list of all items filtered by ID.
func GetListByID(theID, limit, offset int) ([]Item, int, error) {
var itemList []Item
cli, ctx, err := getClient()
if err != nil {
return itemList, 0, err
}
if limit <= 0 {
limit = 20
}
q := datastore.NewQuery("SomeKind").Filter("MasterID = ", theID)
ql := q.Limit(limit).Offset(offset)
_, err = cli.GetAll(ctx, ql, &itemList)
if err != nil {
return itemList, 0, err
}
if len(itemList) < limit && offset == 0 {
return itemList, len(itemList), nil
}
total, err := cli.Count(ctx, q)
return itemList, total, err
}
Please note that it does not run on appengine. It runs on instances in AWS and in datacenter. (Don't ask)
Answer is actually doing what I expect it to, there was a bug that was addressed recently by OkDave on github ( I subitted an issue ), link is available here: https://github.com/GoogleCloudPlatform/gcloud-golang/issues/268. I do appologize for forgetting to get back to here to update with an answer and I do appreciate your details as it explains a lot of things that most may actually miss. For my needs I don't want to use cursors as it would break the functionality of what is implemented. I'm expecting something along the lines of select items 50 - 100, when the query is executed again a second time items 50 - 100 should not be the same as there were additional items inserted into the db prior to the second query. There are reasons to use cursors and there are reasons to not... in my case I don't want it. Issue was more related to the count which was fixed though.
Note: There is no server-side count
aggregate for Cloud Datastore. This is important to know as it shapes how clients implement functions like count
.
total, err := cli.Count(ctx, q)
isn't returning the count from your already executed query, it is taking your query, making it a Keys-only, then issuing a new call to retrieve all those keys. It then pages using cursors as needed and counts all the results.
You can see the implementation here on gitHub
Offset - unlike cursors (which you should switch to using), offsets don't work by merely resuming from where the query left off last time. To handle an offset, Cloud Datastore needs to manually count through offset
number of entities that match your query, then resume. This means every call of your function is re-processing all proceeding entities before processing the new ones - a classic O(N^2) algorithm.
Once again: Switch to cursors :D
This actually relates to why you are getting timeouts (or, infinite loop in this case). The implementation of Count
preserves the offset number. Depending on how many entities you have and how you call your function, this could get very large.
This leads us to the final answer:
The Go client available when you wrote this question was broken. I just looked at the commit history and noticed it didn't handled offsets and skipped results correctly until recently. This would have resulted in it continually issuing a query to Cloud Datastore that wasn't making progress (just trying to skip the first entities of the offset). The count function would then never returned as it continued to call nextbatch()
, never making progress.
This was fixed early June via the following commit: https://github.com/GoogleCloudPlatform/gcloud-golang/commit/0bf7a0795c591c70a17505c1123bc3ef3d30f426
If you get the new version of the Go client you shouldn't have this infinite loop issue any more.