MongoDB的性能各不相同

i have a mongo collection like this:


{
"A2_AboutMe": "",
"A2_Attributes": "|av|nv|",
"A2_Birthday": "",
"A2_DateCreated": "2010-11-25 22: 59: 00",
"A2_DateLast": "2011-11-18 12: 09: 36",
"A2_FK_A1_IDPerson": "0",
"A2_Firstname": "José Luis",
"A2_FirstnameC": "Jose Luis",
"A2_Gender": "m",
"A2_IDProfile": "1",
"A2_Keywords": "...|..",
"A2_Lastname": "test - test",
"A2_LastnameC": "_test test",
"A2_Locale": "",
"A2_Middlename": "",
"A2_Name": "José Luis test",
"A2_NameC": "Jose Luis test",
...
}

with indexies on A2_LastnameC and A2_FirstnameC 3.000.000 docs in this collection, 8 GB data storage

following query(PHP) in done in 3-4 sec

$collection->find(array(«A2_FirstnameC» => new MongoRegex("/jose/i")))->sort(array(«A2_LastnameC» => -1))->limit(10)

but sometimes the similar queries are done in less than 100 msec.

what can i do to get this performance each time?

test computer is i7, 8GB Ram(7 is used by mongo), Windows 7

First of all index won't be used for non-prefix-like, case-insensitive regular expressions. But in the query above index can be used for sorting by A2_LastnameC field so this is fast. Now having the sorted data MongoDB will need to get A2_FirstnameC value and match it against the regexp stopping when there's 10 matches ready (it will be also relatively fast because it will use index to retrieve the data instead of reading whole documents from disk). Depending on data order it can happen to match the first 10 documents - this is the best case and it will be very fast, the worst case would be the matches to occur on the last 10 docs having to scan all the previous index entries.

How to speed this up? Either use query that can use index, like: «A2_FirstnameC» => new MongoRegex("/^jose/"). Or you have to use some kind of full-text search. A simple way would be to split the field (A2_Firstname in your case) into words, normalize them (convert to lower case, replace accents) and store as an array. Now an index for the array field will be used to do fast searches.

Indexes can't be used for case-insensitive regular expression queries, nor for non-rooted regular expressions (those not beginning with "^"). Since you already have the A2_Firstname field denormalized as A2_FirstnameC, you could also store that field case-normalized (i.e. either all lower or all upper case), and avoid needing to use case insensitive regular expressions; however even in this case, you will still be doing a full scan of the collection if you are not using a a rooted regular expression. Whether or not you can afford to use one in this case depends on your exact use case.