PHP-MYSQL SELECT中查询时间长

I have a PHP script which is sending queries to an Amazon RDS instance using the mysqli method. I'm noticing the below code is taking about a minute to execute. I wanted to see where it was getting hung up.

The table is very large - over 30 million rows. It is about 8GB according to phpMyAdmin. It is running on a db.r3.large RDS instance in the same availability zone and area as the webserver. I figure db.r3.large is overkill for this but wanted to make sure it wasn't an issue.

My script does a search on usernames (whole or partial) and returns matches to a jQuery frontend. Nothing is timing out - the client browser holds on "waiting for [sitename]..." then returns the timing info as well as the result. Results are generally in the vicinity of a dozen to a couple hundred matched rows.

Is the long execution time just due to the size of the database? Am I retrieving and processing the matches correctly?

When I run the query manually, phpMyAdmin makes my browser wait about the same time (a minute or so) with the yellow "Loading" box then returns the same matches, along with "Showing rows 0 - 8 (9 total, Query took 53.1656 sec)".

Here is my code:

$mysqli = new mysqli($dbhost, $dbuser, $dbpass, $dbname);
$output = array();

if (mysqli_connect_errno()) {
  printf("Connect failed: %s
", mysqli_connect_error());
  exit();
}

echo "Connected at " . getCurrentTime() . "<br><br>";

if ($result = $mysqli->query("SELECT * FROM tablename WHERE last_name LIKE \"%$query%\"")) {

echo "Loaded result at " . getCurrentTime() . "<br><br>";

$selected = $result->num_rows;

echo "Results ready at " . getCurrentTime() . "<br><br>";

while($row = $result->fetch_array(MYSQL_ASSOC)) {
  $output[] = $row;

  echo "Loaded into array at " . getCurrentTime() . "<br><br>";

/* close result set */
$result->close();

echo "Closed result at " . getCurrentTime() . "<br><br>";

}

} else {
  echo "No result at " . getCurrentTime() . "<br><br>";
}

/* close connection */
$mysqli->close();

echo "Closed mysqli at " . getCurrentTime() . "<br><br>";

Here is what my script is outputting:

>Started at Thu Aug 20 19:56:08 2015
>
>Connected at Thu Aug 20 19:56:08 2015
>
>Loaded result at Thu Aug 20 19:57:01 2015
>
>Results ready at Thu Aug 20 19:57:01 2015
>
>Loaded into array at Thu Aug 20 19:57:01 2015
>
>Closed result at Thu Aug 20 19:57:01 2015
>
>Closed mysqli at Thu Aug 20 19:57:01 2015

(The script then returns JSON encoded-object of results).

I have access to the RDS console and phpMyAdmin for troubleshooting.

Your query is running long because it is not using indexes because of the wildcard and LIKE comparison.

LIKE "%$query%"

Read more here: http://dev.mysql.com/doc/refman/5.6/en/index-btree-hash.html

If it is acceptable you may change your query to

LIKE "$query%"

Although this will produce different results it will (at least it should) create a much quicker query.

Wildcards are far from ideal!

If you use LIKE "%..%", it'll do a full comparison on all the 30 million rows, each time you run the query. Only LIKE "...%" can be cached/indexed.

I don't think you can speed up your query if you want to keep LIKE "%..%" in it, however, I have some suggestions:

  • Use WHERE last_name = :query. Are you sure you want to enter Alex that match both Alex and Alexander?
  • Make your own index. Create a table that contains the most common last names and/or part of them, and their IDs. Instead of reading 30 million row's value each time, when the user have to wait a minute, create a script that runs in the background for even hours, that builds a table with 30.000 rows, where you can use a simple WHERE field = :query, which can be indexed, and will be much faster. - I guess.
  • Reading dozens of data takes time. Make sure your table doesn't have 100 columns that you won't need to use, or don't use SELECT *.

Please don't use ->query("...$query..."). PHP's MySQLi API has a function for binding values: bind_param.

You cannot use LIKE "%...%" queries in SQL and expect to get good performance from it. A leading wildcard search like that means that the database will have to scan every single record in the table to find matches. If there are a lot of matches, it will also end up having to use swap space to store the results of the query. It will never be quick; probably too slow even on a medium sized DB, and on a large DB like yours, it will be painfully slow.

You need a different approach.

There are a number of ways to approach this, and it depends what you're trying to do. If you're looking for keywords in a string, then you might consider pulling all the words out into their own records on a separate table and searching that. You would end up with effectively a tagging system.

But more often than not, searches like this need more power than that. The best solution then is usually to switch to a dedicated data indexing tool like Sphinx or Lucene. These two products work slightly differently from each other, but effectively they do the same job: they do a deep run through of your database, and produce a comprehensive index which you can run searches against much much quicker than anything the database can offer.

They can be complex to setup and configure, but if you want that kind of flexible searching without the performance problems of a LIKE query, they are really the only way you can go.