有没有办法优化在页面上查找文本项(不是正则表达式)

After seeing several threads rubbishing the regexp method of finding a term to match within an HTML document, I've used the Simple HTML DOM PHP parser (http://simplehtmldom.sourceforge.net/) to get the bits of text I'm after, but I want to know if my code is optimal. It feels like I'm looping too many times. Is there a way to optimise the following loop?

//Get the HTML and look at the text nodes
   $html = str_get_html($buffer);
   //First we match the <body> tag as we don't want to change the <head> items
   foreach($html->find('body') as $body) {
    //Then we get the text nodes, rather than any HTML
    foreach($body->find('text') as $text) {
     //Then we match each term
     foreach ($terms as $term) {
      //Match to the terms within the text nodes
      $text->outertext = str_replace($term, '<span class="highlight">'.$term.'</span>', $text->outertext);
     }       
    }
   }

For example, would it make a difference to determine check if I have any matches before I start the loop maybe?

You don't need the outer foreach loop; there's generally only one body tag in a well-formed document. Instead, just use $body = $html->find('body',0);.

However, since a loop with only a single iteration is essentially equivalent in run time to not looping at all, it probably won't have much performance impact either way. So in reality, you really just have 2 nested loops even in your original code, not 3.

Speaking out of ignorance, does find take arbitrary XPath expressions? If it does, you can fold the two outer loops into one:

foreach($html->find('body/text') as $body) {
    ...
}