I'm searching through some database search results on a website & trying to highlight the term in the returned results that matches the searched term. Below is what I have so far (in php):
$highlight = trim($highlight);
if(preg_match('|\b(' . $highlight . ')\b|i', $str_content))
{
$str_content = preg_replace('|\b(' . $highlight. ')(?!["\'])|i', "<span class=\"highlight\">$1</span>",
$str_break;
}
The downside of going this route is that if my search term shows up in the url permalink as well, the returned result will insert the span into the href attribute and break the anchor tag. Is there anyway in my regex to exclude "any" information from the search results that appear in between an opening and closing HTML tag?
I know I could use the strip_tags() function and just spit out the results in plain text, but I'd rather not do that if I didn't have to.
I ended up going this route, which so far, works well for this specific situation.
<?php
if(preg_match('|\b(' . $term . ')\b|i', $str_content))
{
$str_content = strip_tags($str_content);
$str_content = preg_replace('|\b(' . $term . ')(?!["\'])|i', "<span class=\"highlight\">$1</span>", $str_content);
$str_content = preg_replace('|
[^<]+|', '</p><p>', $str_content);
break;
}
?>
It's still html encoded, but it's easier to parse through now without html tags
I think assertions is what your looking for.
DO NOT try to parse HTML with regular expressions:
RegEx match open tags except XHTML self-contained tags
Try something like PHP Simple HTML DOM.
<?php
// get DOM
$html = file_get_html('http://www.google.com/search?q=hello+kitty');
// ensure this is properly sanitized.
$term = trim($term);
// highlight $term in all <div class="result">...</div> elements
foreach($html->find('div.result') as $e){
echo str_replace($term, '<span class="highlight">'.$term.'</span>', $e->plaintext);
}
?>
Note: this is not an exact solution because I don't know what your HTML looks like, but this should put you pretty close to being on track.