I have a string like this.
$dot_prod = "at the coast will reach the Douglas County coast";
I'd like this result by using a regex: at the coast will reach the Douglas County coast
Specifically, I want to bold the word "coast" and "the" but only the word coast if not preceded by the word "county" and only the word "the" if not preceded by the word "at". So, essentially I want an array of words or phrases (case-insensitive that keeps the case the word/phrase was originally in) to be bolded and then an array of words or phrases that I want to ensure are not bolded. For instance, the array of words/phrases that I want bolded are:
$bold = array("coast", "the", "pass");
and the array of words I want to ensure are unbolded are:
$unbold = array("county coast", "at the", "grants pass");
I'm able to do the bolding with this:
$bold = array("coast", "the", "pass");
$dot_prod = preg_replace("/(" . implode("|", $bold) . ")/i", "<b>$1</b>", $dot_prod);
However, I've been unsuccessful at unbolding afterwards, and I definitely couldn't figure out how to do it all in one expression. Can you offer any help please? Thank you.
You may match and skip the patterns you want to "unbold" and match those you want to bold in any other context.
Build a regex like this (I added word boundaries to match whole words, you do not have to use them probably, but that seems a good idea for your current input):
'~\b(?:county coast|at the|grants pass)\b(*SKIP)(*F)|\b(?:coast|the|pass)\b~i'
See the regex demo.
Details
\b
- word boundary(?:county coast|at the|grants pass)
- any of the alternatives\b
- a word boundary(*SKIP)(*F)
- PCRE verbs to skip the current match and proceed looking for a match from the end of the current match|
- or\b
- a word boundary(?:coast|the|pass)
- any of the alternatives\b
- a word boundary.The $0
in the replacement is the reference to the whole match value.
$dot_prod = "at the coast will reach the Douglas County coast";
$bold = array("coast", "the", "pass");
$unbold = array("county coast", "at the", "grants pass");
$rx = "~\b(?:" . implode("|", $unbold) . ")\b(*SKIP)(*F)|\b(?:" . implode("|", $bold) . ")\b~i";
echo preg_replace($rx, "<b>$0</b>", $dot_prod);
// => at the <b>coast</b> will reach <b>the</b> Douglas County coast
One caveat: since your search terms can include whitespace, it is a good idea to sort the $bold
and $unbold
array by length in the descending order before building the pattern:
usort($unbold, function($a, $b) { return strlen($b) - strlen($a); });
usort($bold, function($a, $b) { return strlen($b) - strlen($a); });
See another PHP demo.
In case these items can contain special regex metachars, also use preg_quote
on them.