too long

I have written this script to be able to use bb-code like tags in my code. After spending much time in deciding the best method and asking at How to dynamically replace bb tags?, I decided to use preg_replace_callback.

There is a problem however with multiple items on the same line

Man [forvo=man,nl]
Mannen [forvo=mannen,nl]

The part above works, but below does not.

Kip [forvo=kip,nl] - Kippen [forvo=kippen,nl]
Bal [forvo=bal,nl] - Ballen [forvo=ballen,nl]
Vrouw [forvo=vrouw,nl] - Vrouwen [forvo=vrouwen,nl]

I know that using file_get_contents() is not a recommend option, but should I also find another solution over preg_replace_callback if I want to use multiple Forvo-tags per line?

<?php
// Replace forvos in the lesson
$lesson_body = $lesson['Lesson']['body'];

function forvize($match) {
    $word = $match[1];
    $language = $match[2];
    $link = "http://apifree.forvo.com/action/word-pronunciations/format/js-tag/word/".$word."/language/".$language."/order/rate-desc/limit/1/key/API_KEY/";
    $link = file_get_contents($link);
    return $link;
}

//URL's
$lesson_body = preg_replace_callback("/\[forvo\=(.*),(.*)\]/", 'forvize', $lesson_body);

?>

The mistake you did is common. By default the FULL STOP (., also known as dot) matches any character but newline. So if you have got a single bbcode on a line this is no problem:

/\[forvo\=(.*),(.*)\]/

But if you've got more that a single bbcode on a line, the dot will take as much as possible to match the pattern:

[forvo=aaaa,bbbbb] ... [forvo=cccc,dddd]
       (.*                       ),(.*)

What you can do instead is to limit the first parameter to all characters but COMMA (,) and the second parameter to all characters but RIGHT SQUARE BRACKET (]).

Example:

/\[forvo\=([^,]*),([^\]]*)\]/

A way to catch those is using the ungreedy quantifier (aka. question mark) to amke it catch all until the first occurrence of the following character in the pattern.

~\[forvo=(.*?),(.*?)\]~

  • First parenthesized group matches everything from equals sign until the first comma
  • Second one matches eveything after comma until the closing square bracket

Another option is to make it catch anything but specific characters (comma and closing bracket respectively in this case). Check @hakre's answer to see how it is done.