将分隔符内的文本转换为有效的URL

I have to convert an old website to a CMS and one of the challenges I have is at present there are over 900 folders that contain up to 9 text files in each folder. I need to combine the up to 9 text files into one and then use that file as the import into the CMS.

The file concatenation and import are working perfectly.

The challenge that I have is parsing some of the text in the text file.

The text file contains a url in the form of

Some text [http://xxxxx.com|About something] some more text

I am converting this with this code

if (substr ($line1, 0, 7) !=="Replace") {
    $pattern = '/\\[/';
    $pattern2 = '/\\]/';
    $pattern3 = '/\\|/';
    $replacement = '<a href="';
    $replacement3 = '">';
    $replacement2='</a><br>';

    $subject = $line1;
    $i=preg_replace($pattern, $replacement, $subject, -1 );
    $i=preg_replace($pattern3, $replacement3, $i, -1 );
    $i=preg_replace($pattern2, $replacement2, $i, -1 );

    $line .= '<div class="'.$folders[$x].'">'.$i.'</div>' ;
}

It may not be the most efficient code but it works and as this is a one off exercise execution time etc is not an issue.

Now to the problem that I cannot seem to code around. Some of the urls in the text files are in this format

Some text [http://xxxx.com] some more text

The pattern matching that I have above finds pattern and pattern2 but as there is no pattern3 the url is malformed in the output.

Regular expressions are not my forte is there a way to modify what I have above or is there another way to get the correctly formatted url in my output or will I need to parse the output a second time looking for the malformed url and correct it before writing it to the output file?

You can use preg_replace_callback() to achieve this:

  • Find any string of the format [...]
  • Try to split them by the delimiter | using explode()
    • If the split array contains two pieces, then it means the [...] string contains two pieces: the link href and the link anchor text
    • If not, then it means the the [...] string contains only the link href part
  • Format and return the link

Code:

$input = <<<EOD
Some text [http://xxxxx.com|About something] some more text
Some text [http://xxxx.com] some more text
EOD;

$output = preg_replace_callback('#\[([^\]]+)\]#', function($m)
{
    $parts = explode('|', $m[1]);
    if (count($parts) == 2)
    {
        return sprintf('<a href="%s">%s</a>', $parts[0], $parts[1]);
    }
    else
    {
        return sprintf('<a href="%1$s">%1$s</a>', $m[1]);
    }
}, $input);

echo $output;

Output:

Some text About something some more text
Some text http://xxxx.com some more text

Live demo