将标签中的每个字母包裹起来,避免使用HTML标签

I'd like to build a function that takes a string and wraps each of its letters in a <span>, except spaces and HTML tags (in my case, <br> tags).

So:

"Hi <br> there."

... should become

"<span>H</span><span>i</span> <br> <span>t</span><span>h</span><span>e</span><span>r</span><span>e</span><span>.</span>"

I had no luck coming up with my own solution so I looked around and I found it surprisingly hard to find exactly what I was looking for.

The closest I found was Neverever's answer here.

However, it didn't seem to work that well, as each character of the <br> tags were wrapped in a <span> and it didn't match accentuated characters such as éèàï.

How should I proceed with this? And why does parsing HTML tags with regex seem so wrong?

You may consider using DOMDocument to parse HTML and wrap only chars within the value of DOMText nodes. See comments in code.

// Define source
$source = 'H&iuml; <br/> thérè.';

// Create DOM document and load HTML string, hinting that it is UTF-8 encoded.
// We need a root element for this so we wrap the source in a temporary <div>.
$hint = '<meta http-equiv="content-type" content="text/html; charset=utf-8">';
$dom = new DOMDocument();
$dom->loadHTML($hint . "<div>" . $source . "</div>");

// Get contents of temporary root node
$root = $dom->getElementsByTagName('div')->item(0);

// Loop through children
$next = $root->firstChild;
while ($node = $next) {
    $next = $node->nextSibling; // Save for next while iteration

    // We are only interested in text nodes (not <br/> etc)
    if ($node->nodeType == XML_TEXT_NODE) {
        // Wrap each character of the text node (e.g. "Hi ") in a <span> of
        // its own, e.g. "<span>H</span><span>i</span><span> </span>"
        foreach (preg_split('/(?<!^)(?!$)/u', $node->nodeValue) as $char) {
            $span = $dom->createElement('span', $char);
            $root->insertBefore($span, $node);
        }
        // Drop text node (e.g. "Hi ") leaving only <span> wrapped chars
        $root->removeChild($node);
    }
}

// Back to string via SimpleXMLElement (so that the output is more similar to
// the source than would be the case with $root->C14N() etc), removing temporary
// root <div> element and space-only spans as well.
$withSpans = simplexml_import_dom($root)->asXML();
$withSpans = preg_replace('#^<div>|</div>$#', '', $withSpans);
$withSpans = preg_replace('#<span> </span>#', ' ', $withSpans);

echo $withSpans, PHP_EOL;

Output:

<span>H</span><span>ï</span> <br/> <span>t</span><span>h</span><span>é</span><span>r</span><span>è</span><span>.</span>

You could try something like ...

<?php

  $str = "Hi <br> there.";
  $newstr = "";
  $notintag = true;
  for ($i = 0; $i < strlen($str); $i++) {
    if (substr($str,$i,1) == "<") {
      $notintag = false;
    }
    if (($notintag) and (substr($str,$i,1) != " ")) {
      $newstr .= "<span>" . substr($str,$i,1) . "</span>";
    } else {
      $newstr .= substr($str,$i,1);
    }

    if (substr($str,$i,1) == ">") {
      $notintag = true;
    }


  }
  echo $newstr;

?>

You can achieve the result with a ([^\s>])(?!(?:[^<>]*)?>) regex. To enable Unicode support, just use it with u option:

<?php
   $re = "/([^\\s>])(?!(?:[^<>]*)?>)/u"; 
   $str = "Hi <br> there."; 
   $subst = "<span>$1</span>"; 
   $result = preg_replace($re, $subst, $str);
   echo $result;
?>

Here you can find the regex explanation and demo.

See sample program without Unicode support and here is one with Unicode support (the difference is in u option).