PHP在字符串中检测和拆分html特殊字符代码?

In PHP when i read the Data, lets say the data (chunk of string) is containing HTML Special Character DECIMAL HEX Codes like:
This is a sample string with < œ < and š

What i want is, how to Detect and Split out the Decimal Hex Codes (of any Special Characters) inside a chunk of string?

For example, above string contains:

  • Two Count of <
  • One Count of œ
  • One Count of š

How can i programatically detect it (The OCCURRENCE for any Html Special Characters)?
(Collected results will be better as an Array)

I think this is what you are after:

$s = 'This is a sample string with œ and š';

$pattern = '/\&#x\d+\;/';

preg_match_all($pattern, $s, $matches);   

var_dump( $matches );

This will output:

array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(7) "œ"
    [1]=>
    string(7) "š"
  }
}

If you mean to decode the entities, use html_entity_decode. Here is an example:

<?php
$a = "I'll &quot;walk&quot; the &lt;b&gt;dog&lt;/b&gt;";

$b = html_entity_decode($a);

echo $b; // I'll "walk" the <b>dog</b> now
?>

You should use preg_match() - http://www.php.net/manual/en/function.preg-match.php with pattern like this '/&[0-9a-zA-Z]{1,5};/g'.

[Updated]: Note what entities you need. Is that just &#x[number][number][number]; or all possible html-entities (like &nbsp;, &lt; e.t.c.)?

Above I described the most common case.

You could use substr and strpos to find &# and skip to the next ;:

$string = "This is a sample string with &#x153; and &#x161;"
$hexCodes = array();
while (strlen($string) > 0) {
  if (strpos("&#") > 0) {
    $string = substr($string, strpos("&#"));
    $hex = substr($string, 0, strpos(";") + 1);
    $string = substr($string, strpos(";") + 1);
    array_push($hexCodes, $hex);
  } 
  else { break; }
}