I have strange PHP issue.
I'm using this code to read HTML page
$fh = fopen('html_page.htm', 'r+');
$html_page = '';
while (!feof($fh))
{
$html_page .= fread($fh, 1024);
}
fclose($fh);
And within that page I have something like this:
<span> </span>
And like this:
<span> 324.85 SGD </span>
So I want to strip content of those tags from all of the   ; so that first example turns to empty string and second example to this:
324.85 SGD
My solution was this: ($str variable holds the content of the tags, just the content not tags)
$str = trim(preg_replace('/[^\w+ .,:;]/', ' ', $str));
This worked well when I load my script trough the browser. Even though I was getting this:
324.85 SGD // Inner extra spaces not removed
Note: my script is loaded, not the HTML page, it is still read in trough fread() call.
I display output in the browser (and yes I'm looking at the HTML source) and it behaves well. However when I run the script trough console and it still reads the same HTML page the same way, basically everything the same except I save output to .txt file or display it in the console I get this.
First example with all the   ;
    Â
And second with values mixed in with   ;
  324.85 SGDÂ
And this is not like these characters been there but not displayed when I run trough browser because in the program I'm checking for empty string value (first example) and it really is empty for the first example.
Solution I found is this:
$str = trim(preg_replace('/[\x00-\x1F\x80-\xFF]/', ' ', $str));
Works in both cases. Outputs: 324.85 SGD
So the question is, why does PHP behaves so differently when run trough browser and console in this case?
And what is the best way to normalize the string to remove extra inner spaces?
From this:
324.85 SGD
to this
324.85 SGD
But of course I would like it to work on all strings no matter how long they are.
Thanks.
It seems to have something to do with character-encoding. I'd recon that your HTML is UTF-8 while your console does not support that or something like that.
Character-encoding is a very important thing to understand when working with characters.
I think what could work is to change the output to latin1, but this is a pretty wild guess: So try to wrap utf8_decode() around that what you are trying to output.
Edit: Above was my first guess, but after a little Googling I found that probably fread() is your problem. Please look at: set utf-8 encoding for fread fwrite and http://php.net/manual/en/function.fopen.php#104325