im doing some data scraping ... basically i'm getting some webpage using curl , extract the data and check my database to see if they exist in my db .
so i was been looking for Beijing Guoan (Chn)
in a webpage source code and i couldn't find it , but it was there and i could see it in the browser .
$result = phpQuery::newDocument( file_get_contents('www.site.com/page'), 'text/html');
foreach($result->find('td.table-participant-teams') as $t )
{
list( $host , $guest ) = explode( ' - ' , pq($t)->text());
echo $host.' == Beijing Guoan (Chn) ==> ';
echo $host == 'Beijing Guoan (Chn)' ? ' found it ' : ' false ';
}
result :
Beijing Guoan (Chn) == Beijing Guoan (Chn) ==> false
i did a strlen($host)
and i found $host
was 20 charchter while Beijing Guoan (Chn)
has 19 .... basically there is hidden charachter in $host
so i've added
for($i = 0 ; $i < strlen($host) ; $i++)
{
echo $i.' - '.$host[$i];
echo '<br />';
}
and i got
0 - B
1 - e
2 - i
3 - j
4 - i
5 - n
6 - g
7 -
8 - G
9 - u
10 - o
11 - a
12 - n
13 -
14 -
15 - (
16 - C
17 - h
18 - n
19 - )
as you can see in 13,14 i got 2 spaces , but when i print out $host i only have 1 ! and that's what cuzing all the trouble
so whay there is a extra space in my $host but it wont show when i print it out on the screen and how can i get rid of it ?
please note that i don't want to just remove that extra space from this specific string , there might be other cases with different char-length , iwant a solution that works on all of them
HTML renders multiple consecutive space as one. If you view the source you will see the actual data.
To replace multiple consecutive white space you can use the following
echo preg_replace('/ +/', ' ', 'he llo test');