如何解析从curl网站收到的网站内容

I am trying to read the content of a website using cURL to compare some data. I accomplished to receive the content of the webpage with cURL but when I want to extract some data out of the content is it not working. I parse the content with DOMDocument but it seems that characters like & and € and so on does not get converted in a good way, so it crashes. that is why I put htmlentities with it but that also does not work.

This is one of the errors i receive: Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 37 in URL on line 40

Can anyone suggest me what I should do different?

This is how I get the content of a website: function get_web_page( $url ) { $user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';

$options = array(
    CURLOPT_CUSTOMREQUEST  =>"GET",        //set request type post or get
    CURLOPT_POST           =>false,        //set to GET
    CURLOPT_USERAGENT      => $user_agent, //set user agent
    CURLOPT_COOKIEFILE     =>"cookie.txt", //set cookie file
    CURLOPT_COOKIEJAR      =>"cookie.txt", //set cookie jar
    CURLOPT_RETURNTRANSFER => true,     // return web page
    CURLOPT_HEADER         => false,    // don't return headers
    CURLOPT_FOLLOWLOCATION => false,     // follow redirects
    CURLOPT_ENCODING       => "",       // handle all encodings
    CURLOPT_AUTOREFERER    => true,     // set referer on redirect
    CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
    CURLOPT_TIMEOUT        => 120,      // timeout on response
    CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
);

$ch      = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err     = curl_errno( $ch );
$errmsg  = curl_error( $ch );
$header  = curl_getinfo( $ch );
curl_close( $ch );

$header['errno']   = $err;
$header['errmsg']  = $errmsg;
$header['content'] = $content;
return $header;

}

$html = get_web_page("url of a website");

And this is how i tought i should parse it: $dom = new DOMDocument; $dom->loadHTML(mb_convert_encoding($html["content"], 'HTML-ENTITIES', 'UTF- 8'));

foreach($dom->getElementsByTagName('div') as $div){
    echo $div->nodeValue."<br>";
}

But actually I am looking for a value from a specific div with a class, only that value do you know how I am able to get that ?

I use SimpleHTMLDom, it is quite easy and well documented.

You can even find a bunch of questions here in StackOverflow