I have the following code:
<?php
header('Content-Type: text/html; charset=utf-8');
function getSource($url)
{
if (!function_exists('curl_init'))
{
die('CURL is not installed!');
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_ENCODING, "UTF-8");
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
$source = getSource('http://www.website.com/');
var_dump($source); die();
And the file itself is in UTF-8. The thing is the UTF-8 characters of the output are not displayed properly. Instead they are shown as question marks, or some other trash.
And the only thing to solve this that I found out is to encode the file as ISO-8859-1. But I don't want that. What's wrong here?
The value you pass to CURLOPT_ENCODING
is (a) invalid, and (b) meaningless, in that it doesn't force Curl to translate the content it fetches into the encoding you want. If the remote site returns ISO-8859-1, then you have to translate that to UTF-8 yourself.
CURLOPT_ENCODING
is used to accept the Accept-Encoding:
header when fetching a page. Valid values are "identity"
,"deflate"
, and "gzip"
. As you can see, it has no meaning for the character-set encoding.