I am trying to read a page using file_get_contents() but I cannot get the character encoding to work.
this is my code:
$username = "masked";
$password = "maskedPass";
$remote_url = 'https://utfws.utfpr.edu.br/aluno01/sistema/mplistahorario.inicio?p_curscodnr=212';
// Create a stream
$opts = array(
'http'=>array(
'method'=>"GET",
'header' => array(
"Authorization: Basic " . base64_encode("$username:$password"),
'Accept-Charset: iso-8859-1'
)
)
);
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents($remote_url, false, $context);
echo $file;
I tried to change the character encoding to utf-8 but I always get a page with question marks instead of áéíóúãõç.
When I open the page directly in my browser it works just fine. Why is this happening?
It sounds to me like this might just be a problem of lost encoding details.
What you're describing is:
See where the encoding specification was lost, there in step 3?
The data can correctly be decoded with 8859-1, but only will be decoded with 8859-1 if the viewer is configured to use that encoding by default. Some apps may have a default of 8859-1, but UTF-8 is a lot more common these days.
If you load the data into a different storage engine, say mysql, the problem may compound. mysql associates a charset with text data. If your database defaults to utf-8, and you don't tell it the data is actually in 8859-1, but you don't tell it the data is in 8859-1, now you're feeding it data that is assumed to be in utf-8, and the data will be treated as such in the database going forward. Now even if you ask the database for 8859-1 in the future, the data will be re-encoded from utf-8 to 8859-1, but it's not valid utf-8 - it's yet another incorrect set of bytes.
To address this problem, specify the encoding when you view the data, or when you save it to a database.