I am trying to display content from a website with a text/html object like this :
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
<head>
<meta charset="UTF-8">
</head>
<object type="text/html" width="735" height="1000"
data="http://www.meteo.physik.uni-muenchen.de/dokuwiki/phpincludes/publicationstest.php?abteilung=alle&rev=ja&ajahr=2006&mim=ja">
<p>you should have seen my other page here, but something broke.</p>
</object>
However, special characters are not displayed correctly. I can see, but not edit, the php script creating the output on the server side:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<meta name="GENERATOR" content="Microsoft FrontPage 4.0"; charset=charset=ISO-8859-1>
<title>Publikationen</title>
</head>
<body>
<?php
if ($query['rev'] == "alle") {$OK = true;};
...
?>
</p>
</body>
</html>
Is it possible to get correctly display special characters?
A flagrant mojibake exemplar.
The original data come from next procedure
All data are UTF-8
encoded there, see also explicit HTML meta attribute charset=utf-8
there.
However, those data occur in an embedded HTML document bogusly and seemingly attributed charset=ISO-8859-1
following next scenario (invalid HTML markup code):
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
...
</head>
<body>
...
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<meta name="GENERATOR" content="Microsoft FrontPage 4.0"; charset=charset=ISO-8859-1>
<title>Publikationen</title>
</head>
<body>
...
... UTF-8 encoded data bogusly attributed `charset=ISO-8859-1`
...
</body>
</html>
...
</body>
</html>
On the other hand, you are displaying partial content (that embedded HTML document) extracted from the original website. Unfortunately, in its content are some UTF-8
characters (see e.g. sample data in your comment) misinterpreted as follows:
char encoding code name
ü UTF-8 0xC3 0xBC LATIN SMALL LETTER U WITH DIAERESIS
à ISO-8859-1 0xC3 latin capital letter a with tilde
¼ ISO-8859-1 0xBC vulgar fraction one quarter
ä UTF-8 0xC3 0xA4 LATIN SMALL LETTER A WITH DIAERESIS
à ISO-8859-1 0xC3 latin capital letter a with tilde
¤ ISO-8859-1 0xA4 currency sign
due to charset=UTF-8
encoded data are charset=ISO-8859-1
interpreted.