I'm importing a csv-file with user-data another person sent me. The file was sent back-and-forth between multiple people, working with different programs on different OS. So some encoding issues occurred and I found no easy way to fix it (yes I converted the file to UTF-8 etc).
The data are stored in a postgresql database and I found a way to convert the broken characters.
For example:
â\u0088\u009Aº = ü
â\u0088\u009A§ = ä
â\u0088\u009Aâ\u0088\u0082 = ö
...
In psql I'm using the following statement to convert â\u0088\u009Aº
to ü
:
SELECT COUNT(*) FROM sbl_students WHERE lastname ~* E'.*â\\u0088\\u009Aº.*';
=> 61
UPDATE sbl_students SET lastname = regexp_replace(lastname, E'â\\u0088\\u009Aº', 'ü', 'g');
SELECT COUNT(*) FROM sbl_students WHERE lastname ~* E'.*â\\u0088\\u009Aº.*';
=> 0
I tried to implement this into a php script, but for some reason it doesn't work. Nothing is replaced and no error occurs.
$dbh = pg_connect("dbname=iserv user=sbl");
$query = "UPDATE sbl_students SET lastname = regexp_replace(lastname, E'â\\u0088\\u009Aº', 'ü', 'g');";
pg_query($dbh, $query);
pg_close($dbh);
I already tried multiple variations like regexp_replace(lastname, E'â\u0088\u009Aº', 'ü', 'g')
or regexp_replace(lastname, 'â\u0088\u009Aº', 'ü', 'g')
but non of them worked.
Can anybody help me out? I don't understand why the query works great in psql but doesn't do anything in php.
Thanks!
Postgresql 8.4
PHP 5.3.3-7+squeeze16 with Suhosin-Patch (cli)
Found the answer myself. You have to use pg_prepare():
$dbh = pg_connect("dbname=iserv user=sbl");
pg_prepare($dbh, "", "UPDATE sbl_students SET lastname = regexp_replace(lastname, $1, 'ü', 'g');");
pg_execute($dbh, "", array("â\u0088\u009Aº"));
pg_close($dbh);
Still don't understand why :/
You could try something like this?
$replacements = array(
'â\u0088\u009Aº' => 'ü',
'â\u0088\u009A§' => 'ä',
'â\u0088\u009Aâ\u0088\u0082' => 'ö',
// ...
);
$lastname = str_replace(array_keys($replacements), array_values($replacements), $lastname);