批量插入包含俄语的字符串

I am converting a spreadsheet using PHPExcel to a Database and the cell value happens to contain Russian. If I run mb_detect_encoding() I am told the text is UTF8 and if I set a header of UTF8 then I see the correct Russian characters.

However if I compile it into a string (with only addslashes involved in the process) and insert it into the table I see lots of ????. I have set the table characterset as utf8mb4 and also set the collation as utf8mb4_general_ci. I have also run $this->db->query("SET NAMES 'utf8mb4'"); on my DB connection.

I run PDO query() with my multi part insert and get the ???s but if I output the query to screen I get ÐŸÐ¾Ñ which would be valid UTF8. Why would this not be stored correctly in the database?

I have kept this question rather than deleting it so someone may find the answer helpful.

The reason I was struggling was because in SQLYog it doesn't show you the column Charset by default. There is an option which reads "Hide language options" on the Alter table view which will then reveal that when SQLyog creates a table it uses the default server Charset as opposed to what you define the table Charset to be. I'm not sure if thats correct - but the solution simply is to turn on the Column Charset settings and check they match what you are expecting.

По is Mojibake for По. Probably...

  • The bytes you have in the client are correctly encoded in utf8 (good).
  • You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
  • The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.

The question marks imply...

  • you had utf8-encoded data (good)
  • SET NAMES latin1 was in effect (default, but wrong)
  • the column was declared CHARACTER SET latin1 (default, but wrong)

One way to help diagnose the problem(s) is to run

SELECT col, HEX(col) FROM tbl WHERE ...

For По, the hex should be D09FD0BE. Each Cyrillic character, in utf8, is hex D0xx.