使用mb_convert_encoding函数是一个好习惯

This question is different from UTF-8 all the way through as it asks for how safe and is it a good practice to use the mb_convert_encoding function.

Lets say that a user can upload the files using the PHP API. Each filename and path gets stored in a PostgreSQL database table which has UTF-8 as default encoding.

Sometimes user uploads files which names aren't UTF-8 encoded and they get imported into the database. The problem is that the characters that are not UTF-8 encoded are scrambled and do not display as they should in the table columns.

I was thinking of adding the following to the PHP code before import:

if ( ! mb_check_encoding($output, 'UTF-8') {
    $output = mb_convert_encoding($content, 'UTF-8'); 
}

Does this look like a good practice and will it be displayed and converted by the user's client correctly if I return UTF-8 as the output? Is there a potential loss to the bytes by using mb_convert_encoding?

Thanks

If you're going to convert an encoding, you need to know what you're converting from. You can check whether the encoding is or isn't valid UTF-8, but if it tells you it's not valid UTF-8 then you still have no clue what it is. Omitting the $from_encoding parameter from mb_convert_encoding just makes it assume some preset encoding for that parameter, but that doesn't mean that $content actually is in that encoding.

In other words: if you don't know what encoding a string is in, you cannot meaningfully convert it to anything else either, and just trying to convert it from ¯\_(ツ)_/¯ is a crapshoot with the result being equally likely to be something useful and utter garbage.

If you encounter unknown encodings, you only have a few choices:

Reject the input value.
Test whether it's one of a handful of other expected encodings and then explicitly convert from your best guess; but that is pretty much a crapshoot as well.
Just use bin2hex or something similar on the value, essentially giving up on trying to interpret it correctly, but still leaving some semblance to the original value.