I'm looking for a php function or library or suggestions that can validate broken form input value when customer copy & pastes broken utf8 string.
Scenario:
So on server side, I'm receiving "correct" valid utf8 value.
Is there a way to catch this some how so I could catch it and show error message?
Test examples:
Because there is no generic way of figuring out if value is wrong, I ended up matching regexp [A-Za-z0-9\ -.,] and showing warning message to user if not matched.
(suggested by @soheyl)
At first a UTF-8
string can't be broken. The string can have non utf-8 characters what make it seems like it's 'broken', while it is just a different encoding.
PHP has a function to check what kind of encoding is used for a given string:
string mb_detect_encoding ( string $str [, mixed $encoding_list = mb_detect_order() [, bool $strict = false ]] )
source: http://php.net/manual/en/function.mb-detect-encoding.php.
But it only checks in what kind of encoding the given string is, so you can only check if the right encoding is used.
Hope this helps.