验证损坏的复制粘贴用户输入

I'm looking for a php function or library or suggestions that can validate broken form input value when customer copy & pastes broken utf8 string.

Scenario:

  1. Customer is asked to enter street name in
  2. He opens third party broken page where he is storing address
  3. Copies broken utf8 string (see examples)
  4. Paste this string in input field and clicks Submit

So on server side, I'm receiving "correct" valid utf8 value.

Is there a way to catch this some how so I could catch it and show error message?

Test examples:

  • "At’s ‘em"
  • "Bo�kowski"

Because there is no generic way of figuring out if value is wrong, I ended up matching regexp [A-Za-z0-9\ -.,] and showing warning message to user if not matched.

(suggested by @soheyl)

At first a UTF-8 string can't be broken. The string can have non utf-8 characters what make it seems like it's 'broken', while it is just a different encoding.

PHP has a function to check what kind of encoding is used for a given string:

string mb_detect_encoding ( string $str [, mixed $encoding_list = mb_detect_order() [, bool $strict = false ]] )

source: http://php.net/manual/en/function.mb-detect-encoding.php.

But it only checks in what kind of encoding the given string is, so you can only check if the right encoding is used.

Hope this helps.