too long

I want to normalize keywords to get rid of non-alphanumerics, but while respecting unicode.

Here's what I have:

$keyword = trim($keyword);
$keyword = normalizer_normalize($keyword, Normalizer::FORM_KD);
$keyword = preg_replace('/[^\p{L}\p{N} ]/u', '', $keyword);
$keyword = normalizer_normalize($keyword, Normalizer::FORM_KC);

My question is if this will work. Are there some languages where this will remove important characters? Or will not remove unimportant ones?

I want just words - no symbols, or punctuation. Numbers are OK.

I don't know what Marks are, and I'm not sure if perhaps I should be filtering other types of numbers. What's a letter number? (From: http://us3.php.net/manual/en/regexp.reference.unicode.php )

A biggest question is: I want to remove vowels from Hebrew letters, but not remove diacritics from European letters. Will the normalization step do this properly?

Edit: When I tested this it removed diacritics from European letters. I then used KC for the first normalization, and removed the second, and it seemed to work right - but I only tested European letters, and Hebrew - I don't know how to check other languages.

You can find all you want about the meaning of unicode properties here