In a PHP variable a mixed language context is present. An example is below:
$variable="This is sample text I am storing in the variable. இதன் கூடவே மற்ற மொழி எழுத்துக்களும் உள்ளன"
So the variable $variable
contains both English and other language (Tamil in the above example).
Now I need to add a tag with class something enclosing the Tamil text such as:
$variable="This is sample text I am storing in the variable. <span class='tamil'>இதன் கூடவே மற்ற மொழி எழுத்துக்களும் உள்ளன</span>"
How to omit the English letters and punctuation symbols and add <span>
to other language sentence completely or paragraph?
As Filype mentioned, we can use the unicode ranges for this.
This should match even in cases like 'English' -> 'Tamil' -> 'English' -> 'Tamil'. Though it'll wrap extra spaces into the span.
/**
* @param String $str Input UTF-8 encoded string.
*/
function encapsulate_tamil($str)
{
return preg_replace('/[\x{0B80}-\x{0BFF}][\x{0B80}-\x{0BFF}\s]*/u',
'<span class=\'tamil\'>$0</span>', $str);
}
There's a unicode range that you can use to create a regex, this will help you find tamil chars in your text: http://unicode.org/charts/PDF/U0B80.pdf
[\u0B80-\u0BFA-]*
I have put together a playground for this example so that you can improve it to do what you need to do.
The following is not gold plated code, but hope it can get you started.
<?php
$variable="This is sample text I am storing in the variable. இதன் கூடவே மற்ற மொழி எழுத்துக்களும் உள்ளன";
echo add_tamil_class($variable);
/**
* Adds a HTML Span tag around tamil text using regex
*/
function add_tamil_class($text) {
preg_match_all("/[\x{0B80}-\x{0BFA}]+/u", $text, $matches);
$tamilSentence = implode(' ', $matches[0]);
return str_replace(
$tamilSentence,
"<span class='tamil'>".$tamilSentence."</span>",
$text
);
}