将范围或某些标记添加到短语中的其他语言文本

In a PHP variable a mixed language context is present. An example is below:

$variable="This is sample text I am storing in the variable. இதன் கூடவே மற்ற மொழி எழுத்துக்களும் உள்ளன"

So the variable $variable contains both English and other language (Tamil in the above example).

Now I need to add a tag with class something enclosing the Tamil text such as:

$variable="This is sample text I am storing in the variable. <span class='tamil'>இதன் கூடவே மற்ற மொழி எழுத்துக்களும் உள்ளன</span>"

How to omit the English letters and punctuation symbols and add <span> to other language sentence completely or paragraph?

As Filype mentioned, we can use the unicode ranges for this.

This should match even in cases like 'English' -> 'Tamil' -> 'English' -> 'Tamil'. Though it'll wrap extra spaces into the span.

/**
 * @param String $str Input UTF-8 encoded string.
 */
function encapsulate_tamil($str)
{
   return preg_replace('/[\x{0B80}-\x{0BFF}][\x{0B80}-\x{0BFF}\s]*/u',
      '<span class=\'tamil\'>$0</span>', $str);
}

There's a unicode range that you can use to create a regex, this will help you find tamil chars in your text: http://unicode.org/charts/PDF/U0B80.pdf

[\u0B80-\u0BFA-]*

I have put together a playground for this example so that you can improve it to do what you need to do.

http://regex101.com/r/wT8hP4

The following is not gold plated code, but hope it can get you started.

<?php

$variable="This is sample text I am storing in the variable. இதன் கூடவே மற்ற மொழி எழுத்துக்களும் உள்ளன";

echo add_tamil_class($variable);

/**
 * Adds a HTML Span tag around tamil text using regex
 */
function add_tamil_class($text) {

    preg_match_all("/[\x{0B80}-\x{0BFA}]+/u", $text, $matches);

    $tamilSentence = implode(' ', $matches[0]);
    return str_replace(
        $tamilSentence,
        "<span class='tamil'>".$tamilSentence."</span>",
        $text
        );
}