PHP Unicode字符串提取 - 如何根据第一个非字母字符的位置拆分字符串并将其存储到两个变量中?

I have a .txt file containing a list of thousands of English words along with their meaning in Urdu language. The file structure is below. Each line start with a word along with its translation in unicode characters.

dict.txt(encoding UTF-8)

 Sony     سونی (sōnī)
 South Ossetia   جنوبی اوسیتیا (janūbī osetiyā)
 flower (ur-Arab'کھلنا) (unicode'(kʰilnā))
 fly    اڑنا (uṛnā)
 fog    کوہرا (m) (kuhrā)
 .
 .

Note : There are no spaces in the right of word Sony,fly,fog etc i added them for clarity

so far i had done this ..

$file = fopen("dict.txt",'r');
if ($file) {
while($lines = fgets($file)){
    $word = '';
    $def = ' ';
  //want to extract "word" and its "definition" from $lines
  }
}
    fclose($file);

Now i want to split every line of file in two variables to store them in database as $word and $def to store them in database for further use.

I tired myself using preg_match() and list()+explode() but i am a kind of newbie to my solutions does not work.i also tired searching google but does not find a satisfactory answer.

what i want to do..

{

if found a alphabet other then a-z/A-Z and space break the sting; store left part in variable $name and right part in $def..

}

Thanks in Advance

If the format is always [english][urdu]([pronunciation]), this should do quite well:

preg_match('/^([\w\s]+)([\W\s]+)\((.+)\)$/', $line, $matches);
echo "English: $matches[1], Urdu: $matches[2], pronunciation: $matches[3]";

[\w\s]+ matches "word and space characters", [\W\s]+ is "non-word and space characters" ("word" means A-Z and a few characters like _) and .* between parentheses is the rest. See http://rubular.com/r/eHUQFczLah

How about:

$arr = array(
"Sony     سونی (sōnī)",
"South Ossetia   جنوبی اوسیتیا (janūbī osetiyā)",
"flower (ur-Arab'کھلنا) (unicode'(kʰilnā))",
"fly    اڑنا (uṛnā)",
"fog    کوہرا (m) (kuhrā)"
);

foreach($arr as $val) {
    $list = preg_split('/([\w\s]+)(.+)/', $val, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY );
    print_r($list);
}

output:

Array
(
    [0] => Sony     
    [1] => سونی (sōnī)
)
Array
(
    [0] => South Ossetia   
    [1] => جنوبی اوسیتیا (janūbī osetiyā)
)
Array
(
    [0] => flower 
    [1] => (ur-Arab'کھلنا) (unicode'(kʰilnā))
)
Array
(
    [0] => fly    
    [1] => اڑنا (uṛnā)
)
Array
(
    [0] => fog    
    [1] => کوہرا (m) (kuhrā)
)