I have a .txt file containing a list of thousands of English words along with their meaning in Urdu language. The file structure is below. Each line start with a word along with its translation in unicode characters.
dict.txt(encoding UTF-8)
Sony سونی (sōnī)
South Ossetia جنوبی اوسیتیا (janūbī osetiyā)
flower (ur-Arab'کھلنا) (unicode'(kʰilnā))
fly اڑنا (uṛnā)
fog کوہرا (m) (kuhrā)
.
.
Note : There are no spaces in the right of word Sony,fly,fog etc i added them for clarity
so far i had done this ..
$file = fopen("dict.txt",'r');
if ($file) {
while($lines = fgets($file)){
$word = '';
$def = ' ';
//want to extract "word" and its "definition" from $lines
}
}
fclose($file);
Now i want to split every line of file in two variables to store them in database as $word and $def to store them in database for further use.
I tired myself using preg_match() and list()+explode() but i am a kind of newbie to my solutions does not work.i also tired searching google but does not find a satisfactory answer.
what i want to do..
{
if found a alphabet other then a-z/A-Z and space break the sting; store left part in variable $name and right part in $def..
}
Thanks in Advance
If the format is always [english][urdu]([pronunciation])
, this should do quite well:
preg_match('/^([\w\s]+)([\W\s]+)\((.+)\)$/', $line, $matches);
echo "English: $matches[1], Urdu: $matches[2], pronunciation: $matches[3]";
[\w\s]+
matches "word and space characters", [\W\s]+
is "non-word and space characters" ("word" means A-Z and a few characters like _) and .*
between parentheses is the rest. See http://rubular.com/r/eHUQFczLah
How about:
$arr = array(
"Sony سونی (sōnī)",
"South Ossetia جنوبی اوسیتیا (janūbī osetiyā)",
"flower (ur-Arab'کھلنا) (unicode'(kʰilnā))",
"fly اڑنا (uṛnā)",
"fog کوہرا (m) (kuhrā)"
);
foreach($arr as $val) {
$list = preg_split('/([\w\s]+)(.+)/', $val, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY );
print_r($list);
}
output:
Array
(
[0] => Sony
[1] => سونی (sōnī)
)
Array
(
[0] => South Ossetia
[1] => جنوبی اوسیتیا (janūbī osetiyā)
)
Array
(
[0] => flower
[1] => (ur-Arab'کھلنا) (unicode'(kʰilnā))
)
Array
(
[0] => fly
[1] => اڑنا (uṛnā)
)
Array
(
[0] => fog
[1] => کوہرا (m) (kuhrā)
)