I'm working on a web app that uses scraping to harvest it's data. I have run into a roadblock in that I'm unsure on how to write a regular expression to extract the data I need.
I need to extract the distance and grade from a string like the following.
"The Bet with the Tote 525 (A6) 525y"
The grade is the "A6" and the distance is the "525y".
Every now and again, the string has another set of brackets in it that need to be ruled out. For example in this string:
"The Bet with the Tote (Starter race) Some more info (A6) 525y"
I will need the second set of brackets. The grade and distance are always appended to the end of the description so will always be at the end of the string.
I have tried simply using substr() to get the number of characters from the end of the string but every now and again, the distance is set to something like "525yH" which completely throws it out. For that reason, I would guess that a regular expression would be the best option.
Any help greatly appreciated.
Dan
Extended Information
If data pattern is fixed, why not use EXPLODE ?
<?php
$str = "The Bet with the Tote 525 (A6) 525y";
$strArr = explode(" ",$str);
$arrCount = count($strArr);
$data1 = $strArr[$arrCount - 1];
$data2 = $strArr[$arrCount - 2];
echo $data1," , ",$data2;
?>
Try with:
/.*?\((.*?)\)\W+(.*)$/
Thanks to the update question it's a simple as:
preg_match('/(\(\w+\)) (\w+)H?/', $str, $matches);
Usage:
$str = "The Bet with the Tote 525 (A6) 525y";
print_r($matches);
outputs:
Array
(
[0] => (A6) 525y
[1] => (A6)
[2] => 525y
)
or:
$str = "The Bet with the Tote (Starter race) Some more info (A6) 525y";
print_r($matches);
outputs:
Array
(
[0] => (A6) 525y
[1] => (A6)
[2] => 525y
)
Although I personally prefer the elegance if the explode method, it then would require and extra condition and possible operation to remove the trailing H.
You could try:
([^)]+) (\d+y.?)$
which is a little more specific
Since
The grade and distance are always appended to the end of the description so will always be at the end of the string.
Something like the following, without regex, might work. That is, assuming your above statement is correct.
$text = "The Bet with the Tote (Starter race) Some more info (A6) 525y";
array_slice(explode(" ", $text), -2, 2);
//returns
Array
(
[0] => (A6)
[1] => 525y
)
$str = 'The Bet with the Tote 525 (A6) 525y';
preg_match_all('/.*\((?P<grade>.+?)\)\s(?P<distance>.+?)$/', $str, $matches);
var_dump($matches);
array(5) {
[0]=>
array(1) {
[0]=>
string(9) "(A6) 525y"
}
["grade"]=>
array(1) {
[0]=>
string(2) "A6"
}
[1]=>
array(1) {
[0]=>
string(2) "A6"
}
["distance"]=>
array(1) {
[0]=>
string(4) "525y"
}
[2]=>
array(1) {
[0]=>
string(4) "525y"
}
}
So you can access the grade and distance by accessing $matches['grade']
and $matches['distance']
.
Your second string...
The Bet with the Tote (Starter race) Some more info (A6) 525y
array(5) {
[0]=>
array(1) {
[0]=>
string(61) "The Bet with the Tote (Starter race) Some more info (A6) 525y"
}
["grade"]=>
array(1) {
[0]=>
string(2) "A6"
}
[1]=>
array(1) {
[0]=>
string(2) "A6"
}
["distance"]=>
array(1) {
[0]=>
string(4) "525y"
}
[2]=>
array(1) {
[0]=>
string(4) "525y"
}
}