I have a MySQL text field in an online diary I have, which sometimes contain text like D<num> <tag>
, for example D109 MU
.
Those references can appear in any part of the field - so might be:
D109 MU, worked from home today
Walked the dog, later took the kids to swimming. D110 MU. Went to the gym in the evening for the 9th time this month.
I have worked out an SQL query to pull out the references which include the D<num> <tag>
content, via this - so for example, by going to URL:
example.com/tidy.php?v1=7346&v2=90000&tag=MU
The querystring data is used to get the data out of the field:
$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
if (!empty($_GET['v1'])) {
$v1 = $purifier->purify($_GET['v1']);
}
if (!empty($_GET['v2'])) {
$v2 = $purifier->purify($_GET['v2']);
}
if (!empty($_GET['tag'])) {
$tag = $purifier->purify($_GET['tag']);
}
$sql = "select id, post_date, post_content from tbl_log_days where id between :v1 and :v2 and post_content REGEXP :exp ";
$stmt = $pdo->prepare($sql);
$stmt->bindParam(':v1', $v1);
$stmt->bindParam(':v2', $v2);
$stmt->bindValue(":exp" , "D[0-9]+ $tag", PDO::PARAM_STR);
$stmt->execute();
That works okay - so I get the relevant post_content
entries.
However, I am struggling working out the syntax to pull out only the number of the D part of the content.
I have got this far:
while ($row = $stmt->fetch()){
$id = $row['id'];
$dt = $row['post_date'];
$pc = $row['post_content'];
preg_match_all('/\d+/', $pc, $matches);
$number = implode(' ', $matches[0]);
echo "$number <hr>";
}
The trouble with that is often the content includes multiple numbers, but I only want to get the number that appears between the D and the tag
value. So for D109 MU, I'd want to extract 109, and for the 2nd example, I'd want to extract 110 from D110 MU, but ignore the number 9 that appears later in that same field.
How could I achieve that?
You are not specific if the MU
is a reliable string to match, so I'm leaving that out. Match the D
, restart the fullstring match with \K
, then match 1 or more digits.
Code: (Demo) (Regex101 Demo)
$string = 'D109 MU, worked from home today
Walked the dog, later took the kids to swimming. D110 MU. Went to the gym in the evening for the 9th time this month.';
var_export(preg_match_all('~D\K\d+~', $string, $out) ? $out[0] : 'fail');
Output:
array (
0 => '109',
1 => '110',
)
Extension: If you need to increase the pattern accuracy by adding the known tag
value, you can add the $tag
variable to the pattern as a lookahead.
Code: (Demo)
$tag = "MU";
$string = 'D109 MU, worked from home today
Walked the dog, later took the kids to swimming. D110 MU. Went to the gym in the evening for the 9th time this month.';
var_export(preg_match_all("~D\K\d+(?= $tag)~", $string, $out) ? $out[0] : 'fail');
Furthermore, if your strings only contain one qualifying <num>
, then preg_match()
will suffice.
Code: (Demo)
$tag = "MU";
$strings = [
'D109 MU, worked from home today',
'Walked the dog, later took the kids to swimming. D110 MU. Went to the gym in the evening for the 9th time this month.'
];
foreach ($strings as $string) {
echo "
---
" , preg_match("~D\K\d+(?= $tag)~", $string, $out) ? $out[0] : 'fail';
}
Output:
---
109
---
110
Assuming that the tag is always MU
.
$re = '/D(\d*) MU/'; //or $re = '/D(\d+) MU/';
//if the tag is not always MU, but 2 upcase characters, use the $re below
//$re = '/D(\d*) [A-Z]{2}/'; //or //$re = '/D(\d+) [A-Z]{2}/';
$str = 'D109 MU, worked from home today
Walked the dog, later took the kids to swimming. D110 MU. Went to the gym in the evening for the 9th time this month.';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
$matches
will contain the numbers you need. The output is as below
array(2) {
[0]=>
array(2) {
[0]=>
string(7) "D109 MU"
[1]=>
string(3) "109"
}
[1]=>
array(2) {
[0]=>
string(7) "D110 MU"
[1]=>
string(3) "110"
}
}