I have a text for a book pages that may have footnotes at the end of the string like the following example:
والخاتِم بكسر التاء اسم فاعل، فكأنه قد جاء آخر الرسل، والخاتَم بفتح التاء اسم آلة، كأنه قد ختمت به الرسالة.
__________
(1) - سورة الأحزاب آية : 43.
(2) - سورة البقرة آية : 157.
(3) - سورة الأنعام آية : 17.
(4) - سورة الكهف آية : 19.
The line that I mean in the sample and the specific characters in this case are Kashidas _
(It is not dash -
), in Latin, it called underscore. What I need to get is matching the four lines or any number of lines under that line.
What I have tried let only to match the first line under that line:/_.* *(.*)/gum
and this is a demo. The only way to get them all, is to repeat the pattern portion *(.*)
n times equals to the number of lines in the footnotes i.e four times, regarding the example case, and this is not a practical solution like this demo
You can utilize the \G
anchor here:
preg_match_all('~(?:\G(?!^)|_)\R+\K[^
]+~', $str, $matches);
print_r($matches[0]);
Basically its not that easy to catch lines, and then every match. But what can you do is to catch everything after line, and then match again every line.
You can do that making:
/_{4,}.+/gums
/(\(.*?\.)*/gums
I hope that is good enough for you.
I just tested this successfully:
$text = "_________
Line 1
Line 2
Line 3
";
$matches = array();
$pattern = '/_+
(.+)/s'; // s to have . match newlines.
// Change
to
if appropriate
// Extract all footnotes
preg_match($pattern, $text, $matches);
$footnotes = $matches[1]; // $matches[0] is the whole matched string,
// $matches[1] is the part within ()
$matches = array();
$pattern = '/(.+)/'; // Don't match newlines here
// Extract individual footnotes
preg_match_all($pattern, $footnotes, $matches);
foreach ($matches[0] as $match) { // preg_match_all returns multi-dimensional array
// Do something with each footnote
}