im using this regex to extract salaries from a file containing many different lines of text:
/(£|\€|GBP)\s*?(.)*?(pro-rata|\x28pro-rata\x29)/i
The test cases are like this:
"Relevant Quantity Surveying construction drawings Salary: £36,999 (pro-rata) in accordance with standard construction terms ..."
It matches:
But all matches of parenthesized 'pro-rata' are missing the right parenthtisis - i.e :
£36,999 (pro-rata
any ideas what wrong with this???
I tried using
/(£|\€|GBP)\s*?(.)*?(pro-rata|\(pro-rata\))/i
and it works for me as expected.
This should be:
/(£|\€|GBP)\s*?(.)*?(\x28pro-rata\x29|pro-rata)/i
The problem is that you have (.)*?
which will match any characters up to pro-rata
including a (
which means it matchings the first term in your regex (pro-rata
).
Note: this behavior appears to differ for some PHP versions (possible based on the version of PCRE it was compiled with).
I rewrited your pattern a bit:
\(£|\€|GBP)\s*?([0-9,]*)?\s*?(\x28pro-rata\x29|pro-rata)\i
For examples and why is that, take a look: http://regex101.com/r/mP4hX4