After a few hours of experimenting with http://www.phpliveregex.com/ - I have become rather stuck. I am looking for a regular expression that would satisfy the following example criteria:
arrname = array('blackberry', 'apple', 'orange', 'mandarin');
arrname = array('****11111', '2%%%2', '3$$$$33', '444£££44');
So essentially it boils down to the following pattern:
[arrname = array('] [any characters] [', '] [any characters] [', '] [any characters] [', '] [any characters] [');]
Where any character is literally any (letter / number / symbol) - in any order (but requires a length of at least 1 character) - hence the need for a regular expression using the preg_match()
function.
My trouble is making the regular expression match the pattern stated above (repeated below).
[arrname = array('] [any characters] [', '] [any characters] [', '] [any characters] [', '] [any characters] [');]
UPDATE:
Having tried to implement preg_match()
I've failed and am obviously missing something really stupid (errors listed below). Any idea's?
First (using double quotes on the expression)
$pattern = "arrname = array\('([^']+)', '([^']+)', '([^']+)', '([^']+)'\);";
preg_match($pattern, $data, $matches);
Gives me the error Warning: preg_match(): Delimiter must not be alphanumeric or backslash
Second (using single quote on the expression)
$pattern = 'arrname = array\('([^']+)', '([^']+)', '([^']+)', '([^']+)'\);';
preg_match($pattern, $data, $matches);
Gives me the error Parse error: syntax error, unexpected '('
As an aside comment, I will describe regex engine behaviour with two different patterns step by step. Keep in mind that this is only a short representation, in real life strings are processed character by character. The goal of this is to see the regex engine road.
string: arr = array('cherry', 'apple');
pattern 1: arr = array\('(.+)', '(.+)'\);
1 |
arr = array('cherry', 'apple');| arr = array\('
2 |
arr = array('cherry', 'apple');| arr = array\('(.+)
Since there is no
'
after the;
, the regex engine must backtrack character by character to find a match. For each backtrack position the end of the pattern is tested. This is reason why I count each backtrack position as a step.
3 |
arr = array('cherry', 'apple');| arr = array\('(.+)
4 |
arr = array('cherry', 'apple');| arr = array\('(.+)
5 |
arr = array('cherry', 'apple');| arr = array\('(.+)
The
'
is found, the regex engine stop to backtrack and continue
6 |
arr = array('cherry', 'apple');| arr = array\('(.+)',
There's no
,
after the'
the RE restart backtracking to find another'
7 |
arr = array('cherry', 'apple');| arr = array\('(.+)
...
13 |
arr = array('cherry', 'apple');| arr = array\('(.+)
another
'
is found
14 |
arr = array('cherry', 'apple');| arr = array\('(.+)'
but not followed by a
,
too
15 |
arr = array('cherry', 'apple');| arr = array\('(.+)
16 |
arr = array('cherry', 'apple');| arr = array\('(.+)
17 |
arr = array('cherry', 'apple');| arr = array\('(.+)
another
'
is found followed by a, '
18 |
arr = array('cherry', 'apple');| arr = array\('(.+)', '
19 |
arr = array('cherry', 'apple');| arr = array\('(.+)', '(.+)
There is no
'
after the;
, ...
20 |
arr = array('cherry', 'apple');| arr = array\('(.+)', '(.+)
21 |
arr = array('cherry', 'apple');| arr = array\('(.+)', '(.+)
22 |
arr = array('cherry', 'apple');| arr = array\('(.+)', '(.+)
The
'
is found, followed by all the literals at the end of the pattern
23 |
arr = array('cherry', 'apple');| arr = array\('(.+)', '(.+)'\);
pattern 2: arr = array\('([^']+)', '([^']+)'\);
1 |
arr = array('cherry', 'apple');| arr = array\('
2 |
arr = array('cherry', 'apple');| arr = array\('([^']+)
Now the regex engine is forced to stop before the
'
since the character class contains all characters except the'
3 |
arr = array('cherry', 'apple');| arr = array\('([^']+)', '
4 |
arr = array('cherry', 'apple');| arr = array\('([^']+)', '([^']+)
The RE stop before the
'
for the same reason
5 |
arr = array('cherry', 'apple');| arr = array\('([^']+)', '([^']+)'\);
Depending on how specific you need to be, this is a quick and clean way to go: /'([^']+)'/
. That looks for anything between single quotes that isn't a single quote.