I have a series of strings in the form:
{ method_name { $key1 = 'quoted value' , $key2 = __('literal value'); }}
// Missing method_name and final semi-colon
// Still valid
{{ $key1 = 'quoted value' , $key2 = __('literal value') }}
// Optional key values
{ method_name { $key1 = , $key2 = __('literal value'); }}
{ method_name { $key1, $key2 = __('literal value'); }}
// Any number of values
{ method_name { $key1 = 'quoted value' , $keyN = 3.14; }}
Currently, I use a series of preg_split
and trim
. This is part of a custom template engine where method_name
informs the parser which method to call and $key = value
will be passed to the method as an array. These strings are embedded in a HTML template and that DOM structure may be repeated. Think of it as a table with each row/column having a different value. The keys are the column details (name,sortable etc.) and the method will fill in details of the cell.
The problem I'm having is speed.
Q1. How can I do this with a single expression?
Q2. Will I gain any speed?
Q3. Provided I cache the result, is readability preferred over a somewhat complicated regex?
Q4. Is there any way I can restructure the strings for a performance boost?
Ideally, I'd like to scan the string only once, convert it to PHP code, and do an eval
each time it needs to be used.
I would perhaps use a regex like this (I found some parts to simplify from the one in the comments):
(?:\{ (?:(?<method>.+?)\s+\{)?|\G)[,\s]*(?<key>\$\w+)(?: = (?<value>[^,
;}]*))?
The named capture groups are self explanatory, but here's a breakdown:
(?:
\{
(?:
(?<method>.+?) # Captures everything until the next { for the method
\s+\{
)? # All this optional
|
\G # Or \G anchor, which will allow successive match of multiple key/value pairs
)
[,\s]* # Any spaces and commas
(?<key>\$\w+) # Capture of key with format $\w+
(?: =
(?<value>[^,
;}]*) # Capture of value
)? # All this optional
Your performance concerns may be misguided. You seem to think that your regex matching is taking a long time. Presumably the program you have is taking longer than you would like.
Do not go optimizing your regular expressions unless and until you have found that they are actually the cause of your speed problems. To find out if that's the case, you'll need to use a code profiler like XDebug to analyze your programs and create a report that shows you what is slow.
You might find that your program takes 20 seconds to run, and of that time, 2 seconds are spent in the regexes. Even if you cut the execution time of the regex matching in half, you would only save 1 second, or 5% of the run time.
It is premature optimization to go trying to speed up code without knowing in which part of the code the most time is being spent.