像这样的正则表达式模式和返回数组

i want to return array from string like wordpress short code does , but i want the array to be like the example

i have this string

$str = 'codes example : [code lang="php"]<?php  echo "Hello Wold" ; ?>[/code]  [code lang="html"]<b>Hello</b>[/code]' ;

and i want to return contain

array(
   array(
     'code' => '[code lang="php"]<?php  echo "Hello Wold" ; ?>[/code]' ,
     'function' => 'code' ,
     'attr' => array( 'lang' => 'php' ) ,
     'value' => '<?php  echo "Hello Wold" ; ?>'
   ) ,
   array(
     'code' => '[code lang="html"]<b>Hello</b>[/code]' ,
     'function' => 'code' ,
     'attr' => array( 'lang' => 'html' ) ,
     'value' => '<b>Hello</b>'
   )
)

i tried to do it using preg_match_all

i used this pattern /[[a-z]{3,}+ *[a-z]{2,}=(.*)+ *](.*)[\/[a-z]{3,}]/U

and the result was

Array ( [0] => Array ( [0] => [link href="http://www.php.net" text="php"][/link] [1] => [code lang="php"][/code] [2] => [code lang="html"]Hello[/code] ) [1] => Array ( [0] => " [1] => " [2] => " ) [2] => Array ( [0] => [1] => [2] => Hello ) )

You can try something like this:

preg_match_all(
    '#(?P<block>\[(?P<tag>[a-z]{3,})\s*(?P<attr>[a-z-_]+="[^\]]+")*\](?P<content>((?!\[/(?P=tag)).)*)\[/(?P=tag){1}\])#',
    'codes example : [code lang="php" test="true"]<?php  echo "Hello Wold" ; ?>[/code] [code lang="js"]console.log(\'yeah!\')[/code] [noattr]no attr content[/noattr]',
    $matches,
    PREG_SET_ORDER
);
foreach ($matches as &$match) {
    $match = array_intersect_key($match, array_flip(array('block', 'tag', 'attr', 'content')));;
}
print_r($matches);

result should be:

Array
(
    [0] => Array
        (
            [block] => [code lang="php" test="true"]<?php  echo "Hello Wold" ; ?>[/code]
            [tag] => code
            [attr] => lang="php" test="true"
            [content] => <?php  echo "Hello Wold" ; ?>
        )

    [1] => Array
        (
            [block] => [code lang="js"]console.log('yeah!')[/code]
            [tag] => code
            [attr] => lang="js"
            [content] => console.log('yeah!')
        )

    [2] => Array
        (
            [block] => [noattr]no attr content[/noattr]
            [tag] => noattr
            [attr] =>
            [content] => no attr content
        )

)

You should write a parser. This may seem incredibly complex but actually it's very simple. You only need to keep track of a couple of things.

Outline:

  • Read the string character-by-character
  • If you see a [ record that you saw it, you will now be looking for a ]
  • If you see a " before ] you will want to find another " first.
  • When you see ] you'll know the 'function' and the 'attr'
  • When you've found '/function' you know the 'value'

With these simple checks you can build a list of tokens, like your example output.

You'll want to use named groups: http://www.regular-expressions.info/named.html

Excerpt:

(?Pgroup) captures the match of group into the backreference "name"

EDIT: so you need to insert the named group idea into your regex.