RegEx替换嵌套结构中的匹配括号[关闭]

How can I replace a set of matching opening/closing parentheses if the first opening parenthesis follows the keyword array? Can regular expressions help with this type of problem?

In order to be more specific, I'd like to solve this using either JavaScript or PHP

// input
$data = array(
    'id' => nextId(),
    'profile' => array(
       'name' => 'Hugo Hurley',
       'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
    )
);

// desired output
$data = [
    'id' => nextId(),
    'profile' => [
       'name' => 'Hugo Hurley',
       'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
    ]
];

Tim Pietzcker gave the Dot-Net counting version.
It has the same elements as the PCRE (php) version below.

All the caveats are the same. In particular, non-array parenthesis must
be balanced because they use the same closing parenthesis as delimiters.

All text must be parsed (or should be).
The outer groups 1, 2, 3, 4 allow you to get the parts
CONTENT
CORE-1 array()
CORE-2 any ()
EXCEPTIONS

Each match gets you one of these outer things and are mutually exclusive.

The trick is to define a php function parse( core) that parses the CORE.
Inside that function is the while (regex.search( core ) { .. } loop.

Each time either CORE-1 or 2 groups match, call the parse( core ) function passing
the contents of that core's group to it.

And inside the loop, just take off content and assign it to the hash.

Obviously, the group 1 construct which calls (?&content) should be replaced
with constructs to obtain your hash like variable data.

On a detailed scale, this can be very tedious.
Usually, you'd have to account for every single character to correctly
parse the entire thing.

(?is)(?:((?&content))|(?>\barray\s*\()((?=.)(?&core)|)\)|\(((?=.)(?&core)|)\)|(\barray\s*\(|[()]))(?(DEFINE)(?<core>(?>(?&content)|(?>\barray\s*\()(?:(?=.)(?&core)|)\)|\((?:(?=.)(?&core)|)\))+)(?<content>(?>(?!\barray\s*\(|[()]).)+))

Expanded

 # 1:  CONTENT
 # 2:  CORE-1
 # 3:  CORE-2
 # 4:  EXCEPTIONS

 (?is)

 (?:
      (                                  # (1), Take off   CONTENT
           (?&content) 
      )
   |                                   # OR -----------------------------
      (?>                                # Start 'array('
           \b array \s* \(
      )
      (                                  # (2), Take off   'array( CORE-1 )'
           (?= . )
           (?&core) 
        |  
      )
      \)                                 # End ')'
   |                                   # OR -----------------------------
      \(                                 # Start '('
      (                                  # (3), Take off   '( any CORE-2 )'
           (?= . )
           (?&core) 
        |  
      )
      \)                                 # End ')'
   |                                   # OR -----------------------------
      (                                  # (4), Take off   Unbalanced or Exceptions
           \b array \s* \(
        |  [()] 
      )
 )

 # Subroutines
 # ---------------

 (?(DEFINE)

      # core
      (?<core>
           (?>
                (?&content) 
             |  
                (?> \b array \s* \( )
                # recurse core of  array()
                (?:
                     (?= . )
                     (?&core) 
                  |  
                )
                \)
             |  
                \(
                # recurse core of any  ()
                (?:
                     (?= . )
                     (?&core) 
                  |  
                )
                \)
           )+
      )

      # content 
      (?<content>
           (?>
                (?!
                     \b array \s* \(
                  |  [()] 
                )
                . 
           )+
      )
 )

Output

 **  Grp 0           -  ( pos 0 , len 11 ) 
some_var =   
 **  Grp 1           -  ( pos 0 , len 11 ) 
some_var =   
 **  Grp 2           -  NULL 
 **  Grp 3           -  NULL 
 **  Grp 4 [core]    -  NULL 
 **  Grp 5 [content] -  NULL 

-----------------------

 **  Grp 0           -  ( pos 11 , len 153 ) 
array(
    'id' => nextId(),
    'profile' => array(
       'name' => 'Hugo Hurley',
       'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
    ) 
)  
 **  Grp 1           -  NULL 
 **  Grp 2           -  ( pos 17 , len 146 ) 

    'id' => nextId(),
    'profile' => array(
       'name' => 'Hugo Hurley',
       'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
    ) 

 **  Grp 3           -  NULL 
 **  Grp 4 [core]    -  NULL 
 **  Grp 5 [content] -  NULL 

-------------------------------------

 **  Grp 0           -  ( pos 164 , len 3 ) 
;

 **  Grp 1           -  ( pos 164 , len 3 ) 
;

 **  Grp 2           -  NULL 
 **  Grp 3           -  NULL 
 **  Grp 4 [core]    -  NULL 
 **  Grp 5 [content] -  NULL 

A previous incarnation of something else, to get an idea of usage

 # Perl code:
 # 
 #     use strict;
 #     use warnings;
 #     
 #     use Data::Dumper;
 #     
 #     $/ = undef;
 #     my $content = <DATA>;
 #     
 #     # Set the error mode on/off here ..
 #     my $BailOnError = 1;
 #     my $IsError = 0;
 #     
 #     my $href = {};
 #     
 #     ParseCore( $href, $content );
 #     
 #     #print Dumper($href);
 #     
 #     print "

";
 #     print "
Base======================
";
 #     print $href->{content};
 #     print "
First======================
";
 #     print $href->{first}->{content};
 #     print "
Second======================
";
 #     print $href->{first}->{second}->{content};
 #     print "
Third======================
";
 #     print $href->{first}->{second}->{third}->{content};
 #     print "
Fourth======================
";
 #     print $href->{first}->{second}->{third}->{fourth}->{content};
 #     print "
Fifth======================
";
 #     print $href->{first}->{second}->{third}->{fourth}->{fifth}->{content};
 #     print "
Six======================
";
 #     print $href->{six}->{content};
 #     print "
Seven======================
";
 #     print $href->{six}->{seven}->{content};
 #     print "
Eight======================
";
 #     print $href->{six}->{seven}->{eight}->{content};
 #     
 #     exit;
 #     
 #     
 #     sub ParseCore
 #     {
 #         my ($aref, $core) = @_;
 #         my ($k, $v);
 #         while ( $core =~ /(?is)(?:((?&content))|(?><!--block:(.*?)-->)((?&core)|)<!--endblock-->|(<!--(?:block:.*?|endblock)-->))(?(DEFINE)(?<core>(?>(?&content)|(?><!--block:.*?-->)(?:(?&core)|)<!--endblock-->)+)(?<content>(?>(?!<!--(?:block:.*?|endblock)-->).)+))/g )
 #         {
 #            if (defined $1)
 #            {
 #              # CONTENT
 #                $aref->{content} .= $1;
 #            }
 #            elsif (defined $2)
 #            {
 #              # CORE
 #                $k = $2; $v = $3;
 #                $aref->{$k} = {};
 #      #         $aref->{$k}->{content} = $v;
 #      #         $aref->{$k}->{match} = $&;
 #                
 #                my $curraref = $aref->{$k};
 #                my $ret = ParseCore($aref->{$k}, $v);
 #                if ( $BailOnError && $IsError ) {
 #                    last;
 #                }
 #                if (defined $ret) {
 #                    $curraref->{'#next'} = $ret;
 #                }
 #            }
 #            else
 #            {
 #              # ERRORS
 #                print "Unbalanced '$4' at position = ", $-[0];
 #                $IsError = 1;
 #     
 #                # Decide to continue here ..
 #                # If BailOnError is set, just unwind recursion. 
 #                # -------------------------------------------------
 #                if ( $BailOnError ) {
 #                   last;
 #                }
 #            }
 #         }
 #         return $k;
 #     }
 #     
 #     #================================================
 #     __DATA__
 #     some html content here top base
 #     <!--block:first-->
 #         <table border="1" style="color:red;">
 #         <tr class="lines">
 #             <td align="left" valign="<--valign-->">
 #         <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
 #         <!--hello--> <--again--><!--world-->
 #         some html content here 1 top
 #         <!--block:second-->
 #             some html content here 2 top
 #             <!--block:third-->
 #                 some html content here 3 top
 #                 <!--block:fourth-->
 #                     some html content here 4 top
 #                     <!--block:fifth-->
 #                         some html content here 5a
 #                         some html content here 5b
 #                     <!--endblock-->
 #                 <!--endblock-->
 #                 some html content here 3a
 #                 some html content here 3b
 #             <!--endblock-->
 #             some html content here 2 bottom
 #         <!--endblock-->
 #         some html content here 1 bottom
 #     <!--endblock-->
 #     some html content here1-5 bottom base
 #     
 #     some html content here 6-8 top base
 #     <!--block:six-->
 #         some html content here 6 top
 #         <!--block:seven-->
 #             some html content here 7 top
 #             <!--block:eight-->
 #                 some html content here 8a
 #                 some html content here 8b
 #             <!--endblock-->
 #             some html content here 7 bottom
 #         <!--endblock-->
 #         some html content here 6 bottom
 #     <!--endblock-->
 #     some html content here 6-8 bottom base
 # 
 # Output >>
 # 
 #     Base======================
 #     some html content here top base
 #     
 #     some html content here1-5 bottom base
 #     
 #     some html content here 6-8 top base
 #     
 #     some html content here 6-8 bottom base
 #     
 #     First======================
 #     
 #         <table border="1" style="color:red;">
 #         <tr class="lines">
 #             <td align="left" valign="<--valign-->">
 #         <b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
 #         <!--hello--> <--again--><!--world-->
 #         some html content here 1 top
 #         
 #         some html content here 1 bottom
 #     
 #     Second======================
 #     
 #             some html content here 2 top
 #             
 #             some html content here 2 bottom
 #         
 #     Third======================
 #     
 #                 some html content here 3 top
 #                 
 #                 some html content here 3a
 #                 some html content here 3b
 #             
 #     Fourth======================
 #     
 #                     some html content here 4 top
 #                     
 #                 
 #     Fifth======================
 #     
 #                         some html content here 5a
 #                         some html content here 5b
 #                     
 #     Six======================
 #     
 #         some html content here 6 top
 #         
 #         some html content here 6 bottom
 #     
 #     Seven======================
 #     
 #             some html content here 7 top
 #             
 #             some html content here 7 bottom
 #         
 #     Eight======================
 #     
 #                 some html content here 8a
 #                 some html content here 8b
 #         

How about the following (using the .NET regex engine):

resultString = Regex.Replace(subjectString, 
    @"\barray\(            # Match 'array('
    (                      # Capture in group 1:
     (?>                   # Start a possessive group:
      (?:                  # Either match
       (?!\barray\(|[()])  # only if we're not before another array or parens
       .                   # any character
      )+                   # once or more
     |                     # or
      \( (?<Depth>)        # match '(' (and increase the nesting counter)
     |                     # or
      \) (?<-Depth>)       # match ')' (and decrease the nesting counter).
     )*                    # Repeat as needed.
     (?(Depth)(?!))        # Assert that the nesting counter is at zero.
    )                      # End of capturing group.
    \)                     # Then match ')'.", 
    "[$1]", RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);

This regex matches array(...) where ... may contain anything except another array(...) (so, it only matches the most deeply nested occurrences). It does allow for other nested (and correctly balanced) parentheses within the ..., but it does not do any checking if those are semantic parentheses or if they are contained within strings or comments.

In other words, something like

array(
   'name' => 'Hugo ((( Hurley',
   'numbers' => (4 + 8 + 15 + 16 + 23 + 42) / 108
)

would fail to match (correctly).

You need to apply that regex iteratively until it doesn't modify its input any more - in the case of your example, two iterations would suffice.