如何在php中操作复杂的字符串?

I am trying to group bunch of texts from a string and create an array for it.

The string is something like this:

<em>string</em>  and the <em>test</em> here.  
tableBegin rowNumber:2, columnNumber:2  11 22 33 44 tableEnd  
<em>end</em> text here

I was hoping to get an array like the following results

array (0 => '<em>string</em>  and the <em>test</em> here.',
         1=>'rowNumber:5',
         2=>'columnNumber:3',
         3=>'11',
         4=>'22',
         5=>'33',
         6=>'44'
         7=>'<em>end</em> text here')

11,22,33,44 are the table cell data the user enters. I want to make them have unique index but keep the rest of texts together.

tableBegin and tableEnd are just the check for the table cell data

Any help or tips? Thanks a lot!

You may try the following, note that you need PHP 5.3+:

$string = '<em>string</em>  and the <em>test</em> here.  
tableBegin rowNumber:2, columnNumber:2  11 22 33 44 tableEnd
SOme other text
tableBegin rowNumber:3, columnNumber:3  11 22 33 44 55 tableEnd
<em>end</em> text here';

$array = array();
preg_replace_callback('#tableBegin\s*(.*?)\s*tableEnd\s*|.*?(?=tableBegin|$)#s', function($m)use(&$array){
    if(isset($m[1])){ // If group 1 exists, which means if the table is matched
        $array = array_merge($array, preg_split('#[\s,]+#s', $m[1])); // add the splitted string to the array
      // split by one or more whitespace or comma --^
    }else{// Else just add everything that's matched
        if(!empty($m[0])){
            $array[] = $m[0];
        }
    }
}, $string);
print_r($array);

Output

Array
(
    [0] => string  and the test here.  

    [1] => rowNumber:2
    [2] => columnNumber:2
    [3] => 11
    [4] => 22
    [5] => 33
    [6] => 44
    [7] => SOme other text

    [8] => rowNumber:3
    [9] => columnNumber:3
    [10] => 11
    [11] => 22
    [12] => 33
    [13] => 44
    [14] => 55
    [15] => end text here
)

Regex explanation

  • tableBegin : match tableBegin
  • \s* : match a whitespace zero or more times
  • (.*?) : match everything ungreedy and put it in group 1
  • \s* : match a whitespace zero or more times
  • tableEnd : match tableEnd
  • \s* : match a whitespace zero or more times
  • | : or
  • .*?(?=tableBegin|$) : match everything until tableBegin or end of line
  • The s modifier : make dots also match newlines

Here is the ugly way to do it, if you can't find a Regex guru out ther.

So, this is your text

$string =   "<em>string</em>  and the <em>test</em> here.  
tableBegin rowNumber:2, columnNumber:2  11 22 33 44 tableEnd  
<em>end</em> text here";

And this is my code

$E = explode(' ', $string);
$A =  $E[0].$E[1].$E[2].$E[3].$E[4].$E[5];
$B =  $E[17].$E[18].$E[19];
$All = [$A, $E[8],$E[9], $E[11], $E[12], $E[13], $E[14], $B];

print_r($All);

And this is the output

Array
(
    [0] => stringandthetesthere.
    [1] => rowNumber:2,
    [2] => columnNumber:2
    [3] => 11
    [4] => 22
    [5] => 33
    [6] => 44
    [7] => endtexthere
)

off-course, the <em> tags won't be visible, unless view the source code.