I have seen many (before you go flagging this as a duplicate) on how to do this, but for some reason my output isn't working:
// $delimiters wanted: ', ' | '; ' | ',' | ';' | ' , ' | ', and ' | ' and ' | ',and '
$str = 'Name 1, Name 2; Name 3;Name4 , Name 5,Name 6, and Name 7,and Name 8 and Name 9';
$delimiter = array(
', ',
'; ',
';',
',',
' , ',
', and ',
' and ',
',and '
);
$str_new = explode( $delimiter[0], str_replace($delimiter, $delimiter[0], $str) );
However, when I output the array, I get this:
<?php foreach($str_new as $new) { echo 'a' . $new; } ?>
Array (
[0] => Name 1
[1] => Name 2
[2] => Name 3
[3] => // WHY IS THIS EMPTY?
[4] => Name 4
...
)
So is there a better way to match the delimiters I have listed?
I'd use regexp like this in your case:
preg_split('/,? ?and | ?[,;] ?/', $str)
You may also want to replace spaces by \s
if the other space characters may appear (like TAB, for example) or even \s*
instead of ?
to cover the case of multiple spaces.
Have you tried something like this from php.net?
<?php
//$delimiters has to be array
//$string has to be array
function multiexplode ($delimiters,$string) {
$ready = str_replace($delimiters, $delimiters[0], $string);
$launch = explode($delimiters[0], $ready);
return $launch;
}
$text = "here is a sample: this text, and this will be exploded. this also | this one too :)";
$exploded = multiexplode(array(",",".","|",":"),$text);
print_r($exploded);
?>
Or something like Split String by Multiple Delimiters in PHP
In your code, between Name 6, and Name 7
, first the ,
gets replaced, then the and
.
Therefore you end up with this string:
Name 1, Name 2, Name 3, Name4, Name 5, Name 6, , Name 7, Name 8, Name 9
Hence, the empty value...
Clean your result array before outputting and you should be fine:
$str_out = array_filter($str_new);
The problem in your approach is, that you want to solve a problem using the wrong way. Even if you manage to create a list of delimiters, what happens if you need e.g. separate the words by another character, let's say, a '$' sign?
You should implement a tokenizer/lexer which reads the input char by char and distinguishes between white spaces, terminal and non terminal symbols/characters. The lexer would then generate a sequence of token, e.g.
STRING-SYMBOL:'NAME1'
KOMMA-SYMBOL
AND-SYMBOL
STRING-SYMBOL:'NAME2'
SEMICOLON-SYMBOL
STRING-SYMBOL:'NAME3'
AND-SYMBOL
...
EOF-SYMBOL
You then simply filter out any non STRING-SYMBOL
symbols (or you combine strings using the AND-SYMBOL
. This is (imho) the only rock solid solution. It is also very easy to extend and to generalize: Once you have written a nice tokenizer/lexer, you can use this approach for almost any string analyzing problem.
Writing a tokenizer is generally very simple: It scans the input char by char and first categorizes the char. It implements a simple state machine to collect characters which will form a symbol.
You may try to implement this using a regex, which should be possible as well. Anyway, the tokenizer will generate a list of token (or will retrieve the next one upon request). The last token it will retrieve is the EOF-TOKEN
indicating that the input sequence has been fully traversed.