To split up a string, I come up with...
<php
preg_match_all('/(\w)|(,.!?;)/', "I'm a little teapot, short and stout.", $matches);
print_r($matches[0]);
I thought this would separate each word (\w) and the specified punctuation (,.!?;). For example: ["I'm", "a", "little", "teapot", ",", "short", "and", "stout", "."]
Instead I get:
Array
(
[0] => I
[1] => m
[2] => a
[3] => l
[4] => i
[5] => t
[6] => t
[7] => l
[8] => e
[9] => t
[10] => e
[11] => a
[12] => p
[13] => o
etc...
What am I doing wrong here?
Thanks in advance.
Try this - sure it works as you want:
([\w]+)|[,.!?;]+
Also want to share with you one very useful service - online regex tester
You have two faults:
\w
matches only a single character. You want to match multiple by \w+
. Furthermore \w
matches only alphanumeric characters. If you want to match other characters like '
you will need to include them: [\w']
.(,.!?;)
matches the character sequence ,.!?;
. Instead you want to match any of these characters using [,.!?;]
.The correct regex is:
'/[\w\']+|[,.!?;]/'
If you want to be more permissive you should use unicode character classes instead (allows letters, numbers, combining marks, dash characters and the apostrophe for words and punctuation for punctuation):
'/[\pL\pN\pM\pPd\']+|\pP/u'
You may want to try something like:
/([^,.!?; ]+)|(,.!?;)/