I have a string like this
$string = "Hello; how are you;Hey, I am fine";
$new = preg_split("/;\w/", $string);
print_r($new);
I am trying to split the string only when there is no white-space between the words and ";". But when I do this, I lose the H from Hey. It's probably because the split happens through the recognition of ;H. Could someone tell me how to prevent this?
My output:
$array = [
0 => [
0 => 'Hello; how are you ',
1 => 0,
],
1 => [
0 => 'ey, I am fine',
1 => 21,
],
]
You might use a word boundary \b
:
$string = "Hello; how are you;Hey, I am fine";
$new = preg_split("/\b;\b/", $string);
print_r($new);
Or a negative lookahead and negative lookbehind
You are capturingthe \w
in your regex.You dont want that. Therefore, do this:
$new = preg_split("/;(?=\w)/", $string);
A capture group is defined in brackets, but the ?= means match but don't capture.
Check it out here https://3v4l.org/Q77LZ
Use split with this regex ;(?=\w)
then you will not lose the H
Lookarounds cost more steps. In terms of pattern efficiency, a word boundary is better and maintains the intended "no-length" character consumption.
In well-formed English, you won't ever have to check for a space before a semi-colon, so only 1 word boundary seems sufficient (I don't know if malformed English is possible because it is not represented in your sample string).
If you want to acquire the offset value, preg_split()
has a flag for that.
Code: (Demo)
$string = "Hello; how are you;Hey, I am fine";
$new = preg_split("/;\b/", $string, -1, PREG_SPLIT_OFFSET_CAPTURE);
var_export($new);
Output:
array (
0 =>
array (
0 => 'Hello; how are you',
1 => 0,
),
1 =>
array (
0 => 'Hey, I am fine',
1 => 19,
),
)