I've been wondering, is it possible to group every 2 words using regex? For 1 word i use this:
((?:\w'|\w|-)+)
This works great. But i need it for 2 (or even more words later on).
But if I use this one:
((?:\w'|\w|-)+) ((?:\w'|\w|-)+)
it will make groups of 2 but not really how i want it. And when it encounters a special char it will start over.
Let me give you an example:
If I use it on this text: This is an . example text using & my / Regex expression
It will make groups of This is example text regex expression
and i want groups like this: This is is an an example example text text using using my my regex regex expression
It is okay if it resets after a . So that it won't match hello . guys
together for example.
Is this even possible to accomplish? I've just started experimenting with RegEx so i don't quite know the possibilities with this.
If this isn't possible could you point me in a direction that I should take with my problem?
Thanks in advance!
try this
$samp = "This is an . example text using & my / Regex expression";
//removes anything other than alphabets
$samp = preg_replace('/[^A-Z ]/i', "", $samp);
//removes extra spaces
$samp = str_replace(" "," ",$samp);
//the following code splits the sentence into words
$jk = explode(" ",$samp);
$i = sizeof($jk);
$j = 0;
//this combines words in desired format
$array="";
for($j=0;$j<$i-1;$j++)
{
$array[] = $jk[$j]." ".$jk[$j+1];
}
print_r($array);
EDIT
for your question
I've changed the regex like this: "/[^A-Z0-9-' ]/i" so it doesn't mess up words like 'you're' and '9-year-old' for example. But by doing this when there is a seperate - or ' in my text, it will treat those as a seperate words. I know why it does this but is it preventable?
change the regex like this
preg_replace('/[^A-Z0-9 ]+[^A-Z0-9\'-]/i', "", $samp)
First, strip out non-word characters (replace \W
with ''
) Then perform your match. Many problems can be made simpler by breaking them down. Regexes are no exception.
Alternatively, strip out non-word characters, condense whitespace into single spaces, then use explode
on space and array_chunk
to group your words into pairs.
Regex is an overkill for this. Simply collect the words, then create the pairs:
$a = array('one', 'two', 'three', 'four');
$pairs = array();
$prev = null;
foreach($a as $word) {
if ($prev !== null) {
$pairs[] = "$prev $word";
}
$prev = $word;
}
Live demo: http://ideone.com/8dqAkz