I need a systematic way of replacing each word in a string separately by providing my own input for each word. I want to do this on the command line.
So the program reads in a string, and asks me what I want to replace the first word with, and then the second word, and then the third word, and so on, until all words have been processed.
The sentences in the string have to remain well-formed, so the algorithm should take care not to mess up punctuation and spacing.
Is there a proper way to do this?
Given some text
$subject = <<<TEXT
I need a systematic way of replacing each word in a string separately by providing my own input for each word. I want to do this on the command line.
So the program reads in a string, and asks me what I want to replace the first word with, and then the second word, and then the third word, and so on, until all words have been processed.
The sentences in the string have to remain well-formed, so the algorithm should take care not to mess up punctuation and spacing.
Is there a proper way to do this?
TEXT;
You first tokenize the string into words and "everything else" tokens (e.g. call them fill). Regular expressions are helpful for that:
$pattern = '/(?P<fill>\W+)?(?P<word>\w+)?/';
$r = preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
The job is now to convert the return value into a more useful data-structure, like an array of tokens and an index of all words used:
$tokens = array(); # token stream
$tokenIndex = 0;
$words = array(); # index of words
foreach($matches as $matched)
{
foreach($matched as $type => $match)
{
if (is_numeric($type)) continue;
list($string, $offset) = $match;
if ($offset < 0) continue;
$token = new stdClass;
$token->type = $type;
$token->offset = $offset;
$token->length = strlen($string);
if ($token->type === 'word')
{
if (!isset($words[$string]))
{
$words[$string] = array('string' => $string, 'tokens' => array());
}
$words[$string]['tokens'][] = &$token;
$token->string = &$words[$string]['string'];
} else {
$token->string = $string;
}
$tokens[$tokenIndex] = &$token;
$tokenIndex++;
unset($token);
}
}
Exemplary you can then output all words:
# list all words
foreach($words as $word)
{
printf("Word '%s' used %d time(s)
", $word['string'], count($word['tokens']));
}
Which would give you with the sample text:
Word 'I' used 3 time(s)
Word 'need' used 1 time(s)
Word 'a' used 4 time(s)
Word 'systematic' used 1 time(s)
Word 'way' used 2 time(s)
Word 'of' used 1 time(s)
Word 'replacing' used 1 time(s)
Word 'each' used 2 time(s)
Word 'word' used 5 time(s)
Word 'in' used 3 time(s)
Word 'string' used 3 time(s)
Word 'separately' used 1 time(s)
Word 'by' used 1 time(s)
Word 'providing' used 1 time(s)
Word 'my' used 1 time(s)
Word 'own' used 1 time(s)
Word 'input' used 1 time(s)
Word 'for' used 1 time(s)
Word 'want' used 2 time(s)
Word 'to' used 5 time(s)
Word 'do' used 2 time(s)
Word 'this' used 2 time(s)
Word 'on' used 2 time(s)
Word 'the' used 7 time(s)
Word 'command' used 1 time(s)
Word 'line' used 1 time(s)
Word 'So' used 1 time(s)
Word 'program' used 1 time(s)
Word 'reads' used 1 time(s)
Word 'and' used 5 time(s)
... (and so on)
Then you do the job on the word tokens only. For example replacing one string with another:
# change one word (and to AND)
$words['and']['string'] = 'AND';
Finally you concatenate the tokens into a single string:
# output the whole text
foreach($tokens as $token) echo $token->string;
Which gives with the sample text again:
I need a systematic way of replacing each word in a string separately by providing my own input for each word. I want to
do this on the command line.
So the program reads in a string, AND asks me what I want to replace the first word with, AND then the second word, AND
then the third word, AND so on, until all words have been processed.
The sentences in the string have to remain well-formed, so the algorithm should take care not to mess up punctuation AND
spacing.
Is there a proper way to do this?
Job done. Ensure that word tokens are only replaced with valid word tokens, so tokenize the user-input as well and give errors if it's not a single word token (does not matches the word pattern).
It looks like quite simple when you know basics of commandline programming with PHP of which there are lots of tutorials.
In general a continuous loop which will keep you asking for words should be the basics. Then you do every loop just a: str_replace() that will do the basics you need.
Don't forget to implement a trick to break the loop, like typing exit or using some special command depending on your need.
I think it is not the idea to reply with a full code example here right? That would completely answer this question but also make it kinda like a script request?