PHP替换我的文件中的常用单词

I've tried to make a tool in which you input a website and when you click the submit button it cURLS all the text.

After all the cURLing, stripping it from tags, and counting the words. It's eventually an array named $frequency. If I echo it using <pre> tags it will show me everything just fine! (NOTE: I'm placing the contents in a file, $homepage = file_get_contents($file); and this is what I work with in my code, I don't know if this matters or not)

However i don't really care if the word or is seen 200 times in a website, I only want the important words. So i have made an array with all the common words. Which is set eventually in the $common_words variable. But i can't seem to find a way to replace all words found in the $frequency to replace them with "" if they are found in the $common_words as well.

I've found this piece of code after some research:

$string = 'sand band or nor and where whereabouts foo';
$wordlist = array("or", "and", "where");

foreach ($wordlist as &$word) {
    $word = '/\b' . preg_quote($word, '/') . '\b/';
}

$string = preg_replace($wordlist, '', $string);
var_dump($string);

If I copy paste this it works fine, removing the or, and, where from the string. But replacing $string with $frequency or replacing $wordlist with $common_words will either not work or throw me an error like: Delimiter must not be alphanumeric or backslash

I hope i've formulated my question properly, if not. Please tell me!

Thanks in advance

EDIT: Alright, i've narrowed down the problem alot. First of all i forgot the & inside the foreach ($wordlist as &$word) {

But as it was counting all the words, the words it has replaced are all still counted. See those 2 screenshots to see what I mean: http://imgur.com/oqqZR3h,xHEZKRz#0

If I understand this correctly you wan't to know how many occurrences each word has by ignoring the so called common words.

Assuming that $url is the page you will be running against and $common_words is your common words array, here is what you can do:

// Get the page content's and strip the html tags
$contents = strip_tags( file_get_contents($url) );

// This will split the words from the contents, creating an array with each word in it
preg_match_all("/([\w]+[']?[\w]*)\W/", $contents, $words);

$common_words = array('or', 'and', 'I', 'where');

$frequency = array();

// Count occurrences
$frequency = array_count_values($words[0]);
unset($words); // Release all that memory

var_dump($frequency);

At this point you will have an associative array with each not common word and a count showing the number of occurrences of the given word.

UPDATE

A bit more about the RegEx. We need to match word. The easiest way possible is: (\w+). But that won't match words like I've or haven't (Notice the '). That was my point of making it more complicated. Also, \w doesn't support dashes for words like in 6-year-old.

So I created a subgroup which should match words characters including dashed and single quotes in a word.

(?:\w'|\w|-)

The ?: part on the beginning is do not match or do not include in the results. That is since all I am doing is grouping the options for word contents. To mach an entire word the RegEx will match one or more of the subgroup above:

((?:\w'\w|\w|-)+)

So the RegEx preg_match_all() line should be:

preg_match_all("/((?:\w'\w|\w|-)+)/", $contents, $words);

Hope this helps.

I had changed $wordlist with $mywordlist. still its working!

<?php
$string = 'sand band or nor and where whereabouts foo';
$wordlist = array("or", "and", "where");
$mywordlist=array("sand","band");
foreach ($mywordlist as &$word) {
    $word = '/\b' . preg_quote($word, '/') . '\b/';
}

$string = preg_replace($mywordlist, '', $string);
var_dump($string);
?>

I suppose you can do simply like this:

$common_words = "foo baq etc etc";

$str = "foo bar baz"; // input

foreach (explode(" ", $common_words) as $word){
   $str = strtr($str, $word, "");
}