关键字词组的关键字密度计算?

When attempting to calculate the keyword density of a single keyword in a string of content, the formula is pretty straightforward: kwd = (keyword count / total word count ) * 100

However, what should the formula be when we are looking for keyword density of a keyword phrase?

For example, how would you calculate the keyword density of the phrase "blue widgets" in the following string?

$myContent = "Blue widgets in a field
of widgets blue makes for lots of widgets, true. But
if a widget is blue, is it still a
\"blue widget\" or just a lone widget in a sea
of blue?";

Here's my current function

function my_keyword_density($post)
{
    $word_count =  my_word_count($post);
    $keyword_count = my_keyword_count($post);
    $density = ($keyword_count / $word_count) * 100;
    $density = number_format($density, 1);
return $density;
}

How can I get a count of the number of words in the keyword phrase?

you can try something like this:

$tot_words = str_word_count($myContent);
$keyword_count = preg_match_all("/\bblue widgets\b/msiU", $myContent, $res);
$kwd = ($keyword_count / $tot_words) * 100;

If you need to customize what is considereded a "word" you can add a parameter to the str_word_count function, see the manual page. The just add error checking where needed and it should work. About the formula, I'd use something like this:

    $search_words = str_word_count("blue widgets");
    $kwd = ($keyword_count / ($tot_words - (($keyword_count -1) * $search_words));

This way you'll handle all the multi word keyphrase as if it's single-worded. Hope it helps

Perhaps

kwd = (total word count / num key phrase occurances) / num words in key phrase

Your example seems to imply that you want to take each keyword into account as well as the keyword phrase. In that case, you might use a weighted formula:

kwd = α*kwd("blue widgets") + (1-α)*(kwd("blue")+kwd("widgets"))

α=1 gives the most conservative measure (only the phrase "blue widgets" is relevant),
α=0 gives the most liberal measure (both "blue" and "widgets" anywhere in the text are relevant).