Say I have a list of genres that looks something like:
$genres = array(
'soul',
'soul jazz',
'blues',
'jazz blues',
'rock',
'indie',
'cool jazz',
'rock-blues');
...And so on, for 762 values. How can I organize these genres into categories?
For example, I would want the Blues category to contain 'blues', 'jazz blues', and 'rock blues.' I would want the Jazz category to contain 'soul jazz', 'jazz blues', and 'cool jazz.'
Any and all help is appreciated.
Using preg_match
would be one the best ways to solve your problem.
<?php
$categories = array("blues", "jazz");
$genres = array("soul", "soul jazz", "blues", "jazz blues", "rock", "indie", "cool jazz", "rock-blues");
$arr = array();
$others = array();
foreach($genres as $genre){
$num = 0;
foreach($categories as $category){
if(preg_match("/\\b".$category."\\b/", $genre)){
$arr[$category][] = $genre;
$num = 1;
}
}
if($num == 0){
$others[] = $genre;
}
}
ksort($arr);
$arr["others"] = $others;
unset($genre, $num, $category, $others);
print_r($arr);
?>
The result will be:
Array
(
[blues] => Array
(
[0] => blues
[1] => jazz blues
[2] => rock-blues
)
[jazz] => Array
(
[0] => soul jazz
[1] => jazz blues
[2] => cool jazz
)
[others] => Array
(
[0] => soul
[1] => rock
[2] => indie
)
)
Given some seeds:
$seeds = array('blues','jazz',...);
Then just compute its nearest:
foreach($genres as $v) {
$similarity = 0;
$k = 0;
foreach($seeds as $kk=>$vv) {
$current = similar_text($v,$vv);
if ($current>$similarity) {
$similarity = $current;
$k=$kk;
}
}
$categories[$k][]=$v;
}
At this point you have your $geners
labled in $categories
;
Array
(
[blues] => Array
(
[0] => soul
[1] => blues
[2] => jazz blues
[3] => rock
[4] => indie
[5] => rock-blues
)
[jazz] => Array
(
[0] => soul jazz
[1] => cool jazz
)
)
Tested code at codepad: http://codepad.org/HCPcO4Iy
PS. clearly if you have those two seeds (blues and jeez) and then you have to cluster the genre "jeez blues" then it might be assigned to one or to the other without any logic