tb_content
(left) and tb_word
(right) :
===================================== ================================
|id|sentence |sentence_id|content_id| |id|word|sentence_id|content_id|
===================================== ================================
| 1|sentence1| 0 | 1 | | 1| a | 0 | 1 |
| 2|sentence2| 1 | 1 | | 2| b | 0 | 1 |
| 3|sentence5| 0 | 2 | | 3| c | 1 | 1 |
| 4|sentence6| 1 | 2 | | 4| a | 1 | 1 |
| 5|sentence7| 2 | 2 | | 5| e | 1 | 1 |
===================================== | 6| f | 0 | 2 |
| 7| g | 1 | 2 |
| 8| h | 1 | 2 |
| 9| i | 1 | 2 |
|10| f | 2 | 2 |
|11| h | 2 | 2 |
|12| f | 2 | 2 |
================================
I need to check if every sentence consist of words that owned by other sentences in every content_id
.
for example :
Check for the content_id
= 1
they are sentence1
and sentence2
. from tb_word
, we can see that sentence1
and sentence2
consist of the same word a
. if the number of a
in two sentences is >=2
, then a
will be the result. So if I print the result, it must be : 00Array ( [0] => a [1] => b) 01Array ( [3] => a ) 10Array ( [3] => a )11Array ( [0] => c [1] => a [2] => e)
where 00
means sentence_id
= 0
and sentence_id
= 0
first, I make functionTotal
to count how many sentence
that owned by every content_id
:
$total = array();
$sql = mysql_query('select content_id, count(*) as RowAmount
from tb_content Group By contente_id') or die(mysql_error());
while ($row = mysql_fetch_array($sql)) {
$total[] = $row['RowAmount'];
}
return $total;
From that function I get the value of $total
and from that I need to check the similarity of some words (from tb_word
) between all the possibilities of 2 sentence
foreach ($total as $content_id => $totals){
for ($x=0; $x <= ($totals-1); $x++) {
for ($y=0; $y <= ($totals-1); $y++) {
$shared = getShared($x, $y);
}
}
the function of getShared
is :
function getShared ($x, $y){
$token = array();
$shared = array();
$i = 0;
if ($x == $y) {
$query = mysql_query("SELECT word FROM `tb_word`
WHERE sentence_id ='$x' ");
while ($row = mysql_fetch_array($query)) {
$shared[$i] = $row['word'];
$i++;
}
} else {
$query = mysql_query("SELECT word, count(word) as jml
FROM `tb_word` WHERE sentence_id ='$x'
OR sentence_id ='$y'
GROUP BY word ");
while ($row = mysql_fetch_array($query)) {
$jml = $row['jml'];
$token[$i] = $row['word'];
if ($jml >= 2) {
$shared[$i] = $token[$i];
}
$i++;
}
But the result I get is still wrong. the result still mix between different content_id
. the result must be group by content_id
also. sorry for my bad english and my bad explanation. cmiiw, please help me.. thank you :)
This one can be actually done by DBMS itself, two steps in one query. First, you make a self join in order to prepare sentence combinations within the same content:
SELECT a.content_id,
a.sentence_id AS sentence_id_1,
b.sentence_id AS sentence_id_2
FROM tb_content AS a
JOIN tb_content AS b
ON ( a.content_id = b.content_id
AND a.sentence_id <= b.sentence_id )
The "<=" will keep same sentence joins, like "1-1" or "2-2", and yet avoid bidirectional repetitions, like "1-2" and "2-1". Next you can join the above result with words and count the number of occurances. Like that:
SELECT s.content_id,
s.sentence_id_1,
s.sentence_id_2,
c.word,
Count(*) AS jml
FROM (SELECT a.content_id,
a.sentence_id AS sentence_id_1,
b.sentence_id AS sentence_id_2
FROM tb_content AS a
JOIN tb_content AS b
ON ( a.content_id = b.content_id
AND a.sentence_id <= b.sentence_id )) AS s
JOIN tb_word AS c
ON ( s.content_id = c.content_id
AND ( c.sentence_id = s.sentence_id_1
OR c.sentence_id = s.sentence_id_2 ) )
GROUP BY s.content_id,
s.sentence_id_1,
s.sentence_id_2,
c.word
HAVING Count(*) >= 2;
The result of the above query will give you the container, sentences 1 and 2, the word, and the number of occurances (which is 2 or more). All you need now is collecting the result into the array which as I see you already know to do.
Let me know, if I missunderstood your goal.
How about simply SELECT content_id, word, COUNT(*) as num_appearing FROM tb_word GROUP BY content_id, word
?
EDIT: I see the complexity now: your main issue is that the getShared()
function has two sentence IDs passed to it, but no content_id
to know which content is being analyzed. You're also assuming that content_id
and sentence_id
numbers are consecutive and start at zero. My code doesn't assume that, and pulls those IDs directly from the database.
<?php
$rs = mysql_query("SELECT * FROM tb_content");
$content = array();
while ($row = mysql_fetch_assoc($rs)) {
if (!isset($content[$row['content_id']])) $content[$row['content_id']] = array();
$content[$row['content_id']][] = $row['sentence_id'];
}
foreach($content as $content_id => $sentences) {
foreach($sentences as $sentence_id) {
foreach($sentences as $compare) {
$shared = getShared($content_id, $sentence_id, $compare);
}
}
}
function getShared($cid, $s1, $s2) {
$rs = mysql_query("SELECT `word`, COUNT(*) AS 'num' FROM `tb_word` WHERE `content_id`={$cid} AND `sentence_id` IN ({$s1}, {$s2}) GROUP BY `word`");
$out = array();
while ($row = mysql_fetch_assoc($rs)) {
if ($rs['num'] >= 2) $out[$rs['word']] = $rs['num'];
}
return $out;
}