My Project Purpose : There are 4 files. Each of them has different number of lines. Each line consists of a single or few words. Now, for each of these files, I want to count which other file has the maximum common words.
File 4 Lines : C,E,F,A
Output :
My logic :
Wish to know if this is the right of approaching this problem.
Or is there a better way to think at this problem?
Edits : 1. Forgot to add, Will be using php.
Should be easily done with array_intersect.
You should sort the arrays first. Then, to count the number of common lines between array1
and array2
, have two counters i1
and i2
.
Pseudo code:
while(i1 < array.length && i2 < array2.length)
if array1[i1] == array2[i2]
++i1; ++i2
++result
else if array1[i1] < array2[i2]
++i1
else
++i2
I learned PHP from interesting situations like this. Keep on learning.
// put all files in same directory as this script
// put file names in this array
$files = array('1.txt','2.txt','3.txt','4.txt');
$words = array();
$data = '';
$delimiter = "
"; // change this to if running windows OS
// itterate through the files and create a word list
foreach($files as $file){
$fh = fopen($file,'r');
$data .= $delimiter.fread($fh,filesize($file));
fclose($fh);
}
// assuming 1 match per line like your question example
$lines = explode($delimiter,$data);
foreach($lines as $line){
$line = trim($line);
if(empty($line)) continue;
@$words[$line] += 1; // @ suppreses notices
}
var_dump($words);
/* *
* according to your example:
*
array(7) {
["A"]=>
int(3)
["B"]=>
int(1)
["C"]=>
int(4)
["D"]=>
int(2)
["E"]=>
int(3)
["F"]=>
int(2)
["G"]=>
int(1)
}
*/