需要算法和实现帮助 - 比较2个文件[关闭]

My Project Purpose : There are 4 files. Each of them has different number of lines. Each line consists of a single or few words. Now, for each of these files, I want to count which other file has the maximum common words.

  • eg. (',' is a new line )
  • Input:
  • File 1 Lines : A,B,C,D
  • File 2 Lines : C,D,E,F
  • File 3 Lines : A,E,C,G
  • File 4 Lines : C,E,F,A

  • Output :

  • File 1 : Maximum common words is 2 and they are in the files : File 2 (C,D), File 3 (A,C) and File 4 (C,A).
  • File 2 : Maximum common words is 3 and they are in the files : File 4 (C,E,F).
  • File 3 : Maximum common words is 3 and they are in the files : File 4 (C,E,A).
  • File 4 : Maximum common words is 3 and they are in the files : File 2 (C,E,F).

My logic :

  1. Start
  2. Read each line from file and store it in memory as a 1-D array (eg. array1[0] = "A", array1[1] = "B" and so on.
  3. Since there are 4 files, I create 4 arrays = array1 to array4. Each of them will have the contents of their corresponding files.
  4. Now I will compare the first words in the first array with the first word in the second array.
  5. Now I will compare the first words in the first array with the second word in the second array and so on till the end on second array.
  6. I will continue this till the last word in the last array.
  7. When ever I found something was matching I will note down in a variable by incrementing 1.

Wish to know if this is the right of approaching this problem.

Or is there a better way to think at this problem?

Edits : 1. Forgot to add, Will be using php.

Should be easily done with array_intersect.

You should sort the arrays first. Then, to count the number of common lines between array1 and array2, have two counters i1 and i2.

Pseudo code:

while(i1 < array.length && i2 < array2.length)
  if array1[i1] == array2[i2]
    ++i1; ++i2
    ++result
  else if array1[i1] < array2[i2]
    ++i1
  else
    ++i2

I learned PHP from interesting situations like this. Keep on learning.

// put all files in same directory as this script
// put file names in this array
$files = array('1.txt','2.txt','3.txt','4.txt');
$words = array();
$data = '';

$delimiter = "
";  // change this to  if running windows OS
// itterate through the files and create a word list  
foreach($files as $file){
    $fh = fopen($file,'r');
    $data .= $delimiter.fread($fh,filesize($file));
    fclose($fh);
}
// assuming 1 match per line like your question example 
$lines = explode($delimiter,$data);

foreach($lines as $line){
    $line = trim($line);
    if(empty($line)) continue;
    @$words[$line] += 1;  // @ suppreses notices
}

var_dump($words);
/* *
 * according to your example:
 *
array(7) {
  ["A"]=>
  int(3)
  ["B"]=>
  int(1)
  ["C"]=>
  int(4)
  ["D"]=>
  int(2)
  ["E"]=>
  int(3)
  ["F"]=>
  int(2)
  ["G"]=>
  int(1)
} 
*/