使用PHP,如果ID列重复,如何删除CSV行?

My goal is to with PHP, remove the entire row of a CSV-file if duplicate values appears in a certain column, in this example ID-column. I naturally want to keep the first row where the duplicated ID appears (see example below).

I don't want to create a new CSV-file, I want to open the file, remove what needs to be removed, and overwrite the current file.

I also want to store how many rows that has been removed in variable.

Input (notice duplicate ID of 3): file.csv

ID,Date,Name,Age
1,12/3/13,John Doe ,23
2,12/3/19,Jane Doe ,21
3,12/4/19,Jane Doe ,19
3,12/3/18,John Doe  ,33
4,12/3/19,Jane Doe ,21

Expected output: file.csv

ID,Date,Name,Age
1,12/3/13,John Doe ,23
2,12/3/19,Jane Doe ,21
3,12/4/19,Jane Doe ,19
4,12/3/19,Jane Doe ,21

And then also be able to: echo $removedRows; that will output: 1 How to accomplish this?


I've managed to get this in a new file, but I just want to overwrite the current file and i dont know why i got the " " around name column:

ID,Date,Name,Age
1,12/3/13,"John Doe ",23
2,12/3/19,"Jane Doe ",21
3,12/4/19,"Jane Doe ",19
4,12/3/19,"Jane Doe ",21

With the following code:

$input_filename = 'file.csv';

// Move the csv-file to 'newfile' directory
copy($input_filename, 'newfile/'.$input_filename);

$output_filename = 'newfile/'.$input_filename;

$input_file = fopen($input_filename, 'r');
$output_file = fopen($output_filename, 'w');

$IDs = array();

// Read the header
$headers = fgetcsv($input_file, 1000);
fputcsv($output_file, $headers);

// Flip it so it becomes name => ID
$headers = array_flip($headers);

// Read every row
while (($row = fgetcsv($input_file, 1000)) !== FALSE)
{
    $ID = $row[$headers['ID']];
    // Do we already have this ID?
    if (isset($IDs[$ID]))
        continue;

    // Mark this ID as being found
    $IDs[$ID] = true;
    // Write it to the output
    fputcsv($output_file, $row);
}

Because you cannot read from a file and write to it at the same time, I advise you to write your data to another file and then move this file to your source file, something like:

$input_filename = 'file.csv';
$output_filename = 'newfile/' . $input_filename;

// Copy the csv-file to 'newfile' directory
copy($input_filename, $output_filename);
$input_file = fopen($input_filename, 'r');
$output_file = fopen($output_filename, 'w');

$IDs = array();

// Read the header
$headers = fgetcsv($input_file, 1000);
fputcsv($output_file, $headers);

// Flip it so it becomes name => ID
$headers = array_flip($headers);

// Deleted rows counter
$rows_deleted = 0;
// Read every row
while (($row = fgetcsv($input_file, 1000)) !== FALSE)
{
    $ID = $row[$headers['ID']];
    // Do we already have this ID?
    if (isset($IDs[$ID])) {
        // row skipped - therefore it is deleted
        $rows_deleted++;
        continue;
    }

    // Mark this ID as being found
    $IDs[$ID] = true;
    // Write it to the output
    fputcsv($output_file, $row);
}

// Now we should move output file to input one
rename($output_filename, $input_filename);

echo "Deleted: " . $rows_deleted;

As for " around your data - this is a result of fputcsv. This is for safety reasons. Imagine if your data will be not

3,12/4/19,Jane Doe ,19

but

3,12/4/19,Jane, Doe ,19

And you want to consider Jane, Doe as one element. That's why with " it's more clear for a parser how to treat your rows:

3,12/4/19,"Jane, Doe ",19    // here `Jane, Doe` is one element

Usually, wrapping data in " does not affect parsing generated csv. But if you're sure that you don't want quotes - you can pass more arguments to fputcsv, though I'm not sure if it will work with empty value as enclosure argument.