My goal is to with PHP, remove the entire row of a CSV-file if duplicate values appears in a certain column, in this example ID-column. I naturally want to keep the first row where the duplicated ID appears (see example below).
I don't want to create a new CSV-file, I want to open the file, remove what needs to be removed, and overwrite the current file.
I also want to store how many rows that has been removed in variable.
ID,Date,Name,Age
1,12/3/13,John Doe ,23
2,12/3/19,Jane Doe ,21
3,12/4/19,Jane Doe ,19
3,12/3/18,John Doe ,33
4,12/3/19,Jane Doe ,21
ID,Date,Name,Age
1,12/3/13,John Doe ,23
2,12/3/19,Jane Doe ,21
3,12/4/19,Jane Doe ,19
4,12/3/19,Jane Doe ,21
And then also be able to: echo $removedRows;
that will output: 1 How to accomplish this?
I've managed to get this in a new file, but I just want to overwrite the current file and i dont know why i got the " " around name column:
ID,Date,Name,Age
1,12/3/13,"John Doe ",23
2,12/3/19,"Jane Doe ",21
3,12/4/19,"Jane Doe ",19
4,12/3/19,"Jane Doe ",21
With the following code:
$input_filename = 'file.csv';
// Move the csv-file to 'newfile' directory
copy($input_filename, 'newfile/'.$input_filename);
$output_filename = 'newfile/'.$input_filename;
$input_file = fopen($input_filename, 'r');
$output_file = fopen($output_filename, 'w');
$IDs = array();
// Read the header
$headers = fgetcsv($input_file, 1000);
fputcsv($output_file, $headers);
// Flip it so it becomes name => ID
$headers = array_flip($headers);
// Read every row
while (($row = fgetcsv($input_file, 1000)) !== FALSE)
{
$ID = $row[$headers['ID']];
// Do we already have this ID?
if (isset($IDs[$ID]))
continue;
// Mark this ID as being found
$IDs[$ID] = true;
// Write it to the output
fputcsv($output_file, $row);
}
Because you cannot read from a file and write to it at the same time, I advise you to write your data to another file and then move this file to your source file, something like:
$input_filename = 'file.csv';
$output_filename = 'newfile/' . $input_filename;
// Copy the csv-file to 'newfile' directory
copy($input_filename, $output_filename);
$input_file = fopen($input_filename, 'r');
$output_file = fopen($output_filename, 'w');
$IDs = array();
// Read the header
$headers = fgetcsv($input_file, 1000);
fputcsv($output_file, $headers);
// Flip it so it becomes name => ID
$headers = array_flip($headers);
// Deleted rows counter
$rows_deleted = 0;
// Read every row
while (($row = fgetcsv($input_file, 1000)) !== FALSE)
{
$ID = $row[$headers['ID']];
// Do we already have this ID?
if (isset($IDs[$ID])) {
// row skipped - therefore it is deleted
$rows_deleted++;
continue;
}
// Mark this ID as being found
$IDs[$ID] = true;
// Write it to the output
fputcsv($output_file, $row);
}
// Now we should move output file to input one
rename($output_filename, $input_filename);
echo "Deleted: " . $rows_deleted;
As for "
around your data - this is a result of fputcsv
. This is for safety reasons. Imagine if your data will be not
3,12/4/19,Jane Doe ,19
but
3,12/4/19,Jane, Doe ,19
And you want to consider Jane, Doe
as one element. That's why with "
it's more clear for a parser how to treat your rows:
3,12/4/19,"Jane, Doe ",19 // here `Jane, Doe` is one element
Usually, wrapping data in "
does not affect parsing generated csv. But if you're sure that you don't want quotes - you can pass more arguments to fputcsv
, though I'm not sure if it will work with empty value as enclosure
argument.