Good evening, I have a csv file of 400mb and I have to load it in a mysql database. the csv file is "irregular" because it contains information such as:
user|email|password
user|password|otherdata
I have made a script in php-cli to read the file line by line and take the information I need: username and password. l 'username is always the first record. to know what 'is the password I look at the length of the string.
I run the script five hours ago and still has not finished loading all the data in the database.
how can I do to improve the performance of this script?
<?php
$fileHandle = fopen("C:/Users/AT/Documents/Backup/forumusers.csv", "r");
$mysqlHandle = mysql_connect("localhost", "root", "");
mysql_select_db("testbackupboard");
while(!feof($fileHandle))
{
$fileRow = fgets($fileHandle);
$line2Record = explode("|", $fileRow);
foreach ($line2Record as $rowRecord)
{
if (strlen($rowRecord) == 40)
{
$datatoMysql[0] = $rowRecord; // password hash
}
}
$datatoMysql[1] = $line2Record[0]; // username
$execQuery = mysql_query("INSERT INTO forumusers (username, hash) VALUES ('".mysql_real_escape_string(utf8_encode($datatoMysql[1]))."', '".mysql_real_escape_string(utf8_encode($datatoMysql[0]))."')");
if($execQuery)
{
print"Record ".$rowRecord[1]." ".$rowRecord[0]." loaded into db
";
}
else
{
die(mysql_error());
}
}
fclose($fileHandle);
?>
How about fgetcsv()?
$row = 1;
if (($handle = fopen("test.csv", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, "|")) !== FALSE) {
$num = count($data);
echo "<p> $num fields in line $row: <br /></p>
";
$row++;
for ($c=0; $c < $num; $c++) {
echo $data[$c] . "<br />
";
}
}
fclose($handle);
}
CREATE TABLE tmp
with 4 columns: id
AUTO_INCREMENT, user
, col2
, col3
. This is not the final table.LOAD DATA
for all the rowsCREATE TABLE final (user, email, password, otherdata, PRIMARY KEY(user))
INSERT INTO final (user, email, password, otherdata) SELECT a.user, a.col2 AS email, a.col3 AS password, b.col3 AS otherdata FROM tmp AS a JOIN tmp AS b ON b.id = a.id+1 WHERE (a.id % 2) = 1;
(Caveat: This will not work if
auto_increment_increment
is not 1, such as on a Galera-based cluster. Instead, you would need an@variable
to do the sequencing.)