I have a big data set into MySQL (users, companies, contacts)? about 1 million records.
And now I need to make import new users, companies, contacts from import file (csv) with about 100000 records. I records from file has all info for all three essences (user, company, contacts). Moreover on production i can't use LOAD DATA (just do not have so many rights :( ).
So there are three steps which should be applied to that data set. - compare with existing DB data - update it (if we will find something on previous step) - and insert new, records
I'm using php on server for doing that. I can see two approaches:
which approach is more efficient ? by CPU, memory or time usage
Can I use transactions ? or it will slow down whole production system ?
Thanks.
CPU time/time there won't be much in it, although reading the whole file will be slightly faster. However, for such a large data set, the additional memory required to read all records into memory will vastly outstrip the time advantage - I would definitely process one line at a time.
Did you know that phpMyAdmin has that nifty feature of "resumable import" for big SQL files ?
Just check "Allow interrupt of import" in the Partial Import section. And voila, PhpMyAdmin will stop and loop until all requests are executed.
It may be more efficient to just "use the tool" rather than "reinvent the wheel"
I think, 2nd approach is more acceptable: