最优化/最有效的方式来更新数据库

I have a data set with more than 10000 (this will be more in future) records as below:

[[name=>'name1',url=>'url1', visit=>120],
[name=>'name2',url=>'url2'], visit=>250,
..........
]

It is possible to have duplicate values for the key combination name,url. In such situations I need to get the sum of each records have the duplicate name,url.

Finally I want insert this values into a database. When I do this I have two method to do this:

  1. Create another array with unique combination (name,url) and sum of visit
  2. Update/insert db for each record in a loop.

What is the optimal solution to do this or is there better way to do this?

I know there will be memory issues for a large data set in the first method. In second method there are many db hits and I need to know the disadvantage(s) if I follow 2nd way.

Any help or insight would be appreciated.

I do some big database update like this myself and spent ages trying different solutions.

Instead of:

  1. Check if record exists, eg select count(id) from data where name='name' and url='url'
  2. Not found, insert record
  3. Found, sum result

I would try this

  1. Set the unique primary keys on your data table on the url and name field.
  2. Try to do a normal insert and see if you get a successful result.
  3. On unsuccessful result (there already is value for name and url because these 2 fields must be unique), sum the result.