I am trying to do the following.
I am consulting an external database using a web service. What the web service does is bring me all the products from an ERP system my client uses. As the server and the connection are not really fast, what I decided to do is basically synchronize the database on my web server and handle most operations there, so that the website can run smoothly.
Everything works fine I just need one last step to guarantee that the inventory on the website matches the one available on the ERP. The only issue comes when they (the client) deletes something on the ERP system.
At the moment I am thinking what would be the ideal strategy (least resource and time consuming) to remove products from my Products table if I don't receive them in the web service result.
So I basically have the following process:
foreach
cycle and have the following subprocess.product_id
is present.So, I was thinking of doing the following, but I do not think it's the ideal way:
SELECT * FROM Products
and generate an array that has all the products.foreach
cycle in the resulting array and in each cycle scan the ERP array to check if the specific product exists. If not I delete it, if yes, I continue with the next product.Now considering that after all the previous steps this would involve a couple of nested foreach
I am a little worried that it might consume too much memory and also take longer to process.
I was thinking that maybe something like array_diff
or array map
could solve the issue, but I am not really experienced with these functions, and the structure of the two arrays differs a lot, so I am not sure if it would work that easily.
What would you guys recommend?
It's actually quite simple:
SELECT id FROM Products
Then you have an array of your product Ids, for example:
[123,5679,345]
Then as you go and do your updates or inserts, remove the id from the array.
[for updates]I query my database to check if product_id is present.
This is redundant now.
There are a few ways to remove the value from the array (when you do an update), this is the way I would probably do it.
if(false !== ($index = array_search($data['product_id'],$myids))){
//note the !== type comparison because array_search can return 0 for the first index, we must check for boolean false.
//find the index of the product id in our list of id's from local DB
unset($myids[$index]);
//If our incoming product_id is in the local list we Do Update
}else{
//Otherwise we Do Insert
}
As I mentioned above when doing your updates/inserts, You no longer have to check if the ID exists, because you already know this by having an array of IDs from the database. This alone saves you (n) queries (apx 600).
Then its very simple if you have ids left over.
//I wouldn't normally concatenate variables into SQL, in this case it's a list of int IDs from the database.
//you can of course come up with a loop to make it a prepared statement if you wish, but for the sake of simplistically, I'll leave that as an exercise for another day..
'DELETE FROM Products WHERE id IN('.implode(',', $myids).')'
And because you unset these when Updating, then the only thing left is Products that no longer exist.
Conclusion:
You have no choice (other then doing on duplicate key query, or ignoring exceptions) then to pull out the product Ids. You're already doing this on a row by row basis. So we can effectively kill 2 birds with one stone.
If you need more data then just the ID, for example you check that the product was changed before doing an update. Then pull that data out, but I would recommend using PDO and the FETCH_GROUP option. I wont go into the specifics of that but to say it lets you easily build your array this way:
[{product_id} => [ {product_name}, {product_price} etc..]];
Basically the product_id, is the key with a nested array of the row data, this will make lookup easier.
This way you can look it up like this.
//then instead of array_search
//if(false !== ($index = array_search($data['product_id'],$myids))){
if(isset($myids[$data['product_id']])){
unset($myids[$data['product_id']]);
//do your checks, then your update
}else{
//do inserts
}
References:
http://php.net/manual/en/function.array-search.php
array_search — Searches the array for a given value and returns the first corresponding key if successful
WARNING This function may return Boolean FALSE, but may also return a non-Boolean value which evaluates to FALSE. Please read the section on Booleans for more information. Use the === operator for testing the return value of this function.
UPDATE
There is one other really good way to do this, and that is to add a field called sync_date
, now when you do your insert or update then set the sync_date
to the current data.
This way when you are done, those products with an older sync date then today can be deleted. In this case it's best to cache the time when doing it so you know the exact time.
$time = data('Y-m-d H:i:s'); //or time() if you prefer timestamp
//use this same variable for the whole coarse of the script.
Then you can do
'DELETE from products WHERE sync_time != $time'
This may actually be a bit better because it has more utility. When was the last time it was ran, Now you know.