将xml feed写入数据库,如何安全地删除旧记录并使用new更新?

I am writing information from an XML feed to a database for use on our site. We have found the xml feeds can be inconsistent, so writing info to the database has been a good solution for us.

Ideally I want to cron a file once a day that parses the xml and then writes it to the database. What methodology should I use to eliminate the data from the previous day because I no longer need it once we cron the file and update with the new daily records.

Bad:

cron file -> delete old records -> write new records

What if the xml is not quite right or there is a problem with the script? Then we blew away the data and can't get any new data at the moment.

If the XML info is bad, at least I can then write in some php on the front end to still display the older data but with dates modified or something.

What type of checks and fail safes would be best for my application? I need to update the records each day but only delete the old records if I know for sure we have good new data to import.

I would suggest a backup in the form of a mysql dump. Essentially, the dump is a snapshot of a database at a given time. So if you start the process and something goes wrong, you can revert it back to the point it was at before you started. The workflow would be something along the lines of:

Create dump -> try {Delete old records -> Create new records } catch (Load dump back into database)

If you are using mySQL more information on dumps can be found at: http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html

most other databases have some form of dump as well

Create a guid for your table by hashing a couple of the fields together - whichever ones are persistant between updates. For example, if you are updating inventory you might use the distributor and sku as the input for your guid.

Then when you update just use a mysql REPLACE query to exchange the old data for new data. REPLACE

Or use an INSERT...on duplicate key update

The nice thing about this is if your script fails for some reason you can safely run it again without getting extra rows pushed into your table.

If you are worried about bad XML data being pushed into your db just validate all your data before pushing it into your table and anything that shouldn't go just gets skipped.

You might want to take a sql backup at the beginning of the script - and if somehow your table gets really messed up you can always go back and restore to a safe backup.