There is an XML file of data that is released every 10 minutes on a server that I will be using a cron job to parse and update on my site. I'll take the info, save it to a MySQL database and then display on my site. I have a question about best practice when doing this.
The file is about 200 - 300 KB so it's not very large but I had two ideas on how to do this:
1) Just use simplexml_load_file() to load the file and parse the info.
2) Use cURL to grab the file and save it to my server and then do the parse from my server locally.
I'm curious what best practice is and what would be the most efficient. With simplexml_load_file(), is the file loaded locally and then parsed or is loaded several times over as you go through the data? If it's just loaded once, I suppose that would be the best bet. One of my concerns is that I don't want to be bogging down the server that I'm grabbing the XML file from every time my cron job runs. I imagine it wouldn't since it's such a small file but I'm trying to just grab the file at the intervals and then do what needs to be done with the data in the best possible way.
I'm trying to wrap my head around how these functions work. Let me know if you need any more clarification on the question. I appreciate the help!
Both will work fine. Normally, on a file that small I'd probably do what you're doing now. That being said, if it is time-sensitive and on a cron job, I'd do something a little different.
Pull the file over to your server and save a hash value. If the new file has a different hash than the other, then parse, else rerun the script in 30 seconds. If that runs every 8-9 mins your good +/- 2 mins.
That way you don't run the risk having the cron run 30 secs early and fall 9:30 mins behind.
To answer your question, "With simplexml_load_file(), is the file loaded locally and then parsed or is loaded several times over as you go through the data?" Yes, it pulls it to your server once, then parses the xml.
Hope that helps. :)
Edit: For more a more in depth explanation of what is going on you can search "http stateless get request" It's a ton to get your head around and the more I lear the more questions I have, ;) but it'll explain what's going on when your script makes a request to GET the xml (or other MIME type) file from another server