如何加速/分解多个部分的过程。 Rss,Curl,PHP

Im experimenting with some RSS reader/fetcher im writing at the moment. Everything is going smoothly except 1 thing. It's terribly slow.

Let me explain:

  1. I fetch the list of RSS feeds from the database
  2. I iterate every feed from this list, open it with cURL and parse it with SimpleXMLElement
  3. I check descriptions and title's of these feeds with a given keyword, to see if its already in database or not.
  4. If its not i add it to database.

For now i am looping through 11 feeds. Which gives me a page loading time of 18 seconds. This is without updating the database. When there are some new articles found, it goes up to 22 seconds (on localhost).

On a live webserver, my guess is that this will be even slower, and maybe goes beyond the limit php is setup to.

So my question is, what are your suggestions to improve speed.. and if this is not possible, whats the best way to break this down into multiples executions, like say 2 feeds at a time? I'd like to keep it all automated, dont want to click after every 2 feeds.

Hope you guys have some good suggestions for me!

If you want some code example let me know and ill paste some

Thanks!

I would suggest you use a cronjob or a daemon that automatically synchronizes the feeds with your database by running a php script. That would remove the delay from the user's perspective. Run it like every hour or whatever suits you.

Though first, you should possibly try and figure out which parts of the process are actually slow. Without the code it's hard to tell what could be wrong.

Possible issues could be:

  • The remote servers(which store the feeds) are slow
  • Your local server's internet connection
  • Your server's hardware
  • And obviously the code

Here are some suggestions.

  • First, separate the data fetching and crunching from displaying web pages to the user. You can do this by putting the fetching and crunching part by setting up a script that is executed in a CRON job or that exists as a daemon (i.e. runs continuously.)
  • Second, you can set some sensible time limit between feed fetches, so that your script does not have to loop through every feed each time.
  • Third, you should probably look into using a feed parsing library, like MagpieRSS, rather than SimpleXML.