I've got a rather large PHP web app which gets its products from numerous others suppliers through their API's, usually responding with a large XML to parse. Currently there are 20 suppliers but this is due to rise even further.
Our current set up uses multi curl to make the requests and this takes about 30-40 seconds to complete and is too long. The script runs in the background whilst the front end polls the database looking for results and then displays them as they come in.
To improve this process we were thinking of using a job server to run in the background, each supplier request being a separate job. We've seen beanstalkd and Gearman being mentioned.
So are we looking in the right direction, as in, is a job server the right way to go? We're looking at doing some promotion soon so we may get 200+ users searching 30 suppliers at the same time so the right choice needs to scale well if we have to load balance.
Any advice is great fully received.
You can use Beanstalkd, as you can customize the priority of jobs and the TTR time-to-resolve, default is 60 seconds, but for your scenario you must increase it. There is a nice admin console panel for Beanstalkd.
You should also leverage the multi Curl calls, so you should use parallel requests. In order to make use of Keep-alive you also need to maintain a pool of CURL handles and keep them warm. See high performance curl tips. You also need to tune Linux network stack.
If you run this in cloud, make sure you use multiple micro machines rather than one heavy machine as the throughput is better when you have multiple resources available.