需要有关在PHP中扩展的指导

I have a class which makes use of regular expression for Natural Language Processing and the time spend processing the large amount of data it is fed does not look promising.

I'm looking into having it scaled out, have the means of doing things in parallel, which I have yet to have any experience of.

I was hoping someone could explain what I am trying to get myself into, pros and cons of doing this in php. Also if you could provide good resources on scaling in general or much better scaling in php. Thanks.

EDIT:

foreach ($sentences as $sentence) { 
  // for each sentence check if a keyword or any of its synonyms
  // appear together with any sentiment applicable to the keyword
  foreach ($this->keywords as $keyword => $synonyms) {              
    foreach ($this->sentiments[$keyword] as $sentiment => $weight) {
      $match = $this->check($sentence, $synonyms, $sentiment);
    }
  }
}

// regex part of the code
$keywords = implode('|', $keywords);
$pattern = "/(\b$sentiment\b(.*|\s)\b($keywords)\b|\b($keywords)\b(.*|\s)\b$sentiment\b)/i";

preg_match_all($pattern, $sentence, $matches);

PHP may not be a great choice for that type of application. Its a rather high level language and with it comes overhead that may slow down any significant processing.

Now if you want to stick to PHP, you can do it with some sort of job managing application. There may already be some applications you could use like gearman, or even hadoop. You break your data down into chunks and feed it to the application. With those tools you can scale your processing over one or more servers.

If you use Amazon web services, you may want to look at Elastic Map Reduce and see if it fits your needs.

Apache Hadoop Map Reduce jobs are a neat fit for this kind of work. It's a little more effort from the beginning, but I think you'll find it to be a good solution. With Hadoop, you can easily run your computation on 1 node or 30 nodes.