正确使用Factory方法/模式和动态加载

I have created a library to parse different type of data. This is how the library is used.

$parser = DataParser::factory($text);
$data = $parser->parse();
foreach($data as $name=>$datum)
    echo "name: $name
Data: ".$datum."
";

Here $data is an instance of NamedDataCollection. And here are the class definitions

interface Parsable{
    // @returns NamedDataCollection
    function parse();
    // @returns bool. True if it can parse the data.
    function can_parse();
}

class AParser implements Parsable{
    public function parse(){
       $this->check_header();
       $this->check_content();
       $this->parse_content();
       ...
    }
    public function can_parse(){
       $this->check_header();
       $this->check_content();
       ...
    }
}

class BParser implements Parsable{
    public function parse(){
       ...
    }
    public function can_parse(){
       ...
    }
}

class ParserRegistry{
    // @param $dir path where all the parsers are stored
    public function __construct($dir){
    }
    // @returns all the available parsers by scanning files. 
    public function get(){
    }
}

class DataParser{
    public function factory($data){
       $instance = null;
       $pr= new ParserRegistry("/some/path");
       foreach($pr->get() as $pname){
           $rc = new ReflectionClass($pname);
           $instance = $rc->newInstance($data);
           if($instance->can_parse())
               return $instance;
       }
       throw new ParserException("No parser found", 1);
    }
}

Is it proper factory method usage?

One thing that is bugging me is the can_parse method. It seems there are lots of instances of parsers are created just to test if it can parse the text. Is there any better way to do it? I know I can use static method. But most parsers uses some parts of parse method (see AParser definition). If I make it static it can not use those methods.

Note: The example code represents the pattern I am following. But the class names and the scenario are just example. Providing a ready-made solution for the example will not solve my real problem.

As you write yourself that you don't want to instantiate (load definitions of) all parser classes just to find out which one to take, you need to change your design.

First of all I highly suggest you let your parsers just do the parsing - and not do the parsing and the sensing of data. One object, one job that is. Keep it simple.

What actually does the data type sniffing (to say which parser do use) is within the factory. That is probably a bit much for the factory method. Even it needs to decide which concrete classname to instantiate, it must not mean that it needs to encapsulate the whole logic for that. Also it is likely that this will change over time because you will add parsers over time.

Therefore the next improvement is probably to have a factory method object instead of a factory method. That is, you instantiate the factory which allows you to replace it over time with new factory types.

You should be able to do this step by step.

  1. Remove the can_parse method from the interfaces. If a parser want's to use it internally, make it private. But it does not belong into the interface (enforced as abstract) of any concrete Parser.
  2. Make DataParser and actual ParserFactory object. Instantiate it and pass it along where it is needed. Rename the factory method to makeParser so it's more clear what is happening.
  3. Turn that factory object into an interface and rename the class into a concrete name implementing that interface.

You can now re-implement the ParserFactory::makeParser method as needed over time. Just create a new factory object if you need it. If that changes often, you might want to make the decision injectable to the factory as well but I would defer that step. What you do right now might just be the best way, I would first of all only move it out of the parsers. E.g. test against a list of potential classes and catching any ParseExceptions that signal that a parser can not (successfully) parse the data. Keep your software working, just put the things a bit apart from each other. Afterwards you can create a second factory object that does things differently, all you need to do is to change the object, the rest of your application (read: all existing parsers) can stay and don't need no changes because of that.

This should leave you all the freedom you need to experiment which is the best way to decide how to instantiate (make) a concrete parser.

Generally speaking it looks correct. Factory produces an object based on input. I'm not sure about can_parse(). I thought the decision should be made by factory instead of delegating it to particular objects.

Have a look here http://sourcemaking.com/design_patterns/factory_method