向注入的服务添加其他参数,使其准备好工作

I'm bulding a crawler finding documents from one system and store them for later synchronization. The crawler is built by the ServiceContainer which resolves all dependencies in the constructor of the crawler. Maybe my understanding or/ and approach for my architecture is completely wrong but I will try to describe my structure. My so called "services" are actually just classes registered as singletons in the ServiceContainer. This decision was made to make sure "separation of concerns" is not violated.

So, the crawler depends on:

  • a SOAP-service for connecting into the source system
  • a "NodeService" to tell the crawler if the returned data (node) from SOAP is either a document or a directory (there are more types possible that's why an explicit comparison is necessary here)
  • a "DocumentService" which does the work on any returned document.

The "DocumentService" can tell the crawler if the document is new, or if any data has changed since the last time it has been seen.

Therefor the "DocumentService" again depends on two services "MetaDataService" and "DocumentVersionService".

The crawl-method of the crawler runs recursively and in each iteration the returned "node" is passed to "NodeService" and "DocumentService" which were injected to the crawlers constructor.

// CRAWLER
function __construct(
        SOAPService $css,
        NodeService $ncs,
        DocumentService $ds
    ) {
        $this->css = $css;
        $this->node = $ncs;
        $this->document = $ds;
    }

    public function init($entryPoint)
    {
        $this->setEntryPoint($entryPoint);
        $this->setDocType();
        if ($this->isNotRunning())
        {
            $this->currentProcess = Crawl::create(['doc_type' => $this->docType]);
            $this->processingTime = microtime(true);
            Document::where('document_type', $this->docType)->update(['seen' => 0]);
        }
        else
        {
            throw new CrawlerAlreadyRunningException();
        }
    }

    public function crawlDirectory($path = array())
    {
        if (empty($path)) {
            $path[] = $this->docType;
        }
        $children = $this->css->getChildNodes($this->entryPoint);
        foreach ($children as $child)
        {
            $this->node->set($child);
            if($this->node->isDirectory())
            {
                $newPath = array_merge($path, array($child->Name));
                $this->entryPoint = $child->ID;
                $this->crawlDirectory($newPath);
            }
            else if ($this->node->isDocument())
            {
                $this->document->set($child);
                $this->document->toIndex($this->docType, $path);

                if ($this->document->isNew())
                {
                    $this->document->toQueue('new');
                }
                if ($this->document->hasChangedMeta())
                {
                    $this->document->toQueue('meta');
                }
                if ($this->document->hasNewVersion())
                {
                    $this->document->toQueue('version');
                }
            }
        }
    }

The "DocumentService" knows where in the "node", version and metadata information are stored and push them into the next injected services called "MetaDataService" and "DocumentVersionService". See "SoC" above, the "DocumentService" does't have to know how the data for versions or metadata is actually structured or how to extract them (and if there are any other rules should be applied). It only throws a pile of data into another service and gets the result. According to this, maybe even the picking of the respective data is a vioaltion of SOC instead of pushing the whole "node" into the interlaced services.

class DocumentService
{
    private $documentVersion;
    private $currentDocument;
    private $documentMeta;
    private $attributeMap;
    private $versionAttributes;
    private $possibleArrays;
    private $sync;

    function __construct(
        DocumentVersionService $dvs,
        MetaDataService $mds
    )
    {
        $this->documentVersion = $dvs;
        $this->documentMeta = $mds;
        $this->attributeMap = config('soap.attributeMaps.documents.attributes');
        $this->versionAttributes = config('soap.attributeMaps.documents.version');
        $this->sync = false;
    }

    public function set($node)
    {
        $this->node = $node;
        $this->documentVersion->set($node->VersionInfo->Versions);
        $this->documentMeta->set($node->Metadata->AttributeGroups);
        $this->initDocument();
    }

Because these services are injected via constructor, I cannot pass any additional arguments to these services. Therefor I added a method "set" to them to bring the respective service into right state for the operations of the current iteration.

My problem is, that if in this chain of recursive iterations, the set-methods of the services are not called (for any reason), the services-data would not be updated and they could never return any reliable result.

So each time the service is used (maybe later by someone else, somewhere else) the methods of the services have to be called in the exact correct order.

I was thinking about making all methods in these services private and call them indirectly via the magic __call()-mehtod. Only the set()-method would be public. Within the __call()-method I then can make sure that the service is prepared to run the requested action.

Facepalm
At the moment I'm writing this, I realize that even this approach can't cover my requirements to make sure to have the right current data in each iteration. I only could check if any data is present in the instance of the service. In other words if it was initialized ever before.

Conclusion
I think I'm missing some very fundamental aspects in this architecture. I would be very glad if anyone can give me an advice how this would be done "the right way".