I'm currently building a document-sharing platform, and to attract as many users as possible, I want to already add 10 000 documents to my platform. The documents are only PDF files. I'm working with Symfony2, but I guess this doesn't change much to the problem: how can I extract the metadata I need from these documents (for example, title, the first 100 words for the description) automatically and insert it into my database (in my case, hydrate it into my entities, but I know that part).
I guess a crawler is what I'm looking for but I have no idea where to find something like this nor how to make it work.
Thanks in advance!
well as you don't have a real question:
when you have done all this and then have a specific problem: ask a real question ;)