Has anyone ever used the Wikipedia Data Extraction? I need to use it for work.
Could you give other tools for extracting information on web pages?
Thanks!
When you say Wikipedia Data Extraction, I assume you're referring to the software DBpedia uses to transform Wikipedia XML dumps into the DBpedia data dumps? Have you considered using the DBpedia dumps themselves?
Tools to extract information from web pages is a very broad space. What kind of information do you want to extract? Is it from semi-structured (e.g. tables), or unstructured text (e.g. prose). Are you interested in metadata such as page title and author, or lower-level concepts such as named entities?
(I would have left these clarifying questions on the question but my account level doesn't allow it)