There are a bunch of HTML text extraction tools out there. Mostly for Java or Python. The one I come across most often is boilerpipe. There are a few APIs here and there, and some seem to work pretty well. Does anyone know of anything in PHP that does this?
You could try phpQuery:
DomDocument is a class available in PHP if you have libxml support that can parse HTML documents and let you iterate over them or issue XPath queries to find specific nodes in the DOM tree. This is the ideal method.
Or, if the text is simple enough and uniform, you can use preg_match() to extract text from the data using Regular Expressions.