I have an article with text and multiple images in it and need to get just images and just text, separately.
Now I have this code and it just returns last image in article:
preg_match('/<img.+src=[\'"](?P<src>.+?)[\'"].*>/i', $article, $img);
How to select all images and do reverse for getting just text?
Thank you
You can use the DOM for that:
$imgSrc = array();
$txt = '';
$dom = new DOMDocument();
@$dom->loadHTML($article);
$imgs = $dom->getElementsByTagName('img');
foreach ($imgs as $img) {
$imgSrc[] = $img->getAttribute('src');
}
$xpath = new DOMXPath($dom);
$textNodes = $xpath->query('//*[not(self::script) and not(self::style)]/text()');
foreach ($textNodes as $textNode) {
$tmp = trim($textNode->textContent);
$txt .= (empty($tmp)) ? '' : $tmp . PHP_EOL;
}
XPath query details:
//
means anywhere in the DOM tree*
means all tag nodes[.....]
defines a conditionnot(self::script)
: the current node must not be a script nodetext()
returns the text node
$text = preg_replace('/<img.+src=[\'"](?P<src>.+?)[\'"].*>/i', '', $article);
preg_match_all('/<img.+src=[\'"](?P<src>.+?)[\'"].*>/i', $article, $images);
//use $images and $text