I'm trying to find all words within a block of html. Reading the manual I thought this was possible by using the find('text')
function. Though I'm unable to get this to return anything.
Can anyone tell me what I'm doing wrong?
require_once __DIR__ . '/simple_html_dom.php';
$html = str_get_html("<html><body><div><p><span>Hello to the <b>World</b></span></p><p> again</p></div></body></html>");
foreach($html->find('text') as $element) {
echo $element->plaintext . '<br>';
}
What I'm ultimately trying to do is to find all texts and their starting position within the html. For this particular example it would look like this:
[
0 => [
'word' => 'Hello to the ',
'pos' => 27
],
1 => [
'word' => 'World',
'pos' => 43
],
2 => [
'word' => ' again',
'pos' => 66
]
]
So can someone explain me what I'm doing wrong with Simple HTML Dom and help me figure out the starting position of each word? Or tell me of another tool I should use?
You can use available functionstrip_tag
, preg_match_all
to extract the position of each word
$str = "<html><body><div><p><span>Hello to the <b>World</b></span></p><p> again</p></div></body></html>";
$find = '/'.str_replace(' ','|',strip_tags($str)).'/';
preg_match_all($find, strip_tags($str), $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
Result :-
Array
(
[0] => Array
(
[0] => Array
(
[0] => Hello
[1] => 0
)
[1] => Array
(
[0] => to
[1] => 6
)
[2] => Array
(
[0] => the
[1] => 9
)
[3] => Array
(
[0] => World
[1] => 13
)
[4] => Array
(
[0] => again
[1] => 19
)
)
)