从html中查找所有单词(或句子)

I'm trying to find all words within a block of html. Reading the manual I thought this was possible by using the find('text') function. Though I'm unable to get this to return anything.

Can anyone tell me what I'm doing wrong?

require_once __DIR__ . '/simple_html_dom.php';

$html = str_get_html("<html><body><div><p><span>Hello to the <b>World</b></span></p><p> again</p></div></body></html>");

foreach($html->find('text') as $element) {
    echo $element->plaintext . '<br>';
}

What I'm ultimately trying to do is to find all texts and their starting position within the html. For this particular example it would look like this:

[
    0 => [
        'word' => 'Hello to the ',
        'pos' => 27
    ],
    1 => [
        'word' => 'World',
        'pos' => 43
    ],
    2 => [
        'word' => ' again',
        'pos' => 66
    ]
]

So can someone explain me what I'm doing wrong with Simple HTML Dom and help me figure out the starting position of each word? Or tell me of another tool I should use?

You can use available functionstrip_tag, preg_match_all to extract the position of each word

$str = "<html><body><div><p><span>Hello to the <b>World</b></span></p><p> again</p></div></body></html>";
$find =  '/'.str_replace(' ','|',strip_tags($str)).'/';
preg_match_all($find, strip_tags($str), $matches, PREG_OFFSET_CAPTURE);
print_r($matches);

Result :-

 Array
(
[0] => Array
    (
        [0] => Array
            (
                [0] => Hello
                [1] => 0
            )

        [1] => Array
            (
                [0] => to
                [1] => 6
            )

        [2] => Array
            (
                [0] => the
                [1] => 9
            )

        [3] => Array
            (
                [0] => World
                [1] => 13
            )

        [4] => Array
            (
                [0] => again
                [1] => 19
            )

    )

)