使用正则表达式(无能)在网页中查找重复的单词

I'm trying to figure out a way to use regular expressions to find duplicate words on a webpage, I'm completely clueless and apologise in advance if I'm using the incorrect terminology.

So far I've found the following regular expressions which work well but only on words that are consecutively (e.g. hello hello) but not words that are placed in different parts of the webpage or separated by another word (e.g. hello food hello)

\b(\w+)(\s+\1\b)*

\b(\w+(?:\s*\w*))\s+\1\b

I would be super grateful to anyone that can help, I realise I might not be in the right place since I'm basically a noob.

Capture the first word (surrounded by word boundaries) in a group, and then backreference it later in a lookahead, after repeating optional characters in between:

\b(\w+)\b(?=.*\b\1\b)

https://regex101.com/r/TcS1UW/3

I would use Jsoup to get the text from the webpage. Then you could keep track of the counts using a HashMap, and then search the map for any number of occurrences you want:

    String url = "https://en.wikipedia.org/wiki/Jsoup";

    String body = Jsoup.connect(url).get().body().text();

    Map<String,Integer> counts = new HashMap<>();

    for ( String word : body.split(" ") )
    {
        counts.merge(word, 1, Integer::sum);
    }
    for ( String key : counts.keySet() )
    {
        if ( counts.get(key) >= 2 )
        {
            System.out.println(key + " occurs " + counts.get(key) + " times.");
        }
    }

You may need to clean up the map to get rid of some entries that aren't words, but this will get you most of the way.