I'm trying to find words that are in uppercase in a given piece of text. The words must be one after the other to be considered and they must be at least 4 of them.
I have a "almost" working code but it captures much more: [A-Z]*(?: +[A-Z]*){4,}
. The capture group also includes spaces at the start or the end of those words (like a boundary).
I have a playground if you want to test it out: https://regex101.com/r/BmXHFP/2
Is there a way to make the regex in example capture only the words in the first sentence? The language I'm using is Go
and it has no look-behind/ahead.
In your regex, you just need to change the second *
for a +
:
[A-Z]*(?: +[A-Z]+){4,}
While using (?: +[A-Z]*)
, you are matchin "a space followed by 0+ letters". So you are matching spaces. When replacing the *
by a +
, you matches spaces if there are uppercase after.
Replace the *
s by +
s, and your regex only matches the words in the first sentence.
.*
also matches the empty string. Looking at you regex and ignoring both [A-Z]*
, all that remains is a sequence of spaces. Using +
makes sure that there is at least one uppercase char between every now and then.