PHP REGEX在html标记中查找大写句子

I am trying to create regex to find uppercase sentence in html tag. Here is an example:

<span style="font-family:Arial; font-size:11pt; font-weight:bold">RESSONÂNCIA MAGNÉTICA</span></p>

I got this regex: ^<span style="font-family:Arial; font-size:11pt; font-weight:bold">+[A-Z]+<\/span><\/p>

However it is not working properly. It is missing spaces and letters with accentuation.

You're using [A-Z] that only matches A to Z. This can be solved using Unicode categories

  1. Use \p{Lu} to match characters with the Uppercase_Letter Unicode property.
  2. In order to use the above, set the /u (Unicode modifier) in your pattern.
  3. Don't forget to include spaces (your example has 1).

This will match what you want: [\p{Lu} ]+

Code:

preg_replace("/^<span style=\"font-family:Arial; font-size:11pt; font-weight:bold\">([\p{Lu} ]+)<\/span><\/p>/u", "\1", $input_lines);

Demo online

You seem to have a very specific case in mind. @Mariano pointed out a sweet way to grab uppercases characters that is unicode safe (nice work!) but maybe coming at this a little differently will help.

You mentioned wanting uppercase sentences... I assume that's more than uppercase letters, that includes punctuation, and all matter of other characters being okay. Maybe think about what isn't okay? If all that is not allowed to be inside that tag is lowercase letters, maybe your match (inside the tag) is [^a-z]+ which will match anything that isn't a lowercase letter from a to z.

preg_replace("/^<span style=\"font-family:Arial; font-size:11pt; font-weight:bold\">([^a-z]+)<\/span><\/p>/u", "\1", $input_lines);

And if you want to grab the contents of any span, you could use something like this:

preg_replace("/^<span[^>]+>([^a-z]+)<\/span>/u", "\1", $input_lines);

Or to handle lowercase letters with accents:

preg_replace("/^<span[^>]+>([^\{Ll}]+)<\/span>/u", "\1", $input_lines);

I suggested using \p{Lu} in a previous answer, but you're probably not interested in matching Arabic, German special chars or whatever Uppercase_Letter category matches.

Keep it simple:

Just add the special chars you want inside the character class. For example, and I'm guessing it's Portuguese you're matching:

[A-ZÁÂÃÀÇÉÊÍÓÔÕÚ ]+