too long

I want to match in a html code all Text. But only text with all punctuation characters, but without html like or urls etc.

example:

<div class="description">Boys loving girls</div>

match result:

Boys loving girls

example:

<div class="description">
guys loving girls! 
</div><br />

match result:

guys loving girls!

my try:

(?!.*(?:http:\/\/))^[a-z0-9():+,\-.@;\$_\!*\'%\?\säüöß%]+

Please read How do you parse and process HTML/XML in PHP? to learn more about parsing HTML content.

You should not use regex for this kind of task.


If you want to use regex anyway, then try the following regex pattern:

$pattern = '/^(?!.*(?:https?|ftp):\/\/)(?:[^>]*>|)\s*([^<]+)(?:<.*|)\s*$/';