too long

I'm trying to find the month in a text written in German. (In an html file)

March is written "März".

I want to be sure that I catch it so I check

Marz, März, März

I tried to use this code

if(preg_match("/ma?ä?(ä)?rz/i", $title))
    return 3;

It works fine for the first two, but doesn't with ä. What did I do wrong ?

(The HTML and my PHP files are encoded in UTF8)

Why not just try

(Marz|März|März)

If it's just for searching purposes but not for returning the actual position of the word, you could normalize the search string using html_entity_decode() and iconv():

$string = html_entity_decode($string, ENT_QUOTES, "utf-8");
$string = iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", $string);

// then search for "Marz"

You have to first decode the entities, then use a comparison that works with the Unicode Collation Algorithm. For example, this works in Perl:

use Unicode::Collate;

my $Collator = Unicode::Collate->new(normalization => undef, level => 1);
my $str = "Ich muß Perl studieren.";
my $sub = "MÜSS";
my $match;
if (my($pos,$len) = $Collator->index($str, $sub)) {
    $match = substr($str, $pos, $len);
}

Matching things with and without marks is possible according to what level of comparison you wish done.

How you perform basic Unicode operations like this in PHP I do not know, but I figure there must be a corresponding library, given how necessary these types of things are.

ä is more than one byte or something like that - you have to do this:

preg_match("/ma?(ä)?(ä)?rz/i", $title);

You can see it here.

Besides, Kengs approach is better.