I'm struggling to match a few movie titles i have in a weird format. Some of the titles are wrapped in quotes and some begin with #
or $
, and most of them end with the release year at the end (otherwise ????).
I'm trying to replace this:
"Ein Engel für alle" (2005) {Katzenjammer (#2.5)} ????
#"Sospecha" (1963) {El caso del viejo del Tibet} 1963
MTV Europe Music Awards 1998 (1998) (TV) 1998
"Hotel Cæsar" (1998) {(#12.26)} 1998
$Am Rande - Sechs Kapitel über AIDS in der Ukraine (2006) 2006
...to this:
Ein Engel für alle, ????
Sospecha, 1963
MTV Europe Music Awards 1998, 1998
Hotel Cæsar, 1998
Am Rande - Sechs Kapitel über AIDS in der Ukraine, 2006
...and if possible, get the release year somehow. In the example I just put a comma, but if you can't get the release date just leave it and I'll get it another way.
I'm a complete newbie in regular expressions but I still tried to do it with no luck. If anyone can give me a hand I'd really appreciate it!
Edit
To make it less confusing:
Remove everything wrapped in ()
or {}
.
Remove the $
or #
in the beginning of the string.
IF the title is wrapped in quotes, remove them.
Either leave it like this, or use some kind of grouping to get the release date at the end of the string to a separate variable?
Hope this helps :)
The regexp should be
$regexp = '[\W]*([\w- üæöä]+)[\W^-].*([\d?]{4})';
Try this:
$data = '"Ein Engel für alle" (2005) {Katzenjammer (#2.5)} ????';
$year;
$title;
if (preg_match('#(\d{4})$#', $data, $matches))
{
$year = $matches[1];
}
if (preg_match('#^(?:"(.*)")|(.*)\s+\(\d{4}\)#', $data, $matches))
{
$title = ($matches[2] ? $matches[2] : $matches[1]);
}
Edited my answer to fit your needs. ;)
You could use this script:
<?php
$inputs = Array(
'"Ein Engel für alle" (2005) {Katzenjammer (#2.5)} ????',
'#"Sospecha" (1963) {El caso del viejo del Tibet} 1963',
'MTV Europe Music Awards 1998 (1998) (TV) 1998',
'"Hotel Cæsar" (1998) {(#12.26)} 1998',
'$Am Rande - Sechs Kapitel über AIDS in der Ukraine (2006) 2006'
);
foreach ($inputs as $input) {
$matches = Array();
if (!preg_match('/^(?:\$|#)?(?:"(.+?)"|(.+?)) \(\d{4}\) .* (\d{4}|\?{4})$/', $input, $matches))
continue;
print $matches[1] . $matches[2] . ", " . $matches[3] . "
";
}
?>
Ein Engel für alle, ????
Sospecha, 1963
MTV Europe Music Awards 1998, 1998
Hotel Cæsar, 1998
Am Rande - Sechs Kapitel über AIDS in der Ukraine, 2006
This should fit your given rules precisely and accurately (though it does not use your proposed methodological steps, which do not really fit a pattern matching solution).
Let's take a closer look at that regex:
/ # start of regex
^ # starting delimiter and start-of-input
(?:\$|#)? # $ or # (but don't capture)
(?: # (don't capture the outer group)
"(.+?)"|(.+?) # title either in quotes or not
)
#\(\d{4}\) # the inner date (delimits the title when the title has no quotes)
.* # any other inner fluff
(\d{4}|\?{4}) # either four digits, or four question marks
$ # the end-of-input must immediately follow
/ # end of regex
$string = '"Ein Engel für alle" (2005) {Katzenjammer (#2.5)} ????
"Sospecha" (1963) {El caso del viejo del Tibet} 1963
MTV Europe Music Awards 1998 (1998) (TV) 1998
"Hotel Cæsar" (1998) {(#12.26)} 1998
Am Rande - Sechs Kapitel über AIDS in der Ukraine (2006) 2006';
preg_match_all('#(.*?) \(([0-9]+)\)#i', $string, $matches);
$count = count($matches[0]);
for($i = 0; $i < $count; $i++){
$title = preg_replace('#["\#\$]#us', '', $matches[1][$i]);
echo "$title, {$matches[2][$i]}"."<br />";
}
Result:
Ein Engel für alle , 2005
Sospecha , 1963
MTV Europe Music Awards 1998 , 1998
Hotel Cæsar , 1998
Am Rande - Sechs Kapitel über AIDS in der Ukraine , 2006