I'm trying to decode the content-disposition header (from curl) to get the filename using the following regular expression:
<?php
$str = 'attachment;filename="unnamed.jpg";filename*=UTF-8\'\'unnamed.jpg\'';
preg_match('/^.*?filename=(["\'])([^"\']+)\1/m', $str, $matches);
print_r($matches);
So while it matches if the filename is in single or double quotes, it fails if there are no quotes around the filename (which can happen)
$str = 'attachment;filename=unnamed.jpg;filename*=unnamed.jpg';
Right now I'm using two regular expressions (with if-else) but I just wanted to learn if it is possible to do in a single regex? Just for my own learning to master regex.
I will use the branch reset feature (?|...|...|...)
that gives a more readable pattern and avoids to create a capture group for the quotes. In a branch-reset group, each capture groups have the same numbers for each alternative:
if ( preg_match('~filename=(?|"([^"]*)"|\'([^\']*)\'|([^;]*))~', $str, $match) )
echo $match[1], PHP_EOL;
Whatever the alternative that succeeds, the capture is always in group 1.
One approach is to use an alternation in a single regex to match either a single/double quoted filename, or a filename which is completely unquoted. Note that one side effect of this approach is that we introduce more capture groups into the regex. So we need a bit of extra logic to handle this.
<?php
$str = 'attachment;filename=unnamed.jpg;filename*=UTF-8\'\'unnamed.jpg\'';
$result = preg_match('/^.*?filename=(?:(?:(["\'])([^"\']+)\1)|([^"\';]+))/m',
$str, $matches);
print_r($matches);
$index = count($matches) == 3 ? 2 : 3;
if ($result) {
echo $matches[$index];
}
else {
echo "filename not found";
}
?>
You could make your capturing group optional (["\'])?
and \1?
like: and add a semicolon or end of the string to the end of the regex in a non capturing group which checks if there is a ;
or the end of the line (?:;|$)
^.*?filename=(["\'])?([^"\']+)\1?(?:;|$)
$str = 'attachment;filename=unnamed.jpg;filename*=UTF-8\'\'unnamed.jpg\'';
preg_match('/^.*?filename=(["\'])?([^"\']+)\1?(?:;|$)/m', $str, $matches);
print_r($matches);
You can also use \K
to reset the starting point of the reported match and then match until you encounter a double quote or a semicolon [^";]+
. This will only return the filename.
foreach ($strings as $string) {
preg_match('/^.*?filename="?\K[^";]+/m', $string, $matches);
print_r($matches);
}
Just to put my two cents in - you could use a conditional regex:
filename=(['"])?(?(1)(.+?)\1|([^;]+))
filename= # match filename=
(['"])? # capture " or ' into group 1, optional
(?(1) # if group 1 was set ...
(.+?)\1 # ... then match up to \1
| # else
([^;]+) # not a semicolon
)
Afterwards, you need to check if group 2 or 3 was present.
Alternatively, go for @Casimir's answer using the (often overlooked) branch reset.