I will be receiving input which will be ANY ONE of the following. I need to devise a regex which can handle any of these and extract the validS3bucketname and validS3resourcename.
http://s3.amazonaws.com/validS3bucketname/validS3resourcename
https://s3.amazonaws.com/validS3bucketname/validS3resourcename
http://validS3bucketname.s3.amazonaws.com/validS3resourcename
https://validS3bucketname.s3.amazonaws.com/validS3resourcename
validS3bucketname and validS3resourcename are VALID S3 values, including spaces and whatever not - I have no idea what S3 allows as valid file names.
The following will match both:
/:\/\/s3\.amazonaws.com\/([^\/]+)|:\/\/([^.]+)\.s3\.amazonaws\.com\//
this simple function should wrap it nicely
function getS3Info($url) {
if(! preg_match('/(?:\/\/s3\.amazonaws.com\/([^\/]+)|:\/\/([^.]+)\.s3\.amazonaws\.com)\/([^\/]+)/', $url, $a)) {
return false;
}
$bucket = isset($a[2]) ? $a[2] : $a[1];
$resource = $a[3];
return array('bucket' => $bucket, 'resource' => $resource);
}
Maybe these can be combined. Take it as inspiration:
First case:
^https?://s3\.amazonaws\.com\/([^/]+)/(.+)$
Second case:
^https?://([^/]+)\.s3\.amazonaws\.com\/(.+)$