preg_replace plus在src的开头和结尾附加,以替换cid:

I have a HTML string. For the purposes of this lets say the string is:

<img id="Picture_x0020_1" src="cid:image001.jpg@01D05CBF.CF7A44B0" alt="Variety 008 (893 x 799) (223 x 200)" height="200" width="223">dflkjdslkjdsfldskfjdlfkjdlfksdjfflkdsjfdlkdfdjflkdfjdlkjfkdlfjdljfldjfldjflkdjjfkd<img id="Picture_x0020_1" src="cid:image001.jpg@01D05CBF.CF7A44B0" alt="Variety 008 (893 x 799) (223 x 200)" height="200" width="223">hkjhkhkhkhkhkjhjkhhkjhkjhkjhkjhjkhkjhkjhkhkjhkjhjkhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjh<img id="Picture_x0020_1" src="cid:image001.jpg@01D05CBF.CF7A44B0" alt="Variety 008 (893 x 799) (223 x 200)" height="200" width="223">dsjhfdsjfdjflsjflkjdflkjffldskjfdljdlfkjflkdjflkdjfdslkjfkds

Now lets look at the string i need to do some work on, this is what gmail saves the image name as inside src="":

cid:image001.jpg@01D05CBF.CF7A44B0

The class i use downloads and saves the attachment as follows:

$cid = 'cid:image001.jpg@01D05CBF.CF7A44B0'; 
$mail_id . '_' . $cid . '_' . $image_id;

So the actual image name is something like this: 308907_image001.jpg@01D05CBF.CF7A44B0_image001.jpg

Now my aim is to replace all of these occurrences:

cid:image001.jpg@01D05CBF.CF7A44B0

with

attachments/308907_image001.jpg@01D05CBF.CF7A44B0_image001.jpg

essentially strip out the cid: string, append $mail_id and _ to the start of the string and _image001.jpg to end.

keep in mind ill possibly have a bunch of these embedded cid src in the html string

So not been so good with regex i am doing this in baby steps, first i'm trying to figure out how to replace cid:image001.jpg@01D05CBF.CF7A44B0 with attachments/308907_image001.jpg@01D05CBF.CF7A44B0 and then ill try and figure out how to append _image001.jpg on the end.

I managed to build the regex that highlights the whole image tag and running it in http://www.regexr.com/ it does highlight the cid: value in element [1]:

I was thinking something like this but it just returns an empty string but the logic seems to work in the regex tool so i cant figure out why its not working, maybe its because the regex has 3 elements and i need to access element [1] to get the cid: value, not sure:

$string = preg_replace('/(<img\b\s+.*?src=\")(.*?cid:.*?)(\">)/g', 'attachments/'.$mail_id.'_', $html);

but the problem here is i just need to replace cid: with attachments/308907_ and i dont want to replace the image001.jpg@01D05CBF.CF7A44B0 part.

I am also not sure of the best way to append the _image.jpg at the end. If it was just one replace i could do something like this:

$current_image_name = 'attachments/308907_image001.jpg@01D05CBF.CF7A44B0';
$new_image_name = 'attachments/308907_image001.jpg@01D05CBF.CF7A44B0_image001.jpg';

str_replace($current_image_name, $new_image_name,$html);

But because there could be lots of these in the email i dont think that approach will work and it might not be good performance wise since some emails could be large in some cases.

My worry is that is not efficient doing calls since it could be a big email in parsing so maybe there is a way to do that at the same time as the preg_replace function.

I am happy to figure the actual code out if someone even points me in the right direction and gives me some hints on the best way to achieve this.

Try this,

$re = "/src=\\\"cid:(.*?)@(.*?)\\\"/s"; 
$str = "<img id=\"Picture_x0020_1\" src=\"cid:image001.jpg@01D05CBF.CF7A44B0\" alt=\"Variety 008 (893 x 799) (223 x 200)\" height=\"200\" width=\"223\">dflkjdslkjdsfldskfjdlfkjdlfksdjfflkdsjfdlkdfdjflkdfjdlkjfkdlfjdljfldjfldjflkdjjfkd<img id=\"Picture_x0020_1\" src=\"cid:image001.jpg@01D05CBF.CF7A44B0\" alt=\"Variety 008 (893 x 799) (223 x 200)\" height=\"200\" width=\"223\">hkjhkhkhkhkhkjhjkhhkjhkjhkjhkjhjkhkjhkjhkhkjhkjhjkhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjh<img id=\"Picture_x0020_1\" src=\"cid:image001.jpg@01D05CBF.CF7A44B0\" alt=\"Variety 008 (893 x 799) (223 x 200)\" height=\"200\" width=\"223\">dsjhfdsjfdjflsjflkjdflkjffldskjfdljdlfkjflkdjflkdjfdslkjfkds"; 
$subst = "src=\"attachments/".$mailid."_$1@$2_$1\""; 

$result = preg_replace($re, $subst, $str);

See Regex

Updates:

Pattern =/src=\"cid:(.*?)@(.*?)\"/s
src= matches the characters src
\"= matches the character " literally
cid:= matches the characters cid:

Now, We have to capture image name from the string, so that we can append and prepend it into the output string. Image name can be captured between cid: and @.

Therefore cid:(.*?)@ will capture image name. This is the first capturing group in the pattern. (i.e.$1). Image name will be stored into $1 as it is the first captured group). If you use preg_match then it will be $match[1]

Then we need string between @and " This is the second capturing group. So @(.*?)" which is mentioned as $2 in the preg_replace function.

In preg_replace matched string will be stored into $0,$1 and so on. and in preg_match matched string will be stored into $match[0],$match[1]and so on.. And $match is the userdefined array name which will be parsed as third parameter in the function