正则表达式以html(golang)查找图像

I'm parsing an xml rss feed from a couple of different sources and I want to find the images in the html.

I did some research and I found a regex that I think might work

/<img[^>]+src="?([^"\s]+)"?\s*\/>/g

but I have trouble using it in go. It gives me errors because I don't know how to make it search with that expression.

I tried using it as a string, it doesn't escape properly with single or with double quotes. I tried using it just like that, bare, and it gives me an error.

Any ideas?

Using a proper html parser is always better for parsing html, however a cheap / hackish regex can also work fine, here's an example:

var imgRE = regexp.MustCompile(`<img[^>]+\bsrc=["']([^"']+)["']`)
// if your img's are properly formed with doublequotes then use this, it's more efficient.
// var imgRE = regexp.MustCompile(`<img[^>]+\bsrc="([^"]+)"`)
func findImages(htm string) []string {
    imgs := imgRE.FindAllStringSubmatch(htm, -1)
    out := make([]string, len(imgs))
    for i := range out {
        out[i] = imgs[i][1]
    }
    return out
}

playground

Ah so, sorry,Not worked with Go before but this seems work. tryed at

https://tour.golang.org/welcome/1

.

package main

import (
     "fmt"
     "regexp"
)

func main() {
   var myString = `<img src='img1single.jpg'><img src="img2double.jpg">`
   var myRegex = regexp.MustCompile(`<img[^>]+\bsrc=["']([^"']+)["']`)
   var imgTags = myRegex.FindAllStringSubmatch(myString, -1)
   out := make([]string, len(imgTags))
  for i := range out {
    fmt.Println(imgTags[i][1])
   }
 }

I suggest to use htmlagility to parse any dom/xml kind a.

Read document by;

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(sourceHtml); 

Parse by Xpath definition RegX fine but group ext. issues makes job complex

doc.DocumentNode.SelectSingleNode(XPath here)      

or

doc.DocumentNode.SelectNodes("//img")  // this should give all img tags 

like.

i suggest this becouse it seems rss serves some html content ;) So get xml, parse with XMLDoc get html content that you need then get all images by this. For open answer.

after comment just need regex i think ; my pattern is

 <img.+?src=[\"'](.+?)[\"'].*?>

for input

<img src='img1single.jpg'>
<img src="img2double.jpg">

and result seems fine in .net you must get by foreach via

.Groups[1].Value

regards.