I am getting this type of response from the url that I am hitting and I need to parse this to get the desired HTML.
this=ajax({"htmlInfo":"SOME-HTML", "otherInfo": "Blah Blah", "moreInfo": "Bleh Bleh"})
As mentioned above, I have three key-pair values from which I need to get "SOME-HTML", how can I get that and the main problem is that "SOME-HTML" has escape characters. Below is the kind of response that that will be present.
\u003Cdiv class=\u0022container columns-2\u0022\u003E \u003Csection class=\u0022col-main\u0022\u003E \u003Cdiv class=\u0027visor-article-list list list-view-recent\u0027 \u003E \u003Cdiv class=\u0027grid_item visor-article-teaser list_default\u0027 \u003E \u003Ca class=\u0027grid_img\u0027 href=\u0027/manUnited-is-the-best\u0027\u003E \u003Cimg src=\u0022http://www.xyz.com/sites//files/styles/w400h22
Can anyone please help me in this regard. I am not sure how to tackle this.
Thanks in advance.
The easiest way is to extract the JSON and then unmarshal it into a struct. The \uXXXX
parts are unicode characters
package main
import (
"encoding/json"
"fmt"
"regexp"
)
// Data follows the structure of the JSON data in the response
type Data struct {
HTMLInfo string `json:"htmlInfo"`
OtherInfo string `json:"otherInfo"`
MoreInfo string `json:"moreInfo"`
}
func main() {
// input is an example of the raw response data. It's probably a []byte if
// you got it from ioutil.ReadAll(resp.Body)
input := []byte(`this=ajax({"htmlInfo":"\u003Cdiv class=\u0022container columns-2\u0022\u003E
\u003Csection class=\u0022col-main\u0022\u003E
\u003Cdiv class=\u0027visor-article-list list list-view-recent\u0027 \u003E
\u003Cdiv class=\u0027grid_item visor-article-teaser list_default\u0027 \u003E
\u003Ca class=\u0027grid_img\u0027 href=\u0027/manUnited-is-the-best\u0027\u003E
\u003Cimg src=\u0022http://example.com/sites//files/styles/w400h22", "otherInfo": "Blah Blah", "moreInfo": "Bleh Bleh"})`)
// First we want to extract the data json using regex with a capture group.
dataRegex, err := regexp.Compile("ajax\\((.*)\\)")
if err != nil {
fmt.Println("regex failed to compile:", err)
return
}
// FindSubmatch should return two matches:
// 0: The full match
// 1: The contents of the capture group (what we want)
matches := dataRegex.FindSubmatch(input)
if len(matches) != 2 {
fmt.Println("incorrect number of match results:", len(matches))
return
}
dataJSON := matches[1]
// Since the data is in JSON format, we can unmarshal it into a struct. If
// you don't care at all about the fields other than "htmlInfo", you can
// omit them from the struct.
data := &Data{}
if err := json.Unmarshal(dataJSON, data); err != nil {
fmt.Println("failed to unmarshal data json:", err)
}
// You now have access to the "htmlInfo" property
fmt.Println("HTML INFO:", data.HTMLInfo)
}
Which will produce:
HTML INFO: <div class="container columns-2">
<section class="col-main">
<div class='visor-article-list list list-view-recent' >
<div class='grid_item visor-article-teaser list_default' >
<a class='grid_img' href='/manUnited-is-the-best'>
<img src="http://example.com/sites//files/styles/w400h22