使用GoQuery在换行符上拆分元素

I'm trying to get content from page with GoQuery, but for some reasons I can't do split on line break (br).

The HTML, looks like this:

<ul>
    <li>I'm skipped</li>

    <li> 
        Text Into  - <p>Whatever</p>
        <p>
            Line 1<br />
            Line 2<br />
            Line 3<br />
            Line 4<br />
            Line N
        </p>
    </li> 
</ul>

Go code:

doc, err := goquery.NewDocumentFromReader(res.Body)
if err != nil {
    panic(err)
}

doc.Find("ul").Each(func(i int, s *goquery.Selection) {

    str := s.Find("li p").Next().Text()

    fmt.Println(str, "--")

})

For some reason I'm not able to get each line, separated by break in p tag, as single item.Output of code above is:

Line1Line2Line3Line4LineN--

But the output I'm trying to achieve should looks like this:

Line1--
Line2--
Line3--
Line4--
LineN--

Since I'm Go newbie, please let me know in comment If something is not clear, so I will try to explain It as much as I know.

Thanks.

.Text() will:

Text gets the combined text contents of each element in the set of matched elements, including their descendants.

So what you actually want to do is get the contents and the filter out any br tags. As dave's answer states there is new line characters in there so I've also trimmed those:

package main

import (
    "fmt"
    "github.com/PuerkitoBio/goquery"
    "strings"
)

var input string = `
<ul>
    <li>I'm skipped</li>

    <li> 
        Text Into  - <p>Whatever</p>
        <p>
            Line 1<br />
            Line 2<br />
            Line 3<br />
            Line 4<br />
            Line N
        </p>
    </li> 
</ul>
`

func main() {
    doc, err := goquery.NewDocumentFromReader(strings.NewReader(input))
    if err != nil {
        panic(err)
    }

    doc.Find("ul").Each(func(i int, s *goquery.Selection) {

        p := s.Find("li p").Next()
        p.Contents().Each(func(i int, s *goquery.Selection) {
            if !s.Is("br") {
                fmt.Println(strings.TrimSpace(s.Text()), "--")
            }

        })

    })
}

Produces:

Line 1 --
Line 2 --
Line 3 --
Line 4 --
Line N --

Okay, I managed to find one solution.Not sure If It's right way to go, so If someone have something better - please share It.

So I basically, store value of li p as HTML, and then use strings.Split to break on each br tag, and since strings.Split returns slice of strings, I just loop over It.

title, err := s.Find("li p").Next().Html()
if err != nil {
    panic(err)
}

splittedTitles := strings.Split(title, "<br/>")

for _, str := range splittedTitles {
    fmt.Println(str, "--")
}

I ran the code you showed, and I am getting newlines in the string. Assuming you are on the latest version of goquery you should be as well, unless your html is not

<p>
    Line 1<br />
    Line 2<br />
    Line 3<br />
    Line 4<br />
    Line N
</p>

but is actually something like:

<p>
    Line 1<br />Line 2<br />Line 3<br />Line 4<br />Line N
</p>

(keeping in mind that when you open chrome dev tools, for example, it is probably displaying it as the former, even if the actual source is that latter)

in which case, this is expected behaviour:

let html_1 = $(`<p>
        Line 1<br />
        Line 2<br />
        Line 3<br />
        Line 4<br />
        Line N
    </p>`);

let html_2 = $(`<p>
        Line 1<br />Line 2<br />Line 3<br />Line 4<br />Line N
    </p>`);
    
console.log({html1: html_1.text(), html2: html_2.text()});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

To resolve, you could probably just do:

p := s.Find("li p").Next()
p.SetHtml(strings.Replace(p.Html(), "<br />", "<br />
", -1)).Text()

although, you may have to play with whether to use <br/> or <br /> or <br> as I'm not sure how it will render it.

</div>

I think it could be better if you just replace <br/> with ' ' or '--' before your call .Text() method.

    // html is the result of `.Html()` method
    str := strings.Replace(html, "<br/>", "\
", -1)
    doc, err := goquery.NewDocumentFromReader(strings.NewReader(str))
    if err != nil {
        return ""
    }
    return doc.Text()