如何对CSS进行转义

The text being stored in the database also includes the CSS styling.

<p>ABC&nbsp;&nbsp;| Min. XYZ&nbsp;
<style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}-->
</style>
<span data-sheets-userformat="{&quot;2&quot;:3011,&quot;3&quot;:{&quot;1&quot;:0},&quot;4&quot;:[null,2,16777215],&quot;9&quot;:1,&quot;10&quot;:1,&quot;11&quot;:4,&quot;12&quot;:0,&quot;14&quot;:[null,2,0]}" data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;PQR&quot;}" style="font-size: 10pt; font-family: Arial; color: rgb(0, 0, 0); text-align: center;">PQR</span></p>

To get rid of &nbsp I have used html.Unescape() and it works perfectly fine.

When fetched from database I want to display it in this format : ABC | Min. XYZ PQR

But the actual result (after using html.Unescape()) is :

ABC | Min. XYZ
<style type="text/css">
    <!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}-->
</style>
<span data-sheets-userformat="{&quot;2&quot;:3011,&quot;3&quot;:{&quot;1&quot;:0},&quot;4&quot;:[null,2,16777215],&quot;9&quot;:1,&quot;10&quot;:1,&quot;11&quot;:4,&quot;12&quot;:0,&quot;14&quot;:[null,2,0]}" data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;PQR&quot;}" style="font-size: 10pt; font-family: Arial; color: rgb(0, 0, 0); text-align: center;">PQR</span></p>

This seems simple but requires you to do 3 things:

  1. Strip all HTML tags like <p> and <style type="text/css">
  2. Unescape HTML entities like &nbsp;
  3. Replace newlines, multiple spaces, and non-breaking spaces (U+00A0) with single spaces

You can do this with the following with github.com/microcosm-cc/bluemonday, html and strings:

// Your input text
input := `<p>ABC&nbsp;&nbsp;| Min. XYZ&nbsp;
<style type="text/css"><!--td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}-->
</style>
<span data-sheets-userformat="{&quot;2&quot;:3011,&quot;3&quot;:{&quot;1&quot;:0},&quot;4&quot;:[null,2,16777215],&quot;9&quot;:1,&quot;10&quot;:1,&quot;11&quot;:4,&quot;12&quot;:0,&quot;14&quot;:[null,2,0]}" data-sheets-value="{&quot;1&quot;:2,&quot;2&quot;:&quot;PQR&quot;}" style="font-size: 10pt; font-family: Arial; color: rgb(0, 0, 0); text-align: center;">PQR</span></p>`

// Strip all HTML tags
p := bluemonday.StrictPolicy()

output := p.Sanitize(input)

// Unescape HTML entities
output = html.UnescapeString(output)

// Condense whitespace
output = strings.Join(strings.Fields(strings.TrimSpace(output)), " ")

output is now ABC | Min. XYZ PQR

For the last step, using strings.Fields looks cleaner than using a regexp since \s doesn't cover non-breaking spaces (U+00A0) and thus requires the following:

// Leading and trailing spaces
output = regexp.MustCompile(`^[\s\p{Zs}]+|[\s\p{Zs}]+$`).ReplaceAllString(output, "")
// middle spaces
output = regexp.MustCompile(`[\s\p{Zs}]{2,}`).ReplaceAllString(output, " ")

See more on matching whitespace here: How to remove redundant spaces/whitespace from a string in Golang?

Finally, you can combine the above into a function as follows in github.com/grokify/gotilla/html/htmlutil

var bluemondayStrictPolicy = bluemonday.StrictPolicy()

func HTMLToTextCondensed(s string) string {
    return strings.Join(
        strings.Fields(
            strings.TrimSpace(
                html.UnescapeString(
                    bluemondayStrictPolicy.Sanitize(s),
                ),
            )),
        " ",
    )
}