There is a string, for instance. What is the best way to clean the string from html
content?
s := "<b>John</b> Thank you."
The result should be Thank you.
Best way to slice the string is to search for a specif string, (at your case "</b>"
) and separate him to a different string.
Example:
package main
import (
"fmt"
"strings"
)
func main() {
html := "<b>John</b> Thank you."
fmt.Println(html)
thanks := strings.Split(html, "</b>")[1]
fmt.Println(thanks)
}
Result: Thank you.
Play ground link:https://play.golang.org/p/yOc3G0YeNTe
Also please consider TrimSpace
to prevent unnecessary spacing
package main
import (
"fmt"
"strings"
)
func main() {
html := "<b>John</b> Thank you."
fmt.Println(html)
thanks := strings.Split(html, "</b>")[1]
fmt.Println(thanks)
cleanThanks := strings.TrimSpace(thanks)
fmt.Println(cleanThanks)
}
Result:
Thank you.
Thank you.
Play ground link:https://play.golang.org/p/S7BRM7jOvtL
Please note that you should verify the string contain "</b>"
if not you will get panic: runtime error: index out of range
First of all, please resist using Regular Expressions. Bad things can happen .
On a more serious note, if you can't trust the HTML content, I suggest using something like bluemonday, which is currently what you could be using in production.
For a simpler approach, to get something working quickly you could use either another library such as grokify/html-strip-tags-go which will suit your needs, or as in Eitam's answer roll your own by splitting the strings.
Best of luck!