I'm stress testing (with loader.io) this type of code in Go to create an array of 100 items along with some other basic variables and parse them all in the template:
package main
import (
"html/template"
"net/http"
)
var templates map[string]*template.Template
// Load templates on program initialisation
func init() {
if templates == nil {
templates = make(map[string]*template.Template)
}
templates["index.html"] = template.Must(template.ParseFiles("index.html"))
}
func handler(w http.ResponseWriter, r *http.Request) {
type Post struct {
Id int
Title, Content string
}
var Posts [100]Post
// Fill posts
for i := 0; i < 100; i++ {
Posts[i] = Post{i, "Sample Title", "Lorem Ipsum Dolor Sit Amet"}
}
type Page struct {
Title, Subtitle string
Posts [100]Post
}
var p Page
p.Title = "Index Page of My Super Blog"
p.Subtitle = "A blog about everything"
p.Posts = Posts
tmpl := templates["index.html"]
tmpl.ExecuteTemplate(w, "index.html", p)
}
func main() {
http.HandleFunc("/", handler)
http.ListenAndServe(":8888", nil)
}
My test with Loader is using 5k concurrent connections/s through 1 minute. The problem is, just after a few seconds after starting the test, I get a high average latency (almost 10s) and as a result 5k successful responses and the test stops because it reaches the Error Rate of 50% (timeouts).
On the same machine, PHP gives 50k+.
I understand that it's not Go performance issue, but probably something related to html/template. Go can easily manage hard enough calculations a lot faster than anything like PHP of course, but when it comes to parsing the data to the template, why is it so awful?
Any workarounds, or probably I'm just doing it wrong (I'm new to Go)?
P.S. Actually even with 1 item it's exactly the same... 5-6k and stopping after huge amount of timeouts. But that's probably because the array with posts is staying of the same length.
My template code (index.html):
{{ .Title }}
{{ .Subtitle }}
{{ range .Posts }}
{{ .Title }}
{{ .Content }}
{{ end }}
Here's the profiling result of github.com/pkg/profile:
root@Test:~# go tool pprof app /tmp/profile311243501/cpu.pprof
Possible precedence issue with control flow operator at /usr/lib/go/pkg/tool/linux_amd64/pprof line 3008.
Welcome to pprof! For help, type 'help'.
(pprof) top10
Total: 2054 samples
97 4.7% 4.7% 726 35.3% reflect.Value.call
89 4.3% 9.1% 278 13.5% runtime.mallocgc
85 4.1% 13.2% 86 4.2% syscall.Syscall
66 3.2% 16.4% 75 3.7% runtime.MSpan_Sweep
58 2.8% 19.2% 1842 89.7% text/template.(*state).walk
54 2.6% 21.9% 928 45.2% text/template.(*state).evalCall
51 2.5% 24.3% 53 2.6% settype
47 2.3% 26.6% 47 2.3% runtime.stringiter2
44 2.1% 28.8% 149 7.3% runtime.makeslice
40 1.9% 30.7% 223 10.9% text/template.(*state).evalField
These are profiling results after refining the code (as suggested in the answer by icza):
root@Test:~# go tool pprof app /tmp/profile501566907/cpu.pprof
Possible precedence issue with control flow operator at /usr/lib/go/pkg/tool/linux_amd64/pprof line 3008.
Welcome to pprof! For help, type 'help'.
(pprof) top10
Total: 2811 samples
137 4.9% 4.9% 442 15.7% runtime.mallocgc
126 4.5% 9.4% 999 35.5% reflect.Value.call
113 4.0% 13.4% 115 4.1% syscall.Syscall
110 3.9% 17.3% 122 4.3% runtime.MSpan_Sweep
102 3.6% 20.9% 2561 91.1% text/template.(*state).walk
74 2.6% 23.6% 337 12.0% text/template.(*state).evalField
68 2.4% 26.0% 72 2.6% settype
66 2.3% 28.3% 1279 45.5% text/template.(*state).evalCall
65 2.3% 30.6% 226 8.0% runtime.makeslice
57 2.0% 32.7% 57 2.0% runtime.stringiter2
(pprof)
There are two main reasons why the equivalent application using html/template
is slower than PHP variant.
First of all html/template
provides more functionality than the PHP. The main difference is that html/template
will automatically escape variables using correct escaping rules (HTML, JS, CSS, etc) depending on their location in the resulting HTML output (which I think is quite cool!).
Secondly html/template
rendering code heavily uses reflection and methods with variable number of arguments and they are just not as fast as statically compiled code.
Under the hood the following template
{{ .Title }}
{{ .Subtitle }}
{{ range .Posts }}
{{ .Title }}
{{ .Content }}
{{ end }}
is converted to something like
{{ .Title | html_template_htmlescaper }}
{{ .Subtitle | html_template_htmlescaper }}
{{ range .Posts }}
{{ .Title | html_template_htmlescaper }}
{{ .Content | html_template_htmlescaper }}
{{ end }}
Calling html_template_htmlescaper
using reflection in a loop kills performance.
Having said all that this micro-benchmark of html/template
shouldn't be used to decide whether to use Go or not. Once you add code to work with the database to the request handler I suspect that template rendering time will hardly be noticeable.
Also I am pretty sure that over time both Go reflection and html/template
package will become faster.
If in a real application you will find that html/template
is a bottleneck it still possible to switch to text/template
and supply it with already escaped data.
You are working with arrays and structs, both which are non-pointer types, nor are they descriptors (like slices or maps or channels). So passing them always creates a copy of the value, assigning an array value to a variable copies all the elements. This is slow and gives a huge amount of work to the GC.
Also you are utilizing only 1 CPU core. To utilize more, add this to your main()
function:
func main() {
runtime.GOMAXPROCS(runtime.NumCPU())
http.HandleFunc("/", handler)
log.Fatal(http.ListenAndServe(":8888", nil))
}
Edit: This was only the case prior to Go 1.5. Since Go 1.5 runtime.NumCPU()
is the default.
var Posts [100]Post
An array with space for 100 Post
s is allocated.
Posts[i] = Post{i, "Sample Title", "Lorem Ipsum Dolor Sit Amet"}
You create a Post
value with a composite literal, then this value is copied into the i
th element in the array. (redundant)
var p Page
This creates a variable of type Page
. It is a struct
, so its memory is allocated which also contains a field Posts [100]Post
so another array of 100
elements is allocated.
p.Posts = Posts
This copies 100
elements (a hundred structs)!
tmpl.ExecuteTemplate(w, "index.html", p)
This creates a copy of p
(which is of type Page
), so another array of 100
posts is created and elements from p
are copied, then it is passed to ExecuteTemplate()
.
And since Page.Posts
is an array, most likely when it is processed (iterated over in the template engine), a copy will be made from each element (haven't checked - not verified).
Some things to speed up your code:
func handler(w http.ResponseWriter, r *http.Request) {
type Post struct {
Id int
Title, Content string
}
Posts := make([]*Post, 100) // A slice of pointers
// Fill posts
for i := range Posts {
// Initialize pointers: just copies the address of the created struct value
Posts[i]= &Post{i, "Sample Title", "Lorem Ipsum Dolor Sit Amet"}
}
type Page struct {
Title, Subtitle string
Posts []*Post // "Just" a slice type (it's a descriptor)
}
// Create a page, only the Posts slice descriptor is copied
p := Page{"Index Page of My Super Blog", "A blog about everything", Posts}
tmpl := templates["index.html"]
// Only pass the address of p
// Although since Page.Posts is now just a slice, passing by value would also be OK
tmpl.ExecuteTemplate(w, "index.html", &p)
}
Please test this code and report back your results.
PHP isn't answering 5000 requests concurrently. The requests are being multiplexed to a handful of processes for serial execution. This makes more efficient use of both CPU and memory. 5000 concurrent connections may make sense for a message broker or similar, doing limited processing of small pieces of data, but it makes little sense for any service doing real I/O or processing. If your Go app is not behind a proxy of some type that will limit the number of concurrent requests, you will want to do so yourself, perhaps at the beginning of your handler, using a buffered channel or a wait group, a la https://blakemesdag.com/blog/2014/11/12/limiting-go-concurrency/.
html/template
is slow because it uses reflection, which isn't optimized for speed yet.
Try quicktemplate as a workaround of slow html/template
. Currently quicktemplate
is more than 20x faster than html/template
according to the benchmark from its' source code.