I have a toy web app which is very cpu intensive
func PerfServiceHandler(w http.ResponseWriter, req *http.Request)
{
start := time.Now()
w.Header().Set("Content-Type", "application/json")
x := 0
for i := 0; i < 200000000; i++ {
x = x + 1
x = x - 1
}
elapsed := time.Since(start)
w.Write([]byte(fmt.Sprintf("Time Elapsed %s", elapsed)))
}
func main()
{
http.HandleFunc("/perf", PerfServiceHandler)
http.ListenAndServe(":3000", nil)
}
The above function takes about 120 ms to execute. But when I do a load test this app with 500 concurrent users(siege -t30s -i -v -c500 http://localhost:3000/perf) the results I got
Can someone answer my queries below:-
Environment:-
Go - go1.4.1 linux/amd64
OS - Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux
Processor - 2.6Ghz (Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz)
RAM - 64 GB
OS Parameters -
nproc - 32
cat /proc/sys/kernel/threads-max - 1031126
ulimit -u - 515563
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 515563
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 515563
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Multiple goroutines can correspond to a single os thread. The design is described here: https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit, which references this paper: http://supertech.csail.mit.edu/papers/steal.pdf.
On to the questions:
Even when 500 concurrent requests arrive at the server the number of OS threads were still stuck at 35 OS threads [...] Can someone explain me this behaviour?
Since you set GOMAXPROCS to the # of CPUs go will only run that many goroutines at a time.
One thing that may be a little confusing is that goroutines aren't always running (sometimes they are "busy"). For example if you read a file, while the OS is doing that work the goroutine is busy and the scheduler will pick up another goroutine to run (assuming there is one). Once the file read is complete that goroutine goes back into the list of "runnable" goroutines.
The creation of OS level threads is handled by the scheduler and there are additional complexities around system-level calls. (Sometimes you need a real, dedicated thread. See: LockOSThread) But you shouldn't expect a ton of threads.
Can the no. of OS threads be increased somehow (from OS or from GOlang)?
I think using LockOSThread
may result in the creation of new threads, but it won't matter:
Will this improve the performance if no. of OS threads are increased?
No. Your CPU is fundamentally limited in how many things it can do at once. Goroutines work because it turns out most operations are IO bound in some way, but if you are truly doing something CPU bound, throwing more threads at the problem won't help. In fact it will probably make it worse, since there is overhead involved in switching between threads.
In other words Go is making the right decision here.
Can someone suggest some other ways of optimizing this app?
for i := 0; i < 200000000; i++ {
x = x + 1
x = x - 1
}
I take it you wrote this code just to make the CPU do a lot of work? What does the actual code look like?
Your best bet will be finding a way to optimize that code so it needs less CPU time. If that's not possible (its already highly optimized), then you will need to add more computers / CPUs to the mix. Get a better computer, or more of them.
For multiple computers you can put a load balancer in front of all your machines and that should scale pretty easily.
You may also benefit by pulling this work off of the webserver and moving it to some backend system. Consider using a work queue.