I have a Go binary that I'm trying to profile, and I'm getting surprising results. The code has the following (truncated) in main.go
, and the rest of the code is in the package monte
:
package main
import (
"monte"
"runtime/pprof"
)
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
func main() {
flag.Parse()
if *cpuprofile != "" {
f, err := os.Create(*cpuprofile)
if err != nil {
log.Fatal(err)
}
pprof.StartCPUProfile(f)
}
monte.ExpensiveOperation()
pprof.StopCPUProfile()
}
I build my executable with go build src/main.go
, then run it with ./main -cpuprofile=monte.prof
. When I examine the output with go tool pprof main monte.prof
, I get the following output:
(pprof) top10 --cum
Total: 346 samples
280 80.9% 80.9% 343 99.1% time.Time.Format
0 0.0% 80.9% 341 98.6% runtime.interhash
0 0.0% 80.9% 341 98.6% runtime.main
0 0.0% 80.9% 251 72.5% strconv.Unquote
13 3.8% 84.7% 31 9.0% strconv.roundShortest
11 3.2% 87.9% 18 5.2% strconv.fmtE
9 2.6% 90.5% 9 2.6% runtime.markallocated
1 0.3% 90.8% 8 2.3% math/rand.Float64
2 0.6% 91.3% 8 2.3% runtime.FixAlloc_Free
7 2.0% 93.4% 8 2.3% time.nextStdChunk
The function that has the largest cumulative time is time.Time.Format
, which seems wrong to me (shouldn't it be main
?) and there's no mention of monte
at all, despite the fact that the "expensive operation" takes about 10 seconds to complete, plenty of time for the sampler to see it. If I pass the --focus=monte
flag to go tool pprof
, no samples are shown at all. I assume I'm missing some flag somewhere; does anyone have any ideas? Thanks!
It looks like a CPU-only profiler, so if your ExpensiveOperation
takes time by doing I/O or sleep or something like that, it will be invisible. (That's the problem with "cpu" profilers.)
In terms of what the numbers mean, it looks like there were 346 total samples. It's not too surprising that the numbers are a bit squishy, depending on how the profiler works, but if it were a true stack sampler the numbers would mean this:
341/346 samples had main
and interhash
on the stack. You would expect all samples to have main on the stack, but that's the squishy part.
343/346 samples had Format
on the stack. (Why there's more of those than main
, who knows?)
251/346 samples had Unquote
on the stack. So of those 251 samples, they also probably had main
, interhash
, and Format
on the stack.
With this kind of detective work, you can slowly begin to piece together what the samples are telling you.
Of course, if you could actually see the stack samples, you wouldn't have to see very many of them before you would know exactly what's going on.
Your main
package doesn't compile and you don't provide source code for your monte
package. Therefore, we can't reproduce your results. which is the first step in debugging.
Here's some working source code and results.
package monte
func ExpensiveOperation() {
var t float64
for i := int64(0); i < 1e10; i++ {
t += 1
}
}
.
package main
import (
"flag"
"log"
"monte"
"os"
"runtime/pprof"
)
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
func main() {
flag.Parse()
if *cpuprofile != "" {
f, err := os.Create(*cpuprofile)
if err != nil {
log.Fatal(err)
}
pprof.StartCPUProfile(f)
}
monte.ExpensiveOperation()
pprof.StopCPUProfile()
}
.
(pprof) top10 --cum
Total: 1166 samples
0 0.0% 0.0% 1166 100.0% main.main
1166 100.0% 100.0% 1166 100.0% monte.ExpensiveOperation
0 0.0% 100.0% 1166 100.0% runtime.main
0 0.0% 100.0% 1166 100.0% schedunlock
UPDATE:
The code at github.com/haldean/monte
doesn't provide meaningful results. It only takes time real 0m0.506s and only takes 48 samples.
(pprof) top10 --cum
Total: 48 samples
0 0.0% 0.0% 45 93.8% main.main
0 0.0% 0.0% 45 93.8% monte.(*Scene).Evaluate
0 0.0% 0.0% 45 93.8% monte.(*Scene).Render
0 0.0% 0.0% 45 93.8% monte.(*Scene).SetColor
0 0.0% 0.0% 45 93.8% runtime.main
0 0.0% 0.0% 45 93.8% schedunlock
0 0.0% 0.0% 44 91.7% monte.(*Scene).RayCast
4 8.3% 8.3% 31 64.6% runtime.new
13 27.1% 35.4% 27 56.2% runtime.mallocgc
3 6.2% 41.7% 26 54.2% monte.(*Scene).DirectionAt