Please see the following pprof session. In the treesort.add, line 42, there's an int comparison. I think it accounts for 64% of all cpu time. In disasm the operation is "MOVQ 0x30(SP), DX". Why is it so slow?
File: treesort_bench.test.exe
Type: cpu
Time: Sep 7, 2018 at 3:15pm (EDT)
Duration: 2.60s, Total samples = 2.43s (93.44%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 10
Showing nodes accounting for 2.41s, 99.18% of 2.43s total
Dropped 2 nodes (cum <= 0.01s)
flat flat% sum% cum cum%
2.40s 98.77% 98.77% 2.42s 99.59% gopl.io/ch4/treesort.add
0.01s 0.41% 99.18% 0.02s 0.82% runtime.mallocgc
0 0% 99.18% 0.26s 10.70% gopl.io/ch4/treesort.Sort
0 0% 99.18% 0.25s 10.29% gopl.io/ch4/treesort_bench.BenchmarkSort
0 0% 99.18% 0.26s 10.70% gopl.io/ch4/treesort_bench.run
0 0% 99.18% 0.02s 0.82% runtime.newobject
0 0% 99.18% 0.22s 9.05% testing.(*B).launch
0 0% 99.18% 0.02s 0.82% testing.(*B).run1.func1
0 0% 99.18% 0.25s 10.29% testing.(*B).runN
(pprof) list add
Total: 2.43s
ROUTINE ======================== gopl.io/ch4/treesort.add in go\src\gopl.io\ch4\treesort\sort.go
2.40s 4.45s (flat, cum) 183.13% of Total
. . 30: values = appendValues(values, t.right)
. . 31: }
. . 32: return values
. . 33:}
. . 34:
90ms 90ms 35:func add(t *tree, value int) *tree {
. . 36: if t == nil {
. . 37: // Equivalent to return &tree{value: value}.
. 20ms 38: t = new(tree)
. . 39: t.value = value
. . 40: return t
. . 41: }
1.55s 1.55s 42: flag := value < t.value
. . 43: if flag {
. 240ms 44: t.left = add(t.left, value)
. . 45: } else {
630ms 2.42s 46: t.right = add(t.right, value)
. . 47: }
130ms 130ms 48: return t
. . 49:}
. . 50:
. . 51://!-
(pprof) disasm add
Total: 2.43s
ROUTINE ======================== gopl.io/ch4/treesort.add
2.40s 5.08s (flat, cum) 209.05% of Total
50ms 50ms 4fcb66: MOVQ 0(AX), CX ;gopl.io/ch4/treesort.add sort.go:42
1.48s 1.48s 4fcb69: MOVQ 0x30(SP), DX
20ms 20ms 4fcb6e: CMPQ CX, DX
. . 4fcb71: JGE 0x4fcbbb ;sort.go:43
Why is “MOVQ 0x30(SP), DX” slow?
You have provided insufficient evidence to show that the instruction is slow.
MOVQ — Move Quadword - is an instruction from the Intel 64 and IA-32 architectures instruction set. See Intel® 64 and IA-32 Architectures Software Developer Manuals
The MOVQ 0x30(SP), DX
instruction moves the 8 bytes of a tree.value
variable from memory to the DX register.
Performance measurement, like any other scientific endeavor, relies on reproducible results. You have provided insufficient information to reproduce your results. For example, where is the code for treesort_bench.test.exe
, what processor, what memory, what operating system?.
I've tried, but I'm unable to reproduce your results. Add your code and the steps to reproduce your results to your question.