I have a function that I would like to provide an assembly implementation for on amd64
architecture. For the sake of discussion let's just suppose it's an Add
function, but it's actually more complicated than this. I have the assembly version working but my question concerns getting the godoc to display correctly. I have a feeling this is currenty impossible, but I wanted to seek advice.
Some more details:
BMI2
) therefore can only be used following a CPUID
capability check.The implementation is structured like this gist. At a high level:
amd64
case) the function is defined by delegating to addGeneric
.amd64
case the function is actually a variable, initially set to addGeneric
but replaced by addAsm
in the init
function if a cpuid
check passes.This approach works. However the godoc output is crappy because in the amd64
case the function is actually a variable. Note godoc appears to be picking up the same build tags as the machine it's running on. I'm not sure what godoc.org
would do.
Alternatives considered:
Add
function delegates to addImpl
. Then we pull some similar trick to replace addImpl
in the amd64
case. The problem with this is (in my experiments) Go doesn't seem to be able to inline the call, and the assembly is now wrapped in two function calls. Since the assembly is so small already this has a noticable impact on performance.amd64
case we define a plain function Add
that has the useAsm
check inside it, and calls one of addGeneric
and addAsm
depending on the result. This would have an even worse impact on performance.So I guess the questions are:
See math.Sqrt for an example of how to do this.
To handle the cpuid check, set a package variable in init()
and conditionally jump based on that variable in the assembly implementation.