We are aware that using pointers for passing parameters can avoid data copy, which will benefit the performance. Nevertheless, there are always some edge cases we might need concern.
Let’s take this as an example:
|
|
Which vector addition runs faster?
Intuitively, we might consider that vec.addp
is faster than vec.addv
because its parameter u
uses pointer form. There should be no copies
of the data, whereas vec.addv
involves data copy both when passing and
returning.
However, if we do a micro-benchmark:
|
|
And run as follows:
|
|
The benchstat
will give you the following result:
|
|
How is this happening?
Inlining Optimization
This is all because of compiler optimization, and mostly because of inlining.
If we disable inline from the addv
and addp
:
|
|
Then run the benchmark and compare the perf with the previous one:
|
|
The inline optimization transforms the vec.addv
:
|
|
to a direct assign statement:
|
|
And for the vec.addp
’s case:
|
|
to a direct manipulation:
|
|
Addressing Modes
If we check the compiled assembly, the reason reveals quickly:
|
|
The dumped assumbly code is as follows:
|
|
The addv
implementation uses values from the previous stack frame and
writes the result directly to the return; whereas addp
needs MOVQ that
copies the parameter to different registers (e.g., copy pointers to AX and CX),
then write back when returning. Therefore, with inline disabled, the reason that addv
is slower than addp
is caused by different memory access pattern.
Conclusion
Can pass by value always faster than pass by pointer? We could do a further test. But this time, we need use a generator to generate all possible cases. Here is how we could do it:
|
|
If we generate our test code and perform the same benchmark procedure again:
|
|
We could even further try a version that disables inline:
|
|
Eventually, we will endup with the following results:
TLDR: The above figure basically demonstrates when should you pass-by-value or pass-by-pointer. If you are certain that your code won’t produce any escape variables, and the size of your argument is smaller than 4*4 = 16 bytes, then you should go for pass-by-value; otherwise, you should keep using pointers.
Further Reading Suggestions
- Changkun Ou. Conduct Reliable Benchmarking in Go. March 26, 2020. https://golang.design/s/gobench
- Dave Cheney. Mid-stack inlining in Go. May 2, 2020. https://dave.cheney.net/2020/05/02/mid-stack-inlining-in-go
- Dave Cheney. Inlining optimisations in Go. April 25, 2020. https://dave.cheney.net/2020/04/25/inlining-optimisations-in-go
- MOVSD. Move or Merge Scalar Double-Precision Floating-Point Value. Last access: 2020-10-27. https://www.felixcloutier.com/x86/movsd
- ADDSD. Add Scalar Double-Precision Floating-Point Values. Last access: 2020-10-27. https://www.felixcloutier.com/x86/addsd
- MOVEQ. Move Quadword. Last access: 2020-10-27. https://www.felixcloutier.com/x86/movq