Guess which add implementation has better performance, vec1 or vec2?
|
|
The answer is pass-by-value is faster. The reason is inlining optimization, not escape analysis as many might guess. The pointer implementation returns a pointer solely to support method chaining — the returned pointer is already on the stack, so there’s no escape. Test results:
|
|
A practical example: changing from pass-by-pointer to pass-by-value brought a 6–8% performance improvement in a simple rasterizer (see https://github.com/changkun/ddd/commit/60fba104c574f54e11ffaedba7eaa91c8401bce4).
Furthermore, we might ask: is pass-by-value still faster without inlining? We can try adding the //go:noinline compiler directive to both add methods. The results without inlining (old) compared with inlining (new) are:
|
|
So the next question is: without inlining, why is the pointer version faster? Read more at https://changkun.de/blog/posts/pointers-might-not-be-ideal-for-parameters/
猜猜 vec1 和 vec2 实现的 add 哪个性能更好?
|
|
答案是传值更快。原因是内联优化,而非很多人猜测的逃逸。原因是指针实现的方式虽然返回了指针,但却只是为了能够支持链式调用而设计的,返回的指针本身就已经在栈上,不存在逃逸一说。测试结果:
|
|
一个实际的例子是,将传指针改为传值方式在一个简单的光栅器中带来了 6-8% 的性能提升(见 https://github.com/changkun/ddd/commit/60fba104c574f54e11ffaedba7eaa91c8401bce4)。
除此之外,我们可能会问,如果没有内联的话,还是传值更快么?我们可以试着给两个加法方法增加 //go:noinline 编译标记,最终的结果(old)跟有内联的结果(new)对比如下所示:
|
|
那么问题又来了,在没有内联的情况下,为什么指针更快呢?请阅读 https://changkun.de/blog/posts/pointers-might-not-be-ideal-for-parameters/