Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Dynamic loader optimization for ARM

https://doi.org/10.15514/ISPRAS-2016-28(1)-4

Abstract

The paper discusses an optimization approach for external calls in position-independent code that is based on loading the callee address immediately at the call site from the Global Offset Table (GOT), avoiding the use of the Procedure Linkage Table (PLT). Normally the Linux toolchain creates the PLT both in the main executable (which comprises position-dependent code and has to rely on the PLT mechanism to make external calls) and in shared libraries, where the PLT serves to implement lazy binding of dynamic symbols, but is not required otherwise. However, calls via the PLT have some overhead due to an extra jump instruction and poorer instruction cache locality. On some architectures, binary interface of PLT calls constrains compiler optimization at the call site. It is possible to avoid the overhead of PLT calls by loading the callee address from the GOT at the call site and performing an indirect call, although it prevents lazy symbol resolution and may cause increase in code size. We implement this code generation variant in GCC compiler for x86 and ARM architectures. On ARM, loading the callee address from the GOT at call site normally needs a complex sequence with three load instructions. To improve that, we propose new relocation types that allow to build a PC-relative address of a given GOT slot with a pair of movt, movw instructions, and implement these relocation types in GCC and Binutils (assembler and linker) for both ARM and Thumb-2 modes. Our evaluation results show that proposed optimization yields performance improvements on both x86 (up to 12% improvement with Clang/LLVM built with multiple shared libraries, on big translation units) and ARM (up to 7% improvement with SQLite, average over several tests), even though code size on ARM also grows by 13-15%.

About the Authors

E. A. Kudryashov
Institute for System Programming of the Russian Academy of Sciences
Russian Federation


D. M. Melnik
Institute for System Programming of the Russian Academy of Sciences
Russian Federation


A. V. Monakov
Institute for System Programming of the Russian Academy of Sciences
Russian Federation


References

1. J. Levine. Linkers and Loaders. Morgan-Kauffman, p. 256, October 1999.

2. J. Greenhalgh. [AArch64] Tighten direct call pattern to repair -fno-plt. https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00152.html.

3. A. Monakov. PIC calls without PLT, generic implementation. https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00225.html.

4. D. Melnik. Developing Interblock Combine Pass in GCC. GNU Tools Cauldron 2013. https://gcc.gnu.org/wiki/cauldron2013.

5. ARM Architecture Reference Manual, ARMv7-A and ARMv7-R edition, Section A2.8.6.60, 04 June 2009.

6. SQLite Website. http://www.sqlite.org/about.html.


Review

For citations:


Kudryashov E.A., Melnik D.M., Monakov A.V. Dynamic loader optimization for ARM. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2016;28(1):63-80. (In Russ.) https://doi.org/10.15514/ISPRAS-2016-28(1)-4



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)