Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Evaluating a number of cache coherency misses based on a statistical model

https://doi.org/10.15514/ISPRAS-2015-27(4)-3

Abstract

False cache sharing happens when different parallel execution threads update the variables that reside in the same cache line. We suggest in this paper to evaluate the number of cache misses using code instrumentation and post-mortem trace analysis: the probability of the false sharing cache miss (defined as a memory write issued by one thread between two consecutive memory accesses issued by another thread) is calculated based on the gathered event trace, where each event is a memory access with a timestamp. The tracer tool is implemented as a GCC compiler pass, whereas the post-mortem analyzer is a separate application that gets the trace collection gathered on a sample application input data as its own input. Program slowdown in our approach is ~10 times, and it is dependent on a sampling probability but it does not depend on a cache line size.

About the Author

Evgeny Velesevich
ISP RAS
Russian Federation


References

1. Erik Berg, Håkan Zeffer and Erik Hagersten, A Statistical Multiprocessor Cache Model, In Proceedings of the 2006 IEEE International Symposium on Performance Analysis of System and Software, Austin, Texas, USA, March 2006

2. E. Berg and E. Hagersten. StatCache: A probabilistic approach to efficient and accurate data locality analysis, Technical report 2003-058, Department of Information Technology,Uppsala University, November 2003.

3. E. Berg and E. Hagersten. StatCache: A probabilistic approach to efficient and accurate data locality analysis. In Proceedings of International Symposium on Performance Analysis of Systems And Software, March 2004

4. S. R. Goldschmidt and J. L. Hennessy. The accuracy of trace-driven simulations of multiprocessors. In SIGMETRICS ’93, pages 146–157. ACM Press, 1993.

5. J. Mellor-Crummey, R. Fowler, and D. Whalley. Tools for Application-Oriented Performance Tuning. In Proceedings of 15th ACM International Conference on Supercomputing, Italy, June 2001.

6. R.A. Uhlig and T.N. Mudge, Trace-driven memory simulation: A survey, ACM Computing Surveys, 29 (2), 1997, 128–170.

7. Stunkel, C. and Fuchs, W. TRAPEDS: producing traces for multicomputers via execution-driven simulation. In Proceedings of the 1989 SIGMETRICS Conference on Measurement and Modeling of Computer Systems, Berkeley, CA, ACM, 70-78, 1989.

8. F. Rawson. Mempower: A simple memory power analysis tool set. Technical report, IBM Austin Research Laboratory, 2004

9. X. Gao, B. Simon, and A. Snavely, ALITER: An Asynchronous Lightweight Instrumentation Tool for Event Recording, Workshop on Binary Instrumentation and Applications, St. Louis, Mo. Sept. 2005.

10. Erik Berg, Methods for Run Time Analysis of Data Locality, Licentiate Thesis 2003-015, Department of Information Technology, Uppsala University, December 2003.

11. GCC, the GNU Compiler Collection. https://gcc.gnu.org.


Review

For citations:


Velesevich E. Evaluating a number of cache coherency misses based on a statistical model. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2015;27(4):39-48. (In Russ.) https://doi.org/10.15514/ISPRAS-2015-27(4)-3



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)