Preview

Труды Института системного программирования РАН

Расширенный поиск

Анализ актуальных ошибок в ядре Linux путем кластеризации сообщений об исправлениях в git-репозитории

https://doi.org/10.15514/ISPRAS-2023-35(3)-16

Аннотация

В средах системного программного обеспечения циркулирует огромное количество информации, поэтому крайне важно использовать эту информацию для улучшения их работы. Одной из таких систем является ядро Linux, которое не только поставляется с полностью открытым исходным кодом, но и предоставляет исчерпывающую историю о разработке в своем git-репозитории. Здесь каждое логическое изменение кода сопровождается сообщением, написанным разработчиком на естественном языке. Обрабатывая данные репозитория, мы сосредотачиваемся на коммитах с сообщениями об исправлении ошибок, поскольку анализ их текста может помочь выявить наиболее распространенные типы ошибок. Основываясь на наших предыдущих работах, в этой статье мы предлагаем использовать методы анализа данных. Для достижения наших целей мы предлагаем различные методы обработки сообщений в git-репозиториях и используем автоматизированные методы для выявления распространенных ошибок в них. Вычисляя расстояния между сообщениями об исправлении ошибок, превращая их в вектора и группируя в кластеры, мы далее можем эффективно классифицировать и выявлять наиболее часто возникающие ошибки. Наш подход применяется к нескольким важным частям ядра Linux, что позволяет понять, что происходит с ошибками в различных его подсистемах. В результате мы показываем сводку исправлений ошибок в таких частях ядра Linux, как kernel, sched, mm, net, irq, x86 и Arm64.

Об авторах

Сергей Михайлович СТАРОЛЕТОВ
АлтГТУ им. И.И. Ползунова
Россия

Кандидат физико-математических наук, доцент.



Никита Александрович СТАРОВОЙТОВ
АлтГТУ им. И.И. Ползунова
Россия

Магистрант и ассистент кафедры прикладной математики. 



Николай Андреевич ГОЛОВНЕВ
АлтГТУ им. И.И. Ползунова
Россия

Магистрант кафедры прикладной математики.



Список литературы

1. Starovoytov N., Golovnev N., Staroletov S. Towards methods to automatically identify the most common errors in Linux by analyzing git commit messages. In Proc. of the Spring/Summer Young Researchers' Colloquium on Software Engineering, 2023

2. Chou A., Yang, J. Chelf B., Hallem S., Engler D. An empirical study of operating systems errors. In Proc. of the eighteenth ACM symposium on Operating systems principles, 2001, pp. 73–88.

3. Palix N., Thomas G., Saha S., Calves C., Lawall J., Muller G. Faults in Linux: Ten years later. In Proc. of the sixteenth international conference on Architectural support for programming languages and operating systems, 2011, pp. 305–318.

4. Mutilin V., Novikov E., Khoroshilov A. Analysis of typical errors in Linux OS drivers (in Russian), Proceedings of the Institute for System Programming of the Russian Academy of Sciences, vol. 22, pp. 349–374, 2012.

5. Novikov E.M. Evolution of the Linux OS kernel (in Russian). Proceedings of the Institute for System Programming of the Russian Academy of Sciences, vol. 29, no. 2, pp. 77–96, 2017.

6. Lu L., Arpaci-Dusseau A.C., Arpaci-Dusseau R.H., Lu S. A study of Linux file system evolution. ACM Transactions on Storage (TOS), vol. 10, no. 1, pp. 1–32, 2014.

7. Tan L., Liu C., Li Z., Wang X., Zhou Y., Zhai C. Bug characteristics in open source software. Empirical software engineering, vol. 19, pp. 1665–1705, 2014.

8. Xiao G., Zheng Z., Yin B., Trivedi K.S., Du X., Cai K.-Y. An empirical study of fault triggers in the Linux operating system: An evolutionary perspective. IEEE Transactions on Reliability, vol. 68, no. 4, pp. 1356–1383, 2019.

9. Kernel.org. Bugzilla. Available at: https://bugzilla.kernel.org, accessed Sep. 14, 2023.

10. Melo J., Flesborg E., Brabrand C., Wasowski A. A quantitative analysis of variability warnings in Linux. In Proc. of the Tenth International Workshop on Variability Modelling of Software-intensive Systems, 2016, pp. 3–8.

11. Hoang T., Lawall J., Tian Y., Oentaryo R.J., Lo D. PatchNet: Hierarchical deep learning-based stable patch identification for the Linux kernel. IEEE Transactions on Software Engineering, vol. 47, no. 11, pp. 2471–2486, 2019.

12. Tian Y., Lawall J., Lo D. Identifying Linux bug fixing patches. In Proc. of 2012 34th international conference on software engineering (ICSE). IEEE, 2012, pp. 386–396.

13. Acher M., Martin H., Pereira J.A., Blouin A., Khelladi D.E., Jezequel J.-M. Learning from thousands of build failures of Linux kernel configurations. Ph.D. dissertation, Inria; IRISA, 2019.

14. Levenshtein V.I. Binary codes with correction of dropouts, insertions and character substitutions (in Russian). Reports of the Academy of Sciences, vol. 163, no. 4. Russian Academy of Sciences, 1965, pp. 845–848.

15. Staroletov S. M. Researching the most common bugs in the Linux kernel by analysing commits in the git repository (in Russian). System Administrator, vol. 4(197), pp. 73–77, 2019 Available at: http://samag.ru/archive/article/3859, accessed Sep. 14, 2023.

16. Staroletov S. A survey of most common errors in Linux kernel. SYRCoSE Poster session, 2017.

17. Hann M. Towards an algorithmic methodology of lemmatization. Bulletin Association for Literary and Linguistic Computing, vol. 3, no. 2, pp. 140–150, 1975.

18. Categorial Variation Database (version 2.1). Available at: https://github.com/nizarhabash1/catvar, accessed Sep. 14, 2023.

19. Manning C.D., Surdeanu M., Bauer J., Finkel J. R., Bethard S., McClosky D. The Stanford CoreNLP natural language processing toolkit. In Proc. of 52nd annual meeting of the association for computational linguistics: system demonstrations, 2014, pp. 55–60.

20. Class StanfordCoreNLP. Available at: https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/pipeline/StanfordCoreNLP.html, accessed Sep. 14, 2023.

21. Salton G., Fox E. A., Wu H. Extended boolean information retrieval Communications of the ACM, vol. 26, no. 11, pp. 1022– 1036, 1983.

22. H. Steinhaus et al. Sur la division des corps mate ́riels en parties. Bull. Acad. Polon. Sci, vol. 1, no. 804, p. 801, 1956.

23. Ester M., Kriegel H.-P., Sander J., Xu X. et al. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD, vol. 96, no. 34, 1996, pp. 226–231.

24. Ward J.H. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, vol. 58, no. 301, pp. 236–244, 1963.

25. McKenney P.E. A Tour Through TREE_RCU's Grace-Period Memory Ordering. Available at: https://www.kernel.org/doc/html/latest/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html, accessed Sep. 14, 2023.

26. McKenney P.E., Fernandes J., Boyd-Wickizer S., Walpole J. RCU usage in the Linux kernel: Eighteen years later. ACM SIGOPS Operating Systems Review, vol. 54, no 1, pp. 47-63, 2020.

27. Linux kernel guide. Writing kernel-doc comments. Available at: https://docs.kernel.org/doc-guide/kernel-doc.html, accessed Sep. 14, 2023.

28. Staroletov. S., Chudov R. An anomaly detection and network filtering system for Linux based on Kohonen maps and variable-order Markov chains. In Proc. Conference of Open Innovations Association, vol. 32, pp. 280-290, 2022. – EDN NNASCK.

29. Linux kernel guide. Livepatch. Available at: https://www.kernel.org/doc/html/latest/livepatch/livepatch.html, accessed Sep. 14, 2023.

30. Linux kernel guide. Ftrace - Function Tracer. Available at: https://www.kernel.org/doc/html/latest/trace/ftrace.html, accessed Sep. 14, 2023.

31. Linux manual page. timer_create(2). Available at: https://man7.org/linux/man-pages/man2/timer_create.2.html, accessed Sep. 14, 2023.

32. Linux kernel guide. ktime accessors. Available at: https://www.kernel.org/doc/html/latest/core-api/timekeeping.html, accessed Sep. 14, 2023.

33. S. Boyd. Timekeeping in the Linux kernel. Available at: https://elinux.org/images/0/0e/Timekeeping_in_the_Linux_Kernel_0.pdf, accessed Sep. 14, 2023.

34. Linux kernel guide. Runtime locking correctness validator. Available at: https://www.kernel.org/doc/html/latest/locking/lockdep-design.html, accessed Sep. 14, 2023.

35. Linux kernel guide. Checkpatch. Available at: https://www.kernel.org/doc/html/latest/dev-tools/checkpatch.html, accessed Sep. 14, 2023.

36. Linux kernel guide. BPF (Berkeley Packet Filter) Documentation. Available at: https://www.kernel.org/doc/html/latest/bpf/index.html, accessed Sep. 14, 2023.

37. Linux kernel guide. R. Davoli, M. Di Stefano, 2019. Berkeley Packet Filter: theory, practice and perspectives (Doctoral dissertation, Master’s thesis, Universita di Bologna). Available at: https://amslaurea.unibo.it/19622/1/berkeleypacketfilter_distefano.pdf, accessed Sep. 14, 2023.

38. Google. Syzkaller. Available at: https://github.com/google/syzkaller, accessed Sep. 14, 2023.

39. Linux kernel guide. RCU Torture Test Operation. Available at: https://www.kernel.org/doc/html/latest/RCU/torture.html, accessed Sep. 14, 2023.

40. Lawall J., Muller G. Automating Program Transformation with Coccinelle. In Proc. of NASA Formal Methods Symposium, pp. 71-87, 2022.

41. Serrano L., Nguyen V.A., Thung F., Jiang L., Lo D., Lawall J., Muller G. SPINFER: Inferring Semantic Patches for the Linux Kernel. In Proc. of 2020 USENIX Annual Technical Conference (USENIX ATC 20), pp. 235-248, 2020.

42. Linux kernel guide. Kernel Samepage Merging. Available at: https://www.kernel.org/doc/html/latest/admin-guide/mm/ksm.html, accessed Sep. 14, 2023.

43. Lameter C. Slab allocators in the Linux Kernel: SLAB, SLOB, SLUB. LinuxCon/Düsseldorf, 2014. Available at: https://events.static.linuxfound.org/sites/events/files/slides/slaballocators.pdf, accessed Sep. 14, 2023.

44. Linux kernel guide. Kernel Memory Leak Detector. Available at: https://www.kernel.org/doc/html/latest/dev-tools/kmemleak.html, accessed Sep. 14, 2023.

45. Linux kernel guide. Kernel Electric-Fence (KFENCE). Available at: https://docs.kernel.org/dev-tools/kfence.html, accessed Sep. 14, 2023.

46. Linux kernel guide. Transparent Hugepage Support. Available at: https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html, accessed Sep. 14, 2023.

47. Linux kernel guide. DAMON: Data Access MONitor. Available at: https://docs.kernel.org/mm/damon/index.html, accessed Sep. 14, 2023.

48. Linux kernel guide. Concepts overview. Available at: https://www.kernel.org/doc/html/latest/admin-guide/mm/concepts.html, accessed Sep. 14, 2023.

49. Corbet J. Clarifying memory management with page folios. Available at: https://lwn.net/Articles/849538/, accessed Sep. 14, 2023.

50. Linux kernel guide. The Kernel Address Sanitizer (KASAN). Available at: https://www.kernel.org/doc/html/latest/dev-tools/kasan.html, accessed Sep. 14, 2023.

51. Google. Kernel sanitizers. Available at: https://github.com/google/kernel-sanitizers, accessed Sep. 14, 2023.

52. Linux kernel guide. Deadline Task Scheduling. Available at: https://www.kernel.org/doc/html/latest/scheduler/sched-deadline.html, accessed Sep. 14, 2023.

53. Linux kernel guide. CFS Scheduler. Available at: https://www.kernel.org/doc/html/latest/scheduler/sched-design-CFS.html, accessed Sep. 14, 2023.

54. Linux kernel guide. Scheduler Domains. Available at: https://www.kernel.org/doc/html/latest/scheduler/sched-domains.html, accessed Sep. 14, 2023.

55. Menage. P. Linux kernel guide. Control Groups. Available at: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/cgroups.html, accessed Sep. 14, 2023.

56. Linux kernel guide. Real-Time group scheduling. Available at: https://www.kernel.org/doc/html/latest/scheduler/sched-rt-group.html, accessed Sep. 14, 2023.

57. Linux kernel guide. rfkill - RF kill switch support. Available at: https://docs.kernel.org/driver-api/rfkill.html, accessed Sep. 14, 2023.

58. Linux kernel guide. Using GPIO Lines in Linux. Available at: https://www.kernel.org/doc/html/latest/driver-api/gpio/using-gpio.html, accessed Sep. 14, 2023.

59. Bluetooth. Core Specification 5.4, 2023. Available at: https://www.bluetooth.com/specifications/specs/core-specification-5-4/, accessed Sep. 14, 2023.

60. Linux kernel guide. Distributed Switch Architecture (DSA). Available at: https://docs.kernel.org/next/networking/dsa/dsa.html, accessed Sep. 14, 2023.

61. Linux kernel guide. Linux generic IRQ handling. Available at: https://www.kernel.org/doc/html/latest/core-api/genericirq.html, accessed Sep. 14, 2023.

62. Barham P., Dragovic B., Fraser, et al. Xen and the art of virtualization. ACM SIGOPS operating systems review, vol. 37 no. 5, 164-177, 2013.

63. Arpaci-Dusseau R.H., Arpaci-Dusseau A.C. Operating systems: Three easy pieces. Arpaci-Dusseau Books, LLC, 2018. Available at: https://pages.cs.wisc.edu/~remzi/OSTEP/vm-tlbs.pdf, accessed Sep. 14, 2023.

64. Li S.W., Li X., Gu R., Nieh J. Hui J.Z. A secure and formally verified Linux KVM hypervisor. In Proc. of 2021 IEEE Symposium on Security and Privacy (SP), pp. 1782-1799, 2021.

65. Constantinescu C. AMD EPYC™ 7002 series–a processor with improved soft error resilience. In Proc. of 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S), pp. 33-36, 2021.

66. Linux. Work-arounds for many known PCI hardware bugs. Available at: https://elixir.bootlin.com/linux/latest/source/drivers/pci/quirks.c, accessed Sep. 14, 2023.

67. Linaro. The Devicetree Specification. Current Release. Available at: https://www.devicetree.org/specifications/, accessed Sep. 14, 2023.

68. Linux kernel guide. Linux and the Devicetree. Available at: https://www.kernel.org/doc/html/latest/devicetree/usage-model.html, accessed Sep. 14, 2023.

69. Linux manual page. vdso(7). Available at: https://man7.org/linux/man-pages/man7/vdso.7.html, accessed Sep. 14, 2023.

70. Linux kernel guide. Scalable Vector Extension support for AArch64 Linux. Available at: https://www.kernel.org/doc/Documentation/arm64/sve.txt, accessed Sep. 14, 2023.


Рецензия

Для цитирования:


СТАРОЛЕТОВ С.М., СТАРОВОЙТОВ Н.А., ГОЛОВНЕВ Н.А. Анализ актуальных ошибок в ядре Linux путем кластеризации сообщений об исправлениях в git-репозитории. Труды Института системного программирования РАН. 2023;35(3):215-242. https://doi.org/10.15514/ISPRAS-2023-35(3)-16

For citation:


STAROLETOV S.M., STAROVOYTOV N.A., GOLOVNEV N.A. Analyzing Hot Bugs in the Linux Kernel by Clustering Fixing Commit Messages. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2023;35(3):215-242. https://doi.org/10.15514/ISPRAS-2023-35(3)-16



Creative Commons License
Контент доступен под лицензией Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)