Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Generating and Debugging Java Code Using LLMs Based on Associative Recurrent Memory

https://doi.org/10.15514/ISPRAS-2025-37(5)-13

Abstract

Automatic code generation by large language models (LLMs) has achieved significant success, yet it still faces challenges when dealing with complex and large codebases, especially in languages like Java. The limitations of LLM context windows and the complexity of debugging generated code are key obstacles. This paper presents an approach aimed at improving Java code generation and debugging. We propose using the Associative Recurrent Memory Transformer (ARMT) model, which extends the context window and has enhanced memory capabilities, to address two tasks: 1) selecting the most relevant snippets from the existing codebase for generating new code; 2) selecting the most significant parts of stack traces and runtime data for iterative debugging. This approach is integrated with an iterative debugging loop, embodied in our developing system "JavaCapsule" (inspired by PyCapsule for Python), which includes compilation and test execution in a controlled Docker environment using Gradle. It is expected that the proposed method will enhance the accuracy and relevance of generated Java code, particularly in the context of large projects, and improve the automated debugging process. Such benchmarks like JavaBench further underscore the need for such focused advancements. This paper is an output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE University).

About the Authors

Vladimir Igorevich VASILEVSKIY
HSE University
Russian Federation

A research assistant at the Cloud and Mobile Technologies Laboratory of the Faculty of Computer Science, HSE University. His research interests include large language models, code generation and debugging, long sequence processing, and compilers.



Dmitry Vladimirovich ALEXANDROV
HSE University
Russian Federation

Professor in the Department of Software Engineering, Faculty of Computer Science, National Research University “Higher School of Economics”. He is also the Head of the Research and Educational Laboratory of Cloud and Mobile Technologies. His research interests include methods and technologies of artificial intelligence, machine learning and data analysis, iOS development, mobile application development, software development, indoor navigation, databases, game development.



References

1. Cao J., Chen Z., Wu J., Cheung S., Xu C. JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models. arXiv preprint arXiv:2406.12902, 2024.

2. Adnan M., Xu Z., Kuhn C. C. N. Large Language Model Guided Self-Debugging Code Generation. arXiv preprint arXiv:2502.02928, 2025.

3. Zhong L., Wang Z., Shang J. LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step. arXiv preprint arXiv:2402.16906, 2024.

4. Bulatov A., Kuratov Y., Burtsev M. S. Recurrent memory transformer. Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 11079-11091.

5. Rodkin I., Kuratov Y., Bulatov A., Burtsev M. Associative Recurrent Memory Transformer. In Proc. of the ICML 2024 Next Generation of Sequence Modeling Architectures Workshop, 2024.

6. Kuratov Y., Bulatov A., Anokhin P., Rodkin I., Sorokin D., Sorokin A., Burtsev M. BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack. arXiv preprint arXiv:2406.10149, 2024.

7. Chen M., Tworek J., Jun H., Yuan Q., Pinto H. P. D. O., Kaplan J., ... Brockman G. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.

8. Li R., Allal L. B., Zi Y., Muennighoff N., Kocetkov D., Mou C., ... Li J. Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161, 2023.

9. Hui B., Yang J., Cui Z., Yang J., Liu D., Zhang L., ... Lin J. Qwen2. 5-Coder Technical Report. arXiv preprint arXiv:2409.12186, 2024.

10. Gu A., Dao T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv preprint arXiv:2312.00752, 2023.

11. Peng B., Alcaide E., Anthony Q., Albalak A., Arcadinho S., Cao H., ... Zhu R. J. RWKV: Reinventing RNNs for the Transformer Era. arXiv preprint arXiv:2305.13048, 2023.

12. Lewis P., Perez E., Piktus A., Petroni F., Karpukhin V., Goyal N., ... Kiela D. Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 9459-9474.

13. Ren S., Zhou D., Zhang S., Liu S., Chen Y., Sun H., ... Liu Y. CodeBLEU: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297, 2020.


Supplementary files

Review

For citations:


VASILEVSKIY V.I., ALEXANDROV D.V. Generating and Debugging Java Code Using LLMs Based on Associative Recurrent Memory. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(5):173-182. https://doi.org/10.15514/ISPRAS-2025-37(5)-13



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)