Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Automatic Code Review Generation: Instruction Evolution and Intelligent Filtering

https://doi.org/10.15514/ISPRAS-2025-37(4)-22

Abstract

Code review is essential for software quality but labor-intensive in distributed teams. Current automated comment generation systems often rely on evaluation metrics focused on textual similarity. These metrics fail to capture the core goals of code review, such as identifying bugs, security flaws, and improving code reliability. Semantically equivalent comments can receive low scores if worded differently, and inaccurate suggestions can create confusion for developers. This work aims to develop an automated code review generator focused on producing highly relevant and applicable feedback for code changes. The approach leverages Large Language Models, moving beyond basic generation. The core methodology involves the systematic design and incremental application of sophisticated prompt engineering strategies. Key strategies include step-by-step reasoning instructions, providing the model with relevant examples (few-shot learning), enforcing structured output formats, and expanding contextual understanding. Crucially, a dedicated intelligent filtering stage is introduced: a LLM-as-a-Judge technique acts as an evaluator to rigorously rank generated comments and filter out irrelevant, redundant, or misleading suggestions before presenting results. The approach was implemented and tested using the Qwen/Qwen2.5-Coder-32B-Instruct model. Evaluation by original code authors demonstrated significant improvements. The optimal prompt strategy yielded a 2.5 times increase in the proportion of applicable reviews (reaching 37%) and a 1.6 times increase in good comments (reaching 61%) compared to a baseline. Providing examples enhanced comment quality, and the evaluator filter proved highly effective in boosting output precision. These results represent a substantial advance towards generating genuinely useful, actionable feedback. The approach significantly enhances the practical utility and user experience of automated code review tools for software developers by prioritizing relevance and applicability.

About the Author

Vladimir Vladimirovich KACHANOV
Ivannikov Institute for System Programming of the Russian Academy of Sciences
Russian Federation

Postgraduate student of the Department of Intelligent Systems at MIPT (National Research University), programmer of the Department of Systems Programming at the Institute of Systems Programming since 2021. Research interests: software engineering, machine learning, natural language processing.



References

1. Siow, J. K., Gao, C., Fan, L., Chen, S., & Liu, Y. (2020). Core: Automating review recommendation for code changes. 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 284-295.

2. Tufano, R., Pascarella, L., Tufano, M., Poshyvanyk, D., & Bavota, G. (2021). Towards automating code review activities. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 163-174.

3. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140), 2020, pp. 1-67.

4. Tufano, R., Masiero, S., Mastropaolo, A., Pascarella, L., Poshyvanyk, D., Bavota, G. (2022). Using pre-trained models to boost code review automation. Proceedings of the 44th international conference on software engineering pp. 2291-2302.

5. Li, Z., Lu, S., Guo, D., Duan, N., Jannu, S., Jenks, G., Sundaresan, N. (2022). Automating code review activities by large-scale pre-training. Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1035-1047.

6. Li, L., Yang, L., Jiang, H., Yan, J., Luo, T., Hua, Z., Zuo, C. (2022). AUGER: automatically generating review comments with pre-training models. Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1009-1021.

7. Hong, Y., Tantithamthavorn, C., Thongtanunam, P., Aleti, A. (2022). Commentfinder: a simpler, faster, more accurate code review comments recommendation. Proceedings of the 30th ACM joint European software engineering conference and symposium on the foundations of software engineering, pp. 507-519.

8. Liu, C., Lin, H. Y., Thongtanunam, P. (2025). Too noisy to learn: Enhancing data quality for code review comment generation. In 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR), pp. 236-248.

9. Yu, Y., Zhang, L., Rong, G., Shen, H., Zhang, J., Yan, H., Tian, Z. (2024). Distilling Desired Comments for Enhanced Code Review with Large Language Models. Доступно по ссылке: https://arxiv.org/abs/2412.20340 .

10. Lu, J., Li, X., Hua, Z., Yu, L., Cheng, S., Yang, L., Zuo, C. (2025,). Deepcrceval: Revisiting the evaluation of code review comment generation. International Conference on Fundamental Approaches to Software Engineering, pp. 43-64.

11. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, vol, 35, pp. 24824-24837.

12. Sun, T., Xu, J., Li, Y., Yan, Z., Zhang, G., Xie, L., Sui, K. (2025). BitsAI-CR: Automated Code Review via LLM in Practice. Доступно по ссылке: https://arxiv.org/abs/2501.15134 .

13. Hui B, Yang J, Cui Z, Yang J, Liu D, Zhang L, Liu T, Zhang J, Yu B, Lu K, Dang K. (2024). Qwen2. 5-coder technical report. arXiv preprint arXiv:2409.12186.

14. Structured outputs: Everything you should know, Доступно по ссылке: https://humanloop.com/blog/structured-outputs, дата обращения 10.06.2025.

15. Structured outputs in llms: Definition, techniques, applications, benefits, Доступно по ссылке: https://www.leewayhertz.com/structured-outputs-in-llms/, дата обращения 10.06.2025.

16. How to return structured data from a model, Доступно по ссылке: https://python.langchain.com/docs/how_to/structured_output/, дата обращения 10.06.2025.

17. Pydantic, Доступно по ссылке: https://docs.pydantic.dev/latest/, дата обращения 10.06.2025.

18. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, vol. 33, pp. 9459-9474.

19. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O. Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. The Journal of machine Learning research, vol. 12, pp. 2825- 2830.

20. Петрова П.А., Марков С.И., Качанов В.В. Создание набора данных для комбинированной классификации рецензий исходного кода. Интеллектуализация обработки информации. Тезисы докладов 15-й международной конференции, 2024г., стр. 83-85.

21. Llm-as-a-judge: a complete guide to using llms for evaluations, Доступно по ссылке: https://www.evidentlyai.com/llm-guide/llm-as-a-judge, дата обращения 10.06.2025.

22. Zheng, L., Chiang, W. L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Stoica, I. (2023). Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, vol. 36, pp. 46595-46623.

23. LLM review prompts, Доступно по ссылке: https://github.com/vova98/llm_review_prompts, дата обращения 10.06.2025.


Review

For citations:


KACHANOV V.V. Automatic Code Review Generation: Instruction Evolution and Intelligent Filtering. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(4):117-132. (In Russ.) https://doi.org/10.15514/ISPRAS-2025-37(4)-22



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)