Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Choosing an Isabelle System Interface for Interacting with Large Language Models

https://doi.org/10.15514/ISPRAS-2025-37(6)-48

Abstract

The paper presents a review of existing research on the use of large language models (LLMs) for automating the construction of formal proofs in the Isabelle proof assistant. It examines several levels of interaction between users or LLMs and the proof system. For each level, the paper discusses the advantages and limitations of practical use during routine proof development, as well as the current and potential opportunities for automating the proof process with LLMs. Based on this review, the paper proposes directions for improving the integration of LLM-based proof automation tools into the Isabelle system.

About the Authors

Mikhail Usamovich MANDRYKIN
Institute for System Programming of the Russian Academy of Sciences
Russian Federation

Cand. Sci. (Phys.-Math.), researcher at ISP RAS. Research interests: formal methods, program verification, mathematical logic, functional programming, type systems.



Vadim Sergeevich MUTILIN
Institute for System Programming of the Russian Academy of Sciences, Moscow Institute of Physics and Technology
Russian Federation

Cand. Sci. (Phys.-Math.), leading researcher at ISP RAS and associate professor at Moscow Institute of Physics and Technology. Main research interests: static and dynamic program analysis.



Alexander Konstantinovich PETRENKO
Institute for System Programming of the Russian Academy of Sciences, Lomonosov Moscow State University, National Research University, Higher School of Economics
Russian Federation

Dr. Sci. (Phys.-Math.), Prof., Head of Software Engineering Department of ISP RAS, Professor of MSU and the Faculty of Computer Science, NRU HSE. Research their use in software development and verification.



Konstantin Sergeevich SOROKIN
Institute for System Programming of the Russian Academy of Sciences, National Research University, Higher School of Economics
Russian Federation

A researcher at the Compiler Technology Department of ISP RAS. Research interests include compiler technologies, static and dynamic analysis, applications of AI/ML for enhancing productivity of software engineers and QA team members.



Ivan Yurievich TYUKIN
Institute for System Programming of the Russian Academy of Sciences, Skolkovo Institute of Science and Technolo
Russian Federation

Dr.Sci. (Tech.), Prof., principal research scientist of the ISP RAS, Professor of the Centre for Artificial Intelligence at Skolkovo Institute of Science and Technology. Research interests: mathematics of Artificial Intelligence, adaptive, robust and trustworthy Artificial Intelligence.



References

1. Paulson L. C. Isabelle – A Generic Theorem Prover (with a contribution by T. Nipkow). Lecture Notes in Computer Science, vol. 828, 1994.

2. Isabelle Development Team. Isabelle. 1994–2025. Available at: https://isabelle.in.tum.de/, accessed 10.12.2025.

3. Nipkow T., Wenzel M., Paulson L. C. Isabelle/HOL. A Proof Assistant for Higher-Order Logic. Lecture Notes in Computer Science, vol. 2283, 2002.

4. Lincroft G., Minsung Cho, Hough K., Bazzaz M., Bell J. Thirty-Three Years of Mathematicians and Software Engineers: A Case Study of Domain Expertise and Participation in Proof Assistant Ecosystems. In: Proceedings of the 21st International Conference on Mining Software Repositories: MSR '24, 2024, pp. 1–13.

5. Bayer J., David M., Pal A., Stock B. Beginners’ Quest to Formalize Mathematics: A Feasibility Study in Isabelle. In: Proceedings of Intelligent Computer Mathematics, CICM 2019, 2019, pp. 16–27.

6. Koutsoukou-Argyraki A. Formalising Mathematics – in Praxis; A Mathematician’s First Experiences with Isabelle/HOL and the Why and How of Getting Started. Jahresbericht der Deutschen Mathematiker-Vereinigung, vol. 123, 2021, pp. 3–26.

7. Rocq Development Team. The Rocq Prover. 1989–2025. Available at: https://rocq-prover.org/, accessed 10.12.2025.

8. Lean Development Team. Lean. 2015–2025. Available at: https://lean-lang.org/, accessed 10.12.2025.

9. HOL Development Team. HOL Interactive Theorem Prover. 1988–2025. Available at: https://hol-theorem-prover.org/, accessed 10.12.2025.

10. Harrison J. The HOL Light theorem prover, 2025. Available at: https://hol-light.github.io/, accessed 10.12.2025.

11. Megill N., Wheeler D. A. Metamath: a computer language for mathematical proofs, Lulu.com, 2019.

12. Jiang A. Q., Welleck S., Jin Peng Zhou, Lacroix T., Jiacheng Liu, Wenda Li, Jamnik M., Lample G., Yuhuai Wu. Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs. In: Proceedings of The Eleventh International Conference on Learning Representations, 2023.

13. Jiang A. Q., Wenda Li, Han J. M., Yuhuai Wu. LISA: Language models of isabelle proofs. In: Proceedings of the 6th Conference on Artificial Intelligence and Theorem Proving, 2021.

14. Jiang A. Q., Wenda Li, Tworkowski S., Czechowski K., Odrzygóźdź T., Miłoś P., Yuhuai Wu, Jamnik M. Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers. In: NIPS'22: Proceedings of the 36th International Conference on Neural Information Processing Systems, 2022.

15. First E., Rabe M. N., Ringer T., Brun Y. Baldur: Whole-Proof Generation and Repair with Large Language Models. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023:), 2023.

16. Qiyuan Xu, Renxi Wang, Haonan Li, Sanan D., Watt C. IsaMini: Redesigned Isabelle Proof Language for Machine Learning, Препринт (arXiv:2507.18885), 2025.

17. Xiaolin Hu, Qinghua Zhou, Grechuk B., Tyukin I. Y., Sutton O. StepProof: Step-by-step verification of natural language mathematical proofs. In Proceedings: The Thirteenth International Conference on Learning Representations, ICLR 2025, 2025.

18. Bansal K., Loos S. M., Rabe M. N., Szegedy C., Wilcox S. HOList: An Environment for Machine Learning of Higher-Order Theorem Proving. In: Proceedings of the 36-th International Conference on Machine Learning, 2019.

19. Huang D., Dhariwal P., Dawn Song, Sutskever I. GamePad: A Learning Environment for Theorem Proving. In: Proceedings of the International Conference on Learning Representations, 2019.

20. Paliwal A., Loos S., Rabe M., Bansal K., Szegedy C. Graph Representations for Higher-Order Logic and Theorem Proving. In: Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), 2020.

21. Kaiyu Yang, Swope A. M., Gu A., Chalamala R., Peiyang Song, Shixing Yu, Godil S., Prenger R., Anandkumar A. LeanDojo: Theorem Proving with Retrieval-Augmented Language Models. In: NeurIPS 2023 Datasets and Benchmarks (NIPS’9), article 944, 2023, 40 pages.

22. Del Tredici M., McCarran J., Breen B., Mijares J. A., Yin W. W., Taylor J. M., Koppens F. H. L., Englund D. Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics, Препринт (arXiv:2510.12787), 2025.

23. Paulson L. C. The foundation of a generic theorem prover. Journal of Automated Reasoning, vol. 5, 1989, pp. 363–397.

24. Wenzel M. Isar – A Generic Interpretative Approach to Readable Formal Proof Documents. In Proceedings: Theorem Proving in Higher Order Logics, 1999. pp. 167–183.

25. Huerta y Munive J. J. Reaping the first fruits of the Isabelle/RL project. In: Proceedings of the 10th Conference on Artificial Intelligence and Theorem Proving, 2025.

26. Becker H., Chong N., Dockins R., Grundy J., Hu J. Z. S., Mulder I., Mulligan D. P., Mure P., Paulson L.C., Slind K. I/Q (Isabelle/Q), an Isabelle/jEdit plugin exposing proof editing/exploration capabilities as an MCP server, 2025. Available at: https://github.com/awslabs/AutoCorrode/tree/main/iq, accessed 10.12.2025.

27. Noschinski L., Traut C. Pattern-based Subterm Selection in Isabelle. In: Proceeding of Isabelle workshop, 2014. Также препринт (arXiv:2111.04082).

28. Kappelmann K. Unification Utilities for Isabelle/ML. Archive of Formal Proofs, 2023.

29. Snyder W. E-Unification (Book chapter). A Proof Theory for General Unification, 1991, pp. 49–60.

30. Asperti A., Ricciotti W., Sacerdoti Coen C., Tassi E. Hints in Unification. In: Proceedings of the International Conference on Theorem Proving in Higher Order Logics, 2009. pp. 84–98.

31. Bohua Zhan. AUTO2, A Saturation-Based Heuristic Prover for Higher-Order Logic. In: Proceedings of the International Conference on Interactive Theorem Proving, 2016. pp. 441–456.

32. Unruh Dominique. Without Loss of Generality. Archive of Formal Proofs, 2024.

33. Blanchette J. C., Böhme S., Paulson L. C. Extending Sledgehammer with SMT Solvers. In: Proceedings of the 23rd International Conf. on Automated Deduction (CADE-23), 2011, pp. 116–130.

34. Obua S., Skalberg S. Importing HOL into Isabelle/HOL. In: Proceedings of the Third International Joint Conference on Automated Reasoning, 2006. pp. 298–302.

35. Adams M. The Common HOL Platform. In: Proceedings of the 4th Workshop on Proof eXchange for Theorem Proving (PxTP’15), EPTCS 186, 2015. pp. 42–56.

36. Hurd J. First-order proof tactics in higher-order logic theorem provers. In: Proceedings: Design and Application of Strategies/Tactics in Higher Order Logics, NASA/CP-2003-212448 in NASA Technical Reports, 2003, pp. 56–68.

37. Paulson L. C., Susanto K. W. Source-Level Proof Reconstruction for Interactive Theorem Proving. In: Proceedings: Theorem Proving in Higher Order Logics, 2007. pp. 232–245.

38. The Archive of Formal Proofs. 2004–2025. Available at: https://www.isa-afp.org/, accessed 10.12.2025.

39. Unruh D. Scala-Isabelle, a library that allows to control an Isabelle prover process from a Scala application, 2020–2025. Available at: https://github.com/dominique-unruh/scala-isabelle, accessed 10.12.2025.

40. Jiang A. Q., Wenda Li, Han J. M., Yuhuai Wu. PISA (Portal to ISAbelle). Available at: https://github.com/albertqjiang/Portal-to-ISAbelle, accessed 10.12.2025.

41. Xu Qiyuan. Isabelle REPL. Unofficial support for Isabelle's Read-Eval-Print-Loop, 2024–2025. Available at: https://github.com/xqyww123/Isa-REPL, accessed 10.12.2025.

42. Bartl L., Blanchette J., Nipkow T. Exploiting Instantiations from Paramodulation Proofs in Isabelle/HOL. In: Proceedings: Automated Deduction – CADE 30: 30th International Conference on Automated Deduction, 2025, pp. 573–593.

43. Blanchette J. C., Böhme S., Fleury M., Smolka S. J., Steckermeier A. Semi-intelligible Isar Proofs from Machine-Generated Proofs. Journal of Automated Reasoning, vol. 56, 2015, pp. 155–200.

44. Haftmann F., Wenzel M. Local Theory Specifications in Isabelle/Isar. In: Proceedings: Types for Proofs and Programs, 2008, pp. 153–168.

45. Linzi A. L-Mosaics and Bounded Join-Semilattices in Isabelle/HOL. Препринт (arXiv:2509.19854), 2025.

46. Rao B., Eiers W., Lipizzi C. Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification. In: 19th Conference on Neurosymbolic Learning and Reasoning, Proceedings of Machine Learning Research, vol. 284, 2025, pp. 1–16.

47. Linzi A. Personal AI Research Log. 2025. Available at: https://github.com/linzialessandro/ai-research-log/tree/main/formal-verification, accessed 10.12.2025.

48. Информация об авторах / Information about


Review

For citations:


MANDRYKIN M.U., MUTILIN V.S., PETRENKO A.K., SOROKIN K.S., TYUKIN I.Yu. Choosing an Isabelle System Interface for Interacting with Large Language Models. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(6):27-44. (In Russ.) https://doi.org/10.15514/ISPRAS-2025-37(6)-48



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)