HOREC: a Specialized Regular Expression Compiler for Designing Programmable and Resource-Efficient Hardware Architecture
https://doi.org/10.15514/ISPRAS-2025-37(4)-5
Abstract
The development of programmable and resource-efficient hardware accelerators for regular expression matching is an important research direction in network security, where high throughput for streaming data processing and resilience against ReDoS (Regular Expression Denial of Service) attacks are critical. This paper presents the HOREC compiler, which is used in the high-level design loop of a programmable hardware accelerator. HOREC utilizes a novel extension of a deterministic finite automaton, enabling compact representation of interval quantifiers with a large number of repetitions, typical for rules in intrusion detection systems. Algorithms are described to reduce the number of transitions in instructions and decrease the total number of instructions. The paper presents a software model of the accelerator based on an interpreter for matching compiled patterns and defines a set of parameters for architectural design space exploration of the hardware accelerator, including available memory size, instruction format, and symbols matching modes. Experimental evaluation was performed on 7234 regular expressions extracted from ET OPEN rules. The results demonstrate the high resource efficiency of the proposed solution: up to 7000 expressions were accommodated in a 64K instruction memory. Furthermore, in 60% of cases, the number of transitions per instruction does not exceed 4, and the use of a global symbol set table for 256 frequently used elements allows each program’s local table to be limited to just 10 symbol sets. The presented results confirm HOREC's applicability for hardware implementation of regular expression matching in network security tasks and show the potential of the proposed approach for high-level design of low-hardware-cost accelerators.
About the Author
Petr Nikolayevich SOVIETOVRussian Federation
Cand. Sci. (Tech.), Associate Professor of the Corporate Information Systems Department of the Institute of Information Technologies of the Institute of Information Technologies, Federal State Budgetary Educational Institution of Higher Education “MIREA - Russian Technological University”. Research interests: program synthesis, design automation of specialized processors, compiler writing for specialized processors.
References
1. Turoňová L., Holı́k L., Homoliak I., Lengál O., Veanes M., Vojnar T. Counting in Regexes Considered Harmful: Exposing ReDoS Vulnerability of Nonbacktracking Matchers / 31st USENIX Security Symposium (USENIX Security 22). — Boston, MA: USENIX Association, 2022. — С. 4165–4182.
2. Carloni F., Conficconi D., Moschetto I., Santambrogio M. D. YARB: a Methodology to Characterize Regular Expression Matching on Heterogeneous Systems / 2023 IEEE International Symposium on Circuits and Systems (ISCAS). — IEEE, 2023. — С. 1–5. — DOI: 10.1109/iscas46773.2023.10181547.
3. Sovetov P. N. Development of DSL Compilers for Specialized Processors // Programming and Computer Software. — Pleiades Publishing Ltd, 2021. — Т. 47, вып. 7. — С. 541–554. — DOI: 10.1134/s0361768821070082.
4. Sovietov P. N. Algorithms for Improving the Automatically Synthesized Instruction Set of an Extensible Processor // Programmnaya Ingeneria. — New Technologies Publishing House, 2023. — Т. 14, вып. 5. — С. 225–231. — DOI: 10.17587/prin.14.225-231.
5. Tiny parser combinators library written in Python, Available at: https://github.com/true-grue/peco, accessed 02.08.2025.
6. Antimirov V. Partial derivatives of regular expressions and finite automaton constructions // Theoretical Computer Science. — Elsevier BV, 1996. — Т. 155, вып. 2. — С. 291–319. — DOI: 10.1016/0304-3975(95)00182-4.
7. D’Antoni L., Veanes M. Minimization of symbolic automata / Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. — ACM, 2014. — DOI: 10.1145/2535838.2535849.
8. Holzer M., Jakobi S. Brzozowski’s Minimization Algorithm—More Robust than Expected // Implementation and Application of Automata. — Springer Berlin Heidelberg, 2013. — С. 181–192. — DOI: 10.1007/978-3-642-39274-0_17.
9. Sulzmann M., Lu K. Z. M. Regular expression sub-matching using partial derivatives / Proceedings of the 14th symposium on Principles and practice of declarative programming. — ACM, 2012. — С. 79–90. — DOI: 10.1145/2370776.2370788.
10. Proofpoint Emerging Threats Rules, Available at: https://rules.emergingthreats.net/, accessed 02.08.2025.
11. Le Glaunec A., Kong L., Mamouras K. Regular Expression Matching using Bit Vector Automata // Proceedings of the ACM on Programming Languages. — Association for Computing Machinery (ACM), 2023. — Т. 7, вып. OOPSLA1. — С. 492–521. — DOI: 10.1145/3586044.
12. Smith R., Estan C., Jha S. XFA: Faster Signature Matching with Extended Automata / 2008 IEEE Symposium on Security and Privacy (sp 2008). — IEEE, 2008. — С. 187–201. — DOI: 10.1109/sp.2008.14.
13. Turoňová L., Holík L., Lengál O., Saarikivi O., Veanes M., Vojnar T. Regex matching with counting-set automata // Proceedings of the ACM on Programming Languages. — Association for Computing Machinery (ACM), 2020. — Т. 4, вып. OOPSLA. — С. 1–30. — DOI: 10.1145/3428286.
14. Holík L., Lengál O., Saarikivi O., Turoňová L., Veanes M., Vojnar T. Succinct Determinisation of Counting Automata via Sphere Construction // Programming Languages and Systems. — Springer International Publishing, 2019. — С. 468–489. — DOI: 10.1007/978-3-030-34175-6_24.
15. Rahimi R., Sadredini E., Stan M., Skadron K. Grapefruit: An Open-Source, Full-Stack, and Customizable Automata Processing on FPGAs / 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). — IEEE, 2020. — С. 138–147. — DOI: 10.1109/fccm48280.2020.00027.
16. Wen Z., Kong L., Le Glaunec A., Mamouras K., Yang K. BVAP: Energy and Memory Efficient Automata Processing for Regular Expressions with Bounded Repetitions / Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. — ACM, 2024. — С. 151–166. — DOI: 10.1145/3620665.3640412.
17. Somaini A., Carloni F., Agosta G., Santambrogio M. D., Conficconi D. Combining MLIR Dialects with Domain-Specific Architecture for Efficient Regular Expression Matching / Proceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization. — ACM, 2025. — С. 255–270. — DOI: 10.1145/3696443.3708916.
18. Gong L., Wang C., Xia H., Chen X., Li X., Zhou X. Enabling Fast and Memory-Efficient Acceleration for Pattern Matching Workloads: The Lightweight Automata Processing Engine // IEEE Transactions on Computers. — Institute of Electrical; Electronics Engineers (IEEE), 2023. — Т. 72, вып. 4. — С. 1011–1025. — DOI: 10.1109/tc.2022.3187338.
19. Zhong J., Chen S., Yu C. Xav: A High-Performance Regular Expression Matching Engine for Packet Processing. — Elsevier BV, 2024. — DOI: 10.2139/ssrn.4760269.
Supplementary files
Review
For citations:
SOVIETOV P.N. HOREC: a Specialized Regular Expression Compiler for Designing Programmable and Resource-Efficient Hardware Architecture. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(4):79-96. (In Russ.) https://doi.org/10.15514/ISPRAS-2025-37(4)-5