Iterative Weak Supervision with LLM-Guided Labeling Function Refinement
https://doi.org/10.15514/ISPRAS-2025-37(6)-20
Abstract
Training high-quality classifiers in domains with limited labeled data remains a fundamental challenge in machine learning. While large language models (LLMs) have demonstrated strong zero-shot capabilities, their use as direct predictors suffers from high inference cost, prompt sensitivity, and limited interpretability. Weak supervision, in contrast, provides a scalable alternative through the aggregation of noisy labeling functions (LFs), but authoring and refining these rules traditionally requires significant manual effort. We introduce LLM-Guided Iterative Weak Labeling (LGIWL), a novel framework that integrates prompting with weak supervision in an iterative feedback loop. Rather than using an LLM for classification, we use it to synthesize and refine labeling functions based on downstream classifier errors. The generated rules are filtered using a small development set and applied to unlabeled data via a generative label model, enabling high-quality training of discriminative classifiers with minimal human annotation. We evaluate LGIWL on a real-world text classification task involving Russian-language customer service dialogues. Our method significantly outperforms keyword-based Snorkel heuristics, zero-shot prompting with GPT-4, and even a supervised CatBoost classifier trained on a full labeled dev set. In particular, LGIWL achieves strong recall while yielding a notable improvement in precision, resulting in a final F1 score of 0.863 with a RuModernBERT classifier–demonstrating both robustness and practical scalability.
About the Authors
Artur Dmitrievich SOSNOVIKOVRussian Federation
Graduate student at the Institute of System Programming since 2023. Research interests: machine learning methods, weakly supervised learning.
Anton Dmitrievich ZEMEROV
Russian Federation
Senior ML Engineer at Tochka Bank. Graduate of the PhysTech School of Applied Mathematics and Informatics at MIPT. Research interests: machine learning methods, natural language processing, large language models.
Denis Yurievich TURDAKOV
Russian Federation
Cand. Sci. (Phys.-Math.), Head of Department at ISP RAS, associate professor of the Department of System Programming at MSU. Research interests: natural language processing, information extraction, big data analysis, social network analysis.
References
1. Stephen H Bach, Ben He, Alexander Ratner, and Christopher Ré. Learning the structure of generative models without labeled data. In International Conference on Machine Learning, pages 273–282, 2017.
2. Stephen H Bach, Daniel Rodriguez, Yintao Liu, et al. Snorkel drybell: A case study in deploying weak supervision at industrial scale. In ACM SIGMOD, pages 362–375, 2019.
3. Bradley Denham et al. Witan: Unsupervised labeling function generation for assisted data programming. Proceedings of the VLDB Endowment, 15(11): 2334–2347, 2022.
4. Nan Guan et al. Datasculpt: Cost-efficient label function design via prompting large language models. In EDBT, pages 226–237, 2025.
5. Tai-Hsuan Huang et al. Scriptoriumws: A code generation assistant for weak supervision. In ICLR Workshop, 2023. arXiv:2301.01229.
6. Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. Snorkel: Rapid training data creation with weak supervision.
7. Robert Smith et al. Language models in the loop: Incorporating prompting into weak supervision. Journal of Data Science, 1(2):1–30, 2022.
8. Paroma Varma and Christopher Ré. Snuba: Automating weak supervision to label training data. In VLDB Endowment, volume 12, pages 223–236, 2018.
9. Peng Yu and Stephen H Bach. Alfred: A system for prompted weak supervision. In ACL System Demonstrations, pages 479–488, 2023.
10. Jialu Zhang et al. Wrench: A comprehensive benchmark for weak supervision. NeurIPS Datasets and Benchmarks, 2021.
11. Ruixiang Zhang et al. Prboost: Prompt-based rule discovery and boosting for interactive weakly-supervised learning. In ACL, pages 745–758, 2022.
Review
For citations:
SOSNOVIKOV A.D., ZEMEROV A.D., TURDAKOV D.Yu. Iterative Weak Supervision with LLM-Guided Labeling Function Refinement. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(6):65-76. (In Russ.) https://doi.org/10.15514/ISPRAS-2025-37(6)-20






