Извлечение знаний в ограниченной области для примеров состязательных атак «черного ящика»
https://doi.org/10.15514/ISPRAS-2025-37(4)-23
Аннотация
Устойчивость нейронных сетей к состязательным возмущениям в условиях «чёрного ящика» остаётся сложной проблемой. Большинство существующих методов атак требуют чрезмерного количества запросов к целевой модели, что ограничивает их практическую применимость. В данной работе мы предлагаем подход, в котором суррогатная модель-ученик итеративно обучается на неудачных попытках атак, постепенно изучая локальное поведение модели «чёрного ящика». Эксперименты показывают, что этот метод значительно сокращает количество необходимых запросов, сохраняя при этом высокую вероятность успеха атак.
Об авторах
Кирилл Сергеевич ЛУКЬЯНОВРоссия
Исследователь центра доверенного искусственного интеллекта ИСП РАН; аспирант МФТИ. Научные интересы: методы доверенного ИИ, федеративное обучение, многокритериальная оптимизация, AutoML.
Андрей Игоревич ПЕРМИНОВ
Россия
Является аспирантом института системного программирования РАН. Научные интересы: нейросетевая обработка данных, цифровая обработка изображений, методы доверенного искусственного интеллекта.
Денис Юрьевич ТУРДАКОВ
Россия
Кандидат физико-математических наук, заведующий отделом ИСП РАН. Научные интересы: анализ социальных сетей, анализ текста, извлечение информации, обработка больших данных, методы доверенного искусственного интеллекта.
Михаил Александрович ПАУТОВ
Россия
Кандидат компьютерных наук, научный сотрудник AIRI, Центра доверенного искусственного интеллекта ИСП РАН. Сфера научных интересов: линейная алгебра, теория больших отклонений, доказуемая устойчивость нейронных сетей, цифровые водяные знаки.
Список литературы
1. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., & Fergus, R. (2014). Intriguing properties of neural networks. In 2nd International Conference on Learning Representations.
2. Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and Harnessing Adversarial Examples. CoRR, abs/1412.6572.
3. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
4. Carlini, N., & Wagner, D. A. (2016). Towards Evaluating the Robustness of Neural Networks. In 2017 IEEE Symposium on Security and Privacy (SP), 39–57.
5. Chen, J., Jordan, M. I., & Wainwright, M. J. (2020). Hopskipjumpattack: A query-efficient decision-based attack. In 2020 IEEE Symposium on Security and Privacy (SP), 1277–1294. IEEE.
6. Andriushchenko, M., Croce, F., Flammarion, N., & Hein, M. (2020). Square attack: a query-efficient black-box adversarial attack via random search. In European Conference on Computer Vision, 484–501. Springer.
7. Qin, Y., Xiong, Y., Yi, J., & Hsieh, C.-J. (2023). Training meta-surrogate model for transferable adversarial attack. In Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9516–9524.
8. Vo, Q. V., Abbasnejad, E., & Ranasinghe, D. (2024). BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack. In The Twelfth International Conference on Learning Representations.
9. Ilyas, A., Engstrom, L., Athalye, A., & Lin, J. (2018). Black-box adversarial attacks with limited queries and information. In International Conference on Machine Learning, 2137–2146. PMLR.
10. Bai, Y., Zeng, Y., Jiang, Y., Wang, Y., Xia, S.-T., & Guo, W. (2020). Improving query efficiency of black-box adversarial attack. In Computer Vision—ECCV 2020: 16th European Conference, 101–116. Springer.
11. Li, Q., Guo, Y., Zuo, W., & Chen, H. (2023). Making substitute models more Bayesian can enhance transferability of adversarial examples. arXiv preprint arXiv:2302.05086.
12. Gubri, M., Cordy, M., Papadakis, M., Le Traon, Y., & Sen, K. (2022). LGV: Boosting adversarial example transferability from large geometric vicinity. In European Conference on Computer Vision, 603–618. Springer.
13. Yuan, X., Ding, L., Zhang, L., Li, X., & Wu, D. O. (2022). ES attack: Model stealing against deep neural networks without data hurdles. IEEE Transactions on Emerging Topics in Computational Intelligence, 6(5), 1258–1270. IEEE.
14. Lukas, N., Zhang, Y., & Kerschbaum, F. (2019). Deep neural network fingerprinting by conferrable adversarial examples. arXiv preprint arXiv:1912.00888.
15. Kim, B., Lee, S., Lee, S., Son, S., & Hwang, S. J. (2023). Margin-based neural network watermarking. In International Conference on Machine Learning, 16696–16711. PMLR.
16. Pautov, M., Bogdanov, N., Pyatkin, S., Rogov, O., & Oseledets, I. (2024). Probabilistically Robust Watermarking of Neural Networks. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24), 4778–4787.
17. Bhagoji, A. N., He, W., Li, B., & Song, D. (2018). Practical black-box attacks on deep neural networks using efficient query mechanisms. In Proceedings of the European Conference on Computer Vision (ECCV), 154–169.
18. Chen, P.-Y., Zhang, H., Sharma, Y., Yi, J., & Hsieh, C.-J. (2017). ZOO: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, 15–26.
19. Ilyas, A., Engstrom, L., & Madry, A. (2019). Prior Convictions: Black-box Adversarial Attacks with Bandits and Priors. In International Conference on Learning Representations.
20. Guo, C., Gardner, J., You, Y., Wilson, A. G., & Weinberger, K. (2019). Simple black-box adversarial attacks. In International Conference on Machine Learning, 2484–2493. PMLR.
21. Liu, Y., Chen, X., Liu, C., & Song, D. (2022). Delving into Transferable Adversarial Examples and Black-box Attacks. In International Conference on Learning Representations.
22. Xie, C., Zhang, Z., Zhou, Y., Bai, S., Wang, J., Ren, Z., & Yuille, A. L. (2019). Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2730–2739.
23. Wu, D., Wang, Y., Xia, S.-T., Bailey, J., & Ma, X. (2020). Skip Connections Matter: On the Transferability of Adversarial Examples Generated with ResNets. In International Conference on Learning Representations.
24. Liu, X., Zhong, Y., Zhang, Y., Qin, L., & Deng, W. (2023). Enhancing generalization of universal adversarial perturbation through gradient aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 4435–4444.
25. Yang, X., Lin, J., Zhang, H., Yang, X., & Zhao, P. (2023). Improving the transferability of adversarial examples via direction tuning.
26. Yin, F., Zhang, Y., Wu, B., Feng, Y., Zhang, J., Fan, Y., & Yang, Y. (2023). Generalizable black-box adversarial attack with meta learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(3), 1804–1818. IEEE.
27. Rahmati, A., Moosavi-Dezfooli, S.-M., Frossard, P., & Dai, H. (2020). Geoda: a geometric framework for black-box adversarial attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8446–8455.
28. Chen, J., & Gu, Q. (2020). Rays: A ray searching method for hard-label adversarial attack. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1739 1747.
29. Zhang, J., Li, L., Li, H., Zhang, X., Yang, S., & Li, B. (2021). Progressive-scale boundary blackbox attack via projective gradient estimation. In International Conference on Machine Learning, 12479–12490. PMLR.
30. Hinton, G. (2015). Distilling the Knowledge in a Neural Network. arXiv preprint arXiv:1503.02531.
31. Sun, S., Cheng, Y., Gan, Z., & Liu, J. (2019). Patient Knowledge Distillation for BERT Model Compression. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 4323–4332.
32. Wang, J., Bao, W., Sun, L., Zhu, X., Cao, B., & Philip, S. Y. (2019). Private model compression via knowledge distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, 33(01)_1197.
33. Li, T., Li, J., Liu, Z., & Zhang, C. (2020). Few sample knowledge distillation for efficient network compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14639–14647.
34. Lyu, L., & Chen, C.-H. (2020). Differentially private knowledge distillation for mobile analytics. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 1809–1812.
35. Chourasia, R., Enkhtaivan, B., Ito, K., Mori, J., Teranishi, I., & Tsuchida, H. (2022). Knowledge Cross-Distillation for Membership Privacy. Proceedings on Privacy Enhancing Technologies.
36. Galichin, A. V., Pautov, M., Zhavoronkin, A., Rogov, O. Y., & Oseledets, I. (2024). GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation. arXiv preprint arXiv:2405.07562.
37. McDonald, D., Papadopoulos, R., & Benningfield, L. (2024). Reducing LLM hallucination using knowledge distillation: A case study with Mistral large and MMLU benchmark. Authorea Preprints.
38. Gu, Y., Dong, L., Wei, F., & Huang, M. (2024). MiniLLM: Knowledge distillation of large language models. In The Twelfth International Conference on Learning Representations.
39. Kang, M., Lee, S., Baek, J., Kawaguchi, K., & Hwang, S. J. (2024). Knowledge-augmented reasoning distillation for small language models in knowledge-intensive tasks. Advances in Neural Information Processing Systems, 36.
40. Huang, T., Zhang, Y., Zheng, M., You, S., Wang, F., Qian, C., & Xu, C. (2024). Knowledge diffusion for distillation. Advances in Neural Information Processing Systems, 36.
41. Yao, X., Lu, F., Zhang, Y., Zhang, X., Zhao, W., & Yu, B. (2024). Progressively Knowledge Distillation via Re-parameterizing Diffusion Reverse Process. In Proceedings of the AAAI Conference on Artificial Intelligence, 38(15), 16425–16432.
42. Yin, T., Gharbi, M., Zhang, R., Shechtman, E., Durand, F., Freeman, W. T., & Park, T. (2024). One-step diffusion with distribution matching distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6613–6623.
43. Papernot, N., McDaniel, P., Wu, X., Jha, S., & Swami, A. (2016). Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), 582 597. IEEE.
44. Kuang, H., Liu, H., Wu, Y., Satoh, S., & Ji, R. (2024). Improving adversarial robustness via information bottleneck distillation. Advances in Neural Information Processing Systems, 36.
45. Huang, B., Chen, M., Wang, Y., Lu, J., Cheng, M., & Wang, W. (2023). Boosting accuracy and robustness of student models via adaptive adversarial distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 24668–24677.
46. Goldblum, M., Fowl, L., Feizi, S., & Goldstein, T. (2020). Adversarially robust distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, 34(04), 3996–4003.
47. Zi, B., Zhao, S., Ma, X., & Jiang, Y.-G. (2021). Revisiting adversarial robustness distillation: Robust soft labels make student better. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 16443–16452.
48. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. In International Conference on Learning Representations.
49. Krizhevsky, A., Hinton, G., & others. (2009). Learning multiple layers of features from tiny images. Toronto, ON, Canada.
50. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
51. Zheng, M., Yan, X., Zhu, Z., Chen, H., & Wu, B. (2025). BlackboxBench: A comprehensive benchmark of black-box adversarial attacks. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Рецензия
Для цитирования:
ЛУКЬЯНОВ К.С., ПЕРМИНОВ А.И., ТУРДАКОВ Д.Ю., ПАУТОВ М.А. Извлечение знаний в ограниченной области для примеров состязательных атак «черного ящика». Труды Института системного программирования РАН. 2025;37(4):133-146. https://doi.org/10.15514/ISPRAS-2025-37(4)-23
For citation:
LUKIANOV K.S., PERMINOV A.I., TURDAKOV D.Yu., PAUTOV M.A. Knowledge Distillation in Local-Region for Black-Box Adversarial Examples. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2025;37(4):133-146. (In Russ.) https://doi.org/10.15514/ISPRAS-2025-37(4)-23