Preview

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Advanced search

Is AI Interpretability Safe: the Relationship between Interpretability and Security of Machine Learning Models

https://doi.org/10.15514/ISPRAS-2024-36(5)-9

Abstract

With the growing application of interpretable artificial intelligence (AI) models, increasing attention is being paid to issues of trust and security across all types of data. In this work, we focus on the task of graph node classification, highlighting it as one of the most challenging. To the best of our knowledge, this is the first study to comprehensively explore the relationship between interpretability and robustness. Our experiments are conducted on datasets of citation and purchase graphs. We propose methodologies for constructing black-box attacks on graph models based on interpretation results and demonstrate how adding protection impacts the interpretability of AI models.

About the Authors

Georgii Vladimirovich SAZONOV
Ivannikov Institute of System Programming of the Russian Academy of Sciences, Lomonosov Moscow State University
Russian Federation

Еmployee of the Information Systems Department of the Ivannikov Institute for System Programming of the Russian Academy of Sciences; master's student at Moscow State University.



Kirill Sergeevich LUKYANOV
Ivannikov Institute of System Programming of the Russian Academy of Sciences, Moscow Institute of Physics and Technology (National Research University), Research Center for Trusted Artificial Intelligence ISP RAS
Russian Federation

Researcher at the Center for Trusted Artificial Intelligence of the Ivannikov Institute for System Programming of the Russian Academy of Sciences; postgraduate student at Moscow Institute of Physics and Technology.



Serafim Konstantinovich BOYARSKY
Yandex School of Data Analysis
Russian Federation

Student of Yandex School of Data Analysis, student of ITMO University.



Ilya Andreevich MAKAROV
Research Center for Trusted Artificial Intelligence ISP RAS, AIRI
Russian Federation

Senior Research Fellow position at Artificial Intelligence Research Institute (AIRI), Moscow, Russia, where he leads the research in Industrial AI.



References

1. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, Rob Fergus. Intriguing properties of neural networks. 2nd International Conference on Learning Representations, 2014.

2. Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. CoRR, 2014, abs/1412.6572.

3. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.

4. Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Pascal Frossard. Deepfool: A simple and accurate method to fool deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 2574–2582.

5. Zaixi Zhang, Qi Liu, Hao Wang, Chengqiang Lu, Cheekong Lee. Protgnn: Towards self-explaining graph neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(8): 9127–9135.

6. Han Xuanyuan, Pietro Barbiero, Dobrik Georgiev, Lucie Charlotte Magister, Pietro Liò. Global concept-based interpretability for graph neural networks via neuron analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(9): 10675–10683

7. Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, Jure Leskovec. Gnnexplainer: Generating explanations for graph neural networks. Advances in Neural Information Processing Systems, 2019.

8. Dongsheng Luo, Wei Cheng, Dongkuan Xu, Wenchao Yu, Bo Zong, Haifeng Chen, Xiang Zhang. Parameterized explainer for graph neural network. Advances in Neural Information Processing Systems, 2020, 33: 19620–19631.

9. Michael Sejr Schlichtkrull, Nicola De Cao, Ivan Titov. Interpreting graph neural networks for NLP with differentiable edge masking. arXiv preprint arXiv:2010.00577, 2020.

10. Thomas Schnake, Oliver Eberle, Jonas Lederer, Shinichi Nakajima, Kristof T. Schütt, Klaus-Robert Müller, Grégoire Montavon. Higher-order explanations of graph neural networks via relevant walks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(11): 7581–7596.

11. Qiang Huang, Makoto Yamada, Yuan Tian, Dinesh Singh, Yi Chang. Graphlime: Local interpretable model explanations for graph neural networks. IEEE Transactions on Knowledge and Data Engineering, 2022, 35(7): 6968–6972.

12. Hao Yuan, Haiyang Yu, Jie Wang, Kang Li, Shuiwang Ji. On explainability of graph neural networks via subgraph explorations. International Conference on Machine Learning, 2021, 12241–12252.

13. Daniel Zügner, Amir Akbarnejad, Stephan Günnemann. Adversarial attacks on neural networks for graph data. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, 2847–2856.

14. Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. CoRR, 2014, abs/1412.6572.

15. Daniel Zügner, Stephan Günnemann. Adversarial Attacks on Graph Neural Networks via Meta Learning. International Conference on Learning Representations, Workshop Track, 2019, https://arxiv.org/abs/1902.08412.

16. Xiang Zhang, Marinka Zitnik. Gnnguard: Defending graph neural networks against adversarial attacks. Advances in Neural Information Processing Systems, 2020, 33: 9263–9275.

17. Huijun Wu, Chen Wang, Yuriy Tyshetskiy, Andrew Docherty, Kai Lu, Liming Zhu. Adversarial examples on graph data: Deep insights into attack and defense. arXiv preprint arXiv:1903.01610, 2019.

18. Dingyuan Zhu, Ziwei Zhang, Peng Cui, Wenwu Zhu. Robust graph convolutional networks against adversarial attacks. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, 1399–1407.

19. Fuli Feng, Xiangnan He, Jie Tang, Tat-Seng Chua. Graph adversarial training: Dynamically regularizing based on graph structure. IEEE Transactions on Knowledge and Data Engineering, 2019, 33(6): 2493–2504.

20. Chris Finlay, Adam M. Oberman. Scaleable input gradient regularization for adversarial robustness. arXiv preprint arXiv:1905.11468, 2019.

21. Xiang Zhang, Marinka Zitnik. Gnnguard: Defending graph neural networks against adversarial attacks. Advances in Neural Information Processing Systems, 2020, 33: 9263–9275.

22. Ninghao Liu, Hongxia Yang, Xia Hu. Adversarial Detection with Model Interpretation. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, 1803–1811, https://doi.org/10.1145/3219819.3220027.

23. Shen Wang, Yuxin Gong. Adversarial example detection based on saliency map features. Applied Intelligence, 2022, 52(6): 6262–6275.

24. Jiaqi Ma, Junwei Deng, Qiaozhu Mei. Adversarial attack on graph neural networks as an influence maximization problem. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, 2022, 675–685.

25. Finale Doshi-Velez, Been Kim. Towards a Rigorous Science of Interpretable Machine Learning. arXiv preprint arXiv:1702.08608, 2017.

26. Zachary C. Lipton. The Mythos of Model Interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 2018, 16(3): 31–57

27. Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016, 1135–1144.

28. Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, Dino Pedreschi. A Survey of Methods for Explaining Black Box Models. ACM Computing Surveys (CSUR), 2018, 51(5): 1–42.

29. Tim Miller. Explanation in Artificial Intelligence: Insights from the Social Sciences. Artificial Intelligence, 2019, 267: 1–38.

30. Thorben Funke, Megha Khosla, Mandeep Rathee, Avishek Anand. Zorro: Valid, sparse, and stable explanations in graph neural networks. IEEE Transactions on Knowledge and Data Engineering, 2022, 35(8): 8687–8698.

31. Yiqing Xie, Sha Li, Carl Yang, Raymond Chi-Wing Wong, Jiawei Han. When do GNNs work: Understanding and improving neighborhood aggregation. IJCAI'20: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020.

32. Xie Y. et al. When do gnns work: Understanding and improving neighborhood aggregation //IJCAI'20: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, {IJCAI} 2020. – 2020. – Т. 2020. – №. 1.

33. Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, Tina Eliassi-Rad. Collective classification in network data. AI Magazine, 2008, 29(3): 93–93.

34. Julian McAuley, Christopher Targett, Qinfeng Shi, Anton Van Den Hengel. Image-based recommendations on styles and substitutes. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015, 43–52.


Review

For citations:


SAZONOV G.V., LUKYANOV K.S., BOYARSKY S.K., MAKAROV I.A. Is AI Interpretability Safe: the Relationship between Interpretability and Security of Machine Learning Models. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2024;36(5):127-142. (In Russ.) https://doi.org/10.15514/ISPRAS-2024-36(5)-9



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)