Machine Learning-Based malicious users’ detection in the VKontakte social network

Denis Igorevich SAMOKHVALOV

doi:10.15514/ISPRAS-2020-32(3)-10

Machine Learning-Based malicious users’ detection in the VKontakte social network

Denis Igorevich SAMOKHVALOV

https://doi.org/10.15514/ISPRAS-2020-32(3)-10

Full Text:

PDF (Eng)

Generate QR code

Abstract

This paper presents a machine learning-based approach for detection of malicious users in the largest Russian online social network VKontakte. An exploratory data analysis was conducted to determine the insights and anomalies in a dataset consisted of 42394 malicious and 241035 genuine accounts. Furthermore, a tool for automated collection of the information about malicious accounts in the VKontakte online social network was developed and used for the dataset collection, described in this research. A baseline feature engineering was conducted and the CatBoost classifier was used to build a classification model. The results showed that this model can identify malicious users with an overall 0.91 AUC-score validated with 4-folds cross-validation approach.

Keywords

VKontakte, malicious users, machine learning, social networks, classification models

About the Author

Denis Igorevich SAMOKHVALOV

National Research University Higher School of Economics
Russian Federation
Master student

References

1. J.A. Obar and S.S. Wildman. Social Media Definition and the Governance Challenge: An Introduction to the Special Issue. Telecommunications Policy, vol. 39, no. 9, 2915, pp. 745-750

2. D. M. Romero, W. Galuba, S. Asur, and B. A. Huberman. Influence and passivity in social media. In Proc. of the 20th International Conference Companion on World wide web, 2011, pp. 113-114.

3. Дубль [1] J. A. Obar and S. Wildman, “Social media definition and the governance challenge: An introduction to the special issue,” Telecommunications Policy, vol. 39, no. 9, pp. 745–750, Oct. 2015. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0308596115001172

4. I. Shatilin. What are virtual SIM cards and what do they do? Available at: https://www.kaspersky.com/blog/virtual-sim/11572/.

5. K. S. Adewole, N. B. Anuar, A. Kamsin, K. D. Varathan, and S. A. Razak. Malicious accounts: Dark of the social networks. Journal of Network and Computer Applications, vol. 79, 2017, pp. 41–67.

6. A. V. Filimonov, A. V. Osipov, and A. B. Klimov. Application of neural networks to identify trolls in social networks. arXiv:1504.07416 [cs], Apr. 2015.

7. A. Malm, R. Nash, and R. Moghadam. Social Network Analysis and Terrorism. In Handbook of the Criminology of Terrorism, G. LaFree and J. D. Freilich, eds., John Wiley & Sons, Inc., 2017, pp. 221–231.

8. Z. Mao, D. Li, Y. Yang, X. Fu, and W. Yang. Chinese DMOs’ engagement on global social media: examining post-related factors. Asia Pacific Journal of Tourism Research, vol. 25, no. 3, pp. 274–285.

9. D. DeBarr and H. Wechsler. Using Social Network Analysis for Spam Detection. Lecture Notes in Computer Science, 2010, vol. 6007, pp. 62–69.

10. L. Wu and H. Liu. Detecting Crowdturfing in Social Media. In Encyclopedia of Social Network Analysis and Mining, R. Alhajj and J. Rokne, eds, Springer, 2017, pp. 1–9.

11. M. Fire, D. Kagan, A. Elyashar, and Y. Elovici. Friend or foe? Fake profile identification in online social networks. Social Network Analysis and Mining, vol. 4, 2014, Article no. 194

12. T. Stein, E. Chen, and K. Mangla. Facebook immune system. In Proc. of the 4th Workshop on Social Network Systems, 2011, article no. 8, pp, 1–8 pp. 1–8.

13. S. Ali, N. Islam, A. Rauf, I. Din, M. Guizani, and J. Rodrigues. Privacy and Security Issues in Online Social Networks. Future Internet, vol. 10, no. 12, 2018, article no. 114, pp. 1-12.

14. M. Conti, R. Poovendran, and M. Secchiero. FakeBook: Detecting Fake Profiles in On-Line Social Networks. In Proc. of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2012, pp. 1071–1078.

15. A.J. Banu, N.N. Ahamed, B. Manivannan, K. Vanitha, M.M. Musthafa. Detecting Spammers on Social Networks. International Journal of Engineering and Computer Science, vol. 6, issue 2, 2017, pp. 20240-20247.

16. A. Romanov, A. Semenov, and J. Veijalainen. Revealing Fake Profiles in Social Networks by Longitudinal Data Analysis. In Proc. of the 13th International Conference on Web Information Systems and Technologies., 2017, pp. 51–58. 8

17. S. Adikari and K. Dutta. Identifying fake profiles in linkedin. In Proc. of the 19th Pacific Asia Conference on Information Systems, 2014, article no. 278.

18. Q. Cao, M. Sirivianos, X. Yang, and T. Pregueiro. Aiding the detection of fake accounts in large scale social online services. In Proc. of the 9th USENIX Conference on Networked Systems Design and Implementation, 2012, pp. 197–210.

19. S. Y. Wani, M. M. Kirmani, and S. I. Ansarulla. Prediction of fake profiles on facebook using supervised machine learning techniques-a theoretical model. International Journal of Computer Science and Information Technologies, vol. 7, no. 4, 2016, pp. 1735–1738.

20. M. Albayati and A. Altamimi. MDFP: A Machine Learning Model for Detecting Fake Facebook Profiles Using Supervised and Unsupervised Mining Techniques. International Journal of Simulation: Systems, Science & Technology, vol. 20, no. 1, 2019, article no. 11, pp. 1-10.

21. S. Khaled, N. El-Tazi, and H.M.O. Mokhtar. Detecting Fake Accounts on Social Media. In Proc. of the 2018 IEEE International Conference on Big Data (Big Data), 2018, pp. 3672–3681.

22. C. Troussas, M. Virvou, K. J. Espinosa, K. Llaguno, and J. Caro. Sentiment analysis of facebook statuses using naive bayes classifier for language learning. In Proc. of the International Conference on Information, Intelligence, Systems and Applications, 2013, pp. 1–6.

23. P.D. Zegzhda, E.V. Malyshev, and E.Y. Pavlenko. The use of an artificial neural network to detect automatically managed accounts in social networks. Automatic Control and Computer Sciences, vol. 51, no. 8, 2017, pp. 874–880.

24. K. Skorniakov, D. Turdakov, and A. Zhabotinsky. Make social networks clean again: Graph embedding and stacking classifiers for bot detection. In Proc. of the 2nd International Workshop on Rumours and Deception in Social Media, 2018, paper 39.

25. O. Varol, E. Ferrara, C. A. Davis, F. Menczer, and A. Flammini. Online human-bot interactions: Detection, estimation, and characterization. arXiv:1703.03107, 2017.

26. F. Morstatter, L. Wu, T. H. Nazer, K. M. Carley, and H. Liu. A new approach to bot detection: Striking the balance between precision and recal. In Proc. of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2016, pp. 533–540.

27. MongoDB, 2020. Available at: https://www.mongodb.com/

28. Docker, 2020. Available at: https://www.docker.com/

29. DigitalOcean, 2020. Available at: https://www.digitalocean.com/

30. L. Prokhorenkova, G. Gusev, A. Vorobev, A.V. Dorogush, and A. Gulin. Catboost: unbiased boosting with categorical features. In Proceedings of the 32nd International Conference on Neural Information Processing, 2018, pp. 6638–6648.

31. Microleaves, 2020. Available at: https://microleaves.com/

Review

For citations:

SAMOKHVALOV D.I. Machine Learning-Based malicious users’ detection in the VKontakte social network. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2020;32(3):109-117. https://doi.org/10.15514/ISPRAS-2020-32(3)-10

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Machine Learning-Based malicious users’ detection in the VKontakte social network

Full Text:

Abstract

Keywords

About the Author

References

Review

For citations:

Cookies policy