Synthesis of a Machine Learning Model for Detecting Computer Attacks Based on the CICIDS2017 Dataset

Maxim Nikolaevich GORYUNOV; Andrey Georgievich MATSKEVICH; Dmitry Aleksandrovich RYBOLOVLEV

doi:10.15514/ISPRAS-2020-32(5)-6

Synthesis of a Machine Learning Model for Detecting Computer Attacks Based on the CICIDS2017 Dataset

Maxim Nikolaevich GORYUNOV, Andrey Georgievich MATSKEVICH, Dmitry Aleksandrovich RYBOLOVLEV

https://doi.org/10.15514/ISPRAS-2020-32(5)-6

Full Text:

PDF (Rus)

Generate QR code

Abstract

The paper deals with the construction and practical implementation of the model of computer attack detection based on machine learning methods. Among available public datasets one of the most relevant was chosen - CICIDS2017. For this dataset, the procedures of data preprocessing and sampling were developed in detail. In order to reduce computation time, the only class of computer attacks (brute force, XSS, SQL injection) was left in the training set. The procedure of feature space construction is described sequentially, which allowed to significantly reduce its dimensions - from 85 to 10 most important features. The quality assessment of ten most common machine learning models on the obtained pre-processed dataset was made. Among the models (algorithms) that demonstrated the best results (k-nearest neighbors, decision tree, random forest, AdaBoost, logistic regression), taking into account the minimum time of execution, the choice of random forest model was justified. А quasi-optimal selection of hyper parameters was carried out, which made it possible to improve the quality of the model in comparison with the previously published research results. The synthesized model of attack detection was tested on real network traffic. The model has shown its validity only under the condition of training on data collected in a specific network, since important features depend on the physical structure of the network and the settings of the equipment used. The conclusion was made that it is possible to use machine learning methods to detect computer attacks taking into account these limitations.

Keywords

information security, intrusion detection system, machine learning, decision tree, random forest, network traffic, computer attack

About the Authors

Maxim Nikolaevich GORYUNOV

The Academy of Federal Security Guard Service of the Russian Federation
Russian Federation
Candidate in Engineering Sciences

Andrey Georgievich MATSKEVICH

The Academy of Federal Security Guard Service of the Russian Federation
Russian Federation
Candidate in Engineering Sciences, Associated Professor

Dmitry Aleksandrovich RYBOLOVLEV

The Academy of Federal Security Guard Service of the Russian Federation
Russian Federation
Candidate in Engineering Sciences

References

1. Lee K.-F. AI Superpowers: China, Silicon Valley, and the New World Order. Houghton Mifflin Harcourt, 2018, 272 p.

2. Talabis M, McPherson R., Miyamoto I., Martin J. Information Security Analytics. Elsevier, 2015, 166 p.

3. Sumeet D., Xian D. Data Mining and Machine Learning in Cybersecurity. Auerbach Publications, 2011, 223 p.

4. Шелухин О.И., Ванюшина А.В., Габисова М.Е. Фильтрация нежелательных приложений интернет-трафика с использованием алгоритма классификации Random Forest. Вопросы кибербезопасности, № 2 (26), 2018 г., стр. 44-51. / Sheluhin O., Vanyushina A., Gabisova M. The Filtering of Unwanted Applications in Internet Traffic Using Random Forest Classification Algorithm. Voprosy kiberbezopasnosti, № 2 (26), 2018, pp. 44-51 (in Russian).

5. Kanimozhi V., Jacob T.P. Artificial Intelligence based Network Intrusion Detection with hyper-parameter optimization tuning on the realistic cyber dataset CSE-CIC-IDS2018 using cloud computing. ICT Express, vol. 5, issue 3, 2019, pp. 211-214.

6. Kostas K. Anomaly Detection in Networks Using Machine Learning. Master thesis. School of Computer Science and Electronic Engineering, University of Essex, 2018, 70 p.

7. Scikit-learn documentation. Random forest classifier. Available at: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier, accessed 16.08.2020.

8. Intrusion Detection Evaluation Dataset (CICIDS2017). Available at: https://www.unb.ca/cic/datasets/ids-2017.html, accessed 16.08.2020.

9. Panigrahi R., Borah S. A detailed analysis of CICIDS2017 dataset for designing Intrusion Detection Systems. International Journal of Engineering & Technology, vol 7, no 3.24, 2018, pp. 479-482..

10. Sharafaldin I., Lashkari A.H., Ghorbani Ali A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proc. of the 4th International Conference on Information Systems Security and Privacy (ICISSP), 2018, pp. 108-116.

11. Leskovec J., Rajaraman A., Ullman J. Mining Of Massive Datasets. Cambridge University Press, 2014. 476 p.

12. Domingos P. A Few Useful Things to Know about Machine Learning. Communications of the ACM, vol. 55, № 10, 2012. pp. 78-87.

13. Lashkari H. Characterization of Tor Traffic Using Time Based Features. In Proc. of the 3rd International Conference on Information System Security and Privacy, 2017, pp. 253-262.

14. McAfee A., Brynjolfsson E. Machine, Platform, Crowd. W.W. Norton & Company, 2017. 416 p.

Review

For citations:

GORYUNOV M.N., MATSKEVICH A.G., RYBOLOVLEV D.A. Synthesis of a Machine Learning Model for Detecting Computer Attacks Based on the CICIDS2017 Dataset. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2020;32(5):81-94. (In Russ.) https://doi.org/10.15514/ISPRAS-2020-32(5)-6

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS)

Synthesis of a Machine Learning Model for Detecting Computer Attacks Based on the CICIDS2017 Dataset

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy