Applying Time Series to The Task of Background User Identification Based on Their Text Data Analysis
https://doi.org/10.15514/ISPRAS-2015-27(1)-8
Abstract
About the Authors
V. Y. KorolevRussian Federation
Department of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow, 119991, Russia.
A. Y. Korchagin
Russian Federation
Department of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow, 119991, Russia.
I. V. Mashechkin
Russian Federation
Department of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow, 119991, Russia.
M. I. Petrovskiy
Russian Federation
Department of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow, 119991, Russia.
D. V. Tsarev
Russian Federation
Department of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow, 119991, Russia.
References
1. R.V. Yampolskiy, V. Govindaraju, Behavioural biometrics: a survey and classification. International Journal of Biometrics (IJBM), Vol. 1, No. 1, 2008.
2. Vremennoi ryad [Time Series]. March 24 2015. (http://www.machinelearning.ru/wiki/index.php?title=Временной_ряд) (in Russian)
3. I.V. Mashechkin, M.I. Petrovskiy, D.V.Tsarev. Metody vychislenija relevantnosti fragmentov teksta na osnove tematicheskix modelej v zadache avtomaticheskogo annotirovanija [Methods of text fragment relevance estimation based on the topic model analysis in the text summarization problem]. Vychislitel’nye Metody i Programmirovanie [Numerical Methods and Programming], 2013, vol. 14, pp. 91–102. (in Russian).
4. I.V. Mashechkin, M.I. Petrovskiy, D.S. Popov, D.V. Tsarev. Automatic text summarization using latent semantic analysis. Programming and Computer Software, 2011, pp. 299-305.
5. D.V. Tsarev, M.I. Petrovskiy, I.V. Mashechkin. Using NMF-based text summarization to improve supervised and unsupervised classification. 11th International Conference on Hybrid Intelligent Systems (HIS), 2011. Malacca, MALAYSIA. P. 185-189.
6. D.V. Tsarev, M.I. Petrovskiy I.V. Mashechkin. Supervised and Unsupervised Text Classification via Generic Summarization. International Journal of Computer Information Systems and Industrial Management Applications. MIR Labs, Volume 5, 2013, pp. 509-515.
7. I.V. Mashechkin, M.I. Petrovskiy, D.S. Popov, D.V. Tsarev. Applying Text Mining Methods for Data Loss Prevention. Programming and Computer Software. January 2015, Volume 41, Issue 1, pp 23-30.
8. C.D. Manning, P. Raghavan, H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008.
9. A. Mirzal. Converged Algorithms for Orthogonal Nonnegative Matrix Factorizations. CoRR abs/1010.5290, 2010.
10. Wei Xu, Xin Liu, Yihong Gong. Document clustering based on non-negative matrix factorization. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, Toronto, Canada, 2003.
11. Chris Ding, Tao Li, Wei Peng, Haesun Park. Orthogonal Nonnegative Matrix Tri-Factorizations for Clustering. SIGKDD, 2006.
12. M.W. Berry, M. Browne, A.N. Langville, V.P. Pauca, R.J. Plemmons. Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics and Data Analysis, pp. 155-173, 2007.
13. J. Yoo, S. Choi. Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds. Intelligent Data Engineering and Automated Learning – IDEAL 2008, vol. 5326 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2008, pp. 140–147.
14. C. Meek, D.M. Chickering, D. Heckerman. Autoregressive Tree Models for Time-Series Analysis, 2002. (http://go.microsoft.com/fwlink/?LinkId=45966)
15. Tekhnicheskii spravochnik po algoritmu vremennykh ryadov (Microsoft) [Microsoft Time Series Algorithm Technical Reference]. (http://msdn.microsoft.com/ru-ru/library/bb677216.aspx) (in Russian)
16. T. Hastie, R. Tibshirani, G. Sherlock, M. Eisen, P. Brown, D. Botstein. Imputing Missing Data for Gene Expression Arrays. Technical report, Stanford Statistics Department 1999.
17. O. Troyanskaya. Missing value estimation methods for DNA microarrays. Bioinformatics, , vol. 17, no. 6, 2001. pp. 520-525.
18. D.V. Tsarev, R.V. Kurynin, M.I. Petrovskiy, I.V. Mashechkin. Applying non-negative matrix factorization methods to discover user’s resource access patterns for computer security tasks. Proceedings of the 2014 International Conference on Hybrid Intelligent Systems (HIS 2014). IEEE Computer Society [New York], United States, 2014. pp. 43–48.
19. D. Lee, S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401, 1999. pp. 788-791.
20. Enron Email Dataset. March 24 2015. (http://www.cs.cmu.edu/~./enron/)
21. Natural Language Toolkit (NLTK). March 24 2015. (http://www.nltk.org)
22. M. Kendall, A. Stuart. Statisticheskie vyvody i svyazi [Statistical derivations and associations.]. M.: Nauka, 1973 (In Russian).
23. Krivaya oshibok [Receiver Operating Characteristic, ROC curve]. March 24 2015. (http://www.machinelearning.ru/wiki/index.php?title=ROC-кривая) (In Russian)
Review
For citations:
Korolev V.Y., Korchagin A.Y., Mashechkin I.V., Petrovskiy M.I., Tsarev D.V. Applying Time Series to The Task of Background User Identification Based on Their Text Data Analysis. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2015;27(1):151-172. (In Russ.) https://doi.org/10.15514/ISPRAS-2015-27(1)-8