Machine Learning Use Cases in Cybersecurity

. The problem regarding the use of machine learning in cybersecurity is difficult to solve because the advances in the field offer many opportunities that it is challenging to find exceptional and beneficial use cases for implementation and decision making. Moreover, such technologies can be used by intruders to attack computer systems. The goal of this paper to explore machine learning usage in cybersecurity and cyberattack and provide a model of machine learning-powered attack.


Introduction
Cybersecurity is gaining more and more attention each Cybersecurity is gaining more and more attention each year.The number of cyberattacks has significantly increased since 2009 due to the digitalization of everything in the modern world.According to the Gartner Hype Cycle [1], machine learning (ML) is of great interest in the world of technology.ML is concerned with intelligent behaviour in a system, including perception, reasoning, learning, communication and acting in a complex environment [2].Such widespread interest in ML is due to two critical factors: First, it can automate processes that previously required human participation, for example, control of robotic mechanisms in production (i.e.ML assumes human responsibilities).Second, it can quickly process and analyze huge amounts of information and calculate options using many variables.In these areas, ML provides qualitatively better results compared to humans.ML has much to offer cybersecurity.Current implementations are widely used in IDS systems, sandbox systems and many different areas of cybersecurity -from threat intelligence data collection to advanced automated digital forensics.In fact, 71% of US businesses plan to use ML in their cybersecurity tools in 2019 [3] as over one-third (36%) [3] of organizations experienced damaging cyberattacks in 2018.The majority (83%) [3] confides that cybercriminals use ML to attack organizations.The problem of ML use in cybersecurity is difficult to solve because the advances in the field offer so many opportunities that it is hard to find good and beneficial use cases for implementation and decision making.Moreover, it is difficult to determine how secure a security system is, which is used in production, and how to protect the organization from cyberattacks conducted through ML.The main goal of the current work explore ML usage in cybersecurity and research use cases related to the adversary's use of ML in cyberattacks.

Basic definitions
ML is the process by which machines learn from given data, building the logic and predicting output for a given input [4].ML has three sub-categories: supervised learning, unsupervised learning and reinforcement learning [5].Supervised learning uses a dataset labelled with the correct answers for study.Such labels identify the characteristics of each dataset.Once the model is trained, it can start predicting or deciding on new data that is given to it.In unsupervised learning, there is no need for such a marked set of data.Once the model is given the dataset, it automatically finds patterns and relationships by creating clusters in it.However, such type of learning cannot predict anything.When new data is added, the model assigns them to one of the existing clusters or creates a new one.Reinforcement learning is the ability of a system to interact with the environment and identify the best outcome.The system is either rewarded or penalized with a point for a correct or a wrong answer, and based on positive reward points gained, the model trains itself.Similarly, once trained, it prepares to predict new data presented to it.Deep learning (DL) is a class of ML algorithms [6] that uses multiple layers to progressively extract higher-level features from the raw input.The main differences between ML and DL are as follows: ML algorithms almost always require structured data, whereas DL networks rely on layers of the Artificial Neural Networks (ANNs).Often in ML, human intervention is necessary to produce further outputs with more sets of data, while in DL, this is not necessary.One of the central concepts in DL is ANNs.The ANN is a model that is built on the principle of organization and the functioning of the human brain (i.e.networks of nerve cells in a living organism).In other words, a neural network algorithm tries to create a function to map one's input to one's desired output.Neural networks (NNs) are typically organized in layers (fig.1).Layers consist of a number of interconnected 'nodes' that contain an 'activation function'.Patterns are presented to the network via the 'input layer', which communicates to one or more 'hidden layers' where the actual processing is done via a system of weighted 'connections'.The hidden layers then link to an 'output layer' where the answer is the output.For example, in image processing, lower layers may identify edges, while higher layers may identify concepts relevant to a human, such as digits, letters or faces.If NNs have more than two hidden layers, they are called deep neural networks (DNNs) [7].DNN is used for image recognition, speech recognition and other applications.Moreover, technologies have been created to generate new photographs that look at least superficially authentic to human observers through many realistic characteristics.For example, there is a known attempt to synthesize photographs of cats that has misled an expert to think they are real ones [8].This is an example of the technology called generative adversarial network (GAN), an ML algorithm of unsupervised learning built on a combination of two NNs: one network G (generator) generates new examples and one network D (discriminator) tries to classify examples as either real or fake (generated) [9].

Fig. 2. CRISP-DM process of data mining
One of the processes that is inextricably linked with ML and DL is data mining.Using data mining in large datasets can identify new patterns by utilizing statistics and database systems methods [10].The Cross-Industry Standard Process for Data Mining (CRISP-DM) describes the cross-industry process for data mining [11].CRISP-DM breaks the process into six main phases: business understanding, data understanding, data preparation, modelling, evaluation and deployment (fig.2).The first two phases are connected to each other.Their main aim is to determine the goals of the project, set the task for ML and collect data.These aims can be adjusted based on the data.The next phase refers to the process of working with data: cleaning the data, combining the data, if necessary, and formatting the data.In the modelling phase, various modelling techniques are applied to the data.Models are built, and their parameters are adjusted to optimal values.Because of special data requirements in different models, we can return to the Data Preparation phase.In the evaluation phase, the model has already been built, and quantitative assessments of its quality have been obtained.Before implementing this model, we need to make sure that we have achieved all business goals.Depending on the requirements, the deployment phase may be simple (e.g.preparation of the final report) or complex (e.g.automation of the data analysis process to solve business problems).

Using ML for protection
The scope of ML usage in cybersecurity is huge, starting with identifying anomalies and suspicious or unusual behaviours and ending with detecting zero-day vulnerabilities and patching known ones.Dilek et al. [12] presented the most comprehensive review of applications of ML techniques.Reathi and Malathi [13] presented a set of ML algorithms trained on the NSL-KDD intrusion detection dataset for misuse detection.Meanwhile, Buczak et al. [14] focused on network intrusion detection using ML.Melicher et al. [15] proposed using NNs to check password guessing resistance.They compressed the model to hundreds of kilobytes and developed a client-side JavaScript tool.The similar experiment was conducted by Ciaramella et al. [16].To proactively check the strength of passwords, they use NNs, such as Multilayer Perceptron (MLP) and Single Layer Perceptrons (SLPs).Notably, MLPs provide better results than SLPs when testing datasets.Moreover, the number of layers equal to 10, and thus obtains better result.User and entity behaviour analytics (UEBA) use ML capabilities to analyze behaviour logs and network traffic in real-time and respond appropriately in the event of an attack [17].This process is done by getting the user to log in again, blocking an attack or assessing risk levels and alerting the company's information security officers so that they can take necessary action.Most of the ML and DL methods, such as ensemble learning, clustering, and decision tree, [18] are used to detect misuse, anomaly and hybrid cyber intrusion.As mentioned in the Eugene Kaspersky Official Blog [19], Kaspersky detects 99% of cyber threats using ML technology.The time interval between the disclosure of suspicious behaviour on the protected device and the release of the corresponding new 'tablet' lasts an average of 10 minutes.DARPA collaborated with BAE Systems to develop a system that allow us to configure sensors and apply protective measures 'at machine speed'.This initiative called the CHASE program, which stands for Cyber Hunting at Scale, seeks to develop automated tools to detect and characterize novel attack vectors, collect the right contextual data, and disseminate protective measures both within and across enterprises [20].Cyberattacks performed by hacktivists relate to a common opinion about high-profile news.Information gathered from social media can help predict such incidents using NLP and ML techniques [21].
Moreover, we can use ML to identify the author of the program.Rachel Greenstadt and Aylin Caliskan developed a system that can 'deanonymize' programmers [22] by analyzing source code or compiled binary files [23].Identifying the developer of malware is now much easier.Another way to monitor systems and networks for malicious activity or policy violation is through the intrusion detection system (IDS).Intrusion prevention system (IPS) is a system connected with IDS; these systems perform intrusion detection and stop the detected incidents.Both systems use supervised and unsupervised ML techniques to detecting point anomaly, contextual anomaly, and collective anomaly [24].The main task of firewalls [25] is to ensure a network security system that monitors and controls incoming and outgoing network traffic.Firewalls allow or block traffic by comparing its characteristics with predefined patterns (i.e.firewall rules).In their paper, Ucar and Ozhan [26] presented the result of the automatic detection of anomalies in firewall rule repository based on ML and high-performance computing methods, such as Naive Bayes, kNN, Decision Table and HyperPipes.All six firewall rules from the given 93 rules were detected by the system and verified by the experts as an anomaly.Firewalls filter the content between servers, and there is also a solution specifically meant for the content of web applications.Web application firewall (WAF) is deployed in front of web applications; it analyzes bi-directional web-based (HTTP) traffic and detects and blocks anything malicious [27].WAF prevents vulnerabilities in web applications from being exploited by outside threats.To implement such functionality in WAF, developers use regular expressions, tokens, behavioural analysis, reputation analysis and ML technologies [27].Among ML methods, special predictive ones can also be used for data loss/leak prevention (DLP) to reduce the risk for breaches or leaks [28].DLP software solutions allows us to set business rules that classify confidential and sensitive information so that they cannot be disclosed maliciously or accidentally by unauthorized end users.This process can be done by using supervised learning algorithms and two types of examples: positive examples (i.e.content that needs to be protected) and counterexamples (i.e.documents that are similar to the positive set but should not be protected).

Using ML in cyberattacks
This section describes how cyberattack can succeed using ML.Automated vulnerability scanning is one of the most obvious and common tasks in a cyberattack.For example, CSRF is found in only 5% of applications, as reported in the 2017 OWASP Top 10, because most frameworks include CSRF defences [29].Accordingly, Calzavara et al. presented Mitch [30], the first ML-based tool for the black-box detection of CSRF, which allows the identification of 35 new CSRF vulnerabilities on 20 websites from the Alexa Top 10,000 websites and three previously undetected CSRF vulnerabilities on production software already analyzed with the state-of-the-art tool Deemon [31].Mitch is a binary classifier, labelling sensitive or insensitive requests using a random forest algorithm on a 49-dimensional feature space.Compared to the heuristic classifiers BEAP [32] and CsFire [33], Mitch shows the best F1-score and precision (Table 1).Marketers use ML methods for profiling.Trustwave released an open source intelligence tool that uses face recognition to automatically track subjects across social media networks [34].Facial recognition aids this process by removing false positives in the search results, making data review faster for a human operator.Using collected data about the target, an attacker can hook a victim with specially created fake news.ML tools can help identify fake news, but to do so, researchers confirm that the best way is for that ML to learn to create fake news itself [35].As such, they created a model for controllable text generation called Grover.In the research process, four classes of articles were used: human news, machine news, human propaganda and machine propaganda.Workers on Amazon Mechanical Turk rated each article, including overall trustworthiness.In the case of propaganda, the score increased from 2.19 (out of 3) on articles created manually to 2.42 on articles created by a machine.SNAP_R was introduced at DEFCON 24.SNAP_R is the world's first automated end-to-end spear-phishing campaign generator for Twitter [36].While previous tools were based on models with Markov chains, SNAP_R is based on a recurrent NN with LSTM architecture.Using Twitter as an environment offers some advantages for automatically generating text.For example, limiting the length of a post decreases the probability of grammatical errors.Moreover, Twitter links are often shortened, which allows masking of malicious domains.This, in turn, significantly increased the success rate from 5-14% on Markov chain-based tools [37,38] to 30-66%, which is comparable to the 45% rate for manual spear-phishing [39].
In   Another example use case for GAN in cybersecurity is the password guessing attack.There is a new way of generating password guesses based on DL and generative adversarial networks known as PassGAN [42].The key difference in this approach is that NNs do not need a priori knowledge of the structure of passwords, in contrast to approaches based on rules, Markov models [43] and FLA [15].PassGAN uses the improved training of Wasserstein GANs (IWGAN) of Gulrajani et al. [44] with the ADAM optimizer [45].The generator and the discriminator in PassGAN are built from ResNets [46].The architecture of the generator and the discriminator are shown in fig. 4 and fig. 5 [42], while residual block representation is shown in fig.
Traditional botnets wait for commands from the C&C, but now, attackers use automation to make decisions independently.Fortinet researchers predicted that cybercriminals will replace botnets with intelligent clusters of compromised devices called hivenets, a type of attack that is able to leverage peer-based self-learning to target vulnerable systems with minimal supervision [52].
In the initial stages of an attack, attackers often face the challenge of bypassing captcha.Suphannee et al. [53] designed a low-cost attack that uses DL technologies for the semantic annotations of images.The system requires about 19 seconds per challenge to solve challenges, with an accuracy of 70.78% for reCaptcha [54] and 83.5% for the Facebook image captcha.The system has to automatically identify which of the given images are semantically similar to the sample image.First, the system collects information for all the images through Google Reverse Images Search (GRIS) [55]; Clarifai [56], which is built on deconvolutional networks [57]; TDL [58], which is based on deep Boltzmann machines [59]; NeuralTalk [60] and Caffe [61].Next, 198 if a hint is not provided, the system searches for the sample image in the labelled dataset to obtain one, if possible.

Fully ML-powered cyberattack
As mentioned in the previous section, ML-powered cyberattacks are not a hypothetical future concept.This section describes how an automated cyberattack can be carried out using ML.We considered two scenarios for the weaponization and delivery stages: First, in the case of humanless intrusion, attackers can use a similar tool but utilize information provided by Shodan [62] or Mitch [30] instead of features obtained using a computer vision.Second, attackers can use social engineering, using tools for profiling and for spear-phishing described in the previous section [34,35] and creating click-bytes links to infect the victim [35,36].For automated exploit generation, adversaries can use open-sourced angr [63] framework developed by Shellphish and combine it with MalGAN to bypass defensive systems.
In the post-exploitation stage, attackers can guess stolen passwords using PassGAN [42].The newest method is using intelligent evasion techniques proposed by Darktrace researchers [64] and further self-propagating with a series of autonomous decisions.It is also possible to turn infected systems into a hivenet [52].
As these examples demonstrate, ML can help hackers in every stage of the attack.With the advance level of development of the cybercriminal infrastructure, an advanced attack requires no hands-on-keyboard such as the case at present.

Conclusion
When introducing an ML-based system, we should remember that ML is not a panacea.No system is safe.Under certain conditions, ML both protects vulnerabilities and creates new gaps.ML can be compared to a dog: 'Machine learning can do anything you could train a dog to dobut you're never totally sure what you trained the dog to do'.We should also note the consequences that more active implementation of ML can bring: First, automation and the resulting loss of human jobs and second, inevitable conflict with the existing legal framework, for example, when using technologies to prevent cybercrime or cyberterrorism.
In such a situation, the accused is implicated for crimes that have not yet been committed, which are not regulated by any legal norm.Moreover, some of the information learned by ML may be private or confidential, which violates laws in some countries.Similarly, poor quality or inadequate quantity of ML in the cybersecurity of data on predictions are based can lead to wrong decisions and irreparable mistakes.

Fig. 1 .
Fig. 1.Neural networks most cases, attackers do not know the malware detection algorithm but can figure out features it uses through carefully designed test cases in the black-box algorithm.MalGAN is a generative adversarial network-based algorithm that generates adversarial malware examples that are able to bypass black-box ML-based detection models.It can decrease the detection rate to nearly zero and make it hard for the retraining-based defensive method against adversarial examples to work [40].The architecture of MalGAN is shown in fig. 3 [40].

Figure 3 .
Figure 3. Architecture of MalGANThe generator takes the malware feature vector and the noise vector to transform the former into its adversarial version.Substitute detector is used to fit the black-box detector and provide gradient information to train the generator.Both nets are represented as multi-layer feed-forward ANNs.Adversarial examples tested against the black-box detector according to different ML methods trained on 160-dimensional binary feature vectors representing system API calls include random forest, logistic regression, decision trees, support vector machines, and multi-layer perceptron as well as a voting-based ensemble of these algorithms.All these classifiers detect over 90% of original samples, but random forest and decision trees show the best result of less than 0.20% on adversarial examples.Anti-malware vendors retrain detectors after exploring such undetected examples, but MalGAN only needs one epoch retraining to obtain a 0% true positive rate.Kawai et al. later proposed some performance improvements[41].