In recent years, ISP RAS has been developing a system for machine (binary) code deductive verification. The motivation is rather clear: modern compilers, such as GCC and Clang/LLVM, are not free of bugs; thereby, it is not superfluous (at least for safety- and security-critical components) to check the correctness of the generated code. The key feature of the suggested approach is the ability to reuse source-code-level formal specifications (pre- and postconditions, loop invariants, lemma functions, etc.) at the machine code level. The tool is highly automated: provided that the target instruction set is formalized, it disassembles the machine code, extracts its semantics, adapts the high-level specifications, and generates the verification conditions. The system utilizes a number of third-party components including a source code analyzer (Frama-C), a machine code analyzer (MicroTESK), and an SMT solver (CVC4). The modular design enables replacing one component with another when switching an input format and/or a verification engine. In this paper, we discuss the tool architecture, describe our implementation, and present a case study on verifying the memset C library function.
SharpChecker is an industrial level static analyzer, which is aimed at detection of various bugs in C# source code. Because the tool is actively developed, it requires more and more precise information about program environment, especially about results and side-effects of library functions. The paper is devoted to the evolution of models for the standard library historically used by SharpChecker, its advantages and drawbacks. We have started from SQLite database with the most important functions properties, then introduced manually written C# model implementations of frequently used methods to add support of data container states and have recently developed a model, built by a preliminary analysis of library source code, which allows to gather all significant side-effects with conditions for almost whole C# library.
Writing static analyzers is hard due to many equivalent transformations between program source, intermediate representation and large formulas in Satisfiability Modulo Theories (SMT) format. Traditional methods such as debugger usage, instrumentation, and logging make developers concentrate on specific minor issues. At the same time, each analyzer architecture imposes a unique view on how to represent the intermediate results required for debugging. Thus, error debugging remains a concern for each static analysis researcher. In this paper, our experience debugging a work-in-progress industrial static analyzer is presented. Several most effective techniques of constructive (code generation), testing (random test case generation) and logging (log fusion and visual representation) groups are presented. Code generation helps avoid issues with the copied code, we enhance it with the verification of the code usage. Goal-driven random test case generation reduces the risks of developing a tool highly biased towards specific syntax construction use cases by producing verifiable test programs with assertions. A log fusion merges module logs and sets up cross-references between them. The visual representation module shows a combined log, presents major data structures and provides health and performance reports in the form of log fingerprints. These methods are implemented on a basis of Equid, the static analysis framework for industrial applications, and are used internally for development purposes. They are presented in the paper, studied and evaluated. The main contributions include a study of failure reasons in the author's project, a set of methods, their implementations, testing results and two case studies demonstrating the usefulness of the methods.
This article is related to code generation for floating-point arithmetics in the MIPS architecture. This work is a part of the «RuC» project. It is specialized only in code generation for operations with floatting-point numbers. This paper does not consider lexical, syntactic, and species-specific analyses.
In the past ten years, rapid progress has been observed in science and technology through the development of smart mobile devices, workstations, supercomputers, smart gadgets and network servers. Increase in the number of Internet users and a multiple increase in the speed of the Internet led to the generation of a huge amount of data, which is now commonly called «big data». Given this scenario, storing and processing data on local servers or personal computers can cause a number of problems that can be solved using distributed computing, distributed data storage and distributed data transfer. There are currently several cloud service providers to solve these problems, like Amazon Web Services, Microsoft Azure, Cloudera and etc. Approaches for distributed computing are supported using powerful data processing centers (DPCs). However, traditional DPCs require expensive equipment, a large amount of energy to run and operate the system, a powerful cooling system and occupy a large area. In addition, to maintain such a system, its constant use is necessary, because its stand-by is economically disadvantageous. The article is aimed at the possibility of using a Raspberry Pi and Hadoop cluster for distributed storage and processing of «big data». Such a trip provides low power consumption, the use of limited physical space, high-speed solution to the problems of processing data. Hadoop provides the necessary modules for distributed processing of big data by deploying Map-Reduce software approaches. Data is stored using the Hadoop Distributed File System (HDFS), which provides more flexibility and greater scalability than a single computer. The proposed hardware and software data processing system based on Raspberry Pi 3 microcomputer can be used for research and scientific purposes at universities and scientific centers. Considered distributed system shows economically efficiency in comparison to traditional DPCs. The results of pilot project of Raspberry Pi cluster application are presented. A distinctive feature of this work is the use of distributed computing systems on single-board microcomputers for academic purposes for research and educational tasks of students with minimal cost and ease of creating and using the system.
During the development and maintenance of complex network infrastructure for a big project, developers face a lot of problems. Although there exist plenty of tools and software that helps to troubleshoot such problems, their functionality is limited by the API that Linux kernel provides. Usually, they are narrowly targeted on solving one problem and cannot show a system-wide network stack view, which could be helpful in finding the source of the malfunction. This situation could be changed with the appearance of a new type of tools powered by the Linux kernel's eBPF technology, which provides a flexible and powerful way to run a userspace code inside the kernel. In this paper, an approach to tracing the path of network packets in the Linux kernel using eBPF is described.
This paper considers the OpenFlow 1.3 switch based on a programmable network processing unit (NPU). OpenFlow switch performs flow entry lookup in a flow table by the values of packet header fields to determine actions to apply to incoming packet (classification). In the considered NPU assembly language, lookup operation may be implemented on the basis of search trees. But these trees cannot be directly used for OpenFlow classification because of compared operands width limitation. In this paper, we propose flow table representation designed for easy translation into NPU search trees. Another goal was to create a compact program that fits in NPU memory. Another NPU limitation requires program updating after each flow table modification. Consequently, the switch must maintain the current flow table state to provide a fast NPU program update. We developed algorithms for incremental update of flow table representation (flow addition and removal). To evaluate the proposed flow table translation approach, a set of flow tables was translated into NPU assembly language using a simple algorithm (based on related work) and an improved algorithm (our proposal). Evaluation was performed on the NPU simulation model and showed that our approach effectively reduces program size.
A lot of people nowadays use online education platforms. Most of them run on the free «Open edX» open-source software platform. Using the logs that the platform provides us, we can get psychometrics of students, which can be used to improve the presentation of material or other things, which can increase the quality of online courses. We provide a ready-to-use tool that will help figure out how and for what purpose you can analyze the log files of platforms based on «Open edX».
Currently, a large number of people use various photo hosting services, social networks, online services, and so on. At the same time, users leave a lot of information about themselves on the Internet. These can be photos, comments, geotags, and so on. This information can be used to create a system that can identify different target groups of users. In the future, you can run ad campaigns based on target groups, create recommendation ads, and so on. This article will discuss a system that allows users to identify their interests based on their actions in a social network. The following features were selected for analysis: published photos and text, comments on posts, information about favorite publications, and geotags. To identify target groups, the task was to analyze images in photos and analyze text. Image analysis involves object recognition, and text analysis involves highlighting the main theme of the text and analyzing the tone of the text. The analysis data is combined using a unique identifier with the rest of the information and allows you create a data showcase that can be used to select target groups using a simple SQL-query.
This paper presents a machine learning-based approach for detection of malicious users in the largest Russian online social network VKontakte. An exploratory data analysis was conducted to determine the insights and anomalies in a dataset consisted of 42394 malicious and 241035 genuine accounts. Furthermore, a tool for automated collection of the information about malicious accounts in the VKontakte online social network was developed and used for the dataset collection, described in this research. A baseline feature engineering was conducted and the CatBoost classifier was used to build a classification model. The results showed that this model can identify malicious users with an overall 0.91 AUC-score validated with 4-folds cross-validation approach.
Current work is focused on the processing of medical images obtained by performing a pathomorphological analysis of preparation. The algorithms for processing images of nuclei of light and confocal microscopy and tissue of light microscopy were considered in particular. The application of the proposed algorithms and software for detecting pathologies was justified.
Smart cities are a kind of umbrellas of different technologies for responding to the problem of increasing urban population. The priority of intelligent electronic cities is a strategy to collecting information about the city and its smart use to improve the provided services to citizens or to create new services. These smart cities have weather forecast, urban monitoring, pollution monitoring and various applications. Traffic is a major challenge for electronic cities and coping with it requires analyzing traffic congestion in the city road network. The data transmission with wireless signals in smart cities is one of the challenges because construction of high buildings and barriers reduces the power and quality of the signal. Widespread use of wireless signals and equipment may lead to interference and reduce service quality. Therefore, in order to solve the traffic problem, it is necessary to achieve traffic congestion levels by collecting information, especially with wireless signals so that it can be programmed to control and manage traffic. In this paper, the performance index of vehicle speed was estimated to evaluate the conditions of road networks. This study analyzes the traffic density for the main network of Hamedan communication routes based on the collected data of Speed performance of Hamedan traffic control system. According to this analysis, the congestion index and traffic peak hours were determined. Also the relationship between vehicle speed and traffic congestion was predicted by neural network and the genetic algorithm. In this study areas of traffic were identified using Hamedan Traffic Control Center according with the speed of vehicles.
Integration of computing systems and methods in legal activity allows to extract benefits such as resource saving, increase objectivity, completeness and accuracy of intellectual results. Understanding of the main scientific achievements and trends at the intersection of computer and legal sciences focuses on promising scientific and technological areas of informatization of law. This review systematizes the most important achievements at the intersection of legal and computer sciences, covering foreign and domestic scientific publications for 1949–2020. Researches of computer methods and systems initiated by American jurimetrics were aimed at storing, indexing, abstracting and searching for legal texts and led to the creation of legal information retrieval systems. Domestic legal cybernetics became a pioneer in the field of automation of criminalistics expert examination. Computer modeling of legal reasoning developed on the techniques of artificial intelligence, later it shifted to the field of legal dialogue and conflict of legal arguments, but in recent years it transforms the acquired experience on the basis of modern computer models, patterns and architectures. The popularization of decision-making support systems has provided multi-tasking systems, including information retrieval, legal reasoning, analytics, predictions and control. The latest research segments are focused on the application of machine learning and big data processing. Almost all successful methodological solutions retain their significance, continuing to be applied directly or as the basis for the further development of computational methods and information systems in legal activity.
ISSN 2220-6426 (Online)