A Multilayer Approach to Subgraph Matching in HP-graphs

Visual modeling is widely used nowadays, but the existing modeling platforms cannot meet all the user requirements. Visual languages are usually based on graph models, but the graph types used have significant restrictions. A new graph model, called HP-graph, whose main element is a set of poles, the subsets of which are combined into vertices and edges, has been previously presented to solve the problem of insufficient expressiveness of the existing graph models. Transformations and many other operations on visual models face a problem of subgraph matching, which slows down their execution. A multilayer approach to subgraph matching can be a solution for this problem if a modeling system is based on the HP-graph. In this case, the search is started on the higher level of the graph model, where vertices and hyperedges are compared without revealing their structures, and only when a candidate is found, it moves to the level of poles, where the comparison of the decomposed structures is performed. The description of the idea of the multilayer approach is given. A backtracking algorithm based on this approach is presented. The Ullmann algorithm and VF2 are adapted to this approach and are analyzed for complexity. The proposed approach incrementally decreases the search field of the backtracking algorithm and helps to decrease its overall complexity. The paper proves that the existing subgraph matching algorithms except ones that modify a graph pattern can be successfully adapted to the proposed approach.


Introduction
The study of any objects and processes, as well as their design, can barely be done without modeling; that is why software tools that allow specialists to build various models and formalize descriptions of objects and processes, or use modeling as a method of analysis, are becoming more popular. Models are described and built with the help of a visual modeling language, which is a fixed set of graphical symbols and rules for constructing visual models by using these symbols [1]. Visual languages can be represented as various types of graphs, including oriented graphs [2], hypergraphs [3], hi-graphs [4], meta-graphs [5] and P-graphs [6]. Previously, a new graph model, called HP-graph, was proposed as a formalism for representing visual languages [7]. This model unites expressive possibilities of all the mentioned graph types and, thus, it can be used for building more complicated models than those which can be built with the help of the other graph models. The paper [7] proved that this graph model allows the creation of a flexible visual model editor based on it. This model is proposed as a basis for domain-specific modeling, one of the key aspects of which is model transformations. Such transformations allow users to move from one level of abstraction to another (a vertical transformation) or from one modeling language to another (a horizontal transformation) [5]. Different approaches can be used to transform visual models, but the current standard is the algebraic approach which is based on the graph grammars [9]. Based on this approach, a transformation r = (L, R) includes the left and the right part, where L is a subgraph to be found in a source graph, and R is a subgraph replacing L in the source graph. As for the HP-graph, only main operations, including operations of adding and removing graph elements and operations of decomposition, were described for this model, and no algorithm were proposed to perform an isomorphic subgraph search operation. The structural complexity of the model requires modifying the existing algorithms to adapt them to this model. The HP-graph has a multilayer structure which consists of the layer of vertices and hyperedges and the layer of poles and links, sets of which are combined into the elements of the former layer. The multilayer structure of the graph model allows to reduce time complexity of search algorithms. The number of operations can be decreased due to the fact that the first search and matching is performed on the layer of vertices and hyperedges, and only after finding a subgraph with the desired characteristics, the algorithm moves to a more detailed level, where the already selected sets of corresponding poles and ordinary edges are compared. In practice, a task of finding an isomorphic subgraph has a wide range of applications, including chemical compound search [10], social network analysis [11], pattern recognition [12], and protein interaction analysis [13]. However, subgraph matching is a bottleneck in the overall performance for most of these applications due to the fact that this task is NP-hard [14]. For instance, nodes count  165 for protein structure analysis can reach up to tens of thousands [15]; that is why active efforts are currently being made to find an optimal algorithm for subgraph matching. In visual modeling the problem is the same. The thesis [5] proposes to represent all the models in the form of a single graph, which allows users to maintain links between the models and automatically propagate changes from the source model to the target ones associated with it. For instance, a change in the metamodel of the subject area should be propagated to all the models built on this metamodel. However, storing all the models as a single graph increases the computational complexity of the algorithms on this graph, which requires developing an efficient subgraph search algorithm for the graph model used. The contributions of these paper are: 1) a new multilayer approach to decrease complexity of subgraph matching algorithms, 2) a backtracking algorithm based on this approach, 3) applications of this approach in several existing subgraph matching algorithms. The paper is organized as follows. Section 2 discusses related work and the main algorithms for finding subgraph isomorphism. Section 3 presents the proposed graph model, definitions of the HPsubgraph and isomorphism of the HP-graphs, and the multilayer approach to subgraph matching. Section 4 introduces a backtracking algorithm based on this approach. Section 5 presents several applications of the approach in the existing subgraph matching algorithms. Section 6 describes the obtained results. Section 7 concludes the paper.

Related work
The problem of subgraph matching has been investigated for many years. The works of many scientists, such as [16]- [18], are dedicated to exploring applicability, time complexity and limitations of the existing subgraph matching algorithms. These algorithms are generally divided into two classes: Algorithms that observe many graphs {G1, Gn} and retrieve those which contain a query graph Q. Algorithms that observe a single graph G and retrieve all its subgraphs which are isomorphic to a query graph Q. In both of these approaches, algorithms can either return a correct and complete answer (having an exponential time complexity) or return an approximate answer (having a polynomial time complexity). While the complete answers describe all subgraphs exactly isomorphic to a pattern, the approximate answers are generally obtained using specific similarity measures and, thus, may also contain false positive subgraphs. This work belongs to the second class of the algorithms. Most of these algorithms use backtracking to move through the built search tree and find appropriate combination of corresponding vertices of the source graph and the graph-pattern. Algorithms in this class include Ullmann algorithm [19], VF2 [20] (and also VF2 Plus [21] and VF3 [22]), TurboISO [23], CFL-Match [24], QuickSI [25], SPath [26] and others. These algorithms implement various techniques to decrease time needed for the matching process. Exploiting Pruning Rules. The Ullmann algorithm uses refining procedure on each step of the algorithm by comparing degrees of corresponding neighbors of the added pair of vertices. VF2 [20] provides feasibility rules that are checked before a vertex is added to a graph-candidate. There rules check consistency of graph-candidates with this vertex and check for a sufficient number of verticesneighbors of these graph-candidates. SPath [26] uses neighborhood signature for each vertex to store information about the surrounding vertices. These signatures are compared with the corresponding signatures of the query graph and are used for search space pruning before subgraph matching. TurboISO [23] compares quantity of neighborhood labels of corresponding vertices and prune out unpromising ones. CFL-Match [24] proposes a compact-path-index (CPI) structure 166 presented as a tree which is built from the source graph vertices with the same labels as query graph vertices and then refined by exploiting matching operations. Graph Pattern Modification. The Ullmann algorithm and VF2 [20] do not modify graph pattern and search its embeddings in the source graph. SPath [26] changes the way of graph query processing from vertex-at-a-time to path-at-a-time, which tends to be more cost-effective than traditional graph matching methods. TurboISO [23] presents a NEC-tree structure which merges similar vertices together and present a query graph as a tree. CFL-Match [24] transform a query into a set of dense subgraphs, forests, and leaves. The source graph in this algorithm is only probed for non-tree edge validation, whereas other query parts are checked in the CPI structure. Optimizing Matching Order. The Ullmann algorithm [19] does not specify the matching order of the vertices, whereas VF2 [20] starts from a random query vertex and then recursively adds those vertices that are connected with the already matched ones. QuickSI [25] exploits an order which is based on the vertex label frequency, and the algorithm starts a process of matching from the least frequent ones. TurboISO [23] implements a concept of candidate region exploration and produces a matching order for every region where a NEC-tree was found. CFL-Match [24] present all candidates as a CPI-structure, where all the pattern embeddings are filtered and validated by traversing this tree structure. The most of theoretical research of this problem was conducted specifically for ordinary graphs [18]; that is why the approaches of these algorithms have to be adapted to an HP-graph model. In particular, this paper presents an adaptation of a standard backtracking algorithm for subgraph matching, the Ullmann algorithm [19] and the VF2 algorithm [20], which are optimized for the multilayer structure of this graph model.

Graph-Matching Approach for HP-graphs
Let Pol be a set of all poles of the graph, including external poles and internal poles of vertices and hyperedges. Then, an HP-graph is an ordered triple vm} is a non-empty set of vertices, W = {w1 wl} is a set of hyperedges [7]. An example of the graph model is demonstrated on fig. 1. Every hyperedge w of the HP-graph G can be presented by ordinary links, which are defined as a set Ew = {e1 en}, where every link (e Ew) is a pair of connected poles (p, r), where p is a source pole and r is a target pole of a link. An example of this decomposition is presented in Fig. 2. The hyperedge w2 defines a set Ew 2 = {(p4, p8), (p4, p6), (p6, p8)}. Every vertex and hyperedge can also be decomposed by a new HP-graph, which is described in detail in [7].

Definitions of a Subgraph and Isomorphism
To determine subgraph matching operations, it is needed to give a definition to a subgraph of the HP-graph. An HP-graph G' = (P', V', W') is a subgraph of an HP-graph G = (P, and meets the condition (1) to make transformation operations possible [7]. A subgraph can contain vertices called incomplete whose sets of poles are only part of the sets of poles of the vertices of the original graph: (1) the set V'partial is a set of the incomplete vertices in the graph, where V'partial V'. To define the isomorphism mapping, it is necessary to establish one-to-one correspondences between the same type elements of graphs that preserve the incident relations. This, two HP-graphs G = (P, V, W) and G' = (P', V', W') are isomorphic iff there exists a bijection f:

A Multilayer Approach to Graph Matching
As the graph model is proposed to store all the models together, search algorithms for this formalism have to be optimized for this task. A possible solution to this problem is to divide the HP-graph into two main levels: the level of vertices and hyperedges, and the level of poles and ordinary links between them. In this case, the search is started on the higher level, and when a candidate is found, it moves to the lower level, where a more detailed comparison of graph elements is performed. Fig. 3(a) illustrates an example of a query graph Q, which is a pattern for subgraph matching for a data G from Fig. 1. As is seen, it contains 4 vertices, 2 hyperedges and 4 poles. Its higher (or first) level is presented in fig. 3(b). It contains only 4 vertices and 2 hyperedges, whereas all the poles are eliminated. This layer is compared with the first layer of the graph G ( fig. 4), and when a potential subgraph is found, the matrix of vertex correspondence is built.
-Q Fig. 3. Query graph Q and its first level The found correspondences between vertices of Q and G can be presented as a set {(v1', v2), (v3', v3), (v2', v4), (v4', v5)}. If a subgraph is found, the algorithm moves to the next level, where the corresponding hyperedges and their poles are compared. All the candidate hyperedges are grouped by their incidence with each other depending on the poles which they consist of. For instance, hyperedges w1' and w2' are presented as a single group because of the pole p3' which both of them own. Thus, a corresponding pair (w3, w4) is also presented as a single group. All these groups are compared for exact isomorphism on the layer of poles and ordinary links. Fig. 5 demonstrates this layer for a pair of candidate groups (w1', w2') and (w3, w4). All these hyperedges are decomposed and only their poles and links are considered on this stage. As these graphs are identical, the found correspondences between poles of incident hyperedges of graphs Q and G can be presented as a set {(p3', p9), (p4', p11), (p2', p7), (p1', p4)}. If a validation on this hyperedge group is succeeded, the algorithm moves to the next group of hyperedges and validate them, until all the hyperedges are traversed. If a validation fails, the algorithm moves to the upper level and tries to find new pairs of vertices and hyperedges and validate them. Lastly, the algorithm verifies that for every pole of the pattern graph only one pole of the source graph has been found. Otherwise, the found subgraph is considered as not isomorphic and the search continues.

Backtracking Graph Matching Algorithm based on the Multilayer Approach
The algorithm presented in this section uses as a basis a backtracking algorithm presented in [19]. This algorithm traverses a search tree using DFS until an isomorphic subgraph is found. If a pair of corresponding elements cannot be found at a certain step, a transition to an earlier step is carried out.

Listing Pseudocode of the algorithm that matches the corresponding sets of graph elements
This algorithm at the beginning initializes a matrix M 0 which defines possible candidates between corresponding elements of graphs. If m 0 ij = 1 then the i-th element of the first graph is a candidate for isomorphism for the j-th element of the second graph. Otherwise, they cannot form a pair of corresponding elements. At each step, the modification of this matrix is used to determine appropriate pairs of elements. Thus, it is needed to define rules for building this matrix for each set of HP-graph elements. For vertices matching, external poles and vertices can be combined into one set and named as vertices (for simplification). Thus, the matrix M 0 = |QV QP| GV GP| is filled according to the rule (2); if this condition is not met, m 0 ij = 0:

HP-Listing Pseudocode of the algorithm that finds an isomorphic subgraph in HP-graph
The main idea of this algorithm is to incrementally shorten the search field. While the search for vertices traverses all the vertices of the original graph, the search for hyperedges only moves through those edges that are connected with the already chosen vertices and utilizes information about their correspondence with the vertices of the query graph. Pole matching is performed for each group of incident hyperedges, where a sufficient quantity of combinations is pruned out by exploiting information about the corresponding vertices and hyperedges. The algorithm also checks and matches the unlinked poles if they exist, which can be done in linear or close to linear time as all the corresponding vertices are already found. For simplicity, the algorithm is given for searching for the first isomorphic subgraph but can be transformed to searching for all embeddings of a pattern.

Exploiting Pruning Techniques of the Existing Algorithms
To optimize algorithms certain existing techniques can be used. Adaptation of the main techniques of the existing algorithms to the proposed graph model can prove the possibility of adapting these algorithms as a whole and improve the efficiency of the algorithm presented above.

Ullmann Algorithm
Ullmann algorithm [19] is one of the first algorithms for subgraph matching. This algorithm uses a backtracking algorithm presented above and at each step it performs a refinement procedure to prune out unpromising pairs.

171
This algorithm is performed at each node of the search tree. It traverses the matrix M and converts a certain part of values from ones to zeros. The condition for preserving 1 is that if a vertex j of the original graph is a candidate of a vertex i of the pattern graph, then each neighbor of the vertex i must have at least one candidate among the neighbors of the vertex j. Otherwise, j cannot be a candidate for a vertex i. This algorithm can be implemented for both vertex matching and pole matching to eliminate unpromising element pairs. The refining algorithm for vertices can be presented as follows 3.
HP-Listing 3. Pseudocode of the algorithm that runs refining for vertices of the HP-graph The algorithm goes through all the neighbors of the current query vertex, which have at least one common hyperedge with this vertex, and checks whether a source graph contains a corresponding neighbor-vertex. The algorithm for poles looks similarly but poles and ordinary links are used instead of vertices and hyperedges.

VF2 Algorithm
VF2 [20] has been proposed for performing subgraph matching on large graphs. Effective representation of data structures and the usage of feasibility rules significantly reduces both the average time complexity of the search and the amount of memory used. The idea of the algorithm is to use special rules, called feasibility rules, at each node of the search tree to evaluate the feasibility of further progress on this branch of the tree before adding a pair of vertices to graph-candidates. There rules check consistency of graph-candidates and sufficiency of vertices-quantity of the graph-candidate. If all the checks are passed, the algorithm can move to the next level of the tree. An approach of checking the feasibility rules can be applied on both vertex and pole layers. As a pole layer is presented as an ordinary graph, the feasibility rules from [20] can be used without any significant modifications. However, feasibility rules for a vertex layer have to be defined. The first rule checks the consistency of the existent candidate graphs by checking correctness of connections with the already added vertices. Let coreG be a list of found pair vertices for the graph G and coreQ be a list of found pair vertices for the graph Q. Accordingly, let connG be a list of vertices which already have a pair or have a connection to the current graph-candidate G' and connQ be a similar list for the graph-candidate Q'. Then, the first rule can be presented as follows: n Conn(G, v) is a set of vertices of the candidate-graph G, which are connected to the vertex v.
Let PC define a set of vertices that can be connected to the vertex u, but the graph G does not include them; then it can be represented as follows: Thus, a new rule, which compares numbers of newly added connections to graphs, appears: |PC(G', n)| PC(Q', m)|. The last rule performs a two-look-ahead in the searching process. Let N be a set of vertices which are connected to the target vertex but are not connected to the graph-candidate:

Graph Pattern Modification Algorithms
The usage of algorithms such as TurboISO [23], CFL-Match [24] and other ones, that change a graph pattern, is complicated in the presented multilayer approach because these algorithms are made specifically for ordinary graphs. Their usage on the layer of vertices and hyperedges is a subject for the future research as it requires reformulation of their main aspects and ideas. Nevertheless, all these algorithms can be successfully used on the layer of poles and links and can find an isomorphic subgraph in the single-layer approach.

Complexity of the Algorithms
The presented algorithms can decrease the complexity of subgraph search by implementing matching on different graph layers. The search field shortens at each stage whereas the usage of pruning rules can also eliminate unpromising combinations of elements. Table 1 shows computational complexity of the backtracking algorithm at its main stages. The evaluation of the backtracking algorithms based on the Ullmann refinement is presented in The evaluation of the algorithms based on the VF2 approach is demonstrated in Table 3. The modification of the GetAllCandidatePairs procedure according to rules (2-4) slightly increases the worst-case complexity from N N! to N 2 N! and the best-case complexity from N 2 to N 3 but significantly shortens the search field.

Conclusion
This paper proposed a solution to the problem of identifying isomorphic subgraphs in HP-graphs.
The proposed approach is based on implementing matching on different graph layers of the graph model and incrementally shortening the search field at each layer. The designed algorithms for subgraph matching based on the multilayer approach and evaluations of their complexity are presented above. The proposed approach incrementally decreases the search field of the algorithm and helps to decrease its overall complexity. The usage of pruning rules of the existing algorithms can eliminate unpromising candidates at each stage of the proposed algorithm and thus, significantly shorten the size of the search tree. It is planned to evaluate actual time complexity of these algorithms on various data sets and develop a visual modeling system using the proposed approach to subgraph matching.