Data compression algorithms for flow tables in Network Processor RuNPU

. This paper addresses the problem of packet classification within a network processor (NP) architecture without the separate associative device. By the classification, we mean the process of identifying a packet by its header. The classification stage requires the implementation of data structures to store the flow tables. In our work, we consider the NP without the associative memory. Flow tables are represented by an assembly language program in the NP. For translating flow tables into assembly language programs, a tables translator was used. The main reason for implementing data compression algorithms in a flow tables translator is that modern flow tables can take up to tens of megabytes. In this paper, we describe the following data compression algorithms: Optimal rule caching, recursive end-point cutting and common data compression algorithms. An evaluation of the implemented data compression algorithms was performed on a simulation model of the NP.


Introduction
At present, software-defined networks (SDN) are in active development and require highperformance switches [1]. The main functional element of the high performance SDN switch is a programmable network processor (NP). The network processor is a system-on chip specialized for network packet processing. In this work, we consider a programmable NP. A programmable NP is one that supports on-the-fly modification of the packet processing program and the set of header fields to be processed. In this article, we discuss data compression algorithms used for flow tables. Flow tables are needed for packet classification process. A flow table is the set of rules defined by OpenFlow protocol. OpenFlow is one of the most common protocols for controlling a network SDN switch. This paper considers OpenFlow version 1.3 [2]. Each rule contains a match field, a bit string by witch a packet can be identified and a set of actions, that the NP performs on this packet. Classification is the process of the identification of a network packet by its header. This article has the following structure: in second section we introduce problem, in third section we introduce the NP architecture and flow tables translator, in fourth section we describe related work, in fifth section we describe data compression algorithm implementation and in sixth section we introduce our evaluation methodology.

The problem
Let us consider OpenFlow tables formalisation. An ordered set of all considered attributes is denoted as = { , , . . . , }. Every attribute from the set is described by a bit string ∈ {0, 1, * } . In this article symbol * denotes any bit. But, if ∃ ∈ and = * , then for ∀ , where > , = * . The length of the attribute is denoted ( ) = . 2) a priority is the highest among all rules ∈ , if a vector match a vector . The set of rules must satisfy the following constraint. For any two rules , ∈ , ≠ , if their vectors of values intersect, there is a set of attribute values. This set corresponds to vectors of values of attributes of both rules ≠ .
Let us introduce the function for network packet identification → flow table , (denotes as ( )). It returns a set of actions, that corresponded to the rule → .
where is the set of actions ∈ . We need to introduce a similar concept of the sets of rules and . The set is similar to the set when for any network packet header, that can be identified by some rule from the set ∈ , and there exists another rule that identifies it as ∈ , and = . We need to develop an algorithm for compressing flow tables. This algorithm must translate an input flow table (a set of rules ) into a new compressed set of rules .
1) The set of rules is similar to the set of rules .
2) The cardinality of the set must be lower than the cardinality of the set .

Network processor architecture
In the considered NP the pipeline architecture is used, with each pipeline consisting of 10 computing blocks. To avoid complex memory organization, there is no associative memory in the considered NP. The NP uses the same memory both for commands and data. Let us consider the pipeline NP architecture. Each computing block has an access to the memory area where the program with data is located. There is a limit of 25 clock cycles per packet on each processing block. There is up to 512 kilobytes to store assembly language program representing flow tables. Due to the instruction set architecture, there is no separate memory area where data is stored. Therefore, the microcode contains all the data, required to classify packets.

Flow tables translator
Flow tables translator is a tool that is executed on CPU. It is used for flow tables translating into assembly language programs, that can be interpreted by NP. Flow tables translator uses tree structures for flow table representation. Every node of the tree structure can be associated with a table rule. After building a tree every node is translated into a part of an assembly language program.
Here is a flow tables translator workflow: 1) Load a flow table from file.
2) Check every rule in the table.
3) Build a tree structure from a set of rules. 4) Translate every node into a part of an assembly language program. 5) Combine all translated parts into the one assembly language program. 6) Add a header that corresponds to used protocol. 7) Write the assembly language program into file. This tool was implemented in work [3] 4 Related work In this section, we introduce a review on data compression algorithms, that already used for other network processors [4]. To choose algorithms for implementation in NP we used the following criteria: 1) Compression rate, is needed for algorithm performance evaluation.
2) Evaluation of compression algorithm complexity.
3) Usability of compressed flow tables without decompression. 4) The necessity to use external memory by the algorithm.

Most common data compression algorithms
Data compression algorithms have evolved over the years. Nowadays compression algorithms can be used in many different ways. In this section we describe the algorithms that compress data in binary format. There are most known of them: Huffman codding, JPEG, LWZ, zip. These algorithms require decompression for data usage. And this is why we will not use them in our flow table translator.

Optimal rule caching
Optimal rule caching algorithm is more specific data compression algorithm. It is used for table compressing in SND switches [5]. It is based on search tree structure, that is built based on rules usage frequencies. There are two trees: the first tree consists of the most used rules. This tree is translated into assembly language program. The second tree consist from other rules; it is stored in CPU memory.

Recursive end point cutting
Recursive end-point cutting algorithm is based on HyperSplit tree usage. Compressing is performed by destroying duplication rules [6]. This algorithm permits operations with flow tables without full rebuilding tree. By rules duplication we understand the following rules: • A rule storing in a node duplicates the rules in leaf nodes (particle duplication).
• A rule storing in a node duplicates the rules in all leafs nodes (full duplicating rule). This algorithm recursively uses NewHypersplit tree to remove duplicate rules from the currently being built tree. The deleted duplicate rules are then collected into a second rule table, called a recursive table, to build a second tree. It is possible that duplicate rules still exist in the second tree, and some of them are also removed and used to build the third tree. This tree building process is performed recursively while there are duplicate rules in the last tree.

Algorithms comparison
Let us describe data compression algorithms comparison in Table 1. Each algorithm has its own pros and cons. 1) Optimal rule caching -has the highest compression ratio, and is quickly implemented in the considered NP. The need to use external memory imposes additional overhead on some packet processing.
2) Recursive end-point cutting -has the lowest compression ratio, it is more difficult to implement than the optimal rule caching algorithm. Moreover, this algorithm does not require the use of external memory.
3) Common data compression algorithms -have good compression ratios on average, but require data decompression.

Our solution
In this section, we introduce our solution of flow tables compressing.

Flow table optimization
First of all, we need to introduce operation getting last important bit ( ) = , ∈ {0, 1} and ( ) = * . We claim that the rules ∈ and ∈ are the same if ∀ ∈ ( ) ( ) = ( ) = but ≠ and = . For flow table optimization we need to remove all same rules.

Main flow table compression algorithm
Let us introduce a packet header distribution where mean network packet income probability → = { , , . . . , }. We need a correction ratio ( , ) where and are two different flow tables. Thus correction ratio means probability of incoming network packet header by distribution . As well as probability of identifying this network packet by rules ∈ and ∈ 2.  based on input flow table . There is a minimal set of rules ( ) and a maximum optimal correction ratio ( , , ).

Software solution
In this section, we introduce software workflow of our algorithm. First of all we need to add a new fields in tree structure nodes for our algorithm.
• A probability into tree node. It must be filled if node contains rule.
• A sum of probabilities of leaf nodes. Let us introduce program operation for split tree.
• Generate a set of tree nodes.
• Sort this set in non-increasing order.
• Create a counter that stores a sum of node probabilities.
• Get the first node with maximum self-probability.
• Increase the counter.
• Add this node into another set and remove from first. • Repeat last three operations while counter less than 0.95.
• Build tree from second set of rules. After performing these operations, we get the set of nodes. We could build first tree from second set of rules and second tree from first set of nodes. After this, we need to translate the first tree into an assembly language program.

Notation used
Let node1, node2 -tree vertices, value -some feature value. Let's introduce the following notations: • Tree.root -the root node of the tree Tree.
• node1(value) -the descendant of the node node1, connected to it by an arc with the mark value.
• node1.rules -set of rules corresponding to node node1. • node1.edges -set of marks of arcs coming from node node1. • node1.prob -an amount of probabilities of rules.
• copy(node1, val, node2) -a procedure that adds to the node node1 a descendant with an arc marked value, copying the tree that forms the node node2. • equals(node1, node2) -function that returns true if the trees formed by nodes node1 and node2 are the same, otherwise it returns false. The comparison takes into account the rule sets and arc labels associated with the nodes. • same(rule1, rule2) -function that returns true if rules are same.

Flow table optimization algorithm
Let us introduce procedure Same (Listing 1), it returns a set of rules that are a union of same rules in the sets of two nodes. Let us introduce flow table optimization algorithm. It can be described by the procedure Optimize (Listing 2), where node -tree node. For optimizing flow table, we need to perform this procedure to Tree.root. 1  node.rules += Same(node(val_1), node(val_2)) 7 endif 8 for all val in node.edges do 9 Optimize(node(val)) 10 endif Listing 2. Procedure for optimizing the tree 6. Evaluation

Evaluation methodology
In this section, we describe evaluation methodology. Flow table compression algorithms can be assessed by an assembly language program evaluation. This is so because flow tables translator with implemented data compression algorithms translates flow table into an assembly language program. We used the following parameters in our evaluation: • An assembly language program memory usage%Memory usage by assembly language program. • An assembly language program average number of instructions requires for one packet processing. The described analysis requires doing the following actions for each flow

Memory usage calculation method
The flow tables translator tool uses intermediate flow table representation as trees. Each node of the tree is translated into an assembly language program part. Fully assembled from parts assembly language program has N instructions. Every instruction uses 128 bits of memory. Therefore, memory usage defined as M can be calculated as: = 128 * .
In our evaluation results we use Kbytes to represent memory usage units.

Evaluation data
Several variants of the flow tables should be used for the evaluation. These variants cover most usable network protocols. In this section, we will introduce the flow table templates.
• The first pattern -a flow table rule pattern contains the values of three attributes: an input port number, a destination MAC address and a source MAC address. • The second pattern -a flow table rule pattern contains the values of two attributes: an IPv4 destination address and an IPv4 source address. • The third pattern -a flow table contains five attributes: an input port number, a destination MAC address, a VLAN ID, a L3-level header ID (EtherType) and a destination IPv4 address. An example of input data represented in Listing 4.
Optimal caching has the best compression rate ( fig. 1a) but the worst average number of instructions required to processing one packet ( fig. 1b). This can be explained by necessity to use many instructions to make the CPU call.

Future work
In the future works, we will refine evaluation data. We expect less memory usage with our compression algorithm implemented into flow table translator. In the first experiments conducted, we obtained results showing a significant reduction in the amount of memory usage with the help the data compression algorithm. After this we could check possibility of TCAM memory implementation and use this compression algorithm for it.