Abstract:This paper introduces the sparse matrix storage and data preprocessing of linear programming.

This article is shared from “Linear Programming — Ssparse Matrix” by Huawei Cloud Community, original author: Bale10.

With the development of the era of AI, it is inevitable that the scale of linear programming problems will become larger and larger. Facing the large-scale linear programming problem, how to store data, save the storage space to avoid the waste of resources, and make the query, modification, addition and deletion of data convenient and fast, is an urgent problem to be solved. This paper introduces the sparse matrix storage and data preprocessing of linear programming.

## Sparse matrix

The size of LP is usually determined by the size of the constraint matrix A. The elements of the matrix are usually stored in double type of 8 bytes. Assuming that the matrix has m rows and n columns, 8MN bytes are needed to directly store A. If A has 10,000 rows and 20,000 columns (not A particularly large number), then 1.6G of memory is needed to store A. On the one hand, memory requirements are high, and on the other hand, it is difficult to operate matrix A. Large-scale LP usually contains A large number of zero elements, and the proportion of non-zero elements is very small. This property is called sparsity, that is, A is A sparse matrix.

## Sparse matrix storage

The data structure design of sparse matrix should consider the following three factors:

- Only non-zero elements. A good sparse matrix data structure should only contain non-zero elements of A, rather than A large number of zero elements. The advantages of this are threefold. First of all, it saves memory so that large sparse matrices can be stored in memory. Second, if there is no room for non-zero memory alone, you have to rely on external memory, and accessing data from external memory is generally much slower than accessing data from internal memory, so in the case of external memory we want to keep only non-zero bits. Third, operations involving zero elements can be left unperformed, resulting in significant savings in computation time.
- Generation of non-zero elements — fill in elements, a sparse matrix, in the gaussian elimination (or LU decomposition) process, the original zero elements may become non-zero elements. This kind of non-zero element, which is changed from zero element to non-zero element in the elimination process, is called filling element; The number of fillings produced during the elimination process is called fillings. If a very sparse matrix generates a large number of filling elements after the above elimination operation, the sparsity will disappear. Therefore, maintaining the sparsity of the matrix is the premise of using its sparsity. How to make the elimination process to produce as few filling elements as possible is the algorithm needs to consider. Sparse matrix is a dynamic data structure, in particular, it is often necessary to insert or delete non-zero elements. Therefore, a good sparse matrix data structure must be easy to insert or delete.
- The data structure of sparse matrix is closely related to the elimination algorithm. When considering the data structure of sparse matrix, the elimination algorithm must be taken into account at the same time. The data structure must be as convenient as possible for the implementation of its algorithm.

## The linear table

The simplest sparse matrix storage scheme is linear tables. For the sake of specifics, let’s take the following sparse matrix A_5 as an example:

Store non-zero elements in array CE:

The corresponding non-zero row number in CE is recorded in the array iRow:

In order to give the column number of each non-zero entry, a pointer array ICFR is introduced. ICFR(j) represents the position of the first non-zero entry in column j in CE.

The length of ICFR is N+1. ICFR(N+1) means that the last element is stored at the position of the last element in the $N$column plus 1. This is introduced to facilitate the calculation of the number of non-zero elements in the last column. The storage capacity required by this storage scheme is 2NZ+N+1. Its advantages are small storage capacity and simple structure, but its single disadvantage is that it is not easy to insert and delete. To insert a non-zero element, the non-zero element below it must move down one position, which is very time-consuming.

It is common to specify a large array in a program in order to allow non-zero insertion. The above is to store A in columns, of course, can also be stored in rows. The usual method is to decompose A by LU, and then store the lower triangle matrix L by columns, and the upper triangle matrix U by rows.

## Singly linked lists

In order to overcome the shortcoming that the linear table is not easy to insert and delete, the linked list data structure is introduced. In other words, a linked list pointer array LNXT is added to the linear table above, and LNXT(I) represents the position of the next non-zero element of the ith non-zero element. Meanwhile, the array ICNZ is used to record the number of non-zero elements in the column. This kind of linked list is called a single linked list, also known as a linked list. The above sparse matrix A_5 is still taken as an example, and stored in columns:

As can be seen from the above table structure, for each non-zero element of the sparse matrix, there is a triplet (row number, element, position of the next element) in the table, i.e

Each column of non-zero elements is connected with a chain pointer of $LNXT$. The last non-zero element in each column, with a chain pointer of 0, indicates the end of the chain. The pointer array $ICFR$indicates the first element of each column.

- Non-zero entries in the same column do not have to be placed next to each other, because each non-zero entry uses $LNXT$to indicate the location of the next entry, so the next entry can be placed anywhere in the table.
- To insert and delete a meta, you do not need to move other elements, just change the pointer.

The advantage of using linked list is easy to insert or delete; The disadvantage is that access to any element in the linked list needs to start from the first non-zero element in a column to search down, and all need indirect access, so it is slower than access to a linear table.

## Double linked list

Double-linked list is to add the ith non-zero element column number in ICOL(I), the ith non-zero element chain pointer in RNXT(I), namely the position of the next non-zero element in the row, and the position of the first non-zero element in IRFR(I) in the list on the basis of the single linked list.

This double-linked list data structure is very convenient to deal with the sparse matrix with asymmetric structure, and it has the advantage of single linked list, even if it is easy to insert and delete.

Inserting or deleting a tuple in a double-linked list operates the same way as in a single-linked list, only noting that inserting or deleting a tuple modifies not only the column chain and empty cell chain, but also the row chain.

Designing the data structure of sparse matrix is one of the keys of sparse matrix algorithm. A good sparse matrix data structure should not only save storage, but also facilitate search, insertion and deletion, as well as facilitate the implementation of sparse matrix algorithm. Data structure has a fundamental impact on algorithm design. Usually, a linked list is used to establish a sparse matrix, and a false Gaussian elimination is carried out, filling elements are inserted, the total number of non-zero elements are calculated, and then converted into a linear table for quick solution.

## Data preprocessing

The scale of the linear programming problem is more and more big, contain tens of thousands of constraint and variable problem is very common, faced with the data storage and handling efficiency, large-scale problems are generally support tools automatically generated by the model, there are often a large number of redundant constraints, and variables, it is necessary to carry out data preprocessing before algorithm.

### The main purpose of

- Reduced redundant constraints, reduced non-zero elements of constraint matrix, improved sparsity, reduced problem size and saved memory;
- To improve the numerical characteristics of the model and the numerical stability of the problem;
- Determine that the problem is infeasible or unbounded without solving the problem.

### The main challenge

- Improve the performance of complex pretreatment methods;
- Determine the order and times of action of different pretreatment methods;
- Determine when the pretreatment stops;
- Design a preprocessing parallel framework.

## Pretreatment method

## Is not feasible

## Blank rows and columns

## Fixed variable

## Mandatory line

## Proportion of line

## Parallel lines

## Dual regulation

## Two rows of non-zero entries cancel out

**Click on the attention, the first time to understand Huawei cloud fresh technology ~**