Abstract: This paper introduces sparse matrix storage and data preprocessing for linear programming.

This article is shared from Linear Programming — Sparse Matrices in Huawei Cloud community by Bale10.

With the development of the AI era, it is inevitable that the scale of linear programming problems will become larger and larger. In the face of large-scale linear programming problems, how to store data, save storage space to avoid the waste of resources, and make data query, modification, adding and deleting convenient and fast, is an urgent problem to be solved. This paper introduces sparse matrix storage and data preprocessing for linear programming.

Sparse matrix

The size of LP is usually determined by the size of the constraint matrix A. The elements of the matrix are usually stored as 8-byte double types. Assuming the matrix has M rows and N columns, storing A directly requires 8Mn bytes. If A has 10,000 rows and 20,000 columns (not particularly large), then 1.6G of memory is needed to store A. On the one hand, memory requirements are high, and on the other hand, operation of matrix A is difficult. Large-scale LP usually contains A large number of zero elements and A very small proportion of non-zero elements. This property is called sparsity, that is, A is A sparse matrix.

Sparse matrix storage

The data structure design of sparse matrix should consider the following three factors:

1. Only non-zero elements exist. A good sparse matrix data structure should only contain non-zero elements of A, but not A large number of zero elements. There are three advantages to this. Firstly, memory is saved so that large sparse matrices can be stored in memory. Secondly, if only non-zero memory can not be put, we must use external storage, and the speed of data access from external storage is generally much slower than the data access from memory, therefore, in the use of external storage we also want to only non-zero. Third, operations involving zero elements can not be performed, thus significantly saving computing time.

2. Generation of non-zero elements — Fill in elements, a sparse matrix, in the Process of Gaussian elimination (or LU decomposition), the original zero elements may have to become non-zero elements. In the process of elimination, the non-zero element changed from the zero element is called the filling element; The number of fillers produced during the entire elimination process is called the fillers. If a very sparse matrix generates a large number of fillers after the above elimination operation, the sparsity will disappear. Therefore, maintaining the sparsity of the matrix is the premise of utilizing its sparsity. The algorithm needs to consider how to make the elimination process produce as few fillers as possible. Sparse matrix is a dynamic data structure, especially non-zero elements need to be inserted or deleted. Therefore, a good sparse matrix data structure must be easy to insert or delete.

3. The data structure of sparse matrix is closely related to the elimination algorithm. When considering the data structure of sparse matrix, the elimination algorithm must be considered at the same time, and the data structure must be as convenient as possible for the implementation of its algorithm.

The linear table

The simplest sparse matrix storage scheme is a linear table. For the sake of specificity, let’s take the following sparse matrix A_5 as an example:

Store the non-zero elements as columns in array CE:

The corresponding non-zero line number in CE is recorded in the array IROW:

To give the column number of each non-zero element, a pointer array ICFR is introduced. ICFR(j) represents the position of the first non-zero element in column J in CE.

The length of ICFR is N+1. ICFR(N+1) means that the last element is stored in the position of the last element of the NNN column plus 1. This is introduced to facilitate the calculation of the number of non-zero elements in the last column. This storage scheme requires 2NZ+N+1. Its advantages are small memory, simple structure, single disadvantage is not easy to insert and delete, to insert a non-zero element, the non-zero element located below it must be moved down a position, which is a waste of time.

In order to allow insertion of nonzero elements in a program, it is common to specify a large array. The above is to store A by column, of course, can also be stored by row. The usual way to do this is to decompose A LU, and then store the lower triangle L in columns, and the upper triangle U in rows.

Singly linked lists

In order to overcome the disadvantage that linear table is not easy to insert and delete, the linked list data structure is introduced. That is, add a linked list pointer array LNXT to the above linear table, LNXT(I) represents the position of the ith non-zero element next to the non-zero element, and use array ICNZ to record the number of non-zero elements given to the column. This type of linked list is called single linked list, also known as the list. The above sparse matrix A_5 is still taken as an example and stored in columns:

From the above table structure, it can be seen that for each non-zero element of the sparse matrix, there is a triple (row number, element, position of the next element) in the table, i.e

Each non-zero column is connected by the chain pointer LNXTLNXTLNXT. The last non-zero column is 0, indicating the end of the chain. The pointer array ICFRICFRICFR indicates the position of the first element in each column.

1. Non-zero elements in the same column do not need to be stored next to each other, because LNXTLNXTLNXT indicates the position of the next non-zero element, so the next element can be placed in any position in the table.

2. Insert and delete a element without moving other elements, only need to change the pointer.

The advantage of using a linked list is easy to insert or delete; The disadvantage is that accessing any element in the list needs to start from the first non-zero element of a column, and all need to be accessed indirectly, so it is slower than accessing the linear table.

Double linked list

The double linked list is to increase the ith non-zero column number of ICOL(I) on the basis of the single linked list, RNXT(I) the ith non-zero row chain pointer, that is, the position of the next non-zero element in the row, and the position of the first non-zero element in IRFR(I) the ith row in the list.

This double-linked list data structure is very convenient to deal with the sparse matrix with asymmetric structure, and it has the advantages of single linked list, even for insertion and deletion.

Inserting or deleting a element in a doubly linked list works the same as in a singly linked list, only note that inserting or deleting a element changes not only the chain of columns and empty cells, but also the chain of rows.

The key of sparse matrix algorithm is to design the data structure of sparse matrix. A good sparse matrix data structure should not only save storage, but also be easy to find, insert and delete, and facilitate the implementation of sparse matrix algorithm. Data structure has a fundamental impact on algorithm design. Usually, a sparse matrix is established with a linked list, and a false Gaussian elimination, insertion of filling elements, and the total number of non-zero elements are calculated, and then transformed into a linear table, so as to quickly solve the problem.

Data preprocessing

The scale of the linear programming problem is more and more big, contain tens of thousands of constraint and variable problem is very common, faced with the data storage and handling efficiency, large-scale problems are generally support tools automatically generated by the model, there are often a large number of redundant constraints, and variables, it is necessary to carry out data preprocessing before algorithm.

The main purpose of

1. Reduce redundant constraints, reduce non-zero elements of constraint matrix, improve sparsity, reduce problem size, and save memory;

2. Improve the numerical characteristics of the model and improve the numerical stability of the problem;

3. Determine that the problem has no line solution or is unbounded without solving the problem.

The main challenge

1. Improve the performance of complex preprocessing methods;

2. Determine the action order and action times of different pretreatment methods;

3. Determine when pretreatment stops;

4. Design the preprocessing parallel framework.

Pretreatment method

Is not feasible

Blank lines and blank columns

Fixed variable

Mandatory line

Proportion of line

Parallel lines

Dual regulation

The two rows cancel out

Click follow to learn about the fresh technologies of Huawei Cloud