Random thoughts

Share to Learn, Learn to Share

On Sparse Matrix Storage Schemes (Novice-level)

In this post, I will discuss some simple storage schemas for sparse matrices.
Disclaimer: This informal note is mainly based on my understanding so far, and for my learning purposes. For more accurate and detailed discussion, please refer to your reliable sources.

A sparse matrix is a matrix, which is almost all zeros. In general, a zero can be thought as a meaningless value that we don’t need to store. In other words, storing only nonzero items for a sparse matrix is enough. As a result, we can save a lot of memory space, or time to transfer compressed data over networks, etc. But, keep in mind that there might be a trade-off between memory efficiency and fast/easy access to nonzero items. We will discuss more about this problem in later posts (I hope so!).

Let M denote a sparse matrix of size R x C. Assume that M contains n nonzero items, where n is significantly smaller than R x C. Below are three simple yet useful storage schemes:
1. Coordinate Format (COO)
A list of triplets (r,c,val), where val = M[r][c].

2. Compressed Sparse Row Format (CSR)
Basically, all information is stored in 3 arrays:
* vals: array of nonzero items in left-to-right, then top-to-bottom order. Hence, vals is a length n array.
* col_ind: array of column indices (0-based) of nonzero items. Hence, col_ind is a length n array.
* rs_ind: (index of first nonzero item of each row in vals array). Hence, rs_ind is a length (R+1) array. All nonzero items of M’s i-th row lies in range [rs_ind[i], rs_ind[i+1]-1] of vals array. That is, if we denote le = rs_ind[i], ri = rs_ind[i+1]-1, then vals[le..ri] are all nonzero items in i-th row of M.

3. Compressed Sparse Column Format (CSC)
This format can be thought as CSR with the roles of row and column exchanged. Then, I will skip the explanation here.