In this post, I will discuss some simple storage schemas for sparse matrices.
Disclaimer: This informal note is mainly based on my understanding so far, and for my learning purposes. For more accurate and detailed discussion, please refer to your reliable sources.
A sparse matrix is a matrix, which is almost all zeros. In general, a zero can be thought as a meaningless value that we don’t need to store. In other words, storing only nonzero items for a sparse matrix is enough. As a result, we can save a lot of memory space, or time to transfer compressed data over networks, etc. But, keep in mind that there might be a trade-off between memory efficiency and fast/easy access to nonzero items. We will discuss more about this problem in later posts (I hope so!).
Let M denote a sparse matrix of size R x C
. Assume that M contains n
nonzero items, where n
is significantly smaller than R x C
. Below are three simple yet useful storage schemes:
1. Coordinate Format (COO)
A list of triplets (r,c,val)
, where val = M[r][c]
.
2. Compressed Sparse Row Format (CSR)
Basically, all information is stored in 3 arrays:
* vals
: array of nonzero items in left-to-right, then top-to-bottom order. Hence, vals is a length n
array.
* col_ind
: array of column indices (0-based) of nonzero items. Hence, col_ind is a length n
array.
* rs_ind
: (index of first nonzero item of each row in vals array). Hence, rs_ind is a length (R+1)
array. All nonzero items of M’s i-th row lies in range [rs_ind[i], rs_ind[i+1]-1]
of vals array. That is, if we denote le = rs_ind[i], ri = rs_ind[i+1]-1
, then vals[le..ri]
are all nonzero items in i-th row of M.
3. Compressed Sparse Column Format (CSC)
This format can be thought as CSR with the roles of row and column exchanged. Then, I will skip the explanation here.