MATLAB matrices are in column-major storage

Authors of each and every high-level programming language claim you only have to care the semantics of its statements without worrying about its internal implementation. Unfortunately, this can hardly be true.

Today I was working on some MATLAB codes. One of them scales each row of a sparse matrix to make its 1-norm equal to 1. The matrix I worked on was 20000×20000, with 153306 nonzero entries.

This is the first version:
for k = 1:N
 s = sum(P(k,:));
 if s ~= 0
  P(k,:) = P(k,:)/s;
 end
end
This did not stop in 20 seconds, until I pressed Control-C.

This is the second version:
P = P';
for k = 1:N
 s = sum(P(:,k));
 if s ~= 0
  P(:,k) = P(:,k)/s;
 end
end
P = P';
This finished within one second.

[All entries of matrix P are known to be positive, so I simply use sum to get the 1-norm of a row/column.]

The reason is simple. MATLAB internally stores sparse matrices in compressed column format[1]. Therefore, it is many times more expensive to extract or insert a row than column.

ps. For dense matrices, MATLAB also uses column-major formats, following the FORTRAN convention, though MATLAB is itself written in C (and the GUI in Java) [1].

References

[1] J. Gilbert, C. Moler and R. Schreiber. Sparse matrices in MATLAB: Design and implementation. SIAM Journal on Matrix Analysis, 1992.

1 comment:

  1. S = sum(P,2);
    S(S == 0) = 1;
    P = P./S(:,ones(1,N));


    (PS. On my machine, for N = 7000, non-sparse matrices, your original method took 4.513336 seconds; your second method 3.934098 seconds; this took 1.564700 seconds; all identical output.)

    ReplyDelete