In Section VIII CONCLUSIONS: On a distributed-memory cluster, the mean speedup of FOP over FO is 1.55x (not 1.72x). In Table I: Tile sizes for lu is 64^3 (not 64^2), i.e., lu was 3d tiled (not 2d tiled).