1 min read
cholesky-parallelization

Implementation and benchmarking of the Cholesky decomposition using OpenMP, MPI, and CUDA to analyze how different parallel architectures overcome the memory wall and synchronization bottlenecks in dense linear algebra.