What is MPI?
Message Passing Interface (MPI) is a standardized and portable message-passing system designed to function on parallel computing architectures. It’s the de facto standard for distributed memory parallel programming, enabling processes to communicate by sending and receiving messages.
Why Use MPI?
- Scalability: Run programs across hundreds or thousands of processors
- Portability: Code runs on various parallel architectures
- Performance: Efficient communication for high-performance computing
- Flexibility: Supports both distributed and shared memory systems
Core Concepts
1. Initialization and Finalization
Every MPI program must initialize and finalize the MPI environment.
#include <mpi.h>
#include <iostream>
int main(int argc, char** argv) {
// Initialize MPI environment
MPI_Init(&argc, &argv);
std::cout << "MPI initialized!" << std::endl;
// Finalize MPI environment
MPI_Finalize();
return 0;
}
Compilation: mpic++ program.cpp -o program
Execution: mpirun -np 4 ./program
Or use a Makefile:
CXX = mpic++
CXXFLAGS = -Wall -std=c++17
TARGET = main
SOURCES = main.cpp
all: $(TARGET)
$(TARGET): $(SOURCES)
$(CXX) $(CXXFLAGS) -o $(TARGET) $(SOURCES)
clean:
rm -f $(TARGET)
run: $(TARGET)
mpirun -np 4 ./$(TARGET)
2. Communicators, Rank, and Size
A communicator defines a group of processes that can communicate. Each process has a unique rank (ID) within the communicator.
#include <mpi.h>
#include <iostream>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int world_size, world_rank;
// Get total number of processes
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get rank of current process
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
std::cout << "Process " << world_rank << " of " << world_size << std::endl;
MPI_Finalize();
return 0;
}
3. Point-to-Point Communication: Send and Receive
The most basic form of communication between two processes.
#include <mpi.h>
#include <iostream>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
if (world_rank == 0) {
// Process 0 sends data
int data = 42;
MPI_Send(&data, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
std::cout << "Process 0 sent: " << data << std::endl;
} else if (world_rank == 1) {
// Process 1 receives data
int received_data;
MPI_Recv(&received_data, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
std::cout << "Process 1 received: " << received_data << std::endl;
}
MPI_Finalize();
return 0;
}
Output
mpic++ -Wall -std=c++17 -o main main.cpp
mpirun -np 4 ./main
Process 0 sent: 42
Process 1 received: 42
4. Collective Communication: Broadcast
Broadcast sends data from one process to all other processes.
#include <mpi.h>
#include <iostream>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int data;
if (world_rank == 0) {
data = 100;
std::cout << "Process 0 broadcasting: " << data << std::endl;
}
// Broadcast data from process 0 to all processes
MPI_Bcast(&data, 1, MPI_INT, 0, MPI_COMM_WORLD);
std::cout << "Process " << world_rank << " received: " << data << std::endl;
MPI_Finalize();
return 0;
}
5. Collective Communication: Scatter
Scatter distributes different data from one process to all processes.
#include <mpi.h>
#include <iostream>
#include <vector>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int world_rank, world_size;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
std::vector<int> send_data;
int recv_data;
if (world_rank == 0) {
// Process 0 creates data array
send_data.resize(world_size);
for (int i = 0; i < world_size; i++) {
send_data[i] = i * 10;
}
}
// Scatter data to all processes
MPI_Scatter(send_data.data(), 1, MPI_INT, &recv_data, 1, MPI_INT, 0, MPI_COMM_WORLD);
std::cout << "Process " << world_rank << " received: " << recv_data << std::endl;
MPI_Finalize();
return 0;
}
make run
mpic++ -Wall -std=c++17 -o main main.cpp
mpirun -np 4 ./main
Process 0 received: 0
Process 1 received: 10
Process 2 received: 20
Process 3 received: 30
6. Collective Communication: Gather
Gather collects data from all processes to one process.
#include <mpi.h>
#include <iostream>
#include <vector>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int world_rank, world_size;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Each process has its own data
int send_data = world_rank * world_rank;
std::vector<int> recv_data;
if (world_rank == 0) {
recv_data.resize(world_size);
}
// Gather data to process 0
MPI_Gather(&send_data, 1, MPI_INT, recv_data.data(), 1, MPI_INT, 0, MPI_COMM_WORLD);
if (world_rank == 0) {
std::cout << "Process 0 gathered: ";
for (int i = 0; i < world_size; i++) {
std::cout << recv_data[i] << " ";
}
std::cout << std::endl;
}
MPI_Finalize();
return 0;
}
make run
mpic++ -Wall -std=c++17 -o main main.cpp
mpirun -np 4 ./main
Process 0 gathered: 0 1 4 9
7. Collective Communication: Reduce
Reduce performs a reduction operation (sum, max, min, etc.) across all processes.
#include <mpi.h>
#include <iostream>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int world_rank, world_size;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Each process contributes its rank
int local_value = world_rank + 1;
int sum = 0;
// Sum all values to process 0
MPI_Reduce(&local_value, &sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (world_rank == 0) {
std::cout << "Sum of all ranks: " << sum << std::endl;
}
MPI_Finalize();
return 0;
}
make run
mpic++ -Wall -std=c++17 -o main main.cpp
mpirun -np 4 ./main
Sum of all ranks: 10
8. Barrier Synchronization
Barriers synchronize all processes, ensuring they all reach the same point before continuing.
#include <mpi.h>
#include <iostream>
#include <unistd.h>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
std::cout << "Process " << world_rank << " before barrier" << std::endl;
// Simulate different execution times
sleep(world_rank);
// Wait for all processes
MPI_Barrier(MPI_COMM_WORLD);
std::cout << "Process " << world_rank << " after barrier" << std::endl;
MPI_Finalize();
return 0;
}
make run
mpic++ -Wall -std=c++17 -o main main.cpp
mpirun -np 4 ./main
Sum of all ranks: 10
9. Practical Example: Parallel Array Sum
Combining multiple concepts to compute the sum of a large array in parallel.
#include <mpi.h>
#include <iostream>
#include <vector>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int world_rank, world_size;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
const int N = 1000;
std::vector<int> full_array;
int chunk_size = N / world_size;
std::vector<int> local_array(chunk_size);
if (world_rank == 0) {
// Initialize array on process 0
full_array.resize(N);
for (int i = 0; i < N; i++) {
full_array[i] = i + 1;
}
}
// Scatter array chunks to all processes
MPI_Scatter(full_array.data(), chunk_size, MPI_INT,
local_array.data(), chunk_size, MPI_INT,
0, MPI_COMM_WORLD);
// Each process computes local sum
int local_sum = 0;
for (int val : local_array) {
local_sum += val;
}
std::cout << "Process " << world_rank << " local sum: " << local_sum << std::endl;
// Reduce to get global sum
int global_sum = 0;
MPI_Reduce(&local_sum, &global_sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (world_rank == 0) {
std::cout << "Global sum: " << global_sum << std::endl;
std::cout << "Expected: " << (N * (N + 1) / 2) << std::endl;
}
MPI_Finalize();
return 0;
}
make run
mpic++ -Wall -std=c++17 -o main main.cpp
mpirun -np 4 ./main
Process 0 local sum: 31375
Global sum: 500500
Expected: 500500
Process 1 local sum: 93875
Process 2 local sum: 156375
Process 3 local sum: 218875
Common MPI Data Types
MPI_INT- IntegerMPI_FLOAT- FloatMPI_DOUBLE- DoubleMPI_CHAR- CharacterMPI_BYTE- Byte
Common MPI Operations (for Reduce)
MPI_SUM- SumMPI_MAX- MaximumMPI_MIN- MinimumMPI_PROD- ProductMPI_LAND- Logical ANDMPI_LOR- Logical OR
MPI with Only Six Functions
Many parallel programs can be written using:
- MPI_INIT()
- MPI_FINALIZE()
- MPI_COMM_SIZE()
- MPI_COMM_RANK()
- MPI_SEND()
- MPI_RECV()
Best Practices
- Always initialize and finalize: Call
MPI_InitandMPI_Finalize - Check return values: MPI functions return error codes
- Balance workload: Distribute work evenly across processes
- Minimize communication: Communication is expensive
- Use collective operations: More efficient than point-to-point
- Avoid deadlocks: Ensure send/receive operations are properly paired
Conclusion
MPI is a powerful tool for parallel computing, enabling you to harness the power of multiple processors. Start with simple examples and gradually move to more complex applications. The key to effective MPI programming is understanding communication patterns and minimizing overhead.
Resources:
This blog post covers the essential MPI concepts with practical code examples. Each section demonstrates a different aspect of MPI programming, from basic initialization to complex collective operations and a real-world parallel computation example.This blog post covers the essential MPI concepts with practical code examples. Each section demonstrates a different aspect of MPI programming, from basic initialization to complex collective operations and a real-world parallel computation example.
Comments