MPI Matrix Multiplication Lab

System Configuration

Matrix Size

Number of Processes

Matrix Values

Animation Speed

Speed: 1.0x

Execution Mode

Simulation Control

Matrix Multiplication: A × B = C

Simulation Logs

00:00:00 System initialized and ready

MPI Code Download

Download complete MPI implementation for matrix multiplication

Configuration

MPI Processes

Number of parallel processes

0×0

Matrix Size

Current matrix dimensions

🔧 System Configuration

Number of Processes: Choose from available options based on matrix size. Process count depends on matrix divisibility for optimal load balancing.
Matrix Size: Select 4×4, 6×6, 8×8, or 16×16 matrices. Each size has specific process count options for perfect row distribution.
Matrix Values:
- Random Values: System generates random numbers (0-9) for matrices A and B
- Manual Edit: Click on matrix cells to edit values manually for testing
Animation Speed: Control visualization speed (0.25x to 3x) to match your learning pace.
Execution Mode:
- Automatic: Watch the complete parallel algorithm execution
- Step-by-Step: Manually progress through each phase of the algorithm

⚙️ Process Count Configuration

Process counts are dynamically limited based on matrix size to ensure perfect load balancing:

4×4 Matrix: 2 or 4 processes (2 or 1 rows per process)
6×6 Matrix: 2 processes only (3 rows per process)
8×8 Matrix: 2, 4, or 8 processes (4, 2, or 1 rows per process)
16×16 Matrix: 2, 4, 8, or 16 processes (8, 4, 2, or 1 rows per process)

Note: Process count dropdown updates automatically when matrix size changes to show only valid options.

Why the divisibility rule? This simulation uses a simple row-wise decomposition where matrix rows are split evenly among processes. For this to work, the matrix size must be perfectly divisible by the process count. This ensures perfect load balancing, where every process does the same amount of work.

📊 MPI Matrix Multiplication Algorithm

Phase 1 - Data Distribution (Scatter):
- Master process (rank 0) divides matrix A into row blocks
- Each worker process receives assigned rows + complete matrix B
- Row distribution shown by color coding on matrices
Phase 2 - Parallel Computation:
- Each process computes its assigned rows of result matrix C
- Processes work independently and simultaneously
- Matrix cells light up as calculations complete
Phase 3 - Result Collection (Gather):
- Master process collects computed row blocks from all workers
- Final result matrix C is assembled and displayed
- Performance comparison shows parallel vs sequential timing

🎮 Using the Simulation

Basic Workflow:
1. Configure processes and matrix size
2. Choose random or manual matrix values
3. Select automatic or step-by-step mode
4. Click "Start Simulation" to begin
Manual Matrix Editing:
- Switch to "Manual Edit" mode
- Click any cell in matrices A or B to edit values
- Use "Randomize Matrices" to generate new random values
Step-by-Step Mode:
- Click "Next Step" to progress through each phase
- Read detailed logs for each operation
- Perfect for understanding the algorithm step-by-step

📈 Understanding the Visualization

Process Grid: Shows all MPI processes with their current status and assigned work
Process Colors: Each process has a unique color that matches its assigned matrix rows
Matrix Color Coding: Rows in matrices are colored to show which process handles them
Process States:
- Idle: Process waiting for work assignment
- Computing: Process actively calculating matrix operations
- Communicating: Process sending/receiving data
- Completed: Process finished its assigned work
Matrix Cell Animation: Cells light up as calculations complete, showing real-time progress

🚀 Parallel Computing Concepts

Data Parallelism: Same operation (matrix multiplication) applied to different data chunks
Load Balancing: Work distributed evenly among processes (rows per process ≈ matrix_size / num_processes)
Communication Overhead: Time spent sending/receiving data between processes
Scalability: Performance improvement with more processes (ideally linear speedup)
Master-Worker Pattern: Rank 0 coordinates work distribution and result collection

📊 Performance Analysis

Speedup: How much faster parallel execution is compared to sequential
Efficiency: How well the parallel algorithm uses available processors
Communication vs Computation: Ratio of time spent communicating vs calculating
Optimal Process Count: Point where adding more processes doesn't improve performance
Matrix Size Impact: Larger matrices typically show better parallel efficiency

🎯 Recommended Experiments

Basic Parallel Execution:
- Start with 2 processes and 8×8 matrix
- Use automatic mode to see full algorithm flow
- Observe how work is distributed and collected
Scalability Testing:
- 8×8 matrix: Test with 2, 4, and 8 processes
- 16×16 matrix: Test with 2, 4, 8, and 16 processes
- Compare execution times and efficiency
- Find optimal process count for different matrix sizes
Matrix Size Impact:
- Use fixed process count (4) with different matrix sizes
- Observe how parallel efficiency changes with problem size
- Larger matrices should show better speedup
Algorithm Understanding:
- Use step-by-step mode with manual matrix values
- Create simple test cases (like identity matrix)
- Verify results and understand each phase
Communication Analysis:
- Use slow animation speed to observe communication patterns
- Compare communication overhead with different process counts
- Understand when communication becomes a bottleneck

💾 MPI Code Download

Complete Implementation: Download working MPI C code for matrix multiplication
Ready to Compile: Includes all necessary MPI functions and proper error handling
Educational Comments: Detailed explanations of each code section
Compilation Instructions: How to compile and run with mpicc and mpirun
Performance Measurements: Built-in timing functions to measure speedup

🔍 Key Observations

Load Distribution: Notice how rows are divided among processes (shown by colors)
Parallel Efficiency: More processes don't always mean faster execution (overhead matters)
Communication Patterns: Master-worker communication in scatter and gather phases
Synchronization: All processes must complete before final result assembly
Matrix Properties: Multiplication requires A rows × B columns operations
Memory Distribution: Each process needs its assigned A rows + complete matrix B

❓ Common Questions

Q: Why use more processes than CPU cores? A: Educational purposes - shows communication overhead effects
Q: Why do cells light up at different times? A: Visualizes parallel computation happening simultaneously
Q: What if processes > matrix rows? A: Some processes remain idle (realistic scenario)
Q: Why is sequential sometimes faster? A: Small matrices have high communication-to-computation ratio
Q: How does this relate to real HPC? A: Same principles apply to supercomputers with thousands of cores