đ§ System Configuration
- Number of Processes: Choose from available options based on matrix size. Process count depends on matrix divisibility for optimal load balancing.
- Matrix Size: Select 4Ã4, 6Ã6, 8Ã8, or 16Ã16 matrices. Each size has specific process count options for perfect row distribution.
- Matrix Values:
- Random Values: System generates random numbers (0-9) for matrices A and B
- Manual Edit: Click on matrix cells to edit values manually for testing
- Animation Speed: Control visualization speed (0.25x to 3x) to match your learning pace.
- Execution Mode:
- Automatic: Watch the complete parallel algorithm execution
- Step-by-Step: Manually progress through each phase of the algorithm
âī¸ Process Count Configuration
Process counts are dynamically limited based on matrix size to ensure perfect load balancing:
- 4Ã4 Matrix: 2 or 4 processes (2 or 1 rows per process)
- 6Ã6 Matrix: 2 processes only (3 rows per process)
- 8Ã8 Matrix: 2, 4, or 8 processes (4, 2, or 1 rows per process)
- 16Ã16 Matrix: 2, 4, 8, or 16 processes (8, 4, 2, or 1 rows per process)
Note: Process count dropdown updates automatically when matrix size changes to show only valid options.
Why the divisibility rule? This simulation uses a simple row-wise decomposition where matrix rows are split evenly among processes. For this to work, the matrix size must be perfectly divisible by the process count. This ensures perfect load balancing, where every process does the same amount of work.
đ MPI Matrix Multiplication Algorithm
- Phase 1 - Data Distribution (Scatter):
- Master process (rank 0) divides matrix A into row blocks
- Each worker process receives assigned rows + complete matrix B
- Row distribution shown by color coding on matrices
- Phase 2 - Parallel Computation:
- Each process computes its assigned rows of result matrix C
- Processes work independently and simultaneously
- Matrix cells light up as calculations complete
- Phase 3 - Result Collection (Gather):
- Master process collects computed row blocks from all workers
- Final result matrix C is assembled and displayed
- Performance comparison shows parallel vs sequential timing
đŽ Using the Simulation
- Basic Workflow:
- Configure processes and matrix size
- Choose random or manual matrix values
- Select automatic or step-by-step mode
- Click "Start Simulation" to begin
- Manual Matrix Editing:
- Switch to "Manual Edit" mode
- Click any cell in matrices A or B to edit values
- Use "Randomize Matrices" to generate new random values
- Step-by-Step Mode:
- Click "Next Step" to progress through each phase
- Read detailed logs for each operation
- Perfect for understanding the algorithm step-by-step
đ Understanding the Visualization
- Process Grid: Shows all MPI processes with their current status and assigned work
- Process Colors: Each process has a unique color that matches its assigned matrix rows
- Matrix Color Coding: Rows in matrices are colored to show which process handles them
- Process States:
- Idle: Process waiting for work assignment
- Computing: Process actively calculating matrix operations
- Communicating: Process sending/receiving data
- Completed: Process finished its assigned work
- Matrix Cell Animation: Cells light up as calculations complete, showing real-time progress
đ Parallel Computing Concepts
- Data Parallelism: Same operation (matrix multiplication) applied to different data chunks
- Load Balancing: Work distributed evenly among processes (rows per process â matrix_size / num_processes)
- Communication Overhead: Time spent sending/receiving data between processes
- Scalability: Performance improvement with more processes (ideally linear speedup)
- Master-Worker Pattern: Rank 0 coordinates work distribution and result collection
đ Performance Analysis
- Speedup: How much faster parallel execution is compared to sequential
- Efficiency: How well the parallel algorithm uses available processors
- Communication vs Computation: Ratio of time spent communicating vs calculating
- Optimal Process Count: Point where adding more processes doesn't improve performance
- Matrix Size Impact: Larger matrices typically show better parallel efficiency
đ¯ Recommended Experiments
- Basic Parallel Execution:
- Start with 2 processes and 8Ã8 matrix
- Use automatic mode to see full algorithm flow
- Observe how work is distributed and collected
- Scalability Testing:
- 8Ã8 matrix: Test with 2, 4, and 8 processes
- 16Ã16 matrix: Test with 2, 4, 8, and 16 processes
- Compare execution times and efficiency
- Find optimal process count for different matrix sizes
- Matrix Size Impact:
- Use fixed process count (4) with different matrix sizes
- Observe how parallel efficiency changes with problem size
- Larger matrices should show better speedup
- Algorithm Understanding:
- Use step-by-step mode with manual matrix values
- Create simple test cases (like identity matrix)
- Verify results and understand each phase
- Communication Analysis:
- Use slow animation speed to observe communication patterns
- Compare communication overhead with different process counts
- Understand when communication becomes a bottleneck
đž MPI Code Download
- Complete Implementation: Download working MPI C code for matrix multiplication
- Ready to Compile: Includes all necessary MPI functions and proper error handling
- Educational Comments: Detailed explanations of each code section
- Compilation Instructions: How to compile and run with mpicc and mpirun
- Performance Measurements: Built-in timing functions to measure speedup
đ Key Observations
- Load Distribution: Notice how rows are divided among processes (shown by colors)
- Parallel Efficiency: More processes don't always mean faster execution (overhead matters)
- Communication Patterns: Master-worker communication in scatter and gather phases
- Synchronization: All processes must complete before final result assembly
- Matrix Properties: Multiplication requires A rows à B columns operations
- Memory Distribution: Each process needs its assigned A rows + complete matrix B
â Common Questions
- Q: Why use more processes than CPU cores? A: Educational purposes - shows communication overhead effects
- Q: Why do cells light up at different times? A: Visualizes parallel computation happening simultaneously
- Q: What if processes > matrix rows? A: Some processes remain idle (realistic scenario)
- Q: Why is sequential sometimes faster? A: Small matrices have high communication-to-computation ratio
- Q: How does this relate to real HPC? A: Same principles apply to supercomputers with thousands of cores