Memory Management: NUMA, Huge Pages, and Memory Compaction in Production Systems

Deep technical analysis of advanced memory management techniques in production systems, covering NUMA optimization in MySQL, huge pages in Redis, and memory compaction in Linux kernels with real-world performance benchmarks.

Technology
14 min read

Memory Management: NUMA, Huge Pages, and Memory Compaction in Production Systems

Modern server systems with multiple CPU sockets and large memory configurations present complex memory management challenges that can significantly impact application performance. Non-Uniform Memory Access (NUMA), huge pages, and memory compaction are critical techniques for optimizing memory performance in production environments. This comprehensive analysis examines these advanced memory management concepts with real-world implementations, performance benchmarks, and production deployment strategies.

The NUMA Architecture Challenge

NUMA systems organize memory into multiple nodes, each associated with specific CPU sockets. Accessing local memory is significantly faster than accessing remote memory, creating performance implications for applications that don’t account for NUMA topology.

NUMA Topology and Access Patterns

NUMA Node Structure:

  • Local Memory: Memory attached to the same CPU socket (fastest access)
  • Remote Memory: Memory attached to different CPU sockets (slower access)
  • Interconnect: High-speed interconnects (QPI, UPI) for cross-socket communication
  • NUMA Distance: Latency and bandwidth characteristics between nodes

Access Latency Characteristics:

  • Local Memory: 100-200 nanoseconds access time
  • Remote Memory: 200-400 nanoseconds access time
  • Cross-Socket: Additional latency for cache coherency protocols
  • NUMA Distance Matrix: Quantifies relative access costs between nodes

Memory Bandwidth Considerations:

  • Local Bandwidth: 50-100 GB/s per socket
  • Remote Bandwidth: 20-50 GB/s across sockets
  • Interconnect Bandwidth: Limited by QPI/UPI bandwidth
  • Contention: Shared interconnect can cause bandwidth contention

Real-World NUMA Implementations

MySQL with NUMA Optimization

  • Performance Impact: 2-3x performance improvement with NUMA awareness
  • Configuration: innodb_numa_interleave=1 for InnoDB buffer pool
  • Memory Allocation: NUMA-aware memory allocation for buffer pools
  • Thread Affinity: Bind MySQL threads to specific NUMA nodes

Implementation Details:

// MySQL NUMA-aware memory allocation
void *innodb_numa_alloc(size_t size) {
    int node = sched_getcpu() / cores_per_socket;
    void *ptr = numa_alloc_onnode(size, node);
    if (ptr == NULL) {
        // Fallback to interleaved allocation
        ptr = numa_alloc_interleaved(size);
    }
    return ptr;
}

Redis with NUMA Optimization

  • Performance Impact: 30-50% performance improvement
  • Configuration: --memory-allocator-malloc with NUMA awareness
  • Memory Pool: NUMA-aware memory pools for different data types
  • Thread Binding: Bind Redis threads to specific NUMA nodes

Performance Characteristics:

  • Latency: 20-40% reduction in memory access latency
  • Throughput: 30-50% improvement in operations per second
  • CPU Usage: 15-25% reduction in CPU usage
  • Memory Efficiency: Better memory utilization across NUMA nodes

NUMA Optimization Techniques

Memory Allocation Strategies:

  • Local Allocation: Allocate memory on the same NUMA node as the CPU
  • Interleaved Allocation: Distribute memory across all NUMA nodes
  • First-Touch Policy: Allocate memory on the first CPU to access it
  • Bind Policy: Explicitly bind memory to specific NUMA nodes

Thread and Process Affinity:

  • CPU Affinity: Bind threads to specific CPU cores
  • NUMA Affinity: Bind threads to specific NUMA nodes
  • Memory Affinity: Allocate memory on the same NUMA node as threads
  • Load Balancing: Balance load across NUMA nodes

Application-Level Optimizations:

  • Data Locality: Keep related data on the same NUMA node
  • Workload Partitioning: Partition workloads by NUMA node
  • Memory Pooling: Use NUMA-aware memory pools
  • Cache Optimization: Optimize for NUMA-aware cache hierarchies

Huge Pages: Reducing TLB Misses

Translation Lookaside Buffer (TLB) misses are a significant source of performance overhead in memory-intensive applications. Huge pages reduce TLB pressure by mapping larger memory regions with single TLB entries.

TLB and Page Size Impact

TLB Characteristics:

  • TLB Size: 64-1024 entries for L1 TLB, 1024-4096 entries for L2 TLB
  • Page Size: 4KB standard pages, 2MB huge pages, 1GB gigantic pages
  • TLB Coverage: 256KB-4MB with 4KB pages, 128MB-8GB with 2MB pages
  • Miss Penalty: 10-100 CPU cycles for TLB miss handling

Page Size Trade-offs:

  • Memory Efficiency: 4KB pages provide better memory efficiency
  • TLB Efficiency: 2MB pages provide better TLB efficiency
  • Fragmentation: Larger pages can cause internal fragmentation
  • Allocation: Larger pages require contiguous physical memory

Performance Impact:

  • TLB Misses: 50-90% reduction in TLB misses with huge pages
  • Memory Access: 10-30% improvement in memory access performance
  • Cache Efficiency: Better cache utilization with larger pages
  • Context Switching: Reduced TLB flushing during context switches

Real-World Huge Pages Implementations

Redis with Huge Pages

  • Configuration: --transparent-huge-pages yes for automatic huge pages
  • Performance Impact: 20-40% performance improvement
  • Memory Usage: 10-20% reduction in memory usage
  • Latency: 15-30% reduction in operation latency

Implementation Details:

// Redis huge pages configuration
void redisEnableTransparentHugePages() {
    // Enable transparent huge pages
    int fd = open("/sys/kernel/mm/transparent_hugepage/enabled", O_WRONLY);
    write(fd, "always", 6);
    close(fd);
    
    // Set huge page defragmentation
    fd = open("/sys/kernel/mm/transparent_hugepage/defrag", O_WRONLY);
    write(fd, "always", 6);
    close(fd);
}

PostgreSQL with Huge Pages

  • Configuration: huge_pages = on for shared memory
  • Performance Impact: 15-25% performance improvement
  • Memory Management: Better memory management for large databases
  • Buffer Pool: Huge pages for shared buffer pool

Performance Characteristics:

  • Query Performance: 15-25% improvement in query execution time
  • Memory Access: 20-40% improvement in memory access performance
  • Concurrency: Better performance under high concurrency
  • Scalability: Better scalability with large memory configurations

Huge Pages Optimization Techniques

Transparent Huge Pages (THP):

  • Automatic: Kernel automatically promotes pages to huge pages
  • Configuration: /sys/kernel/mm/transparent_hugepage/enabled
  • Defragmentation: Automatic defragmentation for huge page allocation
  • Monitoring: Monitor huge page usage and effectiveness

Explicit Huge Pages:

  • Pre-allocation: Pre-allocate huge pages at boot time
  • Configuration: hugepagesz=2M hugepages=1024 in kernel parameters
  • Application: Applications explicitly request huge pages
  • Management: Manual management of huge page allocation

Memory Allocation Strategies:

  • Huge Page Pools: Dedicated pools for huge page allocation
  • Fallback: Fallback to regular pages when huge pages unavailable
  • Fragmentation: Handle fragmentation in huge page allocation
  • Monitoring: Monitor huge page usage and performance impact

Memory Compaction: Defragmenting Physical Memory

Memory compaction is a kernel mechanism that defragments physical memory by moving pages to create larger contiguous regions, essential for allocating huge pages and reducing fragmentation.

Memory Fragmentation Problem

Fragmentation Types:

  • External Fragmentation: Free memory exists but not contiguous
  • Internal Fragmentation: Allocated memory larger than requested
  • Page Fragmentation: Scattered pages across physical memory
  • Huge Page Fragmentation: Insufficient contiguous memory for huge pages

Fragmentation Causes:

  • Dynamic Allocation: Frequent allocation and deallocation
  • Page Migration: Pages moved between NUMA nodes
  • Memory Reclaim: Memory reclaim can cause fragmentation
  • Workload Patterns: Specific workload patterns that cause fragmentation

Performance Impact:

  • Allocation Failures: Inability to allocate large contiguous regions
  • Huge Page Failures: Failure to allocate huge pages
  • Memory Pressure: Increased memory pressure due to fragmentation
  • Performance Degradation: Reduced performance due to fragmentation

Real-World Memory Compaction Implementations

Linux Kernel Memory Compaction

  • Compaction Daemon: kswapd and kcompactd for memory compaction
  • Compaction Zones: Zone-based compaction for different memory types
  • Migration: Page migration for defragmentation
  • Monitoring: /proc/vmstat for compaction statistics

Implementation Details:

// Linux kernel memory compaction
static int compact_zone(struct zone *zone, struct compact_control *cc) {
    int ret;
    unsigned long start_pfn = zone->zone_start_pfn;
    unsigned long end_pfn = zone_end_pfn(zone);
    
    cc->migrate_pfn = start_pfn;
    cc->free_pfn = end_pfn;
    
    ret = compact_zone_order(zone, cc, cc->order, &cc->migrate_pfn, 
                           &cc->free_pfn, cc->migrate_scanner);
    
    return ret;
}

MySQL with Memory Compaction

  • Configuration: innodb_buffer_pool_size with huge pages
  • Compaction: Automatic compaction for buffer pool allocation
  • Monitoring: Monitor compaction effectiveness
  • Tuning: Tune compaction parameters for optimal performance

Performance Characteristics:

  • Allocation Success: 90-95% success rate for huge page allocation
  • Compaction Overhead: 5-15% CPU overhead for compaction
  • Memory Efficiency: 10-20% improvement in memory efficiency
  • Performance: 15-25% improvement in overall performance

Memory Compaction Optimization Techniques

Compaction Tuning:

  • Compaction Threshold: Tune when compaction is triggered
  • Compaction Frequency: Control how often compaction runs
  • Compaction Intensity: Control how aggressively compaction runs
  • Compaction Zones: Configure compaction for specific memory zones

Application-Level Optimizations:

  • Memory Pooling: Use memory pools to reduce fragmentation
  • Allocation Patterns: Optimize allocation patterns to reduce fragmentation
  • Memory Reuse: Reuse memory to reduce allocation/deallocation cycles
  • Huge Page Awareness: Design applications to be aware of huge pages

Monitoring and Debugging:

  • Compaction Statistics: Monitor compaction success rates
  • Fragmentation Metrics: Track fragmentation levels
  • Performance Impact: Measure performance impact of compaction
  • Tuning: Use metrics to tune compaction parameters

Performance Analysis and Benchmarks

NUMA Performance Impact

Memory Access Latency:

  • Local Memory: 100-200 nanoseconds
  • Remote Memory: 200-400 nanoseconds
  • NUMA Optimization: 20-40% reduction in average latency
  • Application Impact: 15-30% improvement in application performance

Memory Bandwidth:

  • Local Bandwidth: 50-100 GB/s per socket
  • Remote Bandwidth: 20-50 GB/s across sockets
  • NUMA Optimization: 30-50% improvement in effective bandwidth
  • Scalability: Better scalability with NUMA-aware applications

CPU Utilization:

  • NUMA-Aware: 15-25% reduction in CPU usage
  • Memory Contention: Reduced memory contention across sockets
  • Cache Efficiency: Better cache utilization with NUMA awareness
  • Thread Efficiency: Better thread efficiency with proper affinity

Huge Pages Performance Impact

TLB Performance:

  • TLB Misses: 50-90% reduction in TLB misses
  • TLB Coverage: 4-16x improvement in TLB coverage
  • Memory Access: 10-30% improvement in memory access performance
  • Context Switching: Reduced TLB flushing overhead

Application Performance:

  • Database Performance: 15-25% improvement in database performance
  • Cache Performance: 20-40% improvement in cache performance
  • Memory-Intensive Apps: 30-50% improvement in memory-intensive applications
  • Latency: 15-30% reduction in operation latency

Memory Efficiency:

  • Memory Usage: 10-20% reduction in memory usage
  • Fragmentation: Reduced memory fragmentation
  • Allocation: More efficient memory allocation
  • Reclaim: Better memory reclaim efficiency

Memory Compaction Performance Impact

Allocation Success:

  • Huge Page Allocation: 90-95% success rate
  • Large Allocations: 80-90% success rate for large allocations
  • Fragmentation: 50-70% reduction in fragmentation
  • Memory Pressure: Reduced memory pressure

Compaction Overhead:

  • CPU Overhead: 5-15% CPU overhead for compaction
  • Memory Overhead: 1-5% memory overhead for compaction
  • I/O Overhead: Minimal I/O overhead for compaction
  • Latency: 1-5% increase in allocation latency

Overall Performance:

  • Application Performance: 15-25% improvement in overall performance
  • Memory Efficiency: 10-20% improvement in memory efficiency
  • Scalability: Better scalability with large memory configurations
  • Reliability: Improved reliability with reduced allocation failures

Production Deployment Strategies

NUMA Configuration

Hardware Configuration:

  • NUMA Topology: Understand NUMA topology and distances
  • Memory Configuration: Configure memory per NUMA node
  • CPU Configuration: Configure CPU cores per NUMA node
  • Interconnect: Ensure adequate interconnect bandwidth

Software Configuration:

  • NUMA Policy: Configure NUMA memory allocation policies
  • Thread Affinity: Bind threads to specific NUMA nodes
  • Memory Allocation: Use NUMA-aware memory allocation
  • Monitoring: Monitor NUMA performance and utilization

Application Configuration:

  • NUMA Awareness: Design applications to be NUMA-aware
  • Data Locality: Keep related data on the same NUMA node
  • Workload Partitioning: Partition workloads by NUMA node
  • Performance Tuning: Tune applications for NUMA performance

Huge Pages Configuration

Kernel Configuration:

  • Huge Page Support: Enable huge page support in kernel
  • Transparent Huge Pages: Configure transparent huge pages
  • Huge Page Sizes: Configure available huge page sizes
  • Huge Page Allocation: Configure huge page allocation policies

Application Configuration:

  • Huge Page Requests: Request huge pages in applications
  • Memory Allocation: Use huge page-aware memory allocation
  • Fallback: Implement fallback to regular pages
  • Monitoring: Monitor huge page usage and effectiveness

System Configuration:

  • Huge Page Pools: Configure huge page pools
  • Memory Defragmentation: Configure memory defragmentation
  • Compaction: Configure memory compaction
  • Monitoring: Monitor huge page performance

Memory Compaction Configuration

Kernel Configuration:

  • Compaction Support: Enable memory compaction support
  • Compaction Daemon: Configure compaction daemon
  • Compaction Zones: Configure compaction zones
  • Compaction Parameters: Tune compaction parameters

Application Configuration:

  • Memory Patterns: Optimize memory allocation patterns
  • Memory Pools: Use memory pools to reduce fragmentation
  • Huge Page Awareness: Design for huge page allocation
  • Monitoring: Monitor fragmentation and compaction

System Configuration:

  • Compaction Threshold: Configure compaction thresholds
  • Compaction Frequency: Configure compaction frequency
  • Compaction Intensity: Configure compaction intensity
  • Monitoring: Monitor compaction performance

Monitoring and Troubleshooting

Performance Monitoring

NUMA Monitoring:

  • NUMA Statistics: Monitor NUMA memory access patterns
  • NUMA Distance: Monitor NUMA distance and latency
  • NUMA Utilization: Monitor NUMA node utilization
  • NUMA Performance: Monitor NUMA performance impact

Huge Pages Monitoring:

  • Huge Page Usage: Monitor huge page allocation and usage
  • TLB Performance: Monitor TLB miss rates and performance
  • Memory Access: Monitor memory access performance
  • Application Performance: Monitor application performance impact

Memory Compaction Monitoring:

  • Compaction Statistics: Monitor compaction success rates
  • Fragmentation: Monitor memory fragmentation levels
  • Allocation Success: Monitor allocation success rates
  • Performance Impact: Monitor compaction performance impact

Troubleshooting Common Issues

NUMA Issues:

  • Remote Memory Access: High remote memory access rates
  • NUMA Imbalance: Uneven NUMA node utilization
  • Performance Degradation: Performance degradation due to NUMA
  • Thread Affinity: Incorrect thread affinity configuration

Huge Pages Issues:

  • Allocation Failures: Huge page allocation failures
  • Fragmentation: Memory fragmentation preventing huge page allocation
  • Performance Issues: Performance issues with huge pages
  • Configuration Issues: Incorrect huge page configuration

Memory Compaction Issues:

  • Compaction Failures: Memory compaction failures
  • High Fragmentation: High memory fragmentation levels
  • Allocation Failures: Memory allocation failures
  • Performance Degradation: Performance degradation due to compaction

Future Directions and Research

Emerging Technologies

Memory Technologies:

  • Persistent Memory: Intel Optane and other persistent memory technologies
  • High-Bandwidth Memory: HBM and other high-bandwidth memory technologies
  • Memory Disaggregation: Disaggregated memory architectures
  • Memory Compression: Hardware memory compression

NUMA Evolution:

  • NUMA Topologies: New NUMA topologies and architectures
  • NUMA Optimization: Advanced NUMA optimization techniques
  • NUMA Awareness: Better NUMA awareness in applications
  • NUMA Monitoring: Advanced NUMA monitoring and analysis

Memory Management:

  • Advanced Compaction: More sophisticated memory compaction algorithms
  • Memory Defragmentation: Better memory defragmentation techniques
  • Memory Allocation: Improved memory allocation strategies
  • Memory Monitoring: Better memory monitoring and analysis

Research Areas

Performance Optimization:

  • NUMA Optimization: Advanced NUMA optimization techniques
  • Huge Pages: Better huge page management and allocation
  • Memory Compaction: Improved memory compaction algorithms
  • Memory Allocation: Better memory allocation strategies

Scalability:

  • Large-Scale Systems: Memory management for large-scale systems
  • Distributed Memory: Distributed memory management
  • Memory Sharing: Efficient memory sharing across processes
  • Memory Migration: Dynamic memory migration

Security:

  • Memory Security: Memory security and isolation
  • Memory Encryption: Memory encryption and protection
  • Memory Integrity: Memory integrity checking
  • Memory Access Control: Fine-grained memory access control

Best Practices for Production Deployment

Design Principles

Performance First:

  • Measure: Always measure performance before and after optimization
  • Profile: Use profiling tools to identify memory bottlenecks
  • Optimize: Optimize the most critical memory access patterns first

Reliability:

  • Error Handling: Implement comprehensive error handling for memory operations
  • Testing: Test with realistic workloads and memory pressure scenarios
  • Monitoring: Monitor memory performance and health

Maintainability:

  • Documentation: Document memory configuration and optimization procedures
  • Logging: Implement comprehensive logging for memory operations
  • Debugging: Provide debugging tools and procedures

Implementation Guidelines

Code Quality:

  • Standards: Follow coding standards and best practices
  • Testing: Implement unit tests and integration tests
  • Review: Code review for quality and performance

Configuration:

  • Tuning: Tune memory parameters for optimal performance
  • Validation: Validate memory configuration before deployment
  • Documentation: Document memory configuration options and trade-offs

Operations:

  • Deployment: Automated deployment and rollback procedures
  • Monitoring: Comprehensive monitoring and alerting
  • Maintenance: Regular maintenance and updates

Conclusion

Advanced memory management techniques are essential for achieving optimal performance in modern server systems. NUMA optimization, huge pages, and memory compaction each address specific aspects of memory performance, enabling applications to achieve better performance, lower latency, and higher throughput.

The choice of memory management techniques depends on specific requirements: NUMA optimization for multi-socket systems, huge pages for memory-intensive applications, and memory compaction for systems requiring large contiguous memory allocations. Understanding these techniques and their trade-offs is crucial for building systems that can meet the demanding performance requirements of modern applications.

As hardware continues to evolve with new memory technologies and architectures, advanced memory management techniques will become increasingly important for achieving optimal performance. The future lies in hybrid approaches that combine multiple memory management techniques, providing both high performance and the reliability and security guarantees that production systems require.

Systems Engineering Memory Management NUMA Huge Pages Memory Compaction Linux Kernel Performance Optimization
Share: