A public domain, cross-platform, lock-free thread caching memory allocator with 16-byte alignment, implemented in C.
rpmalloc is a high-performance, lock-free memory allocator implemented in C. It provides fast allocation and deallocation for multithreaded applications by using per-thread caches to avoid lock contention. It solves the problem of memory allocation bottlenecks in performance-critical systems.
System programmers, game developers, and embedded engineers building high-performance applications in C or C++ that require efficient, low-latency memory management.
Developers choose rpmalloc for its combination of lock-free design, cross-platform portability, and benchmarked performance advantages over allocators like tcmalloc and ptmalloc3, all in a single, readable source file.
Public domain cross platform lock free thread caching 16-byte aligned memory allocator implemented in C
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Each thread maintains its own cache, eliminating locks and reducing contention, which leads to faster allocations in multithreaded apps, as benchmarked against tcmalloc and ptmalloc3.
Supports Windows, Linux, macOS, iOS, Android, and is easily portable to any platform with atomic operations and mmap-style virtual memory APIs, making it versatile for diverse systems.
All allocations are aligned to 16 bytes without extra calls, beneficial for SIMD and performance-sensitive operations, simplifying code for high-performance computing.
Implemented in a single ~2200-line C file, designed to be readable and modifiable, reducing complexity for developers who need to understand or customize the allocator.
Public domain or MIT licensed with no restrictions, allowing free use and distribution in both commercial and open-source projects without legal hurdles.
Must call rpmalloc_initialize before any other function, or risk undefined behavior, adding integration overhead and potential for misuse in large codebases.
The library assumes valid inputs and does not guard against errors like invalid pointer frees, which can lead to segmentation faults and requires external validation.
Different size classes allocate pages apart in virtual address space, potentially leading to address space fragmentation, even though internal fragmentation is minimized.
Optimal performance requires tweaking cache limits and understanding size classes, which can be non-trivial and may need per-application tuning, as noted in the CACHE documentation.