Module heapless::pool

source ·
Expand description

A heap-less, interrupt-safe, lock-free memory pool (*)

NOTE: This module is not available on targets that do not support CAS operations and are not emulated by the atomic_polyfill crate (e.g., MSP430).

(*) Currently, the implementation is only lock-free and Sync on ARMv6, ARMv7-{A,R,M} & ARMv8-M devices

§Examples

The most common way of using this pool is as a global singleton; the singleton mode gives you automatic deallocation of memory blocks on drop.

#![no_main]
#![no_std]

use cortex_m_rt::{entry, exception};
use heapless::{
    pool,
    pool::singleton::{Box, Pool},
};

// instantiate a memory pool of `[u8; 128]` blocks as a global singleton
pool!(
    // attributes can be used here
    // #[link_section = ".ccram.A"]
    A: [u8; 128]
);

#[entry]
fn main() -> ! {
    static mut MEMORY: [u8; 1024] = [0; 1024];

    // increase the capacity of the pool by ~8 blocks
    A::grow(MEMORY);

    // claim a block of memory
    // note that the type is `Box<A>`, and not `Box<[u8; 128]>`
    // `A` is the "name" of the pool
    let x: Box<A, _> = A::alloc().unwrap();
    loop {
        // .. do stuff with `x` ..
    }
}

#[exception]
fn SysTick() {
    // claim a block of memory
    let y = A::alloc().unwrap();

    // .. do stuff with `y` ..

    // return the memory block to the pool
    drop(y);
}

§Portability

This pool internally uses a Treiber stack which is known to be susceptible to the ABA problem. The only counter measure against the ABA problem that this implementation currently takes is relying on LL/SC (Link-local / Store-conditional) instructions being used to implement CAS loops on the target architecture (see section on ‘Soundness’ for more information). For this reason, Pool only implements Sync when compiling for some ARM cores.

This module requires CAS atomic instructions which are not available on all architectures (e.g. ARMv6-M (thumbv6m-none-eabi) and MSP430 (msp430-none-elf)). These atomics can be emulated however with atomic_polyfill, which is enabled with the cas feature and is enabled by default for thumbv6m-none-eabi and riscv32 targets. MSP430 is currently not supported by atomic_polyfill.

§Soundness

This pool uses a Treiber stack to keep a list of free memory blocks (nodes). Each of these nodes has a pointer to the next node. To claim a memory block we simply pop a node from the top of the stack and use it as a memory block. The pop operation consists of swapping the current head (top) node with the node below it. The Rust code for the pop operation is shown below:

fn pop(&self) -> Option<NonNull<Node<T>>> {
    let fetch_order = ..;
    let set_order = ..;

    // `self.head` has type `AtomicPtr<Node<T>>`
    // where `struct Node<T> { next: AtomicPtr<Node<T>>, data: UnsafeCell<T> }`
    let mut head = self.head.load(fetch_order);
    loop {
        if let Some(nn_head) = NonNull::new(head) {
            let next = unsafe { (*head).next.load(Ordering::Relaxed) };

            // <~ preempted

            match self
                .head
                .compare_exchange_weak(head, next, set_order, fetch_order)
            {
                Ok(_) => break Some(nn_head),
                // head was changed by some interrupt handler / thread
                Err(new_head) => head = new_head,
            }
        } else {
            // stack is observed as empty
            break None;
        }
    }
}

In general, the pop operation is susceptible to the ABA problem. If this operation gets preempted by some interrupt handler somewhere between the head.load and the compare_and_exchange_weak, and that handler modifies the stack in such a way that the head (top) of the stack remains unchanged then resuming the pop operation will corrupt the stack.

An example: imagine we are doing on pop on stack that contains these nodes: A -> B -> C, A is the head (top), B is next to A and C is next to B. The pop operation will do a CAS(&self.head, A, B) operation to atomically change the head to B iff it currently is A. Now, let’s say a handler preempts the pop operation before the CAS operation starts and it pops the stack twice and then pushes back the A node; now the state of the stack is A -> C. When the original pop operation is resumed it will succeed in doing the CAS operation setting B as the head of the stack. However, B was used by the handler as a memory block and no longer is a valid free node. As a result the stack, and thus the allocator, is in a invalid state.

However, not all is lost because ARM devices use LL/SC (Link-local / Store-conditional) operations to implement CAS loops. Let’s look at the actual disassembly of pop for the ARM Cortex-M.

08000130 <<heapless::pool::Pool<T>>::pop>:
 8000130:       6802            ldr     r2, [r0, #0]
 8000132:       e00c            b.n     800014e <<heapless::pool::Pool<T>>::pop+0x1e>
 8000134:       4611            mov     r1, r2
 8000136:       f8d2 c000       ldr.w   ip, [r2]
 800013a:       e850 2f00       ldrex   r2, [r0]
 800013e:       428a            cmp     r2, r1
 8000140:       d103            bne.n   800014a <<heapless::pool::Pool<T>>::pop+0x1a>
 8000142:       e840 c300       strex   r3, ip, [r0]
 8000146:       b913            cbnz    r3, 800014e <<heapless::pool::Pool<T>>::pop+0x1e>
 8000148:       e004            b.n     8000154 <<heapless::pool::Pool<T>>::pop+0x24>
 800014a:       f3bf 8f2f       clrex
 800014e:       2a00            cmp     r2, #0
 8000150:       d1f0            bne.n   8000134 <<heapless::pool::Pool<T>>::pop+0x4>
 8000152:       2100            movs    r1, #0
 8000154:       4608            mov     r0, r1
 8000156:       4770            bx      lr

LDREX (“load exclusive”) is the LL instruction, and STREX (“store exclusive”) is the SC instruction (see 1). On the Cortex-M, STREX will always fail if the processor takes an exception between it and its corresponding LDREX operation (see 2). If STREX fails then the CAS loop is retried (see instruction @ 0x8000146). On single core systems, preemption is required to run into the ABA problem and on Cortex-M devices preemption always involves taking an exception. Thus the underlying LL/SC operations prevent the ABA problem on Cortex-M.

In the case of multi-core systems if any other core successfully does a STREX op on the head while the current core is somewhere between LDREX and STREX then the current core will fail its STREX operation.

§x86_64 support / limitations

NOTE Pool is only Sync on x86_64 and x86 (i686) if the Cargo feature “x86-sync-pool” is enabled

x86_64 support is a gamble. Yes, a gamble. Do you feel lucky enough to use Pool on x86_64?

As it’s not possible to implement ideal LL/SC semantics (*) on x86_64 the architecture is susceptible to the ABA problem described above. To reduce the chances of ABA occurring in practice we use version tags (keyword: IBM ABA-prevention tags). Again, this approach does not fix / prevent / avoid the ABA problem; it only reduces the chance of it occurring in practice but the chances of it occurring are not reduced to zero.

How we have implemented version tags: instead of using an AtomicPtr to link the stack Nodes we use an AtomicUsize where the 64-bit usize is always comprised of a monotonically increasing 32-bit tag (higher bits) and a 32-bit signed address offset. The address of a node is computed by adding the 32-bit offset to an “anchor” address (the address of a static variable that lives somewhere in the .bss linker section). The tag is increased every time a node is popped (removed) from the stack.

To see how version tags can prevent ABA consider the example from the previous section. Let’s start with a stack in this state: (~A, 0) -> (~B, 1) -> (~C, 2), where ~A represents the address of node A as a 32-bit offset from the “anchor” and the second tuple element (e.g. 0) indicates the version of the node. For simplicity, assume a single core system: thread T1 is performing pop and before CAS(&self.head, (~A, 0), (~B, 1)) is executed a context switch occurs and the core resumes T2. T2 pops the stack twice and pushes A back into the stack; because the pop operation increases the version the stack ends in the following state: `(~A,

  1. -> (~C, 2). Now if T1 is resumed the CAS operation will fail because self.headis(~A, 1)and not(~A, 0)`.

When can version tags fail to prevent ABA? Using the previous example: if T2 performs a push followed by a pop (1 << 32) - 1 times before doing its original pop - pop - push operation then ABA will occur because the version tag of node A will wraparound to its original value of 0 and the CAS operation in T1 will succeed and corrupt the stack.

It does seem unlikely that (1) a thread will perform the above operation and (2) that the above operation will complete within one time slice, assuming time sliced threads. If you have thread priorities then the above operation could occur during the lifetime of many high priorities threads if T1 is running at low priority.

Other implementations of version tags use more than 32 bits in their tags (e.g. “Scalable Lock-Free Dynamic Memory Allocation” uses 42-bit tags in its super blocks). In theory, one could use double-word CAS on x86_64 to pack a 64-bit tag and a 64-bit pointer in a double-word but this CAS operation is not exposed in the standard library (and I think it’s not available on older x86_64 processors?)

(*) Apparently one can emulate proper LL/SC semantics on x86_64 using hazard pointers (?) – the technique appears to be documented in “ABA Prevention Using Single-Word Instructions”, which is not public AFAICT – but hazard pointers require Thread Local Storage (TLS), which is a non-starter for a no_std library like heapless.

§x86_64 Limitations

NOTE this limitation does not apply to x86 (32-bit address space). If you run into this issue, on an x86_64 processor try running your code compiled for x86, e.g. cargo run --target i686-unknown-linux-musl

Because stack nodes must be located within +- 2 GB of the hidden ANCHOR variable, which lives in the .bss section, Pool may not be able to manage static references created using Box::leak – these heap allocated chunks of memory may live in a very different address space. When the Pool is unable to manage a node because of its address it will simply discard it: Pool::grow* methods return the number of new memory blocks added to the pool; if these methods return 0 it means the Pool is unable to manage the memory given to them.

§References

  1. Cortex-M3 Devices Generic User Guide (DUI 0552A), Section 2.2.7 “Synchronization primitives”
  1. ARMv7-M Architecture Reference Manual (DDI 0403E.b), Section A3.4 “Synchronization and semaphores”
  1. “Scalable Lock-Free Dynamic Memory Allocation” Michael, Maged M.

  2. “Hazard pointers: Safe memory reclamation for lock-free objects.” Michael, Maged M.

Modules§

Structs§

Enums§

  • Initialized type state
  • Uninitialized type state