CVE-2024-41009

Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
17/07/2024
Last modified:
03/11/2025

Description

In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> bpf: Fix overrunning reservations in ringbuf<br /> <br /> The BPF ring buffer internally is implemented as a power-of-2 sized circular<br /> buffer, with two logical and ever-increasing counters: consumer_pos is the<br /> consumer counter to show which logical position the consumer consumed the<br /> data, and producer_pos which is the producer counter denoting the amount of<br /> data reserved by all producers.<br /> <br /> Each time a record is reserved, the producer that "owns" the record will<br /> successfully advance producer counter. In user space each time a record is<br /> read, the consumer of the data advanced the consumer counter once it finished<br /> processing. Both counters are stored in separate pages so that from user<br /> space, the producer counter is read-only and the consumer counter is read-write.<br /> <br /> One aspect that simplifies and thus speeds up the implementation of both<br /> producers and consumers is how the data area is mapped twice contiguously<br /> back-to-back in the virtual memory, allowing to not take any special measures<br /> for samples that have to wrap around at the end of the circular buffer data<br /> area, because the next page after the last data page would be first data page<br /> again, and thus the sample will still appear completely contiguous in virtual<br /> memory.<br /> <br /> Each record has a struct bpf_ringbuf_hdr { u32 len; u32 pg_off; } header for<br /> book-keeping the length and offset, and is inaccessible to the BPF program.<br /> Helpers like bpf_ringbuf_reserve() return `(void *)hdr + BPF_RINGBUF_HDR_SZ`<br /> for the BPF program to use. Bing-Jhong and Muhammad reported that it is however<br /> possible to make a second allocated memory chunk overlapping with the first<br /> chunk and as a result, the BPF program is now able to edit first chunk&amp;#39;s<br /> header.<br /> <br /> For example, consider the creation of a BPF_MAP_TYPE_RINGBUF map with size<br /> of 0x4000. Next, the consumer_pos is modified to 0x3000 /before/ a call to<br /> bpf_ringbuf_reserve() is made. This will allocate a chunk A, which is in<br /> [0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets<br /> allocate a chunk B with size 0x3000. This will succeed because consumer_pos<br /> was edited ahead of time to pass the `new_prod_pos - cons_pos &gt; rb-&gt;mask`<br /> check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able<br /> to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned<br /> earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data<br /> pages. This means that chunk B at [0x4000,0x4008] is chunk A&amp;#39;s header.<br /> bpf_ringbuf_submit() / bpf_ringbuf_discard() use the header&amp;#39;s pg_off to then<br /> locate the bpf_ringbuf itself via bpf_ringbuf_restore_from_rec(). Once chunk<br /> B modified chunk A&amp;#39;s header, then bpf_ringbuf_commit() refers to the wrong<br /> page and could cause a crash.<br /> <br /> Fix it by calculating the oldest pending_pos and check whether the range<br /> from the oldest outstanding record to the newest would span beyond the ring<br /> buffer size. If that is the case, then reject the request. We&amp;#39;ve tested with<br /> the ring buffer benchmark in BPF selftests (./benchs/run_bench_ringbufs.sh)<br /> before/after the fix and while it seems a bit slower on some benchmarks, it<br /> is still not significantly enough to matter.

Vulnerable products and versions

CPE From Up to
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 5.8 (including) 6.1.97 (excluding)
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.2 (including) 6.6.37 (excluding)
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.7 (including) 6.9.8 (excluding)