CVE-2022-49093
Publication date:
26/02/2025
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
skbuff: fix coalescing for page_pool fragment recycling<br />
<br />
Fix a use-after-free when using page_pool with page fragments. We<br />
encountered this problem during normal RX in the hns3 driver:<br />
<br />
(1) Initially we have three descriptors in the RX queue. The first one<br />
allocates PAGE1 through page_pool, and the other two allocate one<br />
half of PAGE2 each. Page references look like this:<br />
<br />
RX_BD1 _______ PAGE1<br />
RX_BD2 _______ PAGE2<br />
RX_BD3 _________/<br />
<br />
(2) Handle RX on the first descriptor. Allocate SKB1, eventually added<br />
to the receive queue by tcp_queue_rcv().<br />
<br />
(3) Handle RX on the second descriptor. Allocate SKB2 and pass it to<br />
netif_receive_skb():<br />
<br />
netif_receive_skb(SKB2)<br />
ip_rcv(SKB2)<br />
SKB3 = skb_clone(SKB2)<br />
<br />
SKB2 and SKB3 share a reference to PAGE2 through<br />
skb_shinfo()->dataref. The other ref to PAGE2 is still held by<br />
RX_BD3:<br />
<br />
SKB2 ---+- PAGE2<br />
SKB3 __/ /<br />
RX_BD3 _________/<br />
<br />
(3b) Now while handling TCP, coalesce SKB3 with SKB1:<br />
<br />
tcp_v4_rcv(SKB3)<br />
tcp_try_coalesce(to=SKB1, from=SKB3) // succeeds<br />
kfree_skb_partial(SKB3)<br />
skb_release_data(SKB3) // drops one dataref<br />
<br />
SKB1 _____ PAGE1<br />
\____<br />
SKB2 _____ PAGE2<br />
/<br />
RX_BD3 _________/<br />
<br />
In skb_try_coalesce(), __skb_frag_ref() takes a page reference to<br />
PAGE2, where it should instead have increased the page_pool frag<br />
reference, pp_frag_count. Without coalescing, when releasing both<br />
SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now<br />
when releasing SKB1 and SKB2, two references to PAGE2 will be<br />
dropped, resulting in underflow.<br />
<br />
(3c) Drop SKB2:<br />
<br />
af_packet_rcv(SKB2)<br />
consume_skb(SKB2)<br />
skb_release_data(SKB2) // drops second dataref<br />
page_pool_return_skb_page(PAGE2) // drops one pp_frag_count<br />
<br />
SKB1 _____ PAGE1<br />
\____<br />
PAGE2<br />
/<br />
RX_BD3 _________/<br />
<br />
(4) Userspace calls recvmsg()<br />
Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we<br />
release the SKB3 page as well:<br />
<br />
tcp_eat_recv_skb(SKB1)<br />
skb_release_data(SKB1)<br />
page_pool_return_skb_page(PAGE1)<br />
page_pool_return_skb_page(PAGE2) // drops second pp_frag_count<br />
<br />
(5) PAGE2 is freed, but the third RX descriptor was still using it!<br />
In our case this causes IOMMU faults, but it would silently corrupt<br />
memory if the IOMMU was disabled.<br />
<br />
Change the logic that checks whether pp_recycle SKBs can be coalesced.<br />
We still reject differing pp_recycle between &#39;from&#39; and &#39;to&#39; SKBs, but<br />
in order to avoid the situation described above, we also reject<br />
coalescing when both &#39;from&#39; and &#39;to&#39; are pp_recycled and &#39;from&#39; is<br />
cloned.<br />
<br />
The new logic allows coalescing a cloned pp_recycle SKB into a page<br />
refcounted one, because in this case the release (4) will drop the right<br />
reference, the one taken by skb_try_coalesce().
Severity CVSS v4.0: Pending analysis
Last modification:
25/03/2025