CVE-2024-50226
Publication date:
09/11/2024
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown<br />
<br />
In support of investigating an initialization failure report [1],<br />
cxl_test was updated to register mock memory-devices after the mock<br />
root-port/bus device had been registered. That led to cxl_test crashing<br />
with a use-after-free bug with the following signature:<br />
<br />
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1<br />
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1<br />
cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0<br />
1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1<br />
[..]<br />
cxld_unregister: cxl decoder14.0:<br />
cxl_region_decode_reset: cxl_region region3:<br />
mock_decoder_reset: cxl_port port3: decoder3.0 reset<br />
2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1<br />
cxl_endpoint_decoder_release: cxl decoder14.0:<br />
[..]<br />
cxld_unregister: cxl decoder7.0:<br />
3) cxl_region_decode_reset: cxl_region region3:<br />
Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI<br />
[..]<br />
RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]<br />
[..]<br />
Call Trace:<br />
<br />
cxl_region_decode_reset+0x69/0x190 [cxl_core]<br />
cxl_region_detach+0xe8/0x210 [cxl_core]<br />
cxl_decoder_kill_region+0x27/0x40 [cxl_core]<br />
cxld_unregister+0x5d/0x60 [cxl_core]<br />
<br />
At 1) a region has been established with 2 endpoint decoders (7.0 and<br />
14.0). Those endpoints share a common switch-decoder in the topology<br />
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits<br />
the "out of order reset case" in the switch decoder. The effect though<br />
is that region3 cleanup is aborted leaving it in-tact and<br />
referencing decoder14.0. At 3) the second attempt to teardown region3<br />
trips over the stale decoder14.0 object which has long since been<br />
deleted.<br />
<br />
The fix here is to recognize that the CXL specification places no<br />
mandate on in-order shutdown of switch-decoders, the driver enforces<br />
in-order allocation, and hardware enforces in-order commit. So, rather<br />
than fail and leave objects dangling, always remove them.<br />
<br />
In support of making cxl_region_decode_reset() always succeed,<br />
cxl_region_invalidate_memregion() failures are turned into warnings.<br />
Crashing the kernel is ok there since system integrity is at risk if<br />
caches cannot be managed around physical address mutation events like<br />
CXL region destruction.<br />
<br />
A new device_for_each_child_reverse_from() is added to cleanup<br />
port->commit_end after all dependent decoders have been disabled. In<br />
other words if decoders are allocated 0->1->2 and disabled 1->2->0 then<br />
port->commit_end only decrements from 2 after 2 has been disabled, and<br />
it decrements all the way to zero since 1 was disabled previously.
Severity CVSS v4.0: Pending analysis
Last modification:
11/12/2024