CVE-2021-46925
Severity CVSS v4.0:
Pending analysis
Type:
CWE-362
Concurrent Execution using Shared Resource with Improper Synchronization ('Race Condition')
Publication date:
27/02/2024
Last modified:
29/10/2024
Description
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
net/smc: fix kernel panic caused by race of smc_sock<br />
<br />
A crash occurs when smc_cdc_tx_handler() tries to access smc_sock<br />
but smc_release() has already freed it.<br />
<br />
[ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88<br />
[ 4570.696048] #PF: supervisor write access in kernel mode<br />
[ 4570.696728] #PF: error_code(0x0002) - not-present page<br />
[ 4570.697401] PGD 0 P4D 0<br />
[ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI<br />
[ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111<br />
[ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0<br />
[ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30<br />
<br />
[ 4570.711446] Call Trace:<br />
[ 4570.711746] <br />
[ 4570.711992] smc_cdc_tx_handler+0x41/0xc0<br />
[ 4570.712470] smc_wr_tx_tasklet_fn+0x213/0x560<br />
[ 4570.712981] ? smc_cdc_tx_dismisser+0x10/0x10<br />
[ 4570.713489] tasklet_action_common.isra.17+0x66/0x140<br />
[ 4570.714083] __do_softirq+0x123/0x2f4<br />
[ 4570.714521] irq_exit_rcu+0xc4/0xf0<br />
[ 4570.714934] common_interrupt+0xba/0xe0<br />
<br />
Though smc_cdc_tx_handler() checked the existence of smc connection,<br />
smc_release() may have already dismissed and released the smc socket<br />
before smc_cdc_tx_handler() further visits it.<br />
<br />
smc_cdc_tx_handler() |smc_release()<br />
if (!conn) |<br />
|<br />
|smc_cdc_tx_dismiss_slots()<br />
| smc_cdc_tx_dismisser()<br />
|<br />
|sock_put(&smc->sk) sk) (panic) |<br />
<br />
To make sure we won&#39;t receive any CDC messages after we free the<br />
smc_sock, add a refcount on the smc_connection for inflight CDC<br />
message(posted to the QP but haven&#39;t received related CQE), and<br />
don&#39;t release the smc_connection until all the inflight CDC messages<br />
haven been done, for both success or failed ones.<br />
<br />
Using refcount on CDC messages brings another problem: when the link<br />
is going to be destroyed, smcr_link_clear() will reset the QP, which<br />
then remove all the pending CQEs related to the QP in the CQ. To make<br />
sure all the CQEs will always come back so the refcount on the<br />
smc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced<br />
by smc_ib_modify_qp_error().<br />
And remove the timeout in smc_wr_tx_wait_no_pending_sends() since we<br />
need to wait for all pending WQEs done, or we may encounter use-after-<br />
free when handling CQEs.<br />
<br />
For IB device removal routine, we need to wait for all the QPs on that<br />
device been destroyed before we can destroy CQs on the device, or<br />
the refcount on smc_connection won&#39;t reach 0 and smc_sock cannot be<br />
released.
Impact
Base Score 3.x
4.70
Severity 3.x
MEDIUM
Vulnerable products and versions
CPE | From | Up to |
---|---|---|
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* | 4.11.0 (including) | 5.10.90 (excluding) |
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* | 5.11.0 (including) | 5.15.13 (excluding) |
To consult the complete list of CPE names with products and versions, see this page