CVE-2024-46797
Publication date:
18/09/2024
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
powerpc/qspinlock: Fix deadlock in MCS queue<br />
<br />
If an interrupt occurs in queued_spin_lock_slowpath() after we increment<br />
qnodesp->count and before node->lock is initialized, another CPU might<br />
see stale lock values in get_tail_qnode(). If the stale lock value happens<br />
to match the lock on that CPU, then we write to the "next" pointer of<br />
the wrong qnode. This causes a deadlock as the former CPU, once it becomes<br />
the head of the MCS queue, will spin indefinitely until it&#39;s "next" pointer<br />
is set by its successor in the queue.<br />
<br />
Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results in<br />
occasional lockups similar to the following:<br />
<br />
$ stress-ng --all 128 --vm-bytes 80% --aggressive \<br />
--maximize --oomable --verify --syslog \<br />
--metrics --times --timeout 5m<br />
<br />
watchdog: CPU 15 Hard LOCKUP<br />
......<br />
NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490<br />
LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90<br />
Call Trace:<br />
0xc000002cfffa3bf0 (unreliable)<br />
_raw_spin_lock+0x6c/0x90<br />
raw_spin_rq_lock_nested.part.135+0x4c/0xd0<br />
sched_ttwu_pending+0x60/0x1f0<br />
__flush_smp_call_function_queue+0x1dc/0x670<br />
smp_ipi_demux_relaxed+0xa4/0x100<br />
xive_muxed_ipi_action+0x20/0x40<br />
__handle_irq_event_percpu+0x80/0x240<br />
handle_irq_event_percpu+0x2c/0x80<br />
handle_percpu_irq+0x84/0xd0<br />
generic_handle_irq+0x54/0x80<br />
__do_irq+0xac/0x210<br />
__do_IRQ+0x74/0xd0<br />
0x0<br />
do_IRQ+0x8c/0x170<br />
hardware_interrupt_common_virt+0x29c/0x2a0<br />
--- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490<br />
......<br />
NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490<br />
LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90<br />
--- interrupt: 500<br />
0xc0000029c1a41d00 (unreliable)<br />
_raw_spin_lock+0x6c/0x90<br />
futex_wake+0x100/0x260<br />
do_futex+0x21c/0x2a0<br />
sys_futex+0x98/0x270<br />
system_call_exception+0x14c/0x2f0<br />
system_call_vectored_common+0x15c/0x2ec<br />
<br />
The following code flow illustrates how the deadlock occurs.<br />
For the sake of brevity, assume that both locks (A and B) are<br />
contended and we call the queued_spin_lock_slowpath() function.<br />
<br />
CPU0 CPU1<br />
---- ----<br />
spin_lock_irqsave(A) |<br />
spin_unlock_irqrestore(A) |<br />
spin_lock(B) |<br />
| |<br />
▼ |<br />
id = qnodesp->count++; |<br />
(Note that nodes[0].lock == A) |<br />
| |<br />
▼ |<br />
Interrupt |<br />
(happens before "nodes[0].lock = B") |<br />
| |<br />
▼ |<br />
spin_lock_irqsave(A) |<br />
| |<br />
▼ |<br />
id = qnodesp->count++ |<br />
nodes[1].lock = A |<br />
| |<br />
▼ |<br />
Tail of MCS queue |<br />
| spin_lock_irqsave(A)<br />
▼ |<br />
Head of MCS queue ▼<br />
| CPU0 is previous tail<br />
▼ |<br />
Spin indefinitely ▼<br />
(until "nodes[1].next != NULL") prev = get_tail_qnode(A, CPU0)<br />
|<br />
▼<br />
prev == &qnodes[CPU0].nodes[0]<br />
(as qnodes<br />
---truncated---
Severity CVSS v4.0: Pending analysis
Last modification:
29/09/2024