CVE-2022-49781
Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
01/05/2025
Last modified:
02/05/2025
Description
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
perf/x86/amd: Fix crash due to race between amd_pmu_enable_all, perf NMI and throttling<br />
<br />
amd_pmu_enable_all() does:<br />
<br />
if (!test_bit(idx, cpuc->active_mask))<br />
continue;<br />
<br />
amd_pmu_enable_event(cpuc->events[idx]);<br />
<br />
A perf NMI of another event can come between these two steps. Perf NMI<br />
handler internally disables and enables _all_ events, including the one<br />
which nmi-intercepted amd_pmu_enable_all() was in process of enabling.<br />
If that unintentionally enabled event has very low sampling period and<br />
causes immediate successive NMI, causing the event to be throttled,<br />
cpuc->events[idx] and cpuc->active_mask gets cleared by x86_pmu_stop().<br />
This will result in amd_pmu_enable_event() getting called with event=NULL<br />
when amd_pmu_enable_all() resumes after handling the NMIs. This causes a<br />
kernel crash:<br />
<br />
BUG: kernel NULL pointer dereference, address: 0000000000000198<br />
#PF: supervisor read access in kernel mode<br />
#PF: error_code(0x0000) - not-present page<br />
[...]<br />
Call Trace:<br />
<br />
amd_pmu_enable_all+0x68/0xb0<br />
ctx_resched+0xd9/0x150<br />
event_function+0xb8/0x130<br />
? hrtimer_start_range_ns+0x141/0x4a0<br />
? perf_duration_warn+0x30/0x30<br />
remote_function+0x4d/0x60<br />
__flush_smp_call_function_queue+0xc4/0x500<br />
flush_smp_call_function_queue+0x11d/0x1b0<br />
do_idle+0x18f/0x2d0<br />
cpu_startup_entry+0x19/0x20<br />
start_secondary+0x121/0x160<br />
secondary_startup_64_no_verify+0xe5/0xeb<br />
<br />
<br />
amd_pmu_disable_all()/amd_pmu_enable_all() calls inside perf NMI handler<br />
were recently added as part of BRS enablement but I&#39;m not sure whether<br />
we really need them. We can just disable BRS in the beginning and enable<br />
it back while returning from NMI. This will solve the issue by not<br />
enabling those events whose active_masks are set but are not yet enabled<br />
in hw pmu.