CVE-2021-47094
Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
04/03/2024
Last modified:
08/04/2025
Description
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
KVM: x86/mmu: Don&#39;t advance iterator after restart due to yielding<br />
<br />
After dropping mmu_lock in the TDP MMU, restart the iterator during<br />
tdp_iter_next() and do not advance the iterator. Advancing the iterator<br />
results in skipping the top-level SPTE and all its children, which is<br />
fatal if any of the skipped SPTEs were not visited before yielding.<br />
<br />
When zapping all SPTEs, i.e. when min_level == root_level, restarting the<br />
iter and then invoking tdp_iter_next() is always fatal if the current gfn<br />
has as a valid SPTE, as advancing the iterator results in try_step_side()<br />
skipping the current gfn, which wasn&#39;t visited before yielding.<br />
<br />
Sprinkle WARNs on iter->yielded being true in various helpers that are<br />
often used in conjunction with yielding, and tag the helper with<br />
__must_check to reduce the probabily of improper usage.<br />
<br />
Failing to zap a top-level SPTE manifests in one of two ways. If a valid<br />
SPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),<br />
the shadow page will be leaked and KVM will WARN accordingly.<br />
<br />
WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm]<br />
RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm]<br />
Call Trace:<br />
<br />
kvm_arch_destroy_vm+0x130/0x1b0 [kvm]<br />
kvm_destroy_vm+0x162/0x2a0 [kvm]<br />
kvm_vcpu_release+0x34/0x60 [kvm]<br />
__fput+0x82/0x240<br />
task_work_run+0x5c/0x90<br />
do_exit+0x364/0xa10<br />
? futex_unqueue+0x38/0x60<br />
do_group_exit+0x33/0xa0<br />
get_signal+0x155/0x850<br />
arch_do_signal_or_restart+0xed/0x750<br />
exit_to_user_mode_prepare+0xc5/0x120<br />
syscall_exit_to_user_mode+0x1d/0x40<br />
do_syscall_64+0x48/0xc0<br />
entry_SYSCALL_64_after_hwframe+0x44/0xae<br />
<br />
If kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped by<br />
kvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form of<br />
marking a struct page as dirty/accessed after it has been put back on the<br />
free list. This directly triggers a WARN due to encountering a page with<br />
page_count() == 0, but it can also lead to data corruption and additional<br />
errors in the kernel.<br />
<br />
WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171<br />
RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm]<br />
Call Trace:<br />
<br />
kvm_set_pfn_dirty+0x120/0x1d0 [kvm]<br />
__handle_changed_spte+0x92e/0xca0 [kvm]<br />
__handle_changed_spte+0x63c/0xca0 [kvm]<br />
__handle_changed_spte+0x63c/0xca0 [kvm]<br />
__handle_changed_spte+0x63c/0xca0 [kvm]<br />
zap_gfn_range+0x549/0x620 [kvm]<br />
kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm]<br />
mmu_free_root_page+0x219/0x2c0 [kvm]<br />
kvm_mmu_free_roots+0x1b4/0x4e0 [kvm]<br />
kvm_mmu_unload+0x1c/0xa0 [kvm]<br />
kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm]<br />
kvm_put_kvm+0x3b1/0x8b0 [kvm]<br />
kvm_vcpu_release+0x4e/0x70 [kvm]<br />
__fput+0x1f7/0x8c0<br />
task_work_run+0xf8/0x1a0<br />
do_exit+0x97b/0x2230<br />
do_group_exit+0xda/0x2a0<br />
get_signal+0x3be/0x1e50<br />
arch_do_signal_or_restart+0x244/0x17f0<br />
exit_to_user_mode_prepare+0xcb/0x120<br />
syscall_exit_to_user_mode+0x1d/0x40<br />
do_syscall_64+0x4d/0x90<br />
entry_SYSCALL_64_after_hwframe+0x44/0xae<br />
<br />
Note, the underlying bug existed even before commit 1af4a96025b3 ("KVM:<br />
x86/mmu: Yield in TDU MMU iter even if no SPTES changed") moved calls to<br />
tdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could still<br />
incorrectly advance past a top-level entry when yielding on a lower-level<br />
entry. But with respect to leaking shadow pages, the bug was introduced<br />
by yielding before processing the current gfn.<br />
<br />
Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, or<br />
callers could jump to their "retry" label. The downside of that approach<br />
is that tdp_mmu_iter_cond_resched() _must_ be called before anything else<br />
in the loop, and there&#39;s no easy way to enfornce that requirement.<br />
<br />
Ideally, KVM would handling the cond_resched() fully within the iterator<br />
macro (the code is actually quite clean) and avoid this entire class of<br />
bugs, but that is extremely difficult do wh<br />
---truncated---
Impact
Base Score 3.x
7.10
Severity 3.x
HIGH
Vulnerable products and versions
| CPE | From | Up to |
|---|---|---|
| cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* | 5.10 (including) | 5.15.12 (excluding) |
| cpe:2.3:o:linux:linux_kernel:5.16:rc1:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:5.16:rc2:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:5.16:rc3:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:5.16:rc4:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:5.16:rc5:*:*:*:*:*:* | ||
| cpe:2.3:o:linux:linux_kernel:5.16:rc6:*:*:*:*:*:* |
To consult the complete list of CPE names with products and versions, see this page



