CVE-2025-38389
Publication date:
25/07/2025
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
drm/i915/gt: Fix timeline left held on VMA alloc error<br />
<br />
The following error has been reported sporadically by CI when a test<br />
unbinds the i915 driver on a ring submission platform:<br />
<br />
[239.330153] ------------[ cut here ]------------<br />
[239.330166] i915 0000:00:02.0: [drm] drm_WARN_ON(dev_priv->mm.shrink_count)<br />
[239.330196] WARNING: CPU: 1 PID: 18570 at drivers/gpu/drm/i915/i915_gem.c:1309 i915_gem_cleanup_early+0x13e/0x150 [i915]<br />
...<br />
[239.330640] RIP: 0010:i915_gem_cleanup_early+0x13e/0x150 [i915]<br />
...<br />
[239.330942] Call Trace:<br />
[239.330944] <br />
[239.330949] i915_driver_late_release+0x2b/0xa0 [i915]<br />
[239.331202] i915_driver_release+0x86/0xa0 [i915]<br />
[239.331482] devm_drm_dev_init_release+0x61/0x90<br />
[239.331494] devm_action_release+0x15/0x30<br />
[239.331504] release_nodes+0x3d/0x120<br />
[239.331517] devres_release_all+0x96/0xd0<br />
[239.331533] device_unbind_cleanup+0x12/0x80<br />
[239.331543] device_release_driver_internal+0x23a/0x280<br />
[239.331550] ? bus_find_device+0xa5/0xe0<br />
[239.331563] device_driver_detach+0x14/0x20<br />
...<br />
[357.719679] ---[ end trace 0000000000000000 ]---<br />
<br />
If the test also unloads the i915 module then that&#39;s followed with:<br />
<br />
[357.787478] =============================================================================<br />
[357.788006] BUG i915_vma (Tainted: G U W N ): Objects remaining on __kmem_cache_shutdown()<br />
[357.788031] -----------------------------------------------------------------------------<br />
[357.788204] Object 0xffff888109e7f480 @offset=29824<br />
[357.788670] Allocated in i915_vma_instance+0xee/0xc10 [i915] age=292729 cpu=4 pid=2244<br />
[357.788994] i915_vma_instance+0xee/0xc10 [i915]<br />
[357.789290] init_status_page+0x7b/0x420 [i915]<br />
[357.789532] intel_engines_init+0x1d8/0x980 [i915]<br />
[357.789772] intel_gt_init+0x175/0x450 [i915]<br />
[357.790014] i915_gem_init+0x113/0x340 [i915]<br />
[357.790281] i915_driver_probe+0x847/0xed0 [i915]<br />
[357.790504] i915_pci_probe+0xe6/0x220 [i915]<br />
...<br />
<br />
Closer analysis of CI results history has revealed a dependency of the<br />
error on a few IGT tests, namely:<br />
- igt@api_intel_allocator@fork-simple-stress-signal,<br />
- igt@api_intel_allocator@two-level-inception-interruptible,<br />
- igt@gem_linear_blits@interruptible,<br />
- igt@prime_mmap_coherency@ioctl-errors,<br />
which invisibly trigger the issue, then exhibited with first driver unbind<br />
attempt.<br />
<br />
All of the above tests perform actions which are actively interrupted with<br />
signals. Further debugging has allowed to narrow that scope down to<br />
DRM_IOCTL_I915_GEM_EXECBUFFER2, and ring_context_alloc(), specific to ring<br />
submission, in particular.<br />
<br />
If successful then that function, or its execlists or GuC submission<br />
equivalent, is supposed to be called only once per GEM context engine,<br />
followed by raise of a flag that prevents the function from being called<br />
again. The function is expected to unwind its internal errors itself, so<br />
it may be safely called once more after it returns an error.<br />
<br />
In case of ring submission, the function first gets a reference to the<br />
engine&#39;s legacy timeline and then allocates a VMA. If the VMA allocation<br />
fails, e.g. when i915_vma_instance() called from inside is interrupted<br />
with a signal, then ring_context_alloc() fails, leaving the timeline held<br />
referenced. On next I915_GEM_EXECBUFFER2 IOCTL, another reference to the<br />
timeline is got, and only that last one is put on successful completion.<br />
As a consequence, the legacy timeline, with its underlying engine status<br />
page&#39;s VMA object, is still held and not released on driver unbind.<br />
<br />
Get the legacy timeline only after successful allocation of the context<br />
engine&#39;s VMA.<br />
<br />
v2: Add a note on other submission methods (Krzysztof Karas):<br />
Both execlists and GuC submission use lrc_alloc() which seems free<br />
from a similar issue.<br />
<br />
(cherry picked from commit cc43422b3cc79eacff4c5a8ba0d224688ca9dd4f)
Severity CVSS v4.0: Pending analysis
Last modification:
16/12/2025