CVE-2024-53219

Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
27/12/2024
Last modified:
01/10/2025

Description

In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> virtiofs: use pages instead of pointer for kernel direct IO<br /> <br /> When trying to insert a 10MB kernel module kept in a virtio-fs with cache<br /> disabled, the following warning was reported:<br /> <br /> ------------[ cut here ]------------<br /> WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 ......<br /> Modules linked in:<br /> CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123<br /> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......<br /> RIP: 0010:__alloc_pages+0x2bf/0x380<br /> ......<br /> Call Trace:<br /> <br /> ? __warn+0x8e/0x150<br /> ? __alloc_pages+0x2bf/0x380<br /> __kmalloc_large_node+0x86/0x160<br /> __kmalloc+0x33c/0x480<br /> virtio_fs_enqueue_req+0x240/0x6d0<br /> virtio_fs_wake_pending_and_unlock+0x7f/0x190<br /> queue_request_and_unlock+0x55/0x60<br /> fuse_simple_request+0x152/0x2b0<br /> fuse_direct_io+0x5d2/0x8c0<br /> fuse_file_read_iter+0x121/0x160<br /> __kernel_read+0x151/0x2d0<br /> kernel_read+0x45/0x50<br /> kernel_read_file+0x1a9/0x2a0<br /> init_module_from_file+0x6a/0xe0<br /> idempotent_init_module+0x175/0x230<br /> __x64_sys_finit_module+0x5d/0xb0<br /> x64_sys_call+0x1c3/0x9e0<br /> do_syscall_64+0x3d/0xc0<br /> entry_SYSCALL_64_after_hwframe+0x4b/0x53<br /> ......<br /> <br /> ---[ end trace 0000000000000000 ]---<br /> <br /> The warning is triggered as follows:<br /> <br /> 1) syscall finit_module() handles the module insertion and it invokes<br /> kernel_read_file() to read the content of the module first.<br /> <br /> 2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and<br /> passes it to kernel_read(). kernel_read() constructs a kvec iter by<br /> using iov_iter_kvec() and passes it to fuse_file_read_iter().<br /> <br /> 3) virtio-fs disables the cache, so fuse_file_read_iter() invokes<br /> fuse_direct_io(). As for now, the maximal read size for kvec iter is<br /> only limited by fc-&gt;max_read. For virtio-fs, max_read is UINT_MAX, so<br /> fuse_direct_io() doesn&amp;#39;t split the 10MB buffer. It saves the address and<br /> the size of the 10MB-sized buffer in out_args[0] of a fuse request and<br /> passes the fuse request to virtio_fs_wake_pending_and_unlock().<br /> <br /> 4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to<br /> queue the request. Because virtiofs need DMA-able address, so<br /> virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce buffer for<br /> all fuse args, copies these args into the bounce buffer and passed the<br /> physical address of the bounce buffer to virtiofsd. The total length of<br /> these fuse args for the passed fuse request is about 10MB, so<br /> copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter and<br /> it triggers the warning in __alloc_pages():<br /> <br /> if (WARN_ON_ONCE_GFP(order &gt; MAX_PAGE_ORDER, gfp))<br /> return NULL;<br /> <br /> 5) virtio_fs_enqueue_req() will retry the memory allocation in a<br /> kworker, but it won&amp;#39;t help, because kmalloc() will always return NULL<br /> due to the abnormal size and finit_module() will hang forever.<br /> <br /> A feasible solution is to limit the value of max_read for virtio-fs, so<br /> the length passed to kmalloc() will be limited. However it will affect<br /> the maximal read size for normal read. And for virtio-fs write initiated<br /> from kernel, it has the similar problem but now there is no way to limit<br /> fc-&gt;max_write in kernel.<br /> <br /> So instead of limiting both the values of max_read and max_write in<br /> kernel, introducing use_pages_for_kvec_io in fuse_conn and setting it as<br /> true in virtiofs. When use_pages_for_kvec_io is enabled, fuse will use<br /> pages instead of pointer to pass the KVEC_IO data.<br /> <br /> After switching to pages for KVEC_IO data, these pages will be used for<br /> DMA through virtio-fs. If these pages are backed by vmalloc(),<br /> {flush|invalidate}_kernel_vmap_range() are necessary to flush or<br /> invalidate the cache before the DMA operation. So add two new fields in<br /> fuse_args_pages to record the base address of vmalloc area and the<br /> condition indicating whether invalidation is needed. Perform the flush<br /> in fuse_get_user_pages() for write operations and the invalidation in<br /> fuse_release_user_pages() for read operations.<br /> <br /> It may seem necessary to introduce another fie<br /> ---truncated---

Vulnerable products and versions

CPE From Up to
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 5.4 (including) 6.11.11 (excluding)
cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:* 6.12 (including) 6.12.2 (excluding)