Instituto Nacional de ciberseguridad. Sección Incibe
Instituto Nacional de Ciberseguridad. Sección INCIBE-CERT

CVE-2023-53178

Gravedad:
Pendiente de análisis
Tipo:
No Disponible / Otro tipo
Fecha de publicación:
15/09/2025
Última modificación:
15/09/2025

Descripción

*** Pendiente de traducción *** In the Linux kernel, the following vulnerability has been resolved:<br /> <br /> mm: fix zswap writeback race condition<br /> <br /> The zswap writeback mechanism can cause a race condition resulting in<br /> memory corruption, where a swapped out page gets swapped in with data that<br /> was written to a different page.<br /> <br /> The race unfolds like this:<br /> 1. a page with data A and swap offset X is stored in zswap<br /> 2. page A is removed off the LRU by zpool driver for writeback in<br /> zswap-shrink work, data for A is mapped by zpool driver<br /> 3. user space program faults and invalidates page entry A, offset X is<br /> considered free<br /> 4. kswapd stores page B at offset X in zswap (zswap could also be<br /> full, if so, page B would then be IOed to X, then skip step 5.)<br /> 5. entry A is replaced by B in tree-&gt;rbroot, this doesn&amp;#39;t affect the<br /> local reference held by zswap-shrink work<br /> 6. zswap-shrink work writes back A at X, and frees zswap entry A<br /> 7. swapin of slot X brings A in memory instead of B<br /> <br /> The fix:<br /> Once the swap page cache has been allocated (case ZSWAP_SWAPCACHE_NEW),<br /> zswap-shrink work just checks that the local zswap_entry reference is<br /> still the same as the one in the tree. If it&amp;#39;s not the same it means that<br /> it&amp;#39;s either been invalidated or replaced, in both cases the writeback is<br /> aborted because the local entry contains stale data.<br /> <br /> Reproducer:<br /> I originally found this by running `stress` overnight to validate my work<br /> on the zswap writeback mechanism, it manifested after hours on my test<br /> machine. The key to make it happen is having zswap writebacks, so<br /> whatever setup pumps /sys/kernel/debug/zswap/written_back_pages should do<br /> the trick.<br /> <br /> In order to reproduce this faster on a vm, I setup a system with ~100M of<br /> available memory and a 500M swap file, then running `stress --vm 1<br /> --vm-bytes 300000000 --vm-stride 4000` makes it happen in matter of tens<br /> of minutes. One can speed things up even more by swinging<br /> /sys/module/zswap/parameters/max_pool_percent up and down between, say, 20<br /> and 1; this makes it reproduce in tens of seconds. It&amp;#39;s crucial to set<br /> `--vm-stride` to something other than 4096 otherwise `stress` won&amp;#39;t<br /> realize that memory has been corrupted because all pages would have the<br /> same data.

Impacto