CVE-2023-53178
Gravedad:
Pendiente de análisis
Tipo:
No Disponible / Otro tipo
Fecha de publicación:
15/09/2025
Última modificación:
15/09/2025
Descripción
*** Pendiente de traducción *** In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
mm: fix zswap writeback race condition<br />
<br />
The zswap writeback mechanism can cause a race condition resulting in<br />
memory corruption, where a swapped out page gets swapped in with data that<br />
was written to a different page.<br />
<br />
The race unfolds like this:<br />
1. a page with data A and swap offset X is stored in zswap<br />
2. page A is removed off the LRU by zpool driver for writeback in<br />
zswap-shrink work, data for A is mapped by zpool driver<br />
3. user space program faults and invalidates page entry A, offset X is<br />
considered free<br />
4. kswapd stores page B at offset X in zswap (zswap could also be<br />
full, if so, page B would then be IOed to X, then skip step 5.)<br />
5. entry A is replaced by B in tree->rbroot, this doesn&#39;t affect the<br />
local reference held by zswap-shrink work<br />
6. zswap-shrink work writes back A at X, and frees zswap entry A<br />
7. swapin of slot X brings A in memory instead of B<br />
<br />
The fix:<br />
Once the swap page cache has been allocated (case ZSWAP_SWAPCACHE_NEW),<br />
zswap-shrink work just checks that the local zswap_entry reference is<br />
still the same as the one in the tree. If it&#39;s not the same it means that<br />
it&#39;s either been invalidated or replaced, in both cases the writeback is<br />
aborted because the local entry contains stale data.<br />
<br />
Reproducer:<br />
I originally found this by running `stress` overnight to validate my work<br />
on the zswap writeback mechanism, it manifested after hours on my test<br />
machine. The key to make it happen is having zswap writebacks, so<br />
whatever setup pumps /sys/kernel/debug/zswap/written_back_pages should do<br />
the trick.<br />
<br />
In order to reproduce this faster on a vm, I setup a system with ~100M of<br />
available memory and a 500M swap file, then running `stress --vm 1<br />
--vm-bytes 300000000 --vm-stride 4000` makes it happen in matter of tens<br />
of minutes. One can speed things up even more by swinging<br />
/sys/module/zswap/parameters/max_pool_percent up and down between, say, 20<br />
and 1; this makes it reproduce in tens of seconds. It&#39;s crucial to set<br />
`--vm-stride` to something other than 4096 otherwise `stress` won&#39;t<br />
realize that memory has been corrupted because all pages would have the<br />
same data.