CVE-2025-37988
Severity CVSS v4.0:
Pending analysis
Type:
Unavailable / Other
Publication date:
20/05/2025
Last modified:
21/05/2025
Description
In the Linux kernel, the following vulnerability has been resolved:<br />
<br />
fix a couple of races in MNT_TREE_BENEATH handling by do_move_mount()<br />
<br />
Normally do_lock_mount(path, _) is locking a mountpoint pinned by<br />
*path and at the time when matching unlock_mount() unlocks that<br />
location it is still pinned by the same thing.<br />
<br />
Unfortunately, for &#39;beneath&#39; case it&#39;s no longer that simple -<br />
the object being locked is not the one *path points to. It&#39;s the<br />
mountpoint of path->mnt. The thing is, without sufficient locking<br />
->mnt_parent may change under us and none of the locks are held<br />
at that point. The rules are<br />
* mount_lock stabilizes m->mnt_parent for any mount m.<br />
* namespace_sem stabilizes m->mnt_parent, provided that<br />
m is mounted.<br />
* if either of the above holds and refcount of m is positive,<br />
we are guaranteed the same for refcount of m->mnt_parent.<br />
<br />
namespace_sem nests inside inode_lock(), so do_lock_mount() has<br />
to take inode_lock() before grabbing namespace_sem. It does<br />
recheck that path->mnt is still mounted in the same place after<br />
getting namespace_sem, and it does take care to pin the dentry.<br />
It is needed, since otherwise we might end up with racing mount --move<br />
(or umount) happening while we were getting locks; in that case<br />
dentry would no longer be a mountpoint and could&#39;ve been evicted<br />
on memory pressure along with its inode - not something you want<br />
when grabbing lock on that inode.<br />
<br />
However, pinning a dentry is not enough - the matching mount is<br />
also pinned only by the fact that path->mnt is mounted on top it<br />
and at that point we are not holding any locks whatsoever, so<br />
the same kind of races could end up with all references to<br />
that mount gone just as we are about to enter inode_lock().<br />
If that happens, we are left with filesystem being shut down while<br />
we are holding a dentry reference on it; results are not pretty.<br />
<br />
What we need to do is grab both dentry and mount at the same time;<br />
that makes inode_lock() safe *and* avoids the problem with fs getting<br />
shut down under us. After taking namespace_sem we verify that<br />
path->mnt is still mounted (which stabilizes its ->mnt_parent) and<br />
check that it&#39;s still mounted at the same place. From that point<br />
on to the matching namespace_unlock() we are guaranteed that<br />
mount/dentry pair we&#39;d grabbed are also pinned by being the mountpoint<br />
of path->mnt, so we can quietly drop both the dentry reference (as<br />
the current code does) and mnt one - it&#39;s OK to do under namespace_sem,<br />
since we are not dropping the final refs.<br />
<br />
That solves the problem on do_lock_mount() side; unlock_mount()<br />
also has one, since dentry is guaranteed to stay pinned only until<br />
the namespace_unlock(). That&#39;s easy to fix - just have inode_unlock()<br />
done earlier, while it&#39;s still pinned by mp->m_dentry.