linux-stable/fs/notify/notification.c

194 lines
5.7 KiB
C
Raw Permalink Normal View History

// SPDX-License-Identifier: GPL-2.0-or-later
fsnotify: unified filesystem notification backend fsnotify is a backend for filesystem notification. fsnotify does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify. fsnotify can be extended to be the backend for inotify or the upcoming fanotify. fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing. fsnotify has a number of benefits, the first being actually shrinking the size of an inode. Before fsnotify to support both dnotify and inotify an inode had unsigned long i_dnotify_mask; /* Directory notify events */ struct dnotify_struct *i_dnotify; /* for directory notifications */ struct list_head inotify_watches; /* watches on this inode */ struct mutex inotify_mutex; /* protects the watches list But with fsnotify this same functionallity (and more) is done with just __u32 i_fsnotify_mask; /* all events for this inode */ struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */ That's right, inotify, dnotify, and fanotify all in 64 bits. We used that much space just in inotify_watches alone, before this patch set. fsnotify object lifetime and locking is MUCH better than what we have today. inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of what's been busted since inception. inotify needs to know internal semantics of superblock destruction and unmounting to function. The inode pinning and vfs contortions are horrible. no fsnotify implementers do allocation under locks. This means things like f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to GFP_NOFS can be reverted. There are no longer any allocation rules when using or implementing your own fsnotify listener. fsnotify paves the way for fanotify. In brief fanotify is a notification mechanism that delivers the lisener both an 'event' and an open file descriptor to the object in question. This means that fanotify is pathname agnostic. Some on lkml may not care for the original companies or users that pushed for TALPA, but fanotify was designed with flexibility and input for other users in mind. The readahead group expressed interest in fanotify as it could be used to profile disk access on boot without breaking the audit system. The desktop search groups have also expressed interest in fanotify as it solves a number of the race conditions and problems present with managing inotify when more than a limited number of specific files are of interest. fanotify can provide for a userspace access control system which makes it a clean interface for AV vendors to hook without trying to do binary patching on the syscall table, LSM, and everywhere else they do their things today. With this patch series fanotify can be implemented in less than 1200 lines of easy to review code. Almost all of which is the socket based user interface. This patch series builds fsnotify to the point that it can implement dnotify and inotify_user. Patches exist and will be sent soon after acceptance to finish the in kernel inotify conversion (audit) and implement fanotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
2009-05-21 21:01:20 +00:00
/*
* Copyright (C) 2008 Red Hat, Inc., Eric Paris <eparis@redhat.com>
*/
/*
* Basic idea behind the notification queue: An fsnotify group (like inotify)
* sends the userspace notification about events asynchronously some time after
* the event happened. When inotify gets an event it will need to add that
* event to the group notify queue. Since a single event might need to be on
* multiple group's notification queues we can't add the event directly to each
* queue and instead add a small "event_holder" to each queue. This event_holder
* has a pointer back to the original event. Since the majority of events are
* going to end up on one, and only one, notification queue we embed one
* event_holder into each event. This means we have a single allocation instead
* of always needing two. If the embedded event_holder is already in use by
* another group a new event_holder (from fsnotify_event_holder_cachep) will be
* allocated and used.
*/
fsnotify: unified filesystem notification backend fsnotify is a backend for filesystem notification. fsnotify does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify. fsnotify can be extended to be the backend for inotify or the upcoming fanotify. fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing. fsnotify has a number of benefits, the first being actually shrinking the size of an inode. Before fsnotify to support both dnotify and inotify an inode had unsigned long i_dnotify_mask; /* Directory notify events */ struct dnotify_struct *i_dnotify; /* for directory notifications */ struct list_head inotify_watches; /* watches on this inode */ struct mutex inotify_mutex; /* protects the watches list But with fsnotify this same functionallity (and more) is done with just __u32 i_fsnotify_mask; /* all events for this inode */ struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */ That's right, inotify, dnotify, and fanotify all in 64 bits. We used that much space just in inotify_watches alone, before this patch set. fsnotify object lifetime and locking is MUCH better than what we have today. inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of what's been busted since inception. inotify needs to know internal semantics of superblock destruction and unmounting to function. The inode pinning and vfs contortions are horrible. no fsnotify implementers do allocation under locks. This means things like f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to GFP_NOFS can be reverted. There are no longer any allocation rules when using or implementing your own fsnotify listener. fsnotify paves the way for fanotify. In brief fanotify is a notification mechanism that delivers the lisener both an 'event' and an open file descriptor to the object in question. This means that fanotify is pathname agnostic. Some on lkml may not care for the original companies or users that pushed for TALPA, but fanotify was designed with flexibility and input for other users in mind. The readahead group expressed interest in fanotify as it could be used to profile disk access on boot without breaking the audit system. The desktop search groups have also expressed interest in fanotify as it solves a number of the race conditions and problems present with managing inotify when more than a limited number of specific files are of interest. fanotify can provide for a userspace access control system which makes it a clean interface for AV vendors to hook without trying to do binary patching on the syscall table, LSM, and everywhere else they do their things today. With this patch series fanotify can be implemented in less than 1200 lines of easy to review code. Almost all of which is the socket based user interface. This patch series builds fsnotify to the point that it can implement dnotify and inotify_user. Patches exist and will be sent soon after acceptance to finish the in kernel inotify conversion (audit) and implement fanotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
2009-05-21 21:01:20 +00:00
#include <linux/fs.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/list.h>
#include <linux/module.h>
fsnotify: unified filesystem notification backend fsnotify is a backend for filesystem notification. fsnotify does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify. fsnotify can be extended to be the backend for inotify or the upcoming fanotify. fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing. fsnotify has a number of benefits, the first being actually shrinking the size of an inode. Before fsnotify to support both dnotify and inotify an inode had unsigned long i_dnotify_mask; /* Directory notify events */ struct dnotify_struct *i_dnotify; /* for directory notifications */ struct list_head inotify_watches; /* watches on this inode */ struct mutex inotify_mutex; /* protects the watches list But with fsnotify this same functionallity (and more) is done with just __u32 i_fsnotify_mask; /* all events for this inode */ struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */ That's right, inotify, dnotify, and fanotify all in 64 bits. We used that much space just in inotify_watches alone, before this patch set. fsnotify object lifetime and locking is MUCH better than what we have today. inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of what's been busted since inception. inotify needs to know internal semantics of superblock destruction and unmounting to function. The inode pinning and vfs contortions are horrible. no fsnotify implementers do allocation under locks. This means things like f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to GFP_NOFS can be reverted. There are no longer any allocation rules when using or implementing your own fsnotify listener. fsnotify paves the way for fanotify. In brief fanotify is a notification mechanism that delivers the lisener both an 'event' and an open file descriptor to the object in question. This means that fanotify is pathname agnostic. Some on lkml may not care for the original companies or users that pushed for TALPA, but fanotify was designed with flexibility and input for other users in mind. The readahead group expressed interest in fanotify as it could be used to profile disk access on boot without breaking the audit system. The desktop search groups have also expressed interest in fanotify as it solves a number of the race conditions and problems present with managing inotify when more than a limited number of specific files are of interest. fanotify can provide for a userspace access control system which makes it a clean interface for AV vendors to hook without trying to do binary patching on the syscall table, LSM, and everywhere else they do their things today. With this patch series fanotify can be implemented in less than 1200 lines of easy to review code. Almost all of which is the socket based user interface. This patch series builds fsnotify to the point that it can implement dnotify and inotify_user. Patches exist and will be sent soon after acceptance to finish the in kernel inotify conversion (audit) and implement fanotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
2009-05-21 21:01:20 +00:00
#include <linux/mount.h>
#include <linux/mutex.h>
#include <linux/namei.h>
#include <linux/path.h>
#include <linux/slab.h>
#include <linux/spinlock.h>
#include <linux/atomic.h>
fsnotify: unified filesystem notification backend fsnotify is a backend for filesystem notification. fsnotify does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify. fsnotify can be extended to be the backend for inotify or the upcoming fanotify. fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing. fsnotify has a number of benefits, the first being actually shrinking the size of an inode. Before fsnotify to support both dnotify and inotify an inode had unsigned long i_dnotify_mask; /* Directory notify events */ struct dnotify_struct *i_dnotify; /* for directory notifications */ struct list_head inotify_watches; /* watches on this inode */ struct mutex inotify_mutex; /* protects the watches list But with fsnotify this same functionallity (and more) is done with just __u32 i_fsnotify_mask; /* all events for this inode */ struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */ That's right, inotify, dnotify, and fanotify all in 64 bits. We used that much space just in inotify_watches alone, before this patch set. fsnotify object lifetime and locking is MUCH better than what we have today. inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of what's been busted since inception. inotify needs to know internal semantics of superblock destruction and unmounting to function. The inode pinning and vfs contortions are horrible. no fsnotify implementers do allocation under locks. This means things like f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to GFP_NOFS can be reverted. There are no longer any allocation rules when using or implementing your own fsnotify listener. fsnotify paves the way for fanotify. In brief fanotify is a notification mechanism that delivers the lisener both an 'event' and an open file descriptor to the object in question. This means that fanotify is pathname agnostic. Some on lkml may not care for the original companies or users that pushed for TALPA, but fanotify was designed with flexibility and input for other users in mind. The readahead group expressed interest in fanotify as it could be used to profile disk access on boot without breaking the audit system. The desktop search groups have also expressed interest in fanotify as it solves a number of the race conditions and problems present with managing inotify when more than a limited number of specific files are of interest. fanotify can provide for a userspace access control system which makes it a clean interface for AV vendors to hook without trying to do binary patching on the syscall table, LSM, and everywhere else they do their things today. With this patch series fanotify can be implemented in less than 1200 lines of easy to review code. Almost all of which is the socket based user interface. This patch series builds fsnotify to the point that it can implement dnotify and inotify_user. Patches exist and will be sent soon after acceptance to finish the in kernel inotify conversion (audit) and implement fanotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
2009-05-21 21:01:20 +00:00
#include <linux/fsnotify_backend.h>
#include "fsnotify.h"
static atomic_t fsnotify_sync_cookie = ATOMIC_INIT(0);
/**
* fsnotify_get_cookie - return a unique cookie for use in synchronizing events.
* Called from fsnotify_move, which is inlined into filesystem modules.
*/
u32 fsnotify_get_cookie(void)
{
return atomic_inc_return(&fsnotify_sync_cookie);
}
EXPORT_SYMBOL_GPL(fsnotify_get_cookie);
fsnotify: do not share events between notification groups Currently fsnotify framework creates one event structure for each notification event and links this event into all interested notification groups. This is done so that we save memory when several notification groups are interested in the event. However the need for event structure shared between inotify & fanotify bloats the event structure so the result is often higher memory consumption. Another problem is that fsnotify framework keeps path references with outstanding events so that fanotify can return open file descriptors with its events. This has the undesirable effect that filesystem cannot be unmounted while there are outstanding events - a regression for inotify compared to a situation before it was converted to fsnotify framework. For fanotify this problem is hard to avoid and users of fanotify should kind of expect this behavior when they ask for file descriptors from notified files. This patch changes fsnotify and its users to create separate event structure for each group. This allows for much simpler code (~400 lines removed by this patch) and also smaller event structures. For example on 64-bit system original struct fsnotify_event consumes 120 bytes, plus additional space for file name, additional 24 bytes for second and each subsequent group linking the event, and additional 32 bytes for each inotify group for private data. After the conversion inotify event consumes 48 bytes plus space for file name which is considerably less memory unless file names are long and there are several groups interested in the events (both of which are uncommon). Fanotify event fits in 56 bytes after the conversion (fanotify doesn't care about file names so its events don't have to have it allocated). A win unless there are four or more fanotify groups interested in the event. The conversion also solves the problem with unmount when only inotify is used as we don't have to grab path references for inotify events. [hughd@google.com: fanotify: fix corruption preventing startup] Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 23:48:14 +00:00
void fsnotify_destroy_event(struct fsnotify_group *group,
struct fsnotify_event *event)
fsnotify: unified filesystem notification backend fsnotify is a backend for filesystem notification. fsnotify does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify. fsnotify can be extended to be the backend for inotify or the upcoming fanotify. fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing. fsnotify has a number of benefits, the first being actually shrinking the size of an inode. Before fsnotify to support both dnotify and inotify an inode had unsigned long i_dnotify_mask; /* Directory notify events */ struct dnotify_struct *i_dnotify; /* for directory notifications */ struct list_head inotify_watches; /* watches on this inode */ struct mutex inotify_mutex; /* protects the watches list But with fsnotify this same functionallity (and more) is done with just __u32 i_fsnotify_mask; /* all events for this inode */ struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */ That's right, inotify, dnotify, and fanotify all in 64 bits. We used that much space just in inotify_watches alone, before this patch set. fsnotify object lifetime and locking is MUCH better than what we have today. inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of what's been busted since inception. inotify needs to know internal semantics of superblock destruction and unmounting to function. The inode pinning and vfs contortions are horrible. no fsnotify implementers do allocation under locks. This means things like f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to GFP_NOFS can be reverted. There are no longer any allocation rules when using or implementing your own fsnotify listener. fsnotify paves the way for fanotify. In brief fanotify is a notification mechanism that delivers the lisener both an 'event' and an open file descriptor to the object in question. This means that fanotify is pathname agnostic. Some on lkml may not care for the original companies or users that pushed for TALPA, but fanotify was designed with flexibility and input for other users in mind. The readahead group expressed interest in fanotify as it could be used to profile disk access on boot without breaking the audit system. The desktop search groups have also expressed interest in fanotify as it solves a number of the race conditions and problems present with managing inotify when more than a limited number of specific files are of interest. fanotify can provide for a userspace access control system which makes it a clean interface for AV vendors to hook without trying to do binary patching on the syscall table, LSM, and everywhere else they do their things today. With this patch series fanotify can be implemented in less than 1200 lines of easy to review code. Almost all of which is the socket based user interface. This patch series builds fsnotify to the point that it can implement dnotify and inotify_user. Patches exist and will be sent soon after acceptance to finish the in kernel inotify conversion (audit) and implement fanotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
2009-05-21 21:01:20 +00:00
{
fsnotify: do not share events between notification groups Currently fsnotify framework creates one event structure for each notification event and links this event into all interested notification groups. This is done so that we save memory when several notification groups are interested in the event. However the need for event structure shared between inotify & fanotify bloats the event structure so the result is often higher memory consumption. Another problem is that fsnotify framework keeps path references with outstanding events so that fanotify can return open file descriptors with its events. This has the undesirable effect that filesystem cannot be unmounted while there are outstanding events - a regression for inotify compared to a situation before it was converted to fsnotify framework. For fanotify this problem is hard to avoid and users of fanotify should kind of expect this behavior when they ask for file descriptors from notified files. This patch changes fsnotify and its users to create separate event structure for each group. This allows for much simpler code (~400 lines removed by this patch) and also smaller event structures. For example on 64-bit system original struct fsnotify_event consumes 120 bytes, plus additional space for file name, additional 24 bytes for second and each subsequent group linking the event, and additional 32 bytes for each inotify group for private data. After the conversion inotify event consumes 48 bytes plus space for file name which is considerably less memory unless file names are long and there are several groups interested in the events (both of which are uncommon). Fanotify event fits in 56 bytes after the conversion (fanotify doesn't care about file names so its events don't have to have it allocated). A win unless there are four or more fanotify groups interested in the event. The conversion also solves the problem with unmount when only inotify is used as we don't have to grab path references for inotify events. [hughd@google.com: fanotify: fix corruption preventing startup] Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 23:48:14 +00:00
/* Overflow events are per-group and we don't want to free them */
if (!event || event == group->overflow_event)
fsnotify: unified filesystem notification backend fsnotify is a backend for filesystem notification. fsnotify does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify. fsnotify can be extended to be the backend for inotify or the upcoming fanotify. fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing. fsnotify has a number of benefits, the first being actually shrinking the size of an inode. Before fsnotify to support both dnotify and inotify an inode had unsigned long i_dnotify_mask; /* Directory notify events */ struct dnotify_struct *i_dnotify; /* for directory notifications */ struct list_head inotify_watches; /* watches on this inode */ struct mutex inotify_mutex; /* protects the watches list But with fsnotify this same functionallity (and more) is done with just __u32 i_fsnotify_mask; /* all events for this inode */ struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */ That's right, inotify, dnotify, and fanotify all in 64 bits. We used that much space just in inotify_watches alone, before this patch set. fsnotify object lifetime and locking is MUCH better than what we have today. inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of what's been busted since inception. inotify needs to know internal semantics of superblock destruction and unmounting to function. The inode pinning and vfs contortions are horrible. no fsnotify implementers do allocation under locks. This means things like f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to GFP_NOFS can be reverted. There are no longer any allocation rules when using or implementing your own fsnotify listener. fsnotify paves the way for fanotify. In brief fanotify is a notification mechanism that delivers the lisener both an 'event' and an open file descriptor to the object in question. This means that fanotify is pathname agnostic. Some on lkml may not care for the original companies or users that pushed for TALPA, but fanotify was designed with flexibility and input for other users in mind. The readahead group expressed interest in fanotify as it could be used to profile disk access on boot without breaking the audit system. The desktop search groups have also expressed interest in fanotify as it solves a number of the race conditions and problems present with managing inotify when more than a limited number of specific files are of interest. fanotify can provide for a userspace access control system which makes it a clean interface for AV vendors to hook without trying to do binary patching on the syscall table, LSM, and everywhere else they do their things today. With this patch series fanotify can be implemented in less than 1200 lines of easy to review code. Almost all of which is the socket based user interface. This patch series builds fsnotify to the point that it can implement dnotify and inotify_user. Patches exist and will be sent soon after acceptance to finish the in kernel inotify conversion (audit) and implement fanotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
2009-05-21 21:01:20 +00:00
return;
/*
* If the event is still queued, we have a problem... Do an unreliable
* lockless check first to avoid locking in the common case. The
* locking may be necessary for permission events which got removed
* from the list by a different CPU than the one freeing the event.
*/
if (!list_empty(&event->list)) {
spin_lock(&group->notification_lock);
WARN_ON(!list_empty(&event->list));
spin_unlock(&group->notification_lock);
}
group->ops->free_event(group, event);
}
fsnotify: unified filesystem notification backend fsnotify is a backend for filesystem notification. fsnotify does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify. fsnotify can be extended to be the backend for inotify or the upcoming fanotify. fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing. fsnotify has a number of benefits, the first being actually shrinking the size of an inode. Before fsnotify to support both dnotify and inotify an inode had unsigned long i_dnotify_mask; /* Directory notify events */ struct dnotify_struct *i_dnotify; /* for directory notifications */ struct list_head inotify_watches; /* watches on this inode */ struct mutex inotify_mutex; /* protects the watches list But with fsnotify this same functionallity (and more) is done with just __u32 i_fsnotify_mask; /* all events for this inode */ struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */ That's right, inotify, dnotify, and fanotify all in 64 bits. We used that much space just in inotify_watches alone, before this patch set. fsnotify object lifetime and locking is MUCH better than what we have today. inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of what's been busted since inception. inotify needs to know internal semantics of superblock destruction and unmounting to function. The inode pinning and vfs contortions are horrible. no fsnotify implementers do allocation under locks. This means things like f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to GFP_NOFS can be reverted. There are no longer any allocation rules when using or implementing your own fsnotify listener. fsnotify paves the way for fanotify. In brief fanotify is a notification mechanism that delivers the lisener both an 'event' and an open file descriptor to the object in question. This means that fanotify is pathname agnostic. Some on lkml may not care for the original companies or users that pushed for TALPA, but fanotify was designed with flexibility and input for other users in mind. The readahead group expressed interest in fanotify as it could be used to profile disk access on boot without breaking the audit system. The desktop search groups have also expressed interest in fanotify as it solves a number of the race conditions and problems present with managing inotify when more than a limited number of specific files are of interest. fanotify can provide for a userspace access control system which makes it a clean interface for AV vendors to hook without trying to do binary patching on the syscall table, LSM, and everywhere else they do their things today. With this patch series fanotify can be implemented in less than 1200 lines of easy to review code. Almost all of which is the socket based user interface. This patch series builds fsnotify to the point that it can implement dnotify and inotify_user. Patches exist and will be sent soon after acceptance to finish the in kernel inotify conversion (audit) and implement fanotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
2009-05-21 21:01:20 +00:00
/*
* Try to add an event to the notification queue.
* The group can later pull this event off the queue to deal with.
* The group can use the @merge hook to merge the event with a queued event.
* The group can use the @insert hook to insert the event into hash table.
* The function returns:
* 0 if the event was added to a queue
* 1 if the event was merged with some other queued event
* 2 if the event was not queued - either the queue of events has overflown
* or the group is shutting down.
fsnotify: unified filesystem notification backend fsnotify is a backend for filesystem notification. fsnotify does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify. fsnotify can be extended to be the backend for inotify or the upcoming fanotify. fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing. fsnotify has a number of benefits, the first being actually shrinking the size of an inode. Before fsnotify to support both dnotify and inotify an inode had unsigned long i_dnotify_mask; /* Directory notify events */ struct dnotify_struct *i_dnotify; /* for directory notifications */ struct list_head inotify_watches; /* watches on this inode */ struct mutex inotify_mutex; /* protects the watches list But with fsnotify this same functionallity (and more) is done with just __u32 i_fsnotify_mask; /* all events for this inode */ struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */ That's right, inotify, dnotify, and fanotify all in 64 bits. We used that much space just in inotify_watches alone, before this patch set. fsnotify object lifetime and locking is MUCH better than what we have today. inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of what's been busted since inception. inotify needs to know internal semantics of superblock destruction and unmounting to function. The inode pinning and vfs contortions are horrible. no fsnotify implementers do allocation under locks. This means things like f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to GFP_NOFS can be reverted. There are no longer any allocation rules when using or implementing your own fsnotify listener. fsnotify paves the way for fanotify. In brief fanotify is a notification mechanism that delivers the lisener both an 'event' and an open file descriptor to the object in question. This means that fanotify is pathname agnostic. Some on lkml may not care for the original companies or users that pushed for TALPA, but fanotify was designed with flexibility and input for other users in mind. The readahead group expressed interest in fanotify as it could be used to profile disk access on boot without breaking the audit system. The desktop search groups have also expressed interest in fanotify as it solves a number of the race conditions and problems present with managing inotify when more than a limited number of specific files are of interest. fanotify can provide for a userspace access control system which makes it a clean interface for AV vendors to hook without trying to do binary patching on the syscall table, LSM, and everywhere else they do their things today. With this patch series fanotify can be implemented in less than 1200 lines of easy to review code. Almost all of which is the socket based user interface. This patch series builds fsnotify to the point that it can implement dnotify and inotify_user. Patches exist and will be sent soon after acceptance to finish the in kernel inotify conversion (audit) and implement fanotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
2009-05-21 21:01:20 +00:00
*/
int fsnotify_insert_event(struct fsnotify_group *group,
struct fsnotify_event *event,
int (*merge)(struct fsnotify_group *,
struct fsnotify_event *),
void (*insert)(struct fsnotify_group *,
struct fsnotify_event *))
{
int ret = 0;
struct list_head *list = &group->notification_list;
fsnotify: do not share events between notification groups Currently fsnotify framework creates one event structure for each notification event and links this event into all interested notification groups. This is done so that we save memory when several notification groups are interested in the event. However the need for event structure shared between inotify & fanotify bloats the event structure so the result is often higher memory consumption. Another problem is that fsnotify framework keeps path references with outstanding events so that fanotify can return open file descriptors with its events. This has the undesirable effect that filesystem cannot be unmounted while there are outstanding events - a regression for inotify compared to a situation before it was converted to fsnotify framework. For fanotify this problem is hard to avoid and users of fanotify should kind of expect this behavior when they ask for file descriptors from notified files. This patch changes fsnotify and its users to create separate event structure for each group. This allows for much simpler code (~400 lines removed by this patch) and also smaller event structures. For example on 64-bit system original struct fsnotify_event consumes 120 bytes, plus additional space for file name, additional 24 bytes for second and each subsequent group linking the event, and additional 32 bytes for each inotify group for private data. After the conversion inotify event consumes 48 bytes plus space for file name which is considerably less memory unless file names are long and there are several groups interested in the events (both of which are uncommon). Fanotify event fits in 56 bytes after the conversion (fanotify doesn't care about file names so its events don't have to have it allocated). A win unless there are four or more fanotify groups interested in the event. The conversion also solves the problem with unmount when only inotify is used as we don't have to grab path references for inotify events. [hughd@google.com: fanotify: fix corruption preventing startup] Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 23:48:14 +00:00
pr_debug("%s: group=%p event=%p\n", __func__, group, event);
spin_lock(&group->notification_lock);
if (group->shutdown) {
spin_unlock(&group->notification_lock);
return 2;
}
if (event == group->overflow_event ||
group->q_len >= group->max_events) {
ret = 2;
fsnotify: do not share events between notification groups Currently fsnotify framework creates one event structure for each notification event and links this event into all interested notification groups. This is done so that we save memory when several notification groups are interested in the event. However the need for event structure shared between inotify & fanotify bloats the event structure so the result is often higher memory consumption. Another problem is that fsnotify framework keeps path references with outstanding events so that fanotify can return open file descriptors with its events. This has the undesirable effect that filesystem cannot be unmounted while there are outstanding events - a regression for inotify compared to a situation before it was converted to fsnotify framework. For fanotify this problem is hard to avoid and users of fanotify should kind of expect this behavior when they ask for file descriptors from notified files. This patch changes fsnotify and its users to create separate event structure for each group. This allows for much simpler code (~400 lines removed by this patch) and also smaller event structures. For example on 64-bit system original struct fsnotify_event consumes 120 bytes, plus additional space for file name, additional 24 bytes for second and each subsequent group linking the event, and additional 32 bytes for each inotify group for private data. After the conversion inotify event consumes 48 bytes plus space for file name which is considerably less memory unless file names are long and there are several groups interested in the events (both of which are uncommon). Fanotify event fits in 56 bytes after the conversion (fanotify doesn't care about file names so its events don't have to have it allocated). A win unless there are four or more fanotify groups interested in the event. The conversion also solves the problem with unmount when only inotify is used as we don't have to grab path references for inotify events. [hughd@google.com: fanotify: fix corruption preventing startup] Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 23:48:14 +00:00
/* Queue overflow event only if it isn't already queued */
if (!list_empty(&group->overflow_event->list)) {
spin_unlock(&group->notification_lock);
return ret;
}
event = group->overflow_event;
goto queue;
}
if (!list_empty(list) && merge) {
ret = merge(group, event);
if (ret) {
spin_unlock(&group->notification_lock);
return ret;
}
}
queue:
group->q_len++;
fsnotify: do not share events between notification groups Currently fsnotify framework creates one event structure for each notification event and links this event into all interested notification groups. This is done so that we save memory when several notification groups are interested in the event. However the need for event structure shared between inotify & fanotify bloats the event structure so the result is often higher memory consumption. Another problem is that fsnotify framework keeps path references with outstanding events so that fanotify can return open file descriptors with its events. This has the undesirable effect that filesystem cannot be unmounted while there are outstanding events - a regression for inotify compared to a situation before it was converted to fsnotify framework. For fanotify this problem is hard to avoid and users of fanotify should kind of expect this behavior when they ask for file descriptors from notified files. This patch changes fsnotify and its users to create separate event structure for each group. This allows for much simpler code (~400 lines removed by this patch) and also smaller event structures. For example on 64-bit system original struct fsnotify_event consumes 120 bytes, plus additional space for file name, additional 24 bytes for second and each subsequent group linking the event, and additional 32 bytes for each inotify group for private data. After the conversion inotify event consumes 48 bytes plus space for file name which is considerably less memory unless file names are long and there are several groups interested in the events (both of which are uncommon). Fanotify event fits in 56 bytes after the conversion (fanotify doesn't care about file names so its events don't have to have it allocated). A win unless there are four or more fanotify groups interested in the event. The conversion also solves the problem with unmount when only inotify is used as we don't have to grab path references for inotify events. [hughd@google.com: fanotify: fix corruption preventing startup] Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 23:48:14 +00:00
list_add_tail(&event->list, list);
if (insert)
insert(group, event);
spin_unlock(&group->notification_lock);
wake_up(&group->notification_waitq);
kill_fasync(&group->fsn_fa, SIGIO, POLL_IN);
return ret;
}
void fsnotify_remove_queued_event(struct fsnotify_group *group,
struct fsnotify_event *event)
{
assert_spin_locked(&group->notification_lock);
/*
* We need to init list head for the case of overflow event so that
* check in fsnotify_add_event() works
*/
list_del_init(&event->list);
group->q_len--;
}
/*
* Return the first event on the notification list without removing it.
* Returns NULL if the list is empty.
*/
struct fsnotify_event *fsnotify_peek_first_event(struct fsnotify_group *group)
fsnotify: unified filesystem notification backend fsnotify is a backend for filesystem notification. fsnotify does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify. fsnotify can be extended to be the backend for inotify or the upcoming fanotify. fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing. fsnotify has a number of benefits, the first being actually shrinking the size of an inode. Before fsnotify to support both dnotify and inotify an inode had unsigned long i_dnotify_mask; /* Directory notify events */ struct dnotify_struct *i_dnotify; /* for directory notifications */ struct list_head inotify_watches; /* watches on this inode */ struct mutex inotify_mutex; /* protects the watches list But with fsnotify this same functionallity (and more) is done with just __u32 i_fsnotify_mask; /* all events for this inode */ struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */ That's right, inotify, dnotify, and fanotify all in 64 bits. We used that much space just in inotify_watches alone, before this patch set. fsnotify object lifetime and locking is MUCH better than what we have today. inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of what's been busted since inception. inotify needs to know internal semantics of superblock destruction and unmounting to function. The inode pinning and vfs contortions are horrible. no fsnotify implementers do allocation under locks. This means things like f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to GFP_NOFS can be reverted. There are no longer any allocation rules when using or implementing your own fsnotify listener. fsnotify paves the way for fanotify. In brief fanotify is a notification mechanism that delivers the lisener both an 'event' and an open file descriptor to the object in question. This means that fanotify is pathname agnostic. Some on lkml may not care for the original companies or users that pushed for TALPA, but fanotify was designed with flexibility and input for other users in mind. The readahead group expressed interest in fanotify as it could be used to profile disk access on boot without breaking the audit system. The desktop search groups have also expressed interest in fanotify as it solves a number of the race conditions and problems present with managing inotify when more than a limited number of specific files are of interest. fanotify can provide for a userspace access control system which makes it a clean interface for AV vendors to hook without trying to do binary patching on the syscall table, LSM, and everywhere else they do their things today. With this patch series fanotify can be implemented in less than 1200 lines of easy to review code. Almost all of which is the socket based user interface. This patch series builds fsnotify to the point that it can implement dnotify and inotify_user. Patches exist and will be sent soon after acceptance to finish the in kernel inotify conversion (audit) and implement fanotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
2009-05-21 21:01:20 +00:00
{
assert_spin_locked(&group->notification_lock);
fsnotify: unified filesystem notification backend fsnotify is a backend for filesystem notification. fsnotify does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify. fsnotify can be extended to be the backend for inotify or the upcoming fanotify. fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing. fsnotify has a number of benefits, the first being actually shrinking the size of an inode. Before fsnotify to support both dnotify and inotify an inode had unsigned long i_dnotify_mask; /* Directory notify events */ struct dnotify_struct *i_dnotify; /* for directory notifications */ struct list_head inotify_watches; /* watches on this inode */ struct mutex inotify_mutex; /* protects the watches list But with fsnotify this same functionallity (and more) is done with just __u32 i_fsnotify_mask; /* all events for this inode */ struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */ That's right, inotify, dnotify, and fanotify all in 64 bits. We used that much space just in inotify_watches alone, before this patch set. fsnotify object lifetime and locking is MUCH better than what we have today. inotify locking is incredibly complex. See 8f7b0ba1c8539 as an example of what's been busted since inception. inotify needs to know internal semantics of superblock destruction and unmounting to function. The inode pinning and vfs contortions are horrible. no fsnotify implementers do allocation under locks. This means things like f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to GFP_NOFS can be reverted. There are no longer any allocation rules when using or implementing your own fsnotify listener. fsnotify paves the way for fanotify. In brief fanotify is a notification mechanism that delivers the lisener both an 'event' and an open file descriptor to the object in question. This means that fanotify is pathname agnostic. Some on lkml may not care for the original companies or users that pushed for TALPA, but fanotify was designed with flexibility and input for other users in mind. The readahead group expressed interest in fanotify as it could be used to profile disk access on boot without breaking the audit system. The desktop search groups have also expressed interest in fanotify as it solves a number of the race conditions and problems present with managing inotify when more than a limited number of specific files are of interest. fanotify can provide for a userspace access control system which makes it a clean interface for AV vendors to hook without trying to do binary patching on the syscall table, LSM, and everywhere else they do their things today. With this patch series fanotify can be implemented in less than 1200 lines of easy to review code. Almost all of which is the socket based user interface. This patch series builds fsnotify to the point that it can implement dnotify and inotify_user. Patches exist and will be sent soon after acceptance to finish the in kernel inotify conversion (audit) and implement fanotify. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de>
2009-05-21 21:01:20 +00:00
if (fsnotify_notify_queue_is_empty(group))
return NULL;
return list_first_entry(&group->notification_list,
struct fsnotify_event, list);
}
/*
* Remove and return the first event from the notification list. It is the
* responsibility of the caller to destroy the obtained event
*/
struct fsnotify_event *fsnotify_remove_first_event(struct fsnotify_group *group)
{
struct fsnotify_event *event = fsnotify_peek_first_event(group);
if (!event)
return NULL;
pr_debug("%s: group=%p event=%p\n", __func__, group, event);
fsnotify_remove_queued_event(group, event);
return event;
}
/*
* Called when a group is being torn down to clean up any outstanding
* event notifications.
*/
void fsnotify_flush_notify(struct fsnotify_group *group)
{
struct fsnotify_event *event;
spin_lock(&group->notification_lock);
while (!fsnotify_notify_queue_is_empty(group)) {
event = fsnotify_remove_first_event(group);
spin_unlock(&group->notification_lock);
fsnotify: do not share events between notification groups Currently fsnotify framework creates one event structure for each notification event and links this event into all interested notification groups. This is done so that we save memory when several notification groups are interested in the event. However the need for event structure shared between inotify & fanotify bloats the event structure so the result is often higher memory consumption. Another problem is that fsnotify framework keeps path references with outstanding events so that fanotify can return open file descriptors with its events. This has the undesirable effect that filesystem cannot be unmounted while there are outstanding events - a regression for inotify compared to a situation before it was converted to fsnotify framework. For fanotify this problem is hard to avoid and users of fanotify should kind of expect this behavior when they ask for file descriptors from notified files. This patch changes fsnotify and its users to create separate event structure for each group. This allows for much simpler code (~400 lines removed by this patch) and also smaller event structures. For example on 64-bit system original struct fsnotify_event consumes 120 bytes, plus additional space for file name, additional 24 bytes for second and each subsequent group linking the event, and additional 32 bytes for each inotify group for private data. After the conversion inotify event consumes 48 bytes plus space for file name which is considerably less memory unless file names are long and there are several groups interested in the events (both of which are uncommon). Fanotify event fits in 56 bytes after the conversion (fanotify doesn't care about file names so its events don't have to have it allocated). A win unless there are four or more fanotify groups interested in the event. The conversion also solves the problem with unmount when only inotify is used as we don't have to grab path references for inotify events. [hughd@google.com: fanotify: fix corruption preventing startup] Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 23:48:14 +00:00
fsnotify_destroy_event(group, event);
spin_lock(&group->notification_lock);
}
spin_unlock(&group->notification_lock);
}