linux-stable/include/uapi/linux/fs.h
Linus Torvalds 35a891be96 xfs: reflink update for 4.9-rc1
< XFS has gained super CoW powers! >
  ----------------------------------
         \   ^__^
          \  (oo)\_______
             (__)\       )\/\
                 ||----w |
                 ||     ||
 
 Included in this update:
 - unshare range (FALLOC_FL_UNSHARE) support for fallocate
 - copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr interface
 - shared extent support for XFS
 - copy-on-write support for shared extents
 - copy_file_range support
 - clone_file_range support (implements reflink)
 - dedupe_file_range support
 - defrag support for reverse mapping enabled filesystems
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX/hrZAAoJEK3oKUf0dfodpwcQAKkTerNPhhDcthqWUJ2+jC7w
 JIuhKUg2GYojJhIJ4+Ue1knmuBeIusda+PzGls+6gdy7GDGdux/esRIJSe1W7A5G
 RNeumiSKVX5iYsZNUEX35O2a/SwUM1Sm5mcIFs4CxUwIRwE/cayNby6vrlVExvz7
 Ns6YYOI2bldUHLsxedg8MLG0it1JGTADB9gwGgb98bxQ3bD/UBn3TF9xTlj+ZH22
 ebnWsogSJOnUigOOSGeaQsmy1pJAhRIhvt+f481KuZak1pdQcK2feL4RcKw0NpNt
 15LCYRqX6RexC684VYgJZxXB4EKyfS2Bma71q41A7dz1x36kw7+wG18xasBqU++p
 GZwwL6si02rIGPMz1oD8xxZ0F97ADCGRmkgUHsCJKbP5UmGiP08K6GEN3osr5hAN
 xAmn9AxcprXVnV3WmnFxpBeWY/qCEsvSQqJuKSThYqAilqUc8wN2u5g/eEpE6mmg
 KEEhzaq5P4ovS/HOIQJWdBu1j5E9Mg2o/ncy87Q6uE+9Fa5AAP6GBWOtGcMwdFSU
 adbN7dqjgoHMyNHFrmePqyJYtOZ2hZovDlVndxnYysl5ZBfiBEEDISmr+x6KcSlo
 3kyOltYQLjEVu1sLOT3COCddn0jt5Lr1QhGeVepnrMlU2E1h4461viCNMDinJRIp
 OYoMOS+J83G2FEFwgXYM
 =Sa+Y
 -----END PGP SIGNATURE-----

Merge tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

    < XFS has gained super CoW powers! >
     ----------------------------------
            \   ^__^
             \  (oo)\_______
                (__)\       )\/\
                    ||----w |
                    ||     ||

Pull XFS support for shared data extents from Dave Chinner:
 "This is the second part of the XFS updates for this merge cycle.  This
  pullreq contains the new shared data extents feature for XFS.

  Given the complexity and size of this change I am expecting - like the
  addition of reverse mapping last cycle - that there will be some
  follow-up bug fixes and cleanups around the -rc3 stage for issues that
  I'm sure will show up once the code hits a wider userbase.

  What it is:

  At the most basic level we are simply adding shared data extents to
  XFS - i.e. a single extent on disk can now have multiple owners. To do
  this we have to add new on-disk features to both track the shared
  extents and the number of times they've been shared. This is done by
  the new "refcount" btree that sits in every allocation group. When we
  share or unshare an extent, this tree gets updated.

  Along with this new tree, the reverse mapping tree needs to be updated
  to track each owner or a shared extent. This also needs to be updated
  ever share/unshare operation. These interactions at extent allocation
  and freeing time have complex ordering and recovery constraints, so
  there's a significant amount of new intent-based transaction code to
  ensure that operations are performed atomically from both the runtime
  and integrity/crash recovery perspectives.

  We also need to break sharing when writes hit a shared extent - this
  is where the new copy-on-write implementation comes in. We allocate
  new storage and copy the original data along with the overwrite data
  into the new location. We only do this for data as we don't share
  metadata at all - each inode has it's own metadata that tracks the
  shared data extents, the extents undergoing CoW and it's own private
  extents.

  Of course, being XFS, nothing is simple - we use delayed allocation
  for CoW similar to how we use it for normal writes. ENOSPC is a
  significant issue here - we build on the reservation code added in
  4.8-rc1 with the reverse mapping feature to ensure we don't get
  spurious ENOSPC issues part way through a CoW operation. These
  mechanisms also help minimise fragmentation due to repeated CoW
  operations. To further reduce fragmentation overhead, we've also
  introduced a CoW extent size hint, which indicates how large a region
  we should allocate when we execute a CoW operation.

  With all this functionality in place, we can hook up .copy_file_range,
  .clone_file_range and .dedupe_file_range and we gain all the
  capabilities of reflink and other vfs provided functionality that
  enable manipulation to shared extents. We also added a fallocate mode
  that explicitly unshares a range of a file, which we implemented as an
  explicit CoW of all the shared extents in a file.

  As such, it's a huge chunk of new functionality with new on-disk
  format features and internal infrastructure. It warns at mount time as
  an experimental feature and that it may eat data (as we do with all
  new on-disk features until they stabilise). We have not released
  userspace suport for it yet - userspace support currently requires
  download from Darrick's xfsprogs repo and build from source, so the
  access to this feature is really developer/tester only at this point.
  Initial userspace support will be released at the same time the kernel
  with this code in it is released.

  The new code causes 5-6 new failures with xfstests - these aren't
  serious functional failures but things the output of tests changing
  slightly due to perturbations in layouts, space usage, etc. OTOH,
  we've added 150+ new tests to xfstests that specifically exercise this
  new functionality so it's got far better test coverage than any
  functionality we've previously added to XFS.

  Darrick has done a pretty amazing job getting us to this stage, and
  special mention also needs to go to Christoph (review, testing,
  improvements and bug fixes) and Brian (caught several intricate bugs
  during review) for the effort they've also put in.

  Summary:

   - unshare range (FALLOC_FL_UNSHARE) support for fallocate

   - copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr
     interface

   - shared extent support for XFS

   - copy-on-write support for shared extents

   - copy_file_range support

   - clone_file_range support (implements reflink)

   - dedupe_file_range support

   - defrag support for reverse mapping enabled filesystems"

* tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits)
  xfs: convert COW blocks to real blocks before unwritten extent conversion
  xfs: rework refcount cow recovery error handling
  xfs: clear reflink flag if setting realtime flag
  xfs: fix error initialization
  xfs: fix label inaccuracies
  xfs: remove isize check from unshare operation
  xfs: reduce stack usage of _reflink_clear_inode_flag
  xfs: check inode reflink flag before calling reflink functions
  xfs: implement swapext for rmap filesystems
  xfs: refactor swapext code
  xfs: various swapext cleanups
  xfs: recognize the reflink feature bit
  xfs: simulate per-AG reservations being critically low
  xfs: don't mix reflink and DAX mode for now
  xfs: check for invalid inode reflink flags
  xfs: set a default CoW extent size of 32 blocks
  xfs: convert unwritten status of reverse mappings for shared files
  xfs: use interval query for rmap alloc operations on shared files
  xfs: add shared rmap map/unmap/convert log item types
  xfs: increase log reservations for reflink
  ...
2016-10-13 20:28:22 -07:00

332 lines
13 KiB
C

#ifndef _UAPI_LINUX_FS_H
#define _UAPI_LINUX_FS_H
/*
* This file has definitions for some important file table structures
* and constants and structures used by various generic file system
* ioctl's. Please do not make any changes in this file before
* sending patches for review to linux-fsdevel@vger.kernel.org and
* linux-api@vger.kernel.org.
*/
#include <linux/limits.h>
#include <linux/ioctl.h>
#include <linux/types.h>
/*
* It's silly to have NR_OPEN bigger than NR_FILE, but you can change
* the file limit at runtime and only root can increase the per-process
* nr_file rlimit, so it's safe to set up a ridiculously high absolute
* upper limit on files-per-process.
*
* Some programs (notably those using select()) may have to be
* recompiled to take full advantage of the new limits..
*/
/* Fixed constants first: */
#undef NR_OPEN
#define INR_OPEN_CUR 1024 /* Initial setting for nfile rlimits */
#define INR_OPEN_MAX 4096 /* Hard limit for nfile rlimits */
#define BLOCK_SIZE_BITS 10
#define BLOCK_SIZE (1<<BLOCK_SIZE_BITS)
#define SEEK_SET 0 /* seek relative to beginning of file */
#define SEEK_CUR 1 /* seek relative to current file position */
#define SEEK_END 2 /* seek relative to end of file */
#define SEEK_DATA 3 /* seek to the next data */
#define SEEK_HOLE 4 /* seek to the next hole */
#define SEEK_MAX SEEK_HOLE
#define RENAME_NOREPLACE (1 << 0) /* Don't overwrite target */
#define RENAME_EXCHANGE (1 << 1) /* Exchange source and dest */
#define RENAME_WHITEOUT (1 << 2) /* Whiteout source */
struct file_clone_range {
__s64 src_fd;
__u64 src_offset;
__u64 src_length;
__u64 dest_offset;
};
struct fstrim_range {
__u64 start;
__u64 len;
__u64 minlen;
};
/* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
#define FILE_DEDUPE_RANGE_SAME 0
#define FILE_DEDUPE_RANGE_DIFFERS 1
/* from struct btrfs_ioctl_file_extent_same_info */
struct file_dedupe_range_info {
__s64 dest_fd; /* in - destination file */
__u64 dest_offset; /* in - start of extent in destination */
__u64 bytes_deduped; /* out - total # of bytes we were able
* to dedupe from this file. */
/* status of this dedupe operation:
* < 0 for error
* == FILE_DEDUPE_RANGE_SAME if dedupe succeeds
* == FILE_DEDUPE_RANGE_DIFFERS if data differs
*/
__s32 status; /* out - see above description */
__u32 reserved; /* must be zero */
};
/* from struct btrfs_ioctl_file_extent_same_args */
struct file_dedupe_range {
__u64 src_offset; /* in - start of extent in source */
__u64 src_length; /* in - length of extent */
__u16 dest_count; /* in - total elements in info array */
__u16 reserved1; /* must be zero */
__u32 reserved2; /* must be zero */
struct file_dedupe_range_info info[0];
};
/* And dynamically-tunable limits and defaults: */
struct files_stat_struct {
unsigned long nr_files; /* read only */
unsigned long nr_free_files; /* read only */
unsigned long max_files; /* tunable */
};
struct inodes_stat_t {
long nr_inodes;
long nr_unused;
long dummy[5]; /* padding for sysctl ABI compatibility */
};
#define NR_FILE 8192 /* this can well be larger on a larger system */
/*
* These are the fs-independent mount-flags: up to 32 flags are supported
*/
#define MS_RDONLY 1 /* Mount read-only */
#define MS_NOSUID 2 /* Ignore suid and sgid bits */
#define MS_NODEV 4 /* Disallow access to device special files */
#define MS_NOEXEC 8 /* Disallow program execution */
#define MS_SYNCHRONOUS 16 /* Writes are synced at once */
#define MS_REMOUNT 32 /* Alter flags of a mounted FS */
#define MS_MANDLOCK 64 /* Allow mandatory locks on an FS */
#define MS_DIRSYNC 128 /* Directory modifications are synchronous */
#define MS_NOATIME 1024 /* Do not update access times. */
#define MS_NODIRATIME 2048 /* Do not update directory access times */
#define MS_BIND 4096
#define MS_MOVE 8192
#define MS_REC 16384
#define MS_VERBOSE 32768 /* War is peace. Verbosity is silence.
MS_VERBOSE is deprecated. */
#define MS_SILENT 32768
#define MS_POSIXACL (1<<16) /* VFS does not apply the umask */
#define MS_UNBINDABLE (1<<17) /* change to unbindable */
#define MS_PRIVATE (1<<18) /* change to private */
#define MS_SLAVE (1<<19) /* change to slave */
#define MS_SHARED (1<<20) /* change to shared */
#define MS_RELATIME (1<<21) /* Update atime relative to mtime/ctime. */
#define MS_KERNMOUNT (1<<22) /* this is a kern_mount call */
#define MS_I_VERSION (1<<23) /* Update inode I_version field */
#define MS_STRICTATIME (1<<24) /* Always perform atime updates */
#define MS_LAZYTIME (1<<25) /* Update the on-disk [acm]times lazily */
/* These sb flags are internal to the kernel */
#define MS_NOREMOTELOCK (1<<27)
#define MS_NOSEC (1<<28)
#define MS_BORN (1<<29)
#define MS_ACTIVE (1<<30)
#define MS_NOUSER (1<<31)
/*
* Superblock flags that can be altered by MS_REMOUNT
*/
#define MS_RMT_MASK (MS_RDONLY|MS_SYNCHRONOUS|MS_MANDLOCK|MS_I_VERSION|\
MS_LAZYTIME)
/*
* Old magic mount flag and mask
*/
#define MS_MGC_VAL 0xC0ED0000
#define MS_MGC_MSK 0xffff0000
/*
* Structure for FS_IOC_FSGETXATTR[A] and FS_IOC_FSSETXATTR.
*/
struct fsxattr {
__u32 fsx_xflags; /* xflags field value (get/set) */
__u32 fsx_extsize; /* extsize field value (get/set)*/
__u32 fsx_nextents; /* nextents field value (get) */
__u32 fsx_projid; /* project identifier (get/set) */
__u32 fsx_cowextsize; /* CoW extsize field value (get/set)*/
unsigned char fsx_pad[8];
};
/*
* Flags for the fsx_xflags field
*/
#define FS_XFLAG_REALTIME 0x00000001 /* data in realtime volume */
#define FS_XFLAG_PREALLOC 0x00000002 /* preallocated file extents */
#define FS_XFLAG_IMMUTABLE 0x00000008 /* file cannot be modified */
#define FS_XFLAG_APPEND 0x00000010 /* all writes append */
#define FS_XFLAG_SYNC 0x00000020 /* all writes synchronous */
#define FS_XFLAG_NOATIME 0x00000040 /* do not update access time */
#define FS_XFLAG_NODUMP 0x00000080 /* do not include in backups */
#define FS_XFLAG_RTINHERIT 0x00000100 /* create with rt bit set */
#define FS_XFLAG_PROJINHERIT 0x00000200 /* create with parents projid */
#define FS_XFLAG_NOSYMLINKS 0x00000400 /* disallow symlink creation */
#define FS_XFLAG_EXTSIZE 0x00000800 /* extent size allocator hint */
#define FS_XFLAG_EXTSZINHERIT 0x00001000 /* inherit inode extent size */
#define FS_XFLAG_NODEFRAG 0x00002000 /* do not defragment */
#define FS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
#define FS_XFLAG_DAX 0x00008000 /* use DAX for IO */
#define FS_XFLAG_COWEXTSIZE 0x00010000 /* CoW extent size allocator hint */
#define FS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
/* the read-only stuff doesn't really belong here, but any other place is
probably as bad and I don't want to create yet another include file. */
#define BLKROSET _IO(0x12,93) /* set device read-only (0 = read-write) */
#define BLKROGET _IO(0x12,94) /* get read-only status (0 = read_write) */
#define BLKRRPART _IO(0x12,95) /* re-read partition table */
#define BLKGETSIZE _IO(0x12,96) /* return device size /512 (long *arg) */
#define BLKFLSBUF _IO(0x12,97) /* flush buffer cache */
#define BLKRASET _IO(0x12,98) /* set read ahead for block device */
#define BLKRAGET _IO(0x12,99) /* get current read ahead setting */
#define BLKFRASET _IO(0x12,100)/* set filesystem (mm/filemap.c) read-ahead */
#define BLKFRAGET _IO(0x12,101)/* get filesystem (mm/filemap.c) read-ahead */
#define BLKSECTSET _IO(0x12,102)/* set max sectors per request (ll_rw_blk.c) */
#define BLKSECTGET _IO(0x12,103)/* get max sectors per request (ll_rw_blk.c) */
#define BLKSSZGET _IO(0x12,104)/* get block device sector size */
#if 0
#define BLKPG _IO(0x12,105)/* See blkpg.h */
/* Some people are morons. Do not use sizeof! */
#define BLKELVGET _IOR(0x12,106,size_t)/* elevator get */
#define BLKELVSET _IOW(0x12,107,size_t)/* elevator set */
/* This was here just to show that the number is taken -
probably all these _IO(0x12,*) ioctls should be moved to blkpg.h. */
#endif
/* A jump here: 108-111 have been used for various private purposes. */
#define BLKBSZGET _IOR(0x12,112,size_t)
#define BLKBSZSET _IOW(0x12,113,size_t)
#define BLKGETSIZE64 _IOR(0x12,114,size_t) /* return device size in bytes (u64 *arg) */
#define BLKTRACESETUP _IOWR(0x12,115,struct blk_user_trace_setup)
#define BLKTRACESTART _IO(0x12,116)
#define BLKTRACESTOP _IO(0x12,117)
#define BLKTRACETEARDOWN _IO(0x12,118)
#define BLKDISCARD _IO(0x12,119)
#define BLKIOMIN _IO(0x12,120)
#define BLKIOOPT _IO(0x12,121)
#define BLKALIGNOFF _IO(0x12,122)
#define BLKPBSZGET _IO(0x12,123)
#define BLKDISCARDZEROES _IO(0x12,124)
#define BLKSECDISCARD _IO(0x12,125)
#define BLKROTATIONAL _IO(0x12,126)
#define BLKZEROOUT _IO(0x12,127)
#define BMAP_IOCTL 1 /* obsolete - kept for compatibility */
#define FIBMAP _IO(0x00,1) /* bmap access */
#define FIGETBSZ _IO(0x00,2) /* get the block size used for bmap */
#define FIFREEZE _IOWR('X', 119, int) /* Freeze */
#define FITHAW _IOWR('X', 120, int) /* Thaw */
#define FITRIM _IOWR('X', 121, struct fstrim_range) /* Trim */
#define FICLONE _IOW(0x94, 9, int)
#define FICLONERANGE _IOW(0x94, 13, struct file_clone_range)
#define FIDEDUPERANGE _IOWR(0x94, 54, struct file_dedupe_range)
#define FS_IOC_GETFLAGS _IOR('f', 1, long)
#define FS_IOC_SETFLAGS _IOW('f', 2, long)
#define FS_IOC_GETVERSION _IOR('v', 1, long)
#define FS_IOC_SETVERSION _IOW('v', 2, long)
#define FS_IOC_FIEMAP _IOWR('f', 11, struct fiemap)
#define FS_IOC32_GETFLAGS _IOR('f', 1, int)
#define FS_IOC32_SETFLAGS _IOW('f', 2, int)
#define FS_IOC32_GETVERSION _IOR('v', 1, int)
#define FS_IOC32_SETVERSION _IOW('v', 2, int)
#define FS_IOC_FSGETXATTR _IOR ('X', 31, struct fsxattr)
#define FS_IOC_FSSETXATTR _IOW ('X', 32, struct fsxattr)
/*
* File system encryption support
*/
/* Policy provided via an ioctl on the topmost directory */
#define FS_KEY_DESCRIPTOR_SIZE 8
struct fscrypt_policy {
__u8 version;
__u8 contents_encryption_mode;
__u8 filenames_encryption_mode;
__u8 flags;
__u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE];
} __packed;
#define FS_IOC_SET_ENCRYPTION_POLICY _IOR('f', 19, struct fscrypt_policy)
#define FS_IOC_GET_ENCRYPTION_PWSALT _IOW('f', 20, __u8[16])
#define FS_IOC_GET_ENCRYPTION_POLICY _IOW('f', 21, struct fscrypt_policy)
/*
* Inode flags (FS_IOC_GETFLAGS / FS_IOC_SETFLAGS)
*
* Note: for historical reasons, these flags were originally used and
* defined for use by ext2/ext3, and then other file systems started
* using these flags so they wouldn't need to write their own version
* of chattr/lsattr (which was shipped as part of e2fsprogs). You
* should think twice before trying to use these flags in new
* contexts, or trying to assign these flags, since they are used both
* as the UAPI and the on-disk encoding for ext2/3/4. Also, we are
* almost out of 32-bit flags. :-)
*
* We have recently hoisted FS_IOC_FSGETXATTR / FS_IOC_FSSETXATTR from
* XFS to the generic FS level interface. This uses a structure that
* has padding and hence has more room to grow, so it may be more
* appropriate for many new use cases.
*
* Please do not change these flags or interfaces before checking with
* linux-fsdevel@vger.kernel.org and linux-api@vger.kernel.org.
*/
#define FS_SECRM_FL 0x00000001 /* Secure deletion */
#define FS_UNRM_FL 0x00000002 /* Undelete */
#define FS_COMPR_FL 0x00000004 /* Compress file */
#define FS_SYNC_FL 0x00000008 /* Synchronous updates */
#define FS_IMMUTABLE_FL 0x00000010 /* Immutable file */
#define FS_APPEND_FL 0x00000020 /* writes to file may only append */
#define FS_NODUMP_FL 0x00000040 /* do not dump file */
#define FS_NOATIME_FL 0x00000080 /* do not update atime */
/* Reserved for compression usage... */
#define FS_DIRTY_FL 0x00000100
#define FS_COMPRBLK_FL 0x00000200 /* One or more compressed clusters */
#define FS_NOCOMP_FL 0x00000400 /* Don't compress */
/* End compression flags --- maybe not all used */
#define FS_ENCRYPT_FL 0x00000800 /* Encrypted file */
#define FS_BTREE_FL 0x00001000 /* btree format dir */
#define FS_INDEX_FL 0x00001000 /* hash-indexed directory */
#define FS_IMAGIC_FL 0x00002000 /* AFS directory */
#define FS_JOURNAL_DATA_FL 0x00004000 /* Reserved for ext3 */
#define FS_NOTAIL_FL 0x00008000 /* file tail should not be merged */
#define FS_DIRSYNC_FL 0x00010000 /* dirsync behaviour (directories only) */
#define FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
#define FS_HUGE_FILE_FL 0x00040000 /* Reserved for ext4 */
#define FS_EXTENT_FL 0x00080000 /* Extents */
#define FS_EA_INODE_FL 0x00200000 /* Inode used for large EA */
#define FS_EOFBLOCKS_FL 0x00400000 /* Reserved for ext4 */
#define FS_NOCOW_FL 0x00800000 /* Do not cow file */
#define FS_INLINE_DATA_FL 0x10000000 /* Reserved for ext4 */
#define FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
#define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */
#define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */
#define FS_FL_USER_MODIFIABLE 0x000380FF /* User modifiable flags */
#define SYNC_FILE_RANGE_WAIT_BEFORE 1
#define SYNC_FILE_RANGE_WRITE 2
#define SYNC_FILE_RANGE_WAIT_AFTER 4
/* flags for preadv2/pwritev2: */
#define RWF_HIPRI 0x00000001 /* high priority request, poll if possible */
#define RWF_DSYNC 0x00000002 /* per-IO O_DSYNC */
#define RWF_SYNC 0x00000004 /* per-IO O_SYNC */
#endif /* _UAPI_LINUX_FS_H */