linux-stable

Commit Graph

Author	SHA1	Message	Date
David Sterba	69d2480456	btrfs: use copy_page for copying pages instead of memcpy Use the helper that's possibly optimized for full page copies. Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:43 +02:00
David Sterba	3ffbd68c48	btrfs: simplify pointer chasing of local fs_info variables Functions that get btrfs inode can simply reach the fs_info by dereferencing the root and this looks a bit more straightforward compared to the btrfs_sb(...) indirection. If the transaction handle is available and not NULL it's used instead. Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:43 +02:00
Zhihui Zhang	8f6c72a9e0	Btrfs: free space cache: make sure there is always room for generation number io_ctl_set_generation() assumes that the generation number shares the same page with inline CRCs. Let's make sure this is always true. Signed-off-by: Zhihui Zhang <zzhsuny@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-08-06 13:12:42 +02:00
Omar Sandoval	f7e9e8fc79	Btrfs: stop creating orphan items for truncate Currently, we insert an orphan item during a truncate so that if there's a crash, we don't leak extents past the on-disk i_size. However, since commit `7f4f6e0a3f` ("Btrfs: only update disk_i_size as we remove extents"), we keep disk_i_size in sync with the extent items as we truncate, so orphan cleanup will never have any extents to remove. Don't bother with the superfluous orphan item. Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-05-28 18:23:46 +02:00
David Sterba	c1d7c514f7	btrfs: replace GPL boilerplate by SPDX -- sources Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header. Signed-off-by: David Sterba <dsterba@suse.com>	2018-04-12 16:29:51 +02:00
Qu Wenruo	43b18595d6	btrfs: qgroup: Use separate meta reservation type for delalloc Before this patch, btrfs qgroup is mixing per-transcation meta rsv with preallocated meta rsv, making it quite easy to underflow qgroup meta reservation. Since we have the new qgroup meta rsv types, apply it to delalloc reservation. Now for delalloc, most of its reserved space will use META_PREALLOC qgroup rsv type. And for callers reducing outstanding extent like btrfs_finish_ordered_io(), they will convert corresponding META_PREALLOC reservation to META_PERTRANS. This is mainly due to the fact that current qgroup numbers will only be updated in btrfs_commit_transaction(), that's to say if we don't keep such placeholder reservation, we can exceed qgroup limitation. And for callers freeing outstanding extent in error handler, we will just free META_PREALLOC bytes. This behavior makes callers of btrfs_qgroup_release_meta() or btrfs_qgroup_convert_meta() to be aware of which type they are. So in this patch, btrfs_delalloc_release_metadata() and its callers get an extra parameter to info qgroup to do correct meta convert/release. The good news is, even we use the wrong type (convert or free), it won't cause obvious bug, as prealloc type is always in good shape, and the type only affects how per-trans meta is increased or not. So the worst case will be at most metadata limitation can be sometimes exceeded (no convert at all) or metadata limitation is reached too soon (no free at all). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2018-03-31 01:41:14 +02:00
Linus Torvalds	b2fe5fa686	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) Significantly shrink the core networking routing structures. Result of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf 2) Add netdevsim driver for testing various offloads, from Jakub Kicinski. 3) Support cross-chip FDB operations in DSA, from Vivien Didelot. 4) Add a 2nd listener hash table for TCP, similar to what was done for UDP. From Martin KaFai Lau. 5) Add eBPF based queue selection to tun, from Jason Wang. 6) Lockless qdisc support, from John Fastabend. 7) SCTP stream interleave support, from Xin Long. 8) Smoother TCP receive autotuning, from Eric Dumazet. 9) Lots of erspan tunneling enhancements, from William Tu. 10) Add true function call support to BPF, from Alexei Starovoitov. 11) Add explicit support for GRO HW offloading, from Michael Chan. 12) Support extack generation in more netlink subsystems. From Alexander Aring, Quentin Monnet, and Jakub Kicinski. 13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From Russell King. 14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso. 15) Many improvements and simplifications to the NFP driver bpf JIT, from Jakub Kicinski. 16) Support for ipv6 non-equal cost multipath routing, from Ido Schimmel. 17) Add resource abstration to devlink, from Arkadi Sharshevsky. 18) Packet scheduler classifier shared filter block support, from Jiri Pirko. 19) Avoid locking in act_csum, from Davide Caratti. 20) devinet_ioctl() simplifications from Al viro. 21) More TCP bpf improvements from Lawrence Brakmo. 22) Add support for onlink ipv6 route flag, similar to ipv4, from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits) tls: Add support for encryption using async offload accelerator ip6mr: fix stale iterator net/sched: kconfig: Remove blank help texts openvswitch: meter: Use 64-bit arithmetic instead of 32-bit tcp_nv: fix potential integer overflow in tcpnv_acked r8169: fix RTL8168EP take too long to complete driver initialization. qmi_wwan: Add support for Quectel EP06 rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK ipmr: Fix ptrdiff_t print formatting ibmvnic: Wait for device response when changing MAC qlcnic: fix deadlock bug tcp: release sk_frag.page in tcp_disconnect ipv4: Get the address of interface correctly. net_sched: gen_estimator: fix lockdep splat net: macb: Handle HRESP error net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring ipv6: addrconf: break critical section in addrconf_verify_rtnl() ipv6: change route cache aging logic i40e/i40evf: Update DESC_NEEDED value to reflect larger value bnxt_en: cleanup DIM work on device shutdown ...	2018-01-31 14:31:10 -08:00
David Sterba	e43bbe5e16	btrfs: sink unlock_extent parameter gfp_flags All callers pass either GFP_NOFS or GFP_KERNEL now, so we can sink the parameter to the function, though we lose some of the slightly better semantics of GFP_KERNEL in some places, it's worth cleaning up the callchains. Signed-off-by: David Sterba <dsterba@suse.com>	2018-01-22 16:08:19 +01:00
David Sterba	ae0f162534	btrfs: sink gfp parameter to clear_extent_bit All callers use GFP_NOFS, we don't have to pass it as an argument. The built-in tests pass GFP_KERNEL, but they run only at module load time and NOFS works there as well. Signed-off-by: David Sterba <dsterba@suse.com>	2018-01-22 16:08:12 +01:00
Masami Hiramatsu	663faf9f7b	error-injection: Add injectable error types Add injectable error types for each error-injectable function. One motivation of error injection test is to find software flaws, mistakes or mis-handlings of expectable errors. If we find such flaws by the test, that is a program bug, so we need to fix it. But if the tester miss input the error (e.g. just return success code without processing anything), it causes unexpected behavior even if the caller is correctly programmed to handle any errors. That is not what we want to test by error injection. To clarify what type of errors the caller must expect for each injectable function, this introduces injectable error types: - EI_ETYPE_NULL : means the function will return NULL if it fails. No ERR_PTR, just a NULL. - EI_ETYPE_ERRNO : means the function will return -ERRNO if it fails. - EI_ETYPE_ERRNO_NULL : means the function will return -ERRNO (ERR_PTR) or NULL. ALLOW_ERROR_INJECTION() macro is expanded to get one of NULL, ERRNO, ERRNO_NULL to record the error type for each function. e.g. ALLOW_ERROR_INJECTION(open_ctree, ERRNO) This error types are shown in debugfs as below. ==== / # cat /sys/kernel/debug/error_injection/list open_ctree [btrfs] ERRNO io_ctl_init [btrfs] ERRNO ==== Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-01-12 17:33:38 -08:00
Masami Hiramatsu	540adea380	error-injection: Separate error-injection from kprobe Since error-injection framework is not limited to be used by kprobes, nor bpf. Other kernel subsystems can use it freely for checking safeness of error-injection, e.g. livepatch, ftrace etc. So this separate error-injection framework from kprobes. Some differences has been made: - "kprobe" word is removed from any APIs/structures. - BPF_ALLOW_ERROR_INJECTION() is renamed to ALLOW_ERROR_INJECTION() since it is not limited for BPF too. - CONFIG_FUNCTION_ERROR_INJECTION is the config item of this feature. It is automatically enabled if the arch supports error injection feature for kprobe or ftrace etc. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-01-12 17:33:38 -08:00
Josef Bacik	023f46c5b8	btrfs: allow us to inject errors at io_ctl_init This was instrumental in reproducing a space cache bug. Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2017-12-12 09:02:40 -08:00
Josef Bacik	b77000ed55	btrfs: fix deadlock when writing out space cache If we fail to prepare our pages for whatever reason (out of memory in our case) we need to make sure to drop the block_group->data_rwsem, otherwise hilarity ensues. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add label and use existing unlocking code ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-11-27 15:50:07 +01:00
David Sterba	913e153572	btrfs: drop newlines from strings when using btrfs_* helpers The helpers append "\n" so we can keep the actual strings shorter. The extra newline will print an empty line. Some messages have been slightly modified to be more consistent with the rest (lowercase first letter). Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-08-16 16:12:02 +02:00
David Sterba	619a974292	btrfs: use clear_page where appropriate There's a helper to clear whole page, with a arch-specific optimized code. The replaced cases do not seem to be in performace critical code, but we still might get some percent gain. Signed-off-by: David Sterba <dsterba@suse.com>	2017-04-18 14:07:26 +02:00
Linus Torvalds	1827adb11a	Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull sched.h split-up from Ingo Molnar: "The point of these changes is to significantly reduce the <linux/sched.h> header footprint, to speed up the kernel build and to have a cleaner header structure. After these changes the new <linux/sched.h>'s typical preprocessed size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K lines), which is around 40% faster to build on typical configs. Not much changed from the last version (-v2) posted three weeks ago: I eliminated quirks, backmerged fixes plus I rebased it to an upstream SHA1 from yesterday that includes most changes queued up in -next plus all sched.h changes that were pending from Andrew. I've re-tested the series both on x86 and on cross-arch defconfigs, and did a bisectability test at a number of random points. I tried to test as many build configurations as possible, but some build breakage is probably still left - but it should be mostly limited to architectures that have no cross-compiler binaries available on kernel.org, and non-default configurations" * 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits) sched/headers: Clean up <linux/sched.h> sched/headers: Remove #ifdefs from <linux/sched.h> sched/headers: Remove the <linux/topology.h> include from <linux/sched.h> sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h> sched/headers, x86/apic: Remove the <linux/pm.h> header inclusion from <asm/apic.h> sched/headers, timers: Remove the <linux/sysctl.h> include from <linux/timer.h> sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h> sched/headers: Remove <linux/sched.h> from <linux/sched/init.h> sched/core: Remove unused prefetch_stack() sched/headers: Remove <linux/rculist.h> from <linux/sched.h> sched/headers: Remove the 'init_pid_ns' prototype from <linux/sched.h> sched/headers: Remove <linux/signal.h> from <linux/sched.h> sched/headers: Remove <linux/rwsem.h> from <linux/sched.h> sched/headers: Remove the runqueue_is_locked() prototype sched/headers: Remove <linux/sched.h> from <linux/sched/hotplug.h> sched/headers: Remove <linux/sched.h> from <linux/sched/debug.h> sched/headers: Remove <linux/sched.h> from <linux/sched/nohz.h> sched/headers: Remove <linux/sched.h> from <linux/sched/stat.h> sched/headers: Remove the <linux/gfp.h> include from <linux/sched.h> sched/headers: Remove <linux/rtmutex.h> from <linux/sched.h> ...	2017-03-03 10:16:38 -08:00
Ingo Molnar	f361bf4a66	sched/headers: Prepare for the reduction of <linux/sched.h>'s signal API dependency Instead of including the full <linux/signal.h>, we are going to include the types-only <linux/signal_types.h> header in <linux/sched.h>, to further decouple the scheduler header from the signal headers. This means that various files which relied on the full <linux/signal.h> need to be updated to gain an explicit dependency on it. Update the code that relies on sched.h's inclusion of the <linux/signal.h> header. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:37 +01:00
Nikolay Borisov	691fa05967	btrfs: all btrfs_delalloc_release_metadata take btrfs_inode Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-28 11:30:07 +01:00
Nikolay Borisov	6ef06d2790	btrfs: Make btrfs_i_size_write take btrfs_inode Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-28 11:30:06 +01:00
Jeff Mahoney	21e75ffe3c	btrfs: btrfs_truncate_free_space_cache always allocates path btrfs_truncate_free_space_cache always allocates a btrfs_path structure but only uses it when the caller passes a block group. Let's move the allocation and free into the conditional. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-17 12:03:56 +01:00
Jeff Mahoney	77ab86bf1c	btrfs: free-space-cache, clean up unnecessary root arguments The free space cache APIs accept a root but always use the tree root. Also, btrfs_truncate_free_space_cache accepts a root AND an inode but the inode always points to the root anyway, so let's just pass the inode. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-17 12:03:56 +01:00
David Sterba	0e8d931a82	btrfs: remove unused parameters from __btrfs_write_out_cache Both unused after the call to update_cache_item has been moved to __btrfs_wait_cache_io. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-17 12:03:55 +01:00
David Sterba	7bf1a15912	btrfs: remove unused parameter from cleanup_write_cache_enospc bitmap_list is unused since the io_ctl framework. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-17 12:03:55 +01:00
David Sterba	1d4805386e	btrfs: make space cache inode readahead failure nonfatal We do a readahead of the free space cache inode to speed things up but the failure is not fatal, like in other readahead cases. Proper reads would need to happen anyway and any errors would be caught there. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-17 12:03:49 +01:00
Nikolay Borisov	4a0cc7ca6c	btrfs: Make btrfs_ino take a struct btrfs_inode Currently btrfs_ino takes a struct inode and this causes a lot of internal btrfs functions which consume this ino to take a VFS inode, rather than btrfs' own struct btrfs_inode. In order to fix this "leak" of VFS structs into the internals of btrfs first it's necessary to eliminate all uses of struct inode for the purpose of inode. This patch does that by using BTRFS_I to convert an inode to btrfs_inode. With this problem eliminated subsequent patches will start eliminating the passing of struct inode altogether, eventually resulting in a lot cleaner code. Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com> [ fix btrfs_get_extent tracepoint prototype ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-14 15:50:51 +01:00
David Sterba	34441361c4	btrfs: opencode chunk locking, remove helpers The helpers are trivial and we don't use them consistently. Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:07:00 +01:00
Jeff Mahoney	2ff7e61e0d	btrfs: take an fs_info directly when the root is not used otherwise There are loads of functions in btrfs that accept a root parameter but only use it to obtain an fs_info pointer. Let's convert those to just accept an fs_info pointer directly. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:59 +01:00
Jeff Mahoney	afdb571890	btrfs: simplify btrfs_wait_cache_io prototype With the exception of the one case where btrfs_wait_cache_io is called without a block group, it's called with the same arguments. The root argument is only used in the special case, so let's factor out the core and simplify the call in the normal case to require a trans, block group, and path. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:59 +01:00
Jeff Mahoney	0b246afa62	btrfs: root->fs_info cleanup, add fs_info convenience variables In routines where someptr->fs_info is referenced multiple times, we introduce a convenience variable. This makes the code considerably more readable. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:59 +01:00
Jeff Mahoney	3796d33535	btrfs: root->fs_info cleanup, lock/unlock_chunks Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:58 +01:00
Jeff Mahoney	27965b6c2c	btrfs: root->fs_info cleanup, btrfs_calc_{trans,trunc}_metadata_size Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:58 +01:00
Jeff Mahoney	da17066c40	btrfs: pull node/sector/stripe sizes out of root and into fs_info We track the node sizes per-root, but they never vary from the values in the superblock. This patch messes with the 80-column style a bit, but subsequent patches to factor out root->fs_info into a convenience variable fix it up again. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:58 +01:00
Jeff Mahoney	f15376df0d	btrfs: root->fs_info cleanup, io_ctl_init The io_ctl->root member was only being used to access root->fs_info. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:58 +01:00
Jeff Mahoney	5b4aacefb8	btrfs: call functions that overwrite their root parameter with fs_info There are 11 functions that accept a root parameter and immediately overwrite it. We can pass those an fs_info pointer instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:57 +01:00
Christophe JAILLET	4d5106a126	btrfs: remove redundant check of btrfs_iget return value 'btrfs_iget()' can not return NULL, so this test can be removed. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-11-30 13:45:18 +01:00
Domagoj Tršan	0b5e3dafb6	btrfs: change btrfs_csum_final result param type to u8 csum member of struct btrfs_super_block has array type of u8. It makes sense that function btrfs_csum_final should be also declared to accept u8 . I changed the declaration of method void btrfs_csum_final(u32 crc, char result); to void btrfs_csum_final(u32 crc, u8 *result); Signed-off-by: Domagoj Tršan <domagoj.trsan@gmail.com> [ changed cast to u8 at several call sites ] Signed-off-by: David Sterba <dsterba@suse.com>	2016-11-30 13:45:18 +01:00
David Sterba	b159fa2808	btrfs: remove constant parameter to memset_extent_buffer and rename it The only memset we do is to 0, so sink the parameter to the function and simplify all calls. Rename the function to reflect the behaviour. Signed-off-by: David Sterba <dsterba@suse.com>	2016-11-30 13:45:17 +01:00
Jeff Mahoney	ab8d0fc48d	btrfs: convert pr_* to btrfs_* where possible For many printks, we want to know which file system issued the message. This patch converts most pr_* calls to use the btrfs_* versions instead. In some cases, this means adding plumbing to allow call sites access to an fs_info pointer. fs/btrfs/check-integrity.c is left alone for another day. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-09-26 19:37:04 +02:00
Jeff Mahoney	62e855771d	btrfs: convert printk(KERN_* to use pr_* calls This patch converts printk(KERN_* style messages to use the pr_* versions. One side effect is that anything that was KERN_DEBUG is now automatically a dynamic debug message. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-09-26 18:08:44 +02:00
Jeff Mahoney	5d163e0e68	btrfs: unsplit printed strings CodingStyle chapter 2: "[...] never break user-visible strings such as printk messages, because that breaks the ability to grep for them." This patch unsplits user-visible strings. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-09-26 18:08:44 +02:00
Jeff Mahoney	66642832f0	btrfs: btrfs_abort_transaction, drop root parameter __btrfs_abort_transaction doesn't use its root parameter except to obtain an fs_info pointer. We can obtain that from trans->root->fs_info for now and from trans->fs_info in a later patch. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-07-26 13:54:26 +02:00
Jeff Mahoney	3cdde2240d	btrfs: btrfs_test_opt and friends should take a btrfs_fs_info btrfs_test_opt and friends only use the root pointer to access the fs_info. Let's pass the fs_info directly in preparation to eliminate similar patterns all over btrfs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-07-26 13:53:16 +02:00
Feifei Xu	b9ef22dedd	Btrfs: self-tests: Support non-4k page size self-tests code assumes 4k as the sectorsize and nodesize. This commit fix hardcoded 4K. Enables the self-tests code to be executed on non-4k page sized systems (e.g. ppc64). Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-02 19:23:14 +02:00
Feifei Xu	0ef6447a3d	Btrfs: Fix integer overflow when calculating bytes_per_bitmap On ppc64, bytes_per_bitmap will be (65536865536). Hence append UL to fix integer overflow. Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-02 19:22:49 +02:00
Feifei Xu	5473e0c426	Btrfs: test_check_exists: Fix infinite loop when searching for free space entries On a ppc64 machine using 64K as the block size, assume that the RB tree at btrfs_free_space_ctl->free_space_offset contains following two entries: 1. A bitmap entry having an offset value of 0 and having the bits corresponding to the address range [128M+512K, 128M+768K] set. 2. An extent entry corresponding to the address range [128M-256K, 128M-128K] In such a scenario, test_check_exists() invoked for checking the existence of address range [128M+768K, 256M] can lead to an infinite loop as explained below: - Checking for the extent entry fails. - Checking for a bitmap entry results in the free space info in range [128M+512K, 128M+768K] beng returned. - rb_prev(info) returns NULL because the bitmap entry starting from offset 0 comes first in the RB tree. - current_node = bitmap node. - while (current_node) tmp = rb_next(bitmap_node);/tmp is extent based free space entry/ Since extent based free space entry's last address is smaller than the address being searched for (i.e. 128M+768K) we incorrectly again obtain the extent node as the "next right node" of the RB tree and thus end up looping infinitely. This patch fixes the issue by checking the "tmp" variable which point to the most recently searched free space node. Reviewed-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Feifei Xu <xufeifei@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-02 19:22:34 +02:00
Nicholas D Steeves	0132761017	btrfs: fix string and comment grammatical issues and typos Signed-off-by: Nicholas D Steeves <nsteeves@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-05-25 22:35:14 +02:00
Kirill A. Shutemov	09cbfeaf1a	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-04 10:41:08 -07:00
Chris Mason	b28cf57246	Merge branch 'misc-cleanups-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5 Signed-off-by: Chris Mason <clm@fb.com>	2016-01-11 06:08:37 -08:00
Chris Mason	a3058101c1	Merge branch 'misc-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5	2016-01-11 05:59:32 -08:00
David Sterba	20e5506baf	btrfs: constify remaining structs with function pointers * struct extent_io_ops * struct btrfs_free_space_op Signed-off-by: David Sterba <dsterba@suse.com>	2016-01-07 15:01:14 +01:00
Geliang Tang	7ae1681e12	btrfs: use list_for_each_entry_safe in free-space-cache.c Use list_for_each_entry_safe() instead of list_for_each_safe() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@163.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-01-07 14:39:09 +01:00
Byongho Lee	ee22184b53	Btrfs: use linux/sizes.h to represent constants We use many constants to represent size and offset value. And to make code readable we use '256 * 1024 * 1024' instead of '268435456' to represent '256MB'. However we can make far more readable with 'SZ_256MB' which is defined in the 'linux/sizes.h'. So this patch replaces 'xxx * 1024 * 1024' kind of expression with single 'SZ_xxxMB' if 'xxx' is a power of 2 then 'xxx * SZ_1M' if 'xxx' is not a power of 2. And I haven't touched to '4096' & '8192' because it's more intuitive than 'SZ_4KB' & 'SZ_8KB'. Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-01-07 14:38:02 +01:00
Chris Mason	bb9d687618	Merge branch 'dev/simplify-set-bit' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5 Signed-off-by: Chris Mason <clm@fb.com>	2015-12-23 13:17:42 -08:00
Linus Torvalds	fc315e3e5c	Merge branch 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "A couple of small fixes" * 'for-linus-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: check prepare_uptodate_page() error code earlier Btrfs: check for empty bitmap list in setup_cluster_bitmaps btrfs: fix misleading warning when space cache failed to load Btrfs: fix transaction handle leak in balance Btrfs: fix unprotected list move from unused_bgs to deleted_bgs list	2015-12-18 15:35:08 -08:00
Chris Mason	1d3a5a82fe	Merge branch 'for-chris-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.4	2015-12-15 09:09:59 -08:00
Chris Mason	1b9b922a3a	Btrfs: check for empty bitmap list in setup_cluster_bitmaps Dave Jones found a warning from kasan in setup_cluster_bitmaps() ================================================================== BUG: KASAN: stack-out-of-bounds in setup_cluster_bitmap+0xc4/0x5a0 at addr ffff88039bef6828 Read of size 8 by task nfsd/1009 page:ffffea000e6fbd80 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x8000000000000000() page dumped because: kasan: bad access detected CPU: 1 PID: 1009 Comm: nfsd Tainted: G W 4.4.0-rc3-backup-debug+ #1 ffff880065647b50 000000006bb712c2 ffff88039bef6640 ffffffffa680a43e 0000004559c00000 ffff88039bef66c8 ffffffffa62638d1 ffffffffa61121c0 ffff8803a5769de8 0000000000000296 ffff8803a5769df0 0000000000046280 Call Trace: [<ffffffffa680a43e>] dump_stack+0x4b/0x6d [<ffffffffa62638d1>] kasan_report_error+0x501/0x520 [<ffffffffa61121c0>] ? debug_show_all_locks+0x1e0/0x1e0 [<ffffffffa6263948>] kasan_report+0x58/0x60 [<ffffffffa6814b00>] ? rb_last+0x10/0x40 [<ffffffffa66f8af4>] ? setup_cluster_bitmap+0xc4/0x5a0 [<ffffffffa6262ead>] __asan_load8+0x5d/0x70 [<ffffffffa66f8af4>] setup_cluster_bitmap+0xc4/0x5a0 [<ffffffffa66f675a>] ? setup_cluster_no_bitmap+0x6a/0x400 [<ffffffffa66fcd16>] btrfs_find_space_cluster+0x4b6/0x640 [<ffffffffa66fc860>] ? btrfs_alloc_from_cluster+0x4e0/0x4e0 [<ffffffffa66fc36e>] ? btrfs_return_cluster_to_free_space+0x9e/0xb0 [<ffffffffa702dc37>] ? _raw_spin_unlock+0x27/0x40 [<ffffffffa666a1a1>] find_free_extent+0xba1/0x1520 Andrey noticed this was because we were doing list_first_entry on a list that might be empty. Rework the tests a bit so we don't do that. Signed-off-by: Chris Mason <clm@fb.com> Reprorted-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reported-by: Dave Jones <dsj@fb.com>	2015-12-15 09:09:33 -08:00
Holger Hoffstätte	94356889c4	btrfs: fix misleading warning when space cache failed to load When an inconsistent space cache is detected during loading we log a warning that users frequently mistake as instruction to invalidate the cache manually, even though this is not required. Fix the message to indicate that the cache will be rebuilt automatically. Signed-off-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com> Acked-by: Filipe Manana <fdmanana@suse.com>	2015-12-10 11:38:08 +00:00
David Sterba	ff13db41f1	btrfs: drop unused parameter from lock_extent_bits We've always passed 0. Stack usage will slightly decrease. Signed-off-by: David Sterba <dsterba@suse.com>	2015-12-03 14:30:40 +01:00
Linus Torvalds	ad804a0b2a	Merge branch 'akpm' (patches from Andrew) Merge second patch-bomb from Andrew Morton: - most of the rest of MM - procfs - lib/ updates - printk updates - bitops infrastructure tweaks - checkpatch updates - nilfs2 update - signals - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc, dma-debug, dma-mapping, ... * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) ipc,msg: drop dst nil validation in copy_msg include/linux/zutil.h: fix usage example of zlib_adler32() panic: release stale console lock to always get the logbuf printed out dma-debug: check nents in dma_sync_sg* dma-mapping: tidy up dma_parms default handling pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode kexec: use file name as the output message prefix fs, seqfile: always allow oom killer seq_file: reuse string_escape_str() fs/seq_file: use seq_* helpers in seq_hex_dump() coredump: change zap_threads() and zap_process() to use for_each_thread() coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() signal: turn dequeue_signal_lock() into kernel_dequeue_signal() signals: kill block_all_signals() and unblock_all_signals() nilfs2: fix gcc uninitialized-variable warnings in powerpc build nilfs2: fix gcc unused-but-set-variable warnings MAINTAINERS: nilfs2: add header file for tracing nilfs2: add tracepoints for analyzing reading and writing metadata files ...	2015-11-07 14:32:45 -08:00
Michal Hocko	c62d25556b	mm, fs: introduce mapping_gfp_constraint() There are many places which use mapping_gfp_mask to restrict a more generic gfp mask which would be used for allocations which are not directly related to the page cache but they are performed in the same context. Let's introduce a helper function which makes the restriction explicit and easier to track. This patch doesn't introduce any functional changes. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Josef Bacik	0584f718ed	Btrfs: don't do extra bitmap search in one bit case When we make ctl->unit allocations from a bitmap there is no point in searching for the next 0 in the bitmap. If we've found a bit we're done and can just exit the loop. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-10-21 18:55:41 -07:00
Josef Bacik	cef4048370	Btrfs: keep track of largest extent in bitmaps We can waste a lot of time searching through bitmaps when we are heavily fragmented trying to find large contiguous areas that don't exist in the bitmap. So keep track of the max extent size when we do a full search of a bitmap so that next time around we can just skip the expensive searching if our max size is less than what we are looking for. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-10-21 18:55:40 -07:00
Josef Bacik	c759c4e161	Btrfs: don't keep trying to build clusters if we are fragmented If we are extremely fragmented then we won't be able to create a free_cluster. So if this happens set last_ptr->fragmented so that all future allcations will give up trying to create a cluster. When we unpin extents we will unset ->fragmented if we free up a sufficient amount of space in a block group. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-10-21 18:55:39 -07:00
Josef Bacik	d0bd456074	Btrfs: add fragment=* debug mount option In tracking down these weird bitmap problems it was helpful to artificially create an extremely fragmented file system. These mount options let us either fragment data or metadata or both. With these options I could reproduce all sorts of weird latencies and hangs that occur under extreme fragmentation and get them fixed. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-10-21 18:51:43 -07:00
Chris Mason	a0d58e48db	Merge branch 'cleanups/for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.4	2015-10-21 18:21:40 -07:00
Geliang Tang	8cd1e73111	btrfs: fix a comment typo Just fix a typo in the code comment. Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: David Sterba <dsterba@suse.com>	2015-10-21 18:28:48 +02:00
David Sterba	9464732266	btrfs: switch message printers to ratelimited variants Signed-off-by: David Sterba <dsterba@suse.com>	2015-10-08 13:04:06 +02:00
Jeff Mahoney	e33e17ee10	btrfs: add missing discards when unpinning extents with -o discard When we clear the dirty bits in btrfs_delete_unused_bgs for extents in the empty block group, it results in btrfs_finish_extent_commit being unable to discard the freed extents. The block group removal patch added an alternate path to forget extents other than btrfs_finish_extent_commit. As a result, any extents that would be freed when the block group is removed aren't discarded. In my test run, with a large copy of mixed sized files followed by removal, it left nearly 2/3 of extents undiscarded. To clean up the block groups, we add the removed block group onto a list that will be discarded after transaction commit. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Tested-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-07-29 08:15:29 -07:00
Filipe Manana	35c766425a	Btrfs: fix mutex unlock without prior lock on space cache truncation If the call to btrfs_truncate_inode_items() failed and we don't have a block group, we were unlocking the cache_write_mutex without having locked it (we do it only if we have a block group). Fixes: `1bbc621ef2` ("Btrfs: allow block group cache writeout outside critical section in commit") Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2015-06-02 19:34:34 -07:00
Filipe Manana	e43699d4b4	Btrfs: fix crash after inode cache writeback failure If the writeback of an inode cache failed we were unnecessarilly attempting to release again the delalloc metadata that we previously reserved. However attempting to do this a second time triggers an assertion at drop_outstanding_extent() because we have no more outstanding extents for our inode cache's inode. If we were able to start writeback of the cache the reserved metadata space is released at btrfs_finished_ordered_io(), even if an error happens during writeback. So make sure we don't repeat the metadata space release if writeback started for our inode cache. This issue was trivial to reproduce by running the fstest btrfs/088 with "-o inode_cache", which triggered the assertion leading to a BUG() call and requiring a reboot in order to run the remaining fstests. Trace produced by btrfs/088: [255289.385904] BTRFS: assertion failed: BTRFS_I(inode)->outstanding_extents >= num_extents, file: fs/btrfs/extent-tree.c, line: 5276 [255289.388094] ------------[ cut here ]------------ [255289.389184] kernel BUG at fs/btrfs/ctree.h:4057! [255289.390125] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC (...) [255289.392068] Call Trace: [255289.392068] [<ffffffffa035e774>] drop_outstanding_extent+0x3d/0x6d [btrfs] [255289.392068] [<ffffffffa0364988>] btrfs_delalloc_release_metadata+0x54/0xe3 [btrfs] [255289.392068] [<ffffffffa03b4174>] btrfs_write_out_ino_cache+0x95/0xad [btrfs] [255289.392068] [<ffffffffa036f5c4>] btrfs_save_ino_cache+0x275/0x2dc [btrfs] [255289.392068] [<ffffffffa03e2d83>] commit_fs_roots.isra.12+0xaa/0x137 [btrfs] [255289.392068] [<ffffffff8107d33d>] ? trace_hardirqs_on+0xd/0xf [255289.392068] [<ffffffffa037841f>] ? btrfs_commit_transaction+0x4b1/0x9c9 [btrfs] [255289.392068] [<ffffffff814351a4>] ? _raw_spin_unlock+0x32/0x46 [255289.392068] [<ffffffffa037842e>] btrfs_commit_transaction+0x4c0/0x9c9 [btrfs] (...) Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-05-11 07:59:10 -07:00
Filipe Manana	1d3c61c2eb	Btrfs: fix wrong mapping flags for free space inode We were passing a flags value that differed from the intention in commit `2b10826800` ("Btrfs: don't use highmem for free space cache pages"). This caused problems in a ARM machine, leaving btrfs unusable there. Reported-by: Merlijn Wajer <merlijn@wizzup.org> Tested-by: Merlijn Wajer <merlijn@wizzup.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-05-06 17:06:13 -07:00
Omar Sandoval	b86054540e	btrfs: check io_ctl_prepare_pages return in __btrfs_write_out_cache If io_ctl_prepare_pages fails, the pages in io_ctl.pages are not valid. When we try to access them later, things will blow up in various ways. Also fix the comment about the return value, which is an errno on error, not -1, and update the cases where it was not. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-26 06:27:01 -07:00
Chris Mason	a3bdccc4e6	Btrfs: prevent list corruption during free space cache processing __btrfs_write_out_cache is holding the ctl->tree_lock while it prepares a list of bitmaps to record in the free space cache. It was dropping the lock while it worked on other components, which made a window for free_bitmap() to free the bitmap struct without removing it from the list. This changes things to hold the lock the whole time, and also makes sure we hold the lock during enospc cleanup. Reported-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-24 11:52:25 -07:00
Chris Mason	85db36cfb3	Btrfs: fix inode cache writeout The code to fix stalls during free spache cache IO wasn't using the correct root when waiting on the IO for inode caches. This is only a problem when the inode cache is enabled with mount -o inode_cache This fixes the inode cache writeout to preserve any error values and makes sure not to override the root when inode cache writeout is done. Reported-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-23 17:47:34 -07:00
Chris Mason	1bbc621ef2	Btrfs: allow block group cache writeout outside critical section in commit We loop through all of the dirty block groups during commit and write the free space cache. In order to make sure the cache is currect, we do this while no other writers are allowed in the commit. If a large number of block groups are dirty, this can introduce long stalls during the final stages of the commit, which can block new procs trying to change the filesystem. This commit changes the block group cache writeout to take appropriate locks and allow it to run earlier in the commit. We'll still have to redo some of the block groups, but it means we can get most of the work out of the way without blocking the entire FS. Signed-off-by: Chris Mason <clm@fb.com>	2015-04-10 14:07:22 -07:00
Chris Mason	2b10826800	Btrfs: don't use highmem for free space cache pages In order to create the free space cache concurrently with FS modifications, we need to take a few block group locks. The cache code also does kmap, which would schedule with the locks held. Instead of going through kmap_atomic, lets just use lowmem for the cache pages. Signed-off-by: Chris Mason <clm@fb.com>	2015-04-10 14:07:18 -07:00
Chris Mason	c9dc4c6578	Btrfs: two stage dirty block group writeout Block group cache writeout is currently waiting on the pages for each block group cache before moving on to writing the next one. This commit switches things around to send down all the caches and then wait on them in batches. The end result is much faster, since we're keeping the disk pipeline full. Signed-off-by: Chris Mason <clm@fb.com>	2015-04-10 14:07:11 -07:00
Chris Mason	4c6d1d85ad	btrfs: move struct io_ctl into ctree.h and rename it We'll need to put the io_ctl into the block_group cache struct, so name it struct btrfs_io_ctl and move it into ctree.h Signed-off-by: Chris Mason <clm@fb.com>	2015-04-10 14:07:04 -07:00
Chris Mason	28ed1345a5	btrfs: actively run the delayed refs while deleting large files When we are deleting large files with large extents, we are building up a huge set of delayed refs for processing. Truncate isn't checking often enough to see if we need to back off and process those, or let a commit proceed. The end result is long stalls after the rm, and very long commit times. During the commits, other processes back up waiting to start new transactions and we get into trouble. Signed-off-by: Chris Mason <clm@fb.com>	2015-04-10 14:00:14 -07:00
David Sterba	47c5713f47	btrfs: replace remaining do_div calls with div_u64 variants Switch to div_u64_rem that does type checking and has more obvious semantics than do_div. Signed-off-by: David Sterba <dsterba@suse.cz>	2015-03-03 17:24:01 +01:00
David Sterba	b8b93addde	btrfs: cleanup 64bit/32bit divs, provably bounded values The divisor is derived from nodesize or PAGE_SIZE, fits into 32bit type. Get rid of a few more do_div instances. Signed-off-by: David Sterba <dsterba@suse.cz>	2015-03-03 17:24:00 +01:00
David Sterba	31e818fe73	btrfs: cleanup, use kmalloc_array/kcalloc array helpers Convert kmalloc(nr * size, ..) to kmalloc_array that does additional overflow checks, the zeroing variant is kcalloc. Signed-off-by: David Sterba <dsterba@suse.cz>	2015-03-03 17:23:58 +01:00
David Sterba	f8c269d722	btrfs: cleanup 64bit/32bit divs, compile time constants Switch to div_u64 if the divisor is a numeric constant or sum of sizeof()s. We can remove a few instances of do_div that has the hidden semtantics of changing the 1st argument. Small power-of-two divisors are converted to bitshifts, large values are kept intact for clarity. Signed-off-by: David Sterba <dsterba@suse.cz>	2015-03-03 17:23:57 +01:00
David Sterba	351810c1d2	btrfs: use cond_resched_lock where possible Clean the opencoded variant, cond_resched_lock also checks the lock for contention so it might help in some cases that were not covered by simple need_resched(). Signed-off-by: David Sterba <dsterba@suse.cz>	2015-03-03 17:23:56 +01:00
Gui Hecheng	b76808fc26	btrfs: cleanup init for list in free-space-cache o removed an unecessary INIT_LIST_HEAD after LIST_HEAD o merge a declare & INIT_LIST_HEAD pair into one LIST_HEAD Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2015-02-02 19:25:50 -08:00
Josef Bacik	ce93ec548c	Btrfs: track dirty block groups on their own list Currently any time we try to update the block groups on disk we will walk _all_ block groups and check for the ->dirty flag to see if it is set. This function can get called several times during a commit. So if you have several terabytes of data you will be a very sad panda as we will loop through _all_ of the block groups several times, which makes the commit take a while which slows down the rest of the file system operations. This patch introduces a dirty list for the block groups that we get added to when we dirty the block group for the first time. Then we simply update any block groups that have been dirtied since the last time we called btrfs_write_dirty_block_groups. This allows us to clean up how we write the free space cache out so it is much cleaner. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-01-21 17:36:52 -08:00
Filipe Manana	1edb647bb9	Btrfs: remove non-sense btrfs_error_discard_extent() function It doesn't do anything special, it just calls btrfs_discard_extent(), so just remove it. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-12-10 12:22:32 -08:00
Filipe Manana	a1e7e16ed3	Btrfs: ensure deletion from pinned_chunks list is protected The call to remove_extent_mapping() actually deletes the extent map from the list it's included in - fs_info->pinned_chunks - and that list is protected by the chunk mutex. Therefore make that call while holding the chunk mutex and remove the redundant list delete call because it's a noop. This fixes an overlook of the patch titled "Btrfs: fix race between fs trimming and block group remove/allocation" following the same obvervation from the patch titled "Btrfs: fix unprotected deletion from pending_chunks list". Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-12-10 12:22:28 -08:00
Filipe Manana	946ddbe805	Btrfs: fix memory leak after block remove + trimming There was a free space entry structure memeory leak if a block group is remove while a free space entry is being trimmed, which the following diagram explains: CPU 1 CPU 2 btrfs_trim_block_group() trim_no_bitmap() remove free space entry from block group cache's rbtree do_trimming() btrfs_remove_block_group() btrfs_remove_free_space_cache() add back free space entry to block group's cache rbtree btrfs_put_block_group() (...) btrfs_put_block_group() kfree(bg->free_space_ctl) kfree(bg) The free space entry added after doing the discard of its respective range ends up never being freed. Detected after doing an "rmmod btrfs" after running the stress test recently submitted for fstests: [ 8234.642212] kmem_cache_destroy btrfs_free_space: Slab cache still has objects [ 8234.642657] CPU: 1 PID: 32276 Comm: rmmod Tainted: G W L 3.17.0-rc5-btrfs-next-2+ #1 [ 8234.642660] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 8234.642664] 0000000000000000 ffff8801af1b3eb8 ffffffff8140c7b6 ffff8801dbedd0c0 [ 8234.642670] ffff8801af1b3ed0 ffffffff811149ce 0000000000000000 ffff8801af1b3ee0 [ 8234.642676] ffffffffa042dbe7 ffff8801af1b3ef0 ffffffffa0487422 ffff8801af1b3f78 [ 8234.642682] Call Trace: [ 8234.642692] [<ffffffff8140c7b6>] dump_stack+0x4d/0x66 [ 8234.642699] [<ffffffff811149ce>] kmem_cache_destroy+0x4d/0x92 [ 8234.642731] [<ffffffffa042dbe7>] btrfs_destroy_cachep+0x63/0x76 [btrfs] [ 8234.642757] [<ffffffffa0487422>] exit_btrfs_fs+0x9/0xbe7 [btrfs] [ 8234.642762] [<ffffffff810a76a5>] SyS_delete_module+0x155/0x1c6 [ 8234.642768] [<ffffffff8122a7eb>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 8234.642773] [<ffffffff814122d2>] system_call_fastpath+0x16/0x1b This applies on top (depends on) of my previous patch titled: "Btrfs: fix race between fs trimming and block group remove/allocation" Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-12-02 18:35:09 -08:00
Filipe Manana	55507ce361	Btrfs: fix race between writing free space cache and trimming Trimming is completely transactionless, and the way it operates consists of hiding free space entries from a block group, perform the trim/discard and then make the free space entries visible again. Therefore while a free space entry is being trimmed, we can have free space cache writing running in parallel (as part of a transaction commit) which will miss the free space entry. This means that an unmount (or crash/reboot) after that transaction commit and mount again before another transaction starts/commits after the discard finishes, we will have some free space that won't be used again unless the free space cache is rebuilt. After the unmount, fsck (btrfsck, btrfs check) reports the issue like the following example: * fsck.btrfs output * checking extents checking free space cache There is no free space entry for 521764864-521781248 There is no free space entry for 521764864-1103101952 cache appears valid but isnt 29360128 Checking filesystem on /dev/sdc UUID: b4789e27-4774-4626-98e9-ae8dfbfb0fb5 found 1235681286 bytes used err is -22 (...) Another issue caused by this race is a crash while writing bitmap entries to the cache, because while the cache writeout task accesses the bitmaps, the trim task can be concurrently modifying the bitmap or worse might be freeing the bitmap. The later case results in the following crash: [55650.804460] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC [55650.804835] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop parport_pc parport i2c_piix4 psmouse evdev pcspkr microcode processor i2ccore serio_raw thermal_sys button ext4 crc16 jbd2 mbcache sg sd_mod crc_t10dif sr_mod cdrom crct10dif_generic crct10dif_common ata_generic virtio_scsi floppy ata_piix libata virtio_pci virtio_ring virtio scsi_mod e1000 [last unloaded: btrfs] [55650.806169] CPU: 1 PID: 31002 Comm: btrfs-transacti Tainted: G W 3.17.0-rc5-btrfs-next-1+ #1 [55650.806493] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [55650.806867] task: ffff8800b12f6410 ti: ffff880071538000 task.ti: ffff880071538000 [55650.807166] RIP: 0010:[<ffffffffa037cf45>] [<ffffffffa037cf45>] write_bitmap_entries+0x65/0xbb [btrfs] [55650.807514] RSP: 0018:ffff88007153bc30 EFLAGS: 00010246 [55650.807687] RAX: 000000005d1ec000 RBX: ffff8800a665df08 RCX: 0000000000000400 [55650.807885] RDX: ffff88005d1ec000 RSI: 6b6b6b6b6b6b6b6b RDI: ffff88005d1ec000 [55650.808017] RBP: ffff88007153bc58 R08: 00000000ddd51536 R09: 00000000000001e0 [55650.808017] R10: 0000000000000000 R11: 0000000000000037 R12: 6b6b6b6b6b6b6b6b [55650.808017] R13: ffff88007153bca8 R14: 6b6b6b6b6b6b6b6b R15: ffff88007153bc98 [55650.808017] FS: 0000000000000000(0000) GS:ffff88023ec80000(0000) knlGS:0000000000000000 [55650.808017] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [55650.808017] CR2: 0000000002273b88 CR3: 00000000b18f6000 CR4: 00000000000006e0 [55650.808017] Stack: [55650.808017] ffff88020e834e00 ffff880172d68db0 0000000000000000 ffff88019257c800 [55650.808017] ffff8801d42ea720 ffff88007153bd10 ffffffffa037d2fa ffff880224e99180 [55650.808017] ffff8801469a6188 ffff880224e99140 ffff880172d68c50 00000003000000b7 [55650.808017] Call Trace: [55650.808017] [<ffffffffa037d2fa>] __btrfs_write_out_cache+0x1ea/0x37f [btrfs] [55650.808017] [<ffffffffa037d959>] btrfs_write_out_cache+0xa1/0xd8 [btrfs] [55650.808017] [<ffffffffa033936b>] btrfs_write_dirty_block_groups+0x4b5/0x505 [btrfs] [55650.808017] [<ffffffffa03aa98e>] commit_cowonly_roots+0x15e/0x1f7 [btrfs] [55650.808017] [<ffffffff813eb9c7>] ? _raw_spin_lock+0xe/0x10 [55650.808017] [<ffffffffa0346e46>] btrfs_commit_transaction+0x411/0x882 [btrfs] [55650.808017] [<ffffffffa03432a4>] transaction_kthread+0xf2/0x1a4 [btrfs] [55650.808017] [<ffffffffa03431b2>] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs] [55650.808017] [<ffffffff8105966b>] kthread+0xb7/0xbf [55650.808017] [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67 [55650.808017] [<ffffffff813ebeac>] ret_from_fork+0x7c/0xb0 [55650.808017] [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67 [55650.808017] Code: 4c 89 ef 8d 70 ff e8 d4 fc ff ff 41 8b 45 34 41 39 45 30 7d 5c 31 f6 4c 89 ef e8 80 f6 ff ff 49 8b 7d 00 4c 89 f6 b9 00 04 00 00 <f3> a5 4c 89 ef 41 8b 45 30 8d 70 ff e8 a3 fc ff ff 41 8b 45 34 [55650.808017] RIP [<ffffffffa037cf45>] write_bitmap_entries+0x65/0xbb [btrfs] [55650.808017] RSP <ffff88007153bc30> [55650.815725] ---[ end trace 1c032e96b149ff86 ]--- Fix this by serializing both tasks in such a way that cache writeout doesn't wait for the trim/discard of free space entries to finish and doesn't miss any free space entry. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-12-02 18:35:09 -08:00
Filipe Manana	04216820fe	Btrfs: fix race between fs trimming and block group remove/allocation Our fs trim operation, which is completely transactionless (doesn't start or joins an existing transaction) consists of visiting all block groups and then for each one to iterate its free space entries and perform a discard operation against the space range represented by the free space entries. However before performing a discard, the corresponding free space entry is removed from the free space rbtree, and when the discard completes it is added back to the free space rbtree. If a block group remove operation happens while the discard is ongoing (or before it starts and after a free space entry is hidden), we end up not waiting for the discard to complete, remove the extent map that maps logical address to physical addresses and the corresponding chunk metadata from the the chunk and device trees. After that and before the discard completes, the current running transaction can finish and a new one start, allowing for new block groups that map to the same physical addresses to be allocated and written to. So fix this by keeping the extent map in memory until the discard completes so that the same physical addresses aren't reused before it completes. If the physical locations that are under a discard operation end up being used for a new metadata block group for example, and dirty metadata extents are written before the discard finishes (the VM might call writepages() of our btree inode's i_mapping for example, or an fsync log commit happens) we end up overwriting metadata with zeroes, which leads to errors from fsck like the following: checking extents Check tree block failed, want=833912832, have=0 Check tree block failed, want=833912832, have=0 Check tree block failed, want=833912832, have=0 Check tree block failed, want=833912832, have=0 Check tree block failed, want=833912832, have=0 read block failed check_tree_block owner ref check failed [833912832 16384] Errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots Check tree block failed, want=833912832, have=0 Check tree block failed, want=833912832, have=0 Check tree block failed, want=833912832, have=0 Check tree block failed, want=833912832, have=0 Check tree block failed, want=833912832, have=0 read block failed check_tree_block root 5 root dir 256 error root 5 inode 260 errors 2001, no inode item, link count wrong unresolved ref dir 256 index 0 namelen 8 name foobar_3 filetype 1 errors 6, no dir index, no inode ref root 5 inode 262 errors 2001, no inode item, link count wrong unresolved ref dir 256 index 0 namelen 8 name foobar_5 filetype 1 errors 6, no dir index, no inode ref root 5 inode 263 errors 2001, no inode item, link count wrong (...) Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-12-02 18:35:09 -08:00
Filipe Manana	2000552396	Btrfs: improve free space cache management and space allocation While under random IO, a block group's free space cache eventually reaches a state where it has a mix of extent entries and bitmap entries representing free space regions. As later free space regions are returned to the cache, some of them are merged with existing extent entries if they are contiguous with them. But others are not merged, because despite the existence of adjacent free space regions in the cache, the merging doesn't happen because the existing free space regions are represented in bitmap extents. Even when new free space regions are merged with existing extent entries (enlarging the free space range they represent), we create chances of having after an enlarged region that is contiguous with some other region represented in a bitmap entry. Both clustered and non-clustered space allocation work by iterating over our extent and bitmap entries and skipping any that represents a region smaller then the allocation request (and giving preference to extent entries before bitmap entries). By having a contiguous free space region that is represented by 2 (or more) entries (mix of extent and bitmap entries), we end up not satisfying an allocation request with a size larger than the size of any of the entries but no larger than the sum of their sizes. Making the caller assume we're under a ENOSPC condition or force it to allocate multiple smaller space regions (as we do for file data writes), which adds extra overhead and more chances of causing fragmentation due to the smaller regions being all spread apart from each other (more likely when under concurrency). For example, if we have the following in the cache: * extent entry representing free space range: [128Mb - 256Kb, 128Mb[ * bitmap entry covering the range [128Mb, 256Mb[, but only with the bits representing the range [128Mb, 128Mb + 768Kb[ set - that is, only that space in this 128Mb area is marked as free An allocation request for 1Mb, starting at offset not greater than 128Mb - 256Kb, would fail before, despite the existence of such contiguous free space area in the cache. The caller could only allocate up to 768Kb of space at once and later another 256Kb (or vice-versa). In between each smaller allocation request, another task working on a different file/inode might come in and take that space, preventing the former task of getting a contiguous 1Mb region of free space. Therefore this change implements the ability to move free space from bitmap entries into existing and new free space regions represented with extent entries. This is done when a space region is added to the cache. A test was added to the sanity tests that explains in detail the issue too. Some performance test results with compilebench on a 4 cores machine, with 32Gb of ram and using an HDD follow. Test: compilebench -D /mnt -i 30 -r 1000 --makej Before this change: intial create total runs 30 avg 69.02 MB/s (user 0.28s sys 0.57s) compile total runs 30 avg 314.96 MB/s (user 0.12s sys 0.25s) read compiled tree total runs 3 avg 27.14 MB/s (user 1.52s sys 0.90s) delete compiled tree total runs 30 avg 3.14 seconds (user 0.15s sys 0.66s) After this change: intial create total runs 30 avg 68.37 MB/s (user 0.29s sys 0.55s) compile total runs 30 avg 382.83 MB/s (user 0.12s sys 0.24s) read compiled tree total runs 3 avg 27.82 MB/s (user 1.45s sys 0.97s) delete compiled tree total runs 30 avg 3.18 seconds (user 0.17s sys 0.65s) Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-09-17 13:38:13 -07:00
David Sterba	ed6078f703	btrfs: use DIV_ROUND_UP instead of open-coded variants The form (value + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT is equivalent to (value + PAGE_CACHE_SIZE - 1) / PAGE_CACHE_SIZE The rest is a simple subsitution, no difference in the generated assembly code. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-09-17 13:37:17 -07:00
David Sterba	57cdc8db21	btrfs: cleanup ino cache members of btrfs_root The naming is confusing, generic yet used for a specific cache. Add a prefix 'ino_' or rename appropriately. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>	2014-09-17 13:37:09 -07:00
Miao Xie	e570fd27f2	Btrfs: fix broken free space cache after the system crashed When we mounted the filesystem after the crash, we got the following message: BTRFS error (device xxx): block group xxxx has wrong amount of free space BTRFS error (device xxx): failed to load free space cache for block group xxx It is because we didn't update the metadata of the allocated space (in extent tree) until the file data was written into the disk. During this time, there was no information about the allocated spaces in either the extent tree nor the free space cache. when we wrote out the free space cache at this time (commit transaction), those spaces were lost. In fact, only the free space that is used to store the file data had this problem, the others didn't because the metadata of them is updated in the same transaction context. There are many methods which can fix the above problem - track the allocated space, and write it out when we write out the free space cache - account the size of the allocated space that is used to store the file data, if the size is not zero, don't write out the free space cache. The first one is complex and may make the performance drop down. This patch chose the second method, we use a per-block-group variant to account the size of that allocated space. Besides that, we also introduce a per-block-group read-write semaphore to avoid the race between the allocation and the free space cache write out. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-19 14:20:54 -07:00
Miao Xie	5349d6c3ff	Btrfs: make free space cache write out functions more readable This patch makes the free space cache write out functions more readable, and beisdes that, it also reduces the stack space that the function -- __btrfs_write_out_cache uses from 194bytes to 144bytes. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-19 14:20:54 -07:00
Chris Mason	d4452bc526	Btrfs: break up __btrfs_write_out_cache to cut down stack usage __btrfs_write_out_cache was one of our stack pigs. This breaks it up into helper functions and slims it down to 194 bytes. Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:55 -07:00
Miao Xie	32d6b47fe6	Btrfs: output warning instead of error when loading free space cache failed If we fail to load a free space cache, we can rebuild it from the extent tree, so it is not a serious error, we should not output a error message that would make the users uncomfortable. This patch uses warning message instead of it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-09 17:20:33 -07:00
Frank Holton	efe120a067	Btrfs: convert printk to btrfs_ and fix BTRFS prefix Convert all applicable cases of printk and pr_* to the btrfs_* macros. Fix all uses of the BTRFS prefix. Signed-off-by: Frank Holton <fholton@gmail.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-01-28 13:20:05 -08:00
Valentina Giusti	f0265bb409	btrfs: remove unused variable from setup_cluster_no_bitmap The variable window_start in setup_cluster_no_bitmap is not used since commit `1bb91902dc` (Btrfs: revamp clustered allocation logic) Signed-off-by: Valentina Giusti <valentina.giusti@microon.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-01-28 13:19:33 -08:00
Dulshani Gunawardhana	678712545b	btrfs: Fix checkpatch.pl warning of spacing issues Fix spacing issues detected via checkpatch.pl in accordance with the kernel style guidelines. Signed-off-by: Dulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:12:31 -05:00
Josef Bacik	0ef8b72607	Btrfs: return an error from btrfs_wait_ordered_range I noticed that if the free space cache has an error writing out it's data it won't actually error out, it will just carry on. This is because it doesn't check the return value of btrfs_wait_ordered_range, which didn't actually return anything. So fix this in order to keep us from making free space cache look valid when it really isnt. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:07:35 -05:00
Filipe David Borba Manana	7451432394	Btrfs: remove path arg from btrfs_truncate_free_space_cache Not used for anything, and removing it avoids caller's need to allocate a path structure. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 21:51:33 -05:00
Filipe David Borba Manana	53645a91f4	Btrfs: remove duplicated ino cache's inode lookup We're doing a unnecessary extra lookup of the ino cache's inode when we already have it (and holding a reference) during the process of saving the ino cache contents to disk. Therefore remove this extra lookup. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 21:51:24 -05:00
Linus Torvalds	0fbf2cc983	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "These are mostly bug fixes and a two small performance fixes. The most important of the bunch are Josef's fix for a snapshotting regression and Mark's update to fix compile problems on arm" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits) Btrfs: create the uuid tree on remount rw btrfs: change extent-same to copy entire argument struct Btrfs: dir_inode_operations should use btrfs_update_time also btrfs: Add btrfs: prefix to kernel log output btrfs: refuse to remount read-write after abort Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0 Btrfs: don't leak transaction in btrfs_sync_file() Btrfs: add the missing mutex unlock in write_all_supers() Btrfs: iput inode on allocation failure Btrfs: remove space_info->reservation_progress Btrfs: kill delay_iput arg to the wait_ordered functions Btrfs: fix worst case calculator for space usage Revert "Btrfs: rework the overcommit logic to be based on the total size" Btrfs: improve replacing nocow extents Btrfs: drop dir i_size when adding new names on replay Btrfs: replay dir_index items before other items Btrfs: check roots last log commit when checking if an inode has been logged Btrfs: actually log directory we are fsync()'ing Btrfs: actually limit the size of delalloc range Btrfs: allocate the free space by the existed max extent size when ENOSPC ...	2013-09-22 14:58:49 -07:00
Miao Xie	a482039889	Btrfs: allocate the free space by the existed max extent size when ENOSPC By the current code, if the requested size is very large, and all the extents in the free space cache are small, we will waste lots of the cpu time to cut the requested size in half and search the cache again and again until it gets down to the size the allocator can return. In fact, we can know the max extent size in the cache after the first search, so we needn't cut the size in half repeatedly, and just use the max extent size directly. This way can save lots of cpu time and make the performance grow up when there are only fragments in the free space cache. According to my test, if there are only 4KB free space extents in the fs, and the total size of those extents are 256MB, we can reduce the execute time of the following test from 5.4s to 1.4s. dd if=/dev/zero of=<testfile> bs=1MB count=1 oflag=sync Changelog v2 -> v3: - fix the problem that we skip the block group with the space which is less than we need. Changelog v1 -> v2: - address the problem that we return a wrong start position when searching the free space in a bitmap. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-21 11:05:23 -04:00
Linus Torvalds	ac4de9543a	Merge branch 'akpm' (patches from Andrew Morton) Merge more patches from Andrew Morton: "The rest of MM. Plus one misc cleanup" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits) mm/Kconfig: add MMU dependency for MIGRATION. kernel: replace strict_strto() with kstrto() mm, thp: count thp_fault_fallback anytime thp fault fails thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page() thp: do_huge_pmd_anonymous_page() cleanup thp: move maybe_pmd_mkwrite() out of mk_huge_pmd() mm: cleanup add_to_page_cache_locked() thp: account anon transparent huge pages into NR_ANON_PAGES truncate: drop 'oldsize' truncate_pagecache() parameter mm: make lru_add_drain_all() selective memcg: document cgroup dirty/writeback memory statistics memcg: add per cgroup writeback pages accounting memcg: check for proper lock held in mem_cgroup_update_page_stat memcg: remove MEMCG_NR_FILE_MAPPED memcg: reduce function dereference memcg: avoid overflow caused by PAGE_ALIGN memcg: rename RESOURCE_MAX to RES_COUNTER_MAX memcg: correct RESOURCE_MAX to ULLONG_MAX mm: memcg: do not trap chargers with full callstack on OOM mm: memcg: rework and document OOM waiting and wakeup ...	2013-09-12 15:44:27 -07:00
Kirill A. Shutemov	7caef26767	truncate: drop 'oldsize' truncate_pagecache() parameter truncate_pagecache() doesn't care about old size since commit `cedabed49b` ("vfs: Fix vmtruncate() regression"). Let's drop it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-09-12 15:38:02 -07:00
Josef Bacik	b12d6869f6	Btrfs: convert all bug_ons in free-space-cache.c All of these are logic checks to make sure we're not breaking anything, so convert them over to ASSERT(). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-01 08:16:33 -04:00
Geert Uytterhoeven	c1c9ff7c94	Btrfs: Remove superfluous casts from u64 to unsigned long long u64 is "unsigned long long" on all architectures now, so there's no need to cast it when formatting it using the "ll" length modifier. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-01 08:16:08 -04:00
Josef Bacik	dc11dd5d70	Btrfs: separate out tests into their own directory The plan is to have a bunch of unit tests that run when btrfs is loaded when you build with the appropriate config option. My ultimate goal is to have a test for every non-static function we have, but at first I'm going to focus on the things that cause us the most problems. To start out with this just adds a tests/ directory and moves the existing free space cache tests into that directory and sets up all of the infrastructure. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-01 08:15:38 -04:00
Josef Bacik	00361589d2	Btrfs: avoid starting a transaction in the write path I noticed while looking at a deadlock that we are always starting a transaction in cow_file_range(). This isn't really needed since we only need a transaction if we are doing an inline extent, or if the allocator needs to allocate a chunk. So push down all the transaction start stuff to be closer to where we actually need a transaction in all of these cases. This will hopefully reduce our write latency when we are committing often. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-09-01 08:05:05 -04:00
Linus Torvalds	e3a0dd98e1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs update from Chris Mason: "These are the usual mixture of bugs, cleanups and performance fixes. Miao has some really nice tuning of our crc code as well as our transaction commits. Josef is peeling off more and more problems related to early enospc, and has a number of important bug fixes in here too" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (81 commits) Btrfs: wait ordered range before doing direct io Btrfs: only do the tree_mod_log_free_eb if this is our last ref Btrfs: hold the tree mod lock in __tree_mod_log_rewind Btrfs: make backref walking code handle skinny metadata Btrfs: fix crash regarding to ulist_add_merge Btrfs: fix several potential problems in copy_nocow_pages_for_inode Btrfs: cleanup the code of copy_nocow_pages_for_inode() Btrfs: fix oops when recovering the file data by scrub function Btrfs: make the chunk allocator completely tree lockless Btrfs: cleanup orphaned root orphan item Btrfs: fix wrong mirror number tuning Btrfs: cleanup redundant code in btrfs_submit_direct() Btrfs: remove btrfs_sector_sum structure Btrfs: check if we can nocow if we don't have data space Btrfs: stop using try_to_writeback_inodes_sb_nr to flush delalloc Btrfs: use a percpu to keep track of possibly pinned bytes Btrfs: check for actual acls rather than just xattrs when caching no acl Btrfs: move btrfs_truncate_page to btrfs_cont_expand instead of btrfs_truncate Btrfs: optimize reada_for_balance Btrfs: optimize read_block_for_search ...	2013-07-09 12:33:09 -07:00
Wei Yongjun	4b286cd1f5	Btrfs: return error code in btrfs_check_trunc_cache_free_space() Fix to return error code instead always return 0 from function btrfs_check_trunc_cache_free_space(). Introduced by commit `7b61cd9224` (Btrfs: don't use global block reservation for inode cache truncation) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-06-14 11:29:56 -04:00
David Sterba	e6d2960582	btrfs: move ifdef around sanity checks out of init_btrfs_fs Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-06-14 11:29:18 -04:00
David Sterba	905d0f564e	btrfs: add prefix to sanity tests messages And change the message level to KERN_INFO. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-06-14 11:29:17 -04:00
Masanari Iida	8b513d0cf6	treewide: Fix typo in printk Correct spelling typo in various part of drivers Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-05-28 12:02:13 +02:00
Miao Xie	7b61cd9224	Btrfs: don't use global block reservation for inode cache truncation It is very likely that there are lots of subvolumes/snapshots in the filesystem, so if we use global block reservation to do inode cache truncation, we may hog all the free space that is reserved in global rsv. So it is better that we do the free space reservation for inode cache truncation by ourselves. Cc: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-17 21:40:22 -04:00
Josef Bacik	73e1e61fb8	Btrfs: remove warn on in free space cache writeout This catches block groups that are too large to properly cache. We deal with this case fine, so the warning just confuses users. Remove the warning. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-17 21:40:13 -04:00
Eric Sandeen	48a3b6366f	btrfs: make static code static & remove dead code Big patch, but all it does is add statics to functions which are in fact static, then remove the associated dead-code fallout. removed functions: btrfs_iref_to_path() __btrfs_lookup_delayed_deletion_item() __btrfs_search_delayed_insertion_item() __btrfs_search_delayed_deletion_item() find_eb_for_page() btrfs_find_block_group() range_straddles_pages() extent_range_uptodate() btrfs_file_extent_length() btrfs_scrub_cancel_devid() btrfs_start_transaction_lflush() btrfs_print_tree() is left because it is used for debugging. btrfs_start_transaction_lflush() and btrfs_reada_detach() are left for symmetry. ulist.c functions are left, another patch will take care of those. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:55:23 -04:00
Josef Bacik	b50c6e250e	Btrfs: deal with free space cache errors while replaying log So everybody who got hit by my fsync bug will still continue to hit this BUG_ON() in the free space cache, which is pretty heavy handed. So I took a file system that had this bug and fixed up all the BUG_ON()'s and leaks that popped up when I tried to mount a broken file system like this. With this patch we just fail to mount instead of panicing. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:55:20 -04:00
Simon Kirby	c2cf52eb71	Btrfs: Include the device in most error printk()s With more than one btrfs volume mounted, it can be very difficult to find out which volume is hitting an error. btrfs_error() will print this, but it is currently rigged as more of a fatal error handler, while many of the printk()s are currently for debugging and yet-unhandled cases. This patch just changes the functions where the device information is already available. Some cases remain where the root or fs_info is not passed to the function emitting the error. This may introduce some confusion with volumes backed by multiple devices emitting errors referring to the primary device in the set instead of the one on which the error occurred. Use btrfs_printk(fs_info, format, ...) rather than writing the device string every time, and introduce macro wrappers ala XFS for brevity. Since the function already cannot be used for continuations, print a newline as part of the btrfs_printk() message rather than at each caller. Signed-off-by: Simon Kirby <sim@hostway.ca> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:54:23 -04:00
Liu Bo	b0496686ba	Btrfs: cleanup unused arguments of btrfs_csum_data Argument 'root' is no more used in btrfs_csum_data(). Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:54:14 -04:00
Josef Bacik	74255aa07d	Btrfs: add some free space cache tests We keep hitting bugs in the tree log replay because btrfs_remove_free_space doesn't account for some corner case. So add a bunch of tests to try and fully test btrfs_remove_free_space since the only time it is called is during tree log replay. These tests all finish successfully, so as we find more of these bugs we need to add to these tests to make sure we don't regress in fixing things. I've hidden the tests behind a Kconfig option, but they take no time to run so all btrfs developers should have this turned on all the time. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-05-06 15:52:54 -04:00
Chris Mason	e942f883bc	Merge branch 'raid56-experimental' into for-linus-3.9 Signed-off-by: Chris Mason <chris.mason@fusionio.com> Conflicts: fs/btrfs/ctree.h fs/btrfs/extent-tree.c fs/btrfs/inode.c fs/btrfs/volumes.c	2013-02-20 14:06:05 -05:00
Josef Bacik	dde5740fdd	Btrfs: relax the block group size limit for bitmaps Dave pointed out that xfstests 273 will tell you that it failed to load the space cache for a block group when it remounts. This is because we run out of space writing out the block group cache. This is ok and is working as it should, but let's try to be a bit nicer. This happens because the block group was 100mb, but bitmap entries cover 128mb, so we were only getting extent entries for this block group, which ended up being too many to fit in the free space cache. So relax the bitmap size requirements to block groups that are at least half the size a bitmap will cover or larger, that way we can still keep the amount of space used in the free space cache low enough to be able to write it out. With this patch I no longer fail to write out the free space cache. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-02-20 12:59:55 -05:00
Chris Mason	0e4e026366	Merge branch 'for-linus' into raid56-experimental Conflicts: fs/btrfs/volumes.c Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-02-05 10:04:03 -05:00
David Woodhouse	53b381b3ab	Btrfs: RAID5 and RAID6 This builds on David Woodhouse's original Btrfs raid5/6 implementation. The code has changed quite a bit, blame Chris Mason for any bugs. Read/modify/write is done after the higher levels of the filesystem have prepared a given bio. This means the higher layers are not responsible for building full stripes, and they don't need to query for the topology of the extents that may get allocated during delayed allocation runs. It also means different files can easily share the same stripe. But, it does expose us to incorrect parity if we crash or lose power while doing a read/modify/write cycle. This will be addressed in a later commit. Scrub is unable to repair crc errors on raid5/6 chunks. Discard does not work on raid5/6 (yet) The stripe size is fixed at 64KiB per disk. This will be tunable in a later commit. Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-02-01 14:24:23 -05:00
Josef Bacik	b0175117b9	Btrfs: fix panic when recovering tree log A user reported a BUG_ON(ret) that occured during tree log replay. Ret was -EAGAIN, so what I think happened is that we removed an extent that covered a bitmap entry and an extent entry. We remove the part from the bitmap and return -EAGAIN and then search for the next piece we want to remove, which happens to be an entire extent entry, so we just free the sucker and return. The problem is ret is still set to -EAGAIN so we trip the BUG_ON(). The user used btrfs-zero-log so I'm not 100% sure this is what happened so I've added a WARN_ON() to catch the other possibility. Thanks, Reported-by: Jan Steffens <jan.steffens@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2013-01-24 12:49:49 -05:00
Wang Sheng-Hui	960097622d	Btrfs: use ctl->unit for free space calculation instead of block_group->sectorsize We should use ctl->unit for free space calculation instead of block_group->sectorsize even though for free space use_bitmap or free space cluster we only have sectorsize assigned to ctl->unit currently. Also, we can keep it consisten in code style. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-12-16 20:46:17 -05:00
Wang Sheng-Hui	071401258a	Btrfs: do not warn_on io_ctl->cur in io_ctl_map_page io_ctl_map_page is called by many functions in free-space-cache. In most scenarios, the ->cur is not null, e.g. io_ctl_add_entry. I think we'd better remove the warn_on here. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-12-16 20:46:06 -05:00
Miao Xie	de6c4115a2	Btrfs: fix unnecessary while loop when search the free space, cache When we find a bitmap free space entry, we may check the previous extent entry covers the offset or not. But if we find this entry is also a bitmap entry, we will continue to check the previous entry of the current one by a while loop. It is unnecessary because it is impossible that the extent entry which is in front of a bitmap entry can cover the offset of the entry after that bitmap entry. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2012-12-11 13:31:33 -05:00
Josef Bacik	e6138876ad	Btrfs: cache extent state when writing out dirty metadata pages Everytime we write out dirty pages we search for an offset in the tree, convert the bits in the state, and then when we wait we search for the offset again and clear the bits. So for every dirty range in the io tree we are doing 4 rb searches, which is suboptimal. With this patch we are only doing 2 searches for every cycle (modulo weird things happening). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-10-09 09:15:41 -04:00
Wei Yongjun	ebb3dad435	Btrfs: using for_each_set_bit_from to simplify the code Using for_each_set_bit_from() to simplify the code. spatch with a semantic match is used to found this. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>	2012-10-04 09:39:55 -04:00
Liu Bo	f6175efab1	Btrfs: do not count in readonly bytes If a block group is ro, do not count its entries in when we dump space info. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-23 16:28:03 -04:00
Linus Torvalds	5eecb9cc90	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "I held off on my rc5 pull because I hit an oops during log recovery after a crash. I wanted to make sure it wasn't a regression because we have some logging fixes in here. It turns out that a commit during the merge window just made it much more likely to trigger directory logging instead of full commits, which exposed an old bug. The new backref walking code got some additional fixes. This should be the final set of them. Josef fixed up a corner where our O_DIRECT writes and buffered reads could expose old file contents (not stale, just not the most recent). He and Liu Bo fixed crashes during tree log recover as well. Ilya fixed errors while we resume disk balancing operations on readonly mounts." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: run delayed directory updates during log replay Btrfs: hold a ref on the inode during writepages Btrfs: fix tree log remove space corner case Btrfs: fix wrong check during log recovery Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS Btrfs: resume balance on rw (re)mounts properly Btrfs: restore restriper state on all mounts Btrfs: fix dio write vs buffered read race Btrfs: don't count I/O statistic read errors for missing devices Btrfs: resolve tree mod log locking issue in btrfs_next_leaf Btrfs: fix tree mod log rewind of ADD operations Btrfs: leave critical region in btrfs_find_all_roots as soon as possible Btrfs: always put insert_ptr modifications into the tree mod log Btrfs: fix tree mod log for root replacements at leaf level Btrfs: support root level changes in __resolve_indirect_ref Btrfs: avoid waiting for delayed refs when we must not	2012-07-05 13:06:25 -07:00
Josef Bacik	bdb7d303b3	Btrfs: fix tree log remove space corner case The tree log stuff can have allocated space that we end up having split across a bitmap and a real extent. The free space code does not deal with this, it assumes that if it finds an extent or bitmap entry that the entire range must fall within the entry it finds. This isn't necessarily the case, so rework the remove function so it can handle this case properly. This fixed two panics the user hit, first in the case where the space was initially in a bitmap and then in an extent entry, and then the reverse case. Thanks, Reported-and-tested-by: Shaun Reich <sreich@kde.org> Signed-off-by: Josef Bacik <jbacik@fusionio.com>	2012-07-02 15:39:18 -04:00
Linus Torvalds	1193755ac6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs changes from Al Viro. "A lot of misc stuff. The obvious groups: * Miklos' atomic_open series; kills the damn abuse of ->d_revalidate() by NFS, which was the major stumbling block for all work in that area. * ripping security_file_mmap() and dealing with deadlocks in the area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in general. * ->encode_fh() switched to saner API; insane fake dentry in mm/cleancache.c gone. * assorted annotations in fs (endianness, __user) * parts of Artem's ->s_dirty work (jff2 and reiserfs parts) * ->update_time() work from Josef. * other bits and pieces all over the place. Normally it would've been in two or three pull requests, but signal.git stuff had eaten a lot of time during this cycle ;-/" Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the 'truncate_range' inode method was removed by the VM changes, the VFS update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due to sparse fix added twice, with other changes nearby). * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits) nfs: don't open in ->d_revalidate vfs: retry last component if opening stale dentry vfs: nameidata_to_filp(): don't throw away file on error vfs: nameidata_to_filp(): inline __dentry_open() vfs: do_dentry_open(): don't put filp vfs: split __dentry_open() vfs: do_last() common post lookup vfs: do_last(): add audit_inode before open vfs: do_last(): only return EISDIR for O_CREAT vfs: do_last(): check LOOKUP_DIRECTORY vfs: do_last(): make ENOENT exit RCU safe vfs: make follow_link check RCU safe vfs: do_last(): use inode variable vfs: do_last(): inline walk_component() vfs: do_last(): make exit RCU safe vfs: split do_lookup() Btrfs: move over to use ->update_time fs: introduce inode operation ->update_time reiserfs: get rid of resierfs_sync_super reiserfs: mark the superblock as dirty a bit later ...	2012-06-01 10:34:35 -07:00
Josef Bacik	cd023e7b17	Btrfs: merge contigous regions when loading free space cache When we write out the free space cache we will write out everything that is in our in memory tree, and then we will just walk the pinned extents tree and write anything we see there. The problem with this is that during normal operations the pinned extents will be merged back into the free space tree normally, and then we can allocate space from the merged areas and commit them to the tree log. If we crash and replay the tree log we will crash again because the tree log will try to free up space from what looks like 2 seperate but contiguous entries, since one entry is from the original free space cache and the other was a pinned extent that was merged back. To fix this we just need to walk the free space tree after we load it and merge contiguous entries back together. This will keep the tree log stuff from breaking and it will make the allocator behave more nicely. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:36 -04:00
Josef Bacik	5fd0204355	Btrfs: finish ordered extents in their own thread We noticed that the ordered extent completion doesn't really rely on having a page and that it could be done independantly of ending the writeback on a page. This patch makes us not do the threaded endio stuff for normal buffered writes and direct writes so we can end page writeback as soon as possible (in irq context) and only start threads to do the ordered work when it is actually done. Compression needs to be reworked some to take advantage of this as well, but atm it has to do a find_get_page in its endio handler so it must be done in its own thread. This makes direct writes quite a bit faster. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-05-30 10:23:33 -04:00
Al Viro	528c032764	btrfs: trivial endianness annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-05-29 23:28:35 -04:00
Linus Torvalds	659e45d8a0	Merge branch 'for-linus-min' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull the minimal btrfs branch from Chris Mason: "We have a use-after-free in there, along with errors when mount -o discard is enabled, and a BUG_ON(we should compile with UP more often)." * 'for-linus-min' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: use commit root when loading free space cache Btrfs: fix use-after-free in __btrfs_end_transaction Btrfs: check return value of bio_alloc() properly Btrfs: remove lock assert from get_restripe_target() Btrfs: fix eof while discarding extents Btrfs: fix uninit variable in repair_eb_io_failure Revert "Btrfs: increase the global block reserve estimates"	2012-04-13 19:41:27 -07:00
Josef Bacik	d53ba47484	Btrfs: use commit root when loading free space cache A user reported that booting his box up with btrfs root on 3.4 was way slower than on 3.3 because I removed the ideal caching code. It turns out that we don't load the free space cache if we're in a commit for deadlock reasons, but since we're reading the cache and it hasn't changed yet we are safe reading the inode and free space item from the commit root, so do that and remove all of the deadlock checks so we don't unnecessarily skip loading the free space cache. The user reported this fixed the slowness. Thanks, Tested-by: Calvin Walton <calvin.walton@kepstin.ca> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-04-12 20:54:01 -04:00
Linus Torvalds	9613bebb22	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes and features from Chris Mason: "We've merged in the error handling patches from SuSE. These are already shipping in the sles kernel, and they give btrfs the ability to abort transactions and go readonly on errors. It involves a lot of churn as they clarify BUG_ONs, and remove the ones we now properly deal with. Josef reworked the way our metadata interacts with the page cache. page->private now points to the btrfs extent_buffer object, which makes everything faster. He changed it so we write an whole extent buffer at a time instead of allowing individual pages to go down,, which will be important for the raid5/6 code (for the 3.5 merge window ;) Josef also made us more aggressive about dropping pages for metadata blocks that were freed due to COW. Overall, our metadata caching is much faster now. We've integrated my patch for metadata bigger than the page size. This allows metadata blocks up to 64KB in size. In practice 16K and 32K seem to work best. For workloads with lots of metadata, this cuts down the size of the extent allocation tree dramatically and fragments much less. Scrub was updated to support the larger block sizes, which ended up being a fairly large change (thanks Stefan Behrens). We also have an assortment of fixes and updates, especially to the balancing code (Ilya Dryomov), the back ref walker (Jan Schmidt) and the defragging code (Liu Bo)." Fixed up trivial conflicts in fs/btrfs/scrub.c that were just due to removal of the second argument to k[un]map_atomic() in commit `7ac687d9e0`. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (75 commits) Btrfs: update the checks for mixed block groups with big metadata blocks Btrfs: update to the right index of defragment Btrfs: do not bother to defrag an extent if it is a big real extent Btrfs: add a check to decide if we should defrag the range Btrfs: fix recursive defragment with autodefrag option Btrfs: fix the mismatch of page->mapping Btrfs: fix race between direct io and autodefrag Btrfs: fix deadlock during allocating chunks Btrfs: show useful info in space reservation tracepoint Btrfs: don't use crc items bigger than 4KB Btrfs: flush out and clean up any block device pages during mount btrfs: disallow unequal data/metadata blocksize for mixed block groups Btrfs: enhance superblock sanity checks Btrfs: change scrub to support big blocks Btrfs: minor cleanup in scrub Btrfs: introduce common define for max number of mirrors Btrfs: fix infinite loop in btrfs_shrink_device() Btrfs: fix memory leak in resolver code Btrfs: allow dup for data chunks in mixed mode Btrfs: validate target profiles only if we are going to use them ...	2012-03-30 12:44:29 -07:00
Jeff Mahoney	79787eaab4	btrfs: replace many BUG_ONs with proper error handling btrfs currently handles most errors with BUG_ON. This patch is a work-in- progress but aims to handle most errors other than internal logic errors and ENOMEM more gracefully. This iteration prevents most crashes but can run into lockups with the page lock on occasion when the timing "works out." Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 11:52:54 +01:00
Jeff Mahoney	d0082371cf	btrfs: drop gfp_t from lock_extent lock_extent and unlock_extent are always called with GFP_NOFS, drop the argument and use GFP_NOFS consistently. Signed-off-by: Jeff Mahoney <jeffm@suse.com>	2012-03-22 01:45:35 +01:00
Linus Torvalds	69a7aebcf0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "It's indeed trivial -- mostly documentation updates and a bunch of typo fixes from Masanari. There are also several linux/version.h include removals from Jesper." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits) kcore: fix spelling in read_kcore() comment constify struct pci_dev * in obvious cases Revert "char: Fix typo in viotape.c" init: fix wording error in mm_init comment usb: gadget: Kconfig: fix typo for 'different' Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c" writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header writeback: fix typo in the writeback_control comment Documentation: Fix multiple typo in Documentation tpm_tis: fix tis_lock with respect to RCU Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c" Doc: Update numastat.txt qla4xxx: Add missing spaces to error messages compiler.h: Fix typo security: struct security_operations kerneldoc fix Documentation: broken URL in libata.tmpl Documentation: broken URL in filesystems.tmpl mtd: simplify return logic in do_map_probe() mm: fix comment typo of truncate_inode_pages_range power: bq27x00: Fix typos in comment ...	2012-03-20 21:12:50 -07:00
Linus Torvalds	855a85f704	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Quoth Chris: "This is later than I wanted because I got backed up running through btrfs bugs from the Oracle QA teams. But they are all bug fixes that we've queued and tested since rc1. Nothing in particular stands out, this just reflects bug fixing and QA done in parallel by all the btrfs developers. The most user visible of these is: Btrfs: clear the extent uptodate bits during parent transid failures Because that helps deal with out of date drives (say an iscsi disk that has gone away and come back). The old code wasn't always properly retrying the other mirror for this type of failure." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits) Btrfs: fix compiler warnings on 32 bit systems Btrfs: increase the global block reserve estimates Btrfs: clear the extent uptodate bits during parent transid failures Btrfs: add extra sanity checks on the path names in btrfs_mksubvol Btrfs: make sure we update latest_bdev Btrfs: improve error handling for btrfs_insert_dir_item callers Btrfs: be less strict on finding next node in clear_extent_bit Btrfs: fix a bug on overcommit stuff Btrfs: kick out redundant stuff in convert_extent_bit Btrfs: skip states when they does not contain bits to clear Btrfs: check return value of lookup_extent_mapping() correctly Btrfs: fix deadlock on page lock when doing auto-defragment Btrfs: fix return value check of extent_io_ops btrfs: honor umask when creating subvol root btrfs: silence warning in raid array setup btrfs: fix structs where bitfields and spinlock/atomic share 8B word btrfs: delalloc for page dirtied out-of-band in fixup worker Btrfs: fix memory leak in load_free_space_cache() btrfs: don't check DUP chunks twice Btrfs: fix trim 0 bytes after a device delete ...	2012-02-24 09:02:53 -08:00
Tsutomu Itoh	a7e221e900	Btrfs: fix memory leak in load_free_space_cache() load_free_space_cache() has forgotten to free path. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>	2012-02-15 16:40:24 +01:00
Masanari Iida	934e7d44b8	btrfs: Fix typo in free-space-cache.c Correct spelling "cace" to "cache" in fs/btrfs/free-space-cache.c Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-02-09 23:09:36 +01:00
Linus Torvalds	67d2433ee7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix reservations in btrfs_page_mkwrite Btrfs: advance window_start if we're using a bitmap btrfs: mask out gfp flags in releasepage Btrfs: fix enospc error caused by wrong checks of the chunk Btrfs: do not defrag a file partially Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c Btrfs: use cluster->window_start when allocating from a cluster bitmap Btrfs: Check for NULL page in extent_range_uptodate btrfs: Fix busyloops in transaction waiting code Btrfs: make sure a bitmap has enough bytes Btrfs: fix uninit warning in backref.c	2012-01-28 17:00:19 -08:00
Josef Bacik	9b23062840	Btrfs: advance window_start if we're using a bitmap If we span a long area in a bitmap we could end up taking a lot of time searching to the next free area if we're searching from the original window_start, so advance window_start in order to make sure we don't do any superficial searching. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:12 -05:00
Josef Bacik	0b4a9d248f	Btrfs: use cluster->window_start when allocating from a cluster bitmap We specifically set window_start in the cluster struct to indicate where the cluster starts in a bitmap, but we've been using min_start to indicate where we're searching from. This is usually the start of the blockgroup, so essentially means we're constantly searching from the start of any bitmap we find, which completely negates all the trouble we go to in order to setup a cluster. So start using window_start to make sure we actually use the area we found. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:11 -05:00
Josef Bacik	357b9784b7	Btrfs: make sure a bitmap has enough bytes We have only been checking for min_bytes available in bitmap entries, but we won't successfully setup a bitmap cluster unless it has at least bytes in the bitmap, so in the common case min_bytes is 4k and we want something like 2MB, so if there are a bunch of bitmap entries with less than 2mb's in them, we'll search all them anyway, which is suboptimal. Fix this check. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-26 15:01:11 -05:00
Linus Torvalds	f9156c7288	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits) Btrfs: use larger system chunks Btrfs: add a delalloc mutex to inodes for delalloc reservations Btrfs: space leak tracepoints Btrfs: protect orphan block rsv with spin_lock Btrfs: add allocator tracepoints Btrfs: don't call btrfs_throttle in file write Btrfs: release space on error in page_mkwrite Btrfs: fix btrfsck error 400 when truncating a compressed Btrfs: do not use btrfs_end_transaction_throttle everywhere Btrfs: add balance progress reporting Btrfs: allow for resuming restriper after it was paused Btrfs: allow for canceling restriper Btrfs: allow for pausing restriper Btrfs: add skip_balance mount option Btrfs: recover balance on mount Btrfs: save balance parameters to disk Btrfs: soft profile changing mode (aka soft convert) Btrfs: implement online profile changing Btrfs: do not reduce profile in do_chunk_alloc() Btrfs: virtual address space subset filter ... Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new mnt_drop_write_file() helper.	2012-01-17 15:49:54 -08:00
Josef Bacik	3f7de037fb	Btrfs: add allocator tracepoints I used these tracepoints when figuring out what the cluster stuff was doing, so add them to mainline in case we need to profile this stuff again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2012-01-16 15:29:42 -05:00
Chris Mason	d756bd2d93	Merge branch 'for-chris' of git://repo.or.cz/linux-btrfs-devel into integration Conflicts: fs/btrfs/volumes.c Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-16 15:26:17 -05:00
Li Zefan	7fe1e64150	Btrfs: rewrite btrfs_trim_block_group() There are various bugs in block group trimming: - It may trim from offset smaller than user-specified offset. - It may trim beyond user-specified range. - It may leak free space for extents smaller than specified minlen. - It may truncate the last trimmed extent thus leak free space. - With mixed extents+bitmaps, some extents may not be trimmed. - With mixed extents+bitmaps, some bitmaps may not be trimmed (even none will be trimmed). Even for those trimmed, not all the free space in the bitmaps will be trimmed. I rewrite btrfs_trim_block_group() and break it into two functions. One is to trim extents only, and the other is to trim bitmaps only. Before patching: # fstrim -v /mnt/ /mnt/: 1496465408 bytes were trimmed After patching: # fstrim -v /mnt/ /mnt/: 2193768448 bytes were trimmed And this matches the total free space: # btrfs fi df /mnt Data: total=3.58GB, used=1.79GB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=205.12MB, used=97.14MB Metadata: total=8.00MB, used=0.00 Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2012-01-11 10:26:48 +08:00
Li Zefan	706efc6630	Btrfs: check the return value of io_ctl_init() It can return -ENOMEM. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2012-01-11 10:26:36 +08:00
Li Zefan	a1ee5a4581	Btrfs: avoid possible NULL deref in io_ctl_drop_pages() If we run into some failure path in io_ctl_prepare_pages(), io_ctl->pages[] array may have some NULL pointers. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2012-01-11 10:26:34 +08:00
Li Zefan	db804f23a7	Btrfs: add pinned extents to on-disk free space cache correctly I got this while running xfstests: [24256.836098] block group 317849600 has an wrong amount of free space [24256.836100] btrfs: failed to load free space cache for block group 317849600 We should clamp the extent returned by find_first_extent_bit(), so the start of the extent won't smaller than the start of the block group. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>	2012-01-11 10:26:31 +08:00
Linus Torvalds	98793265b4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits) Kconfig: acpi: Fix typo in comment. misc latin1 to utf8 conversions devres: Fix a typo in devm_kfree comment btrfs: free-space-cache.c: remove extra semicolon. fat: Spelling s/obsolate/obsolete/g SCSI, pmcraid: Fix spelling error in a pmcraid_err() call tools/power turbostat: update fields in manpage mac80211: drop spelling fix types.h: fix comment spelling for 'architectures' typo fixes: aera -> area, exntension -> extension devices.txt: Fix typo of 'VMware'. sis900: Fix enum typo 'sis900_rx_bufer_status' decompress_bunzip2: remove invalid vi modeline treewide: Fix comment and string typo 'bufer' hyper-v: Update MAINTAINERS treewide: Fix typos in various parts of the kernel, and fix some comments. clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR gpio: Kconfig: drop unknown symbol 'CS5535_GPIO' leds: Kconfig: Fix typo 'D2NET_V2' sound: Kconfig: drop unknown symbol ARCH_CLPS7500 ... Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new kconfig additions, close to removed commented-out old ones)	2012-01-08 13:21:22 -08:00
Alexandre Oliva	1bb91902dc	Btrfs: revamp clustered allocation logic Parameterize clusters on minimum total size, minimum chunk size and minimum contiguous size for at least one chunk, without limits on cluster, window or gap sizes. Don't tolerate any fragmentation for SSD_SPREAD; accept it for metadata, but try to keep data dense. Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2012-01-07 19:15:15 -05:00
Justin P. Mattock	cb54f2571f	btrfs: free-space-cache.c: remove extra semicolon. The patch below removes an extra semicolon. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> CC: Chris Mason <chris.mason@oracle.com> CC: linux-btrfs@vger.kernel.org Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-12-15 16:42:25 +01:00
Alexandre Oliva	b78d09bceb	Btrfs: reset cluster's max_size when creating bitmap The field that indicates the size of the largest contiguous chunk of free space in the cluster is not initialized when setting up bitmaps, it's only increased when we find a larger contiguous chunk. We end up retaining a larger value than appropriate for highly-fragmented clusters, which may cause pointless searches for large contiguous groups, and even cause clusters that do not meet the density requirements to be set up. Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-30 13:43:00 -05:00
Alexandre Oliva	f2d0f6765d	Btrfs: initialize new bitmaps' list We're failing to create clusters with bitmaps because setup_cluster_no_bitmap checks that the list is empty before inserting the bitmap entry in the list for setup_cluster_bitmap, but the list field is only initialized when it is restored from the on-disk free space cache, or when it is written out to disk. Besides a potential race condition due to the multiple use of the list field, filesystem performance severely degrades over time: as we use up all non-bitmap free extents, the try-to-set-up-cluster dance is done at every metadata block allocation. For every block group, we fail to set up a cluster, and after failing on them all up to twice, we fall back to the much slower unclustered allocation. To make matters worse, before the unclustered allocation, we try to create new block groups until we reach the 1% threshold, which introduces additional bitmaps and thus block groups that we'll iterate over at each metadata block request.	2011-11-30 18:46:06 +01:00
Chris Mason	24a7031396	Btrfs: remove free-space-cache.c WARN during log replay The log replay code only partially loads block groups, since the block group caching code is able to detect and deal with extents the logging code has pinned down. While the logging code is pinning down block groups, there is a bogus WARN_ON we're hitting if the code wasn't able to find an extent in the cache. This commit removes the warning because it can happen any time there isn't a valid free space cache for that block group. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-21 14:57:33 -05:00
Josef Bacik	f7d61dcd68	Btrfs: clear pages dirty for io and set them extent mapped When doing the io_ctl helpers to clean up the free space cache stuff I stopped using our normal prepare_pages stuff, which means I of course forgot to do things like set the pages extent mapped, which will cause us all sorts of wonderful propblems. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-11-20 07:42:17 -05:00
Li Zefan	52621cb6ed	Btrfs: avoid unnecessary bitmap search for cluster setup setup_cluster_no_bitmap() searches all the extents and bitmaps starting from offset. Therefore if it returns -ENOSPC, all the bitmaps starting from offset are in the bitmaps list, so it's sufficient to search from this list in setup_cluser_bitmap(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-20 07:42:15 -05:00
Li Zefan	0f0fbf1d0e	Btrfs: fix to search one more bitmap for cluster setup Suppose there are two bitmaps [0, 256], [256, 512] and one extent [100, 120] in the free space cache, and we want to setup a cluster with offset=100, bytes=50. In this case, there will be only one bitmap [256, 512] in the temporary bitmaps list, and then setup_cluster_bitmap() won't search bitmap [0, 256]. The cause is, the list is constructed in setup_cluster_no_bitmap(), and only bitmaps with bitmap_entry->offset >= offset will be added into the list, and the very bitmap that convers offset has bitmap_entry->offset <= offset. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-20 07:42:14 -05:00
Josef Bacik	2f120c05e6	Btrfs: only map pages if we know we need them when reading the space cache People have been running into a warning when loading space cache because the page is already mapped when trying to read in a bitmap. The way we read in entries and pages is kind of convoluted, so fix it so that io_ctl_read_entry maps the entries if it needs to, and if it hits the end of the page it simply unmaps the page. That way we can unconditionally unmap the io_ctl before reading in the bitmap and we should stop hitting these warnings. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-10 20:45:05 -05:00
Josef Bacik	c8174313a8	Btrfs: use the global reserve when truncating the free space cache inode We no longer use the orphan block rsv for holding the reservation for truncating the inode, so instead use the global block rsv and check to make sure it has enough space for us to truncate the space. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-06 03:03:50 -05:00
Chris Mason	1eae31e918	Btrfs: make sure btrfs_remove_free_space doesn't leak EAGAIN btrfs_remove_free_space needs to make sure to set ret back to a valid return value after setting it to EAGAIN, otherwise we return it to the callers. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-11-06 03:03:47 -05:00
Josef Bacik	016fc6a63e	Btrfs: don't flush the cache inode before writing it I noticed we had a little bit of latency when writing out the space cache inodes. It's because we flush it before we write anything in case we have dirty pages already there. This doesn't matter though since we're just going to overwrite the space, and there really shouldn't be any dirty pages anyway. This makes some of my tests run a little bit faster. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:13:01 -04:00
Josef Bacik	36ba022ac0	Btrfs: seperate out btrfs_block_rsv_check out into 2 different functions Currently btrfs_block_rsv_check does 2 things, it will either refill a block reserve like in the truncate or refill case, or it will check to see if there is enough space in the global reserve and possibly refill it. However because of overcommit we could be well overcommitting ourselves just to try and refill the global reserve, when really we should just be committing the transaction. So breack this out into btrfs_block_rsv_refill and btrfs_block_rsv_check. Refill will try to reserve more metadata if it can and btrfs_block_rsv_check will not, it will only tell you if the factor of the total space is still reserved. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:59 -04:00
Josef Bacik	5b0e95bf60	Btrfs: inline checksums into the disk free space cache Yeah yeah I know this is how we used to do it and then I changed it, but damnit I'm changing it back. The fact is that writing out checksums will modify metadata, which could cause us to dirty a block group we've already written out, so we have to truncate it and all of it's checksums and re-write it which will write new checksums which could dirty a blockg roup that has already been written and you see where I'm going with this? This can cause unmount or really anything that depends on a transaction to commit to take it's sweet damned time to happen. So go back to the way it was, only this time we're specifically setting NODATACOW because we can't go through the COW pathway anyway and we're doing our own built-in cow'ing by truncating the free space cache. The other new thing is once we truncate the old cache and preallocate the new space, we don't need to do that song and dance at all for the rest of the transaction, we can just overwrite the existing space with the new cache if the block group changes for whatever reason, and the NODATACOW will let us do this fine. So keep track of which transaction we last cleared our cache in and if we cleared it in this transaction just say we're all setup and carry on. This survives xfstests and stress.sh. The inode cache will continue to use the normal csum infrastructure since it only gets written once and there will be no more modifications to the fs tree in a transaction commit. Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:54 -04:00
Josef Bacik	549b4fdb8f	Btrfs: check the return value of filemap_write_and_wait in the space cache We need to check the return value of filemap_write_and_wait in the space cache writeout code. Also don't set the inode's generation until we're sure nothing else is going to fail. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:53 -04:00
Josef Bacik	a67509c300	Btrfs: add a io_ctl struct and helpers for dealing with the space cache In writing and reading the space cache we have one big loop that keeps track of which page we are on and then a bunch of sizeable loops underneath this big loop to try and read/write out properly. Especially in the write case this makes things hugely complicated and hard to follow, and makes our error checking and recovery equally as complex. So add a io_ctl struct with a bunch of helpers to keep track of the pages we have, where we are, if we have enough space etc. This unifies how we deal with the pages we're writing and keeps all the messy tracking internal. This allows us to kill the big loops in both the read and write case and makes reviewing and chaning the write and read paths much simpler. I've run xfstests and stress.sh on this code and it survives. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:52 -04:00
Josef Bacik	f75b130e9b	Btrfs: don't skip writing out a empty block groups cache I noticed a slight bug where we will not bother writing out the block group cache's space cache if it's space tree is empty. Since it could have a cluster or pinned extents that need to be written out this is just not a valid test. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:51 -04:00
Josef Bacik	3b16a4e3c3	Btrfs: use the inode's mapping mask for allocating pages Johannes pointed out we were allocating only kernel pages for doing writes, which is kind of a big deal if you are on 32bit and have more than a gig of ram. So fix our allocations to use the mapping's gfp but still clear __GFP_FS so we don't re-enter. Thanks, Reported-by: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:45 -04:00
Josef Bacik	4a92b1b8d2	Btrfs: stop passing a trans handle all around the reservation code The only thing that we need to have a trans handle for is in reserve_metadata_bytes and thats to know how much flushing we can do. So instead of passing it around, just check current->journal_info for a trans_handle so we know if we can commit a transaction to try and free up space or not. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:44 -04:00
Josef Bacik	c09544e07f	Btrfs: handle enospc accounting for free space inodes Since free space inodes now use normal checksumming we need to make sure to account for their metadata use. So reserve metadata space, and then if we fail to write out the metadata we can just release it, otherwise it will be freed up when the io completes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:42 -04:00
Josef Bacik	300e4f8a56	Btrfs: put the block group cache after we commit the super In moving some enospc stuff around I noticed that when we unmount we are often evicting the free space cache inodes before we do our last commit. This isn't bad, but it makes us constantly have to re-read the inodes back. So instead don't evict the cache until after we do our last commit, this will make things a little less crappy and makes a future enospc change work properly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:41 -04:00
Josef Bacik	a9b5fcddce	Btrfs: fix call to btrfs_search_slot in free space cache We are setting ins_len to 1 even tho we are just modifying an item that should be there already. This may cause the search stuff to split nodes on the way down needelessly. Set this to 0 since we aren't inserting anything. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:38 -04:00
Josef Bacik	482e6dc526	Btrfs: allow callers to specify if flushing can occur for btrfs_block_rsv_check If you run xfstest 224 it you will get lots of messages about not being able to delete inodes and that they will be cleaned up next mount. This is because btrfs_block_rsv_check was not calling reserve_metadata_bytes with the ability to flush, so if there was not enough space, it simply failed. But in truncate and evict case we could easily flush space to try and get enough space to do our work, so make btrfs_block_rsv_check take a flush argument to pass down to reserve_metadata_bytes. Now xfstests 224 runs fine without all those complaints. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:38 -04:00
Josef Bacik	6ab60601d5	Btrfs: ratelimit the generation printk for the free space cache A user reported getting spammed when moving to 3.0 by this message. Since we switched to the normal checksumming infrastructure all old free space caches will be wrong and need to be regenerated so people are likely to see this message a lot, so ratelimit it so it doesn't fill up their logs and freak them out. Thanks, Reported-by: Andrew Lutomirski <luto@mit.edu> Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:33 -04:00
Josef Bacik	fb25e9141a	Btrfs: use bytes_may_use for all ENOSPC reservations We have been using bytes_reserved for metadata reservations, which is wrong since we use that to keep track of outstanding reservations from the allocator. This resulted in us doing a lot of silly things to make sure we don't allocate a bunch of metadata chunks since we never had a real view of how much space was actually in use by metadata. This passes Arne's enospc test and xfstests as well as my own enospc tests. Hopefully this will get us moving in the right direction. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-10-19 15:12:30 -04:00
Liu Bo	65450aa645	Btrfs: reset to appropriate block rsv after orphan operations While truncating free space cache, we forget to change trans->block_rsv back to the original one, but leave it with the orphan_block_rsv, and then with option inode_cache enable, it leads to countless warnings of btrfs_alloc_free_block and btrfs_orphan_commit_root: WARNING: at fs/btrfs/extent-tree.c:5711 btrfs_alloc_free_block+0x180/0x350 [btrfs]() ... WARNING: at fs/btrfs/inode.c:2193 btrfs_orphan_commit_root+0xb0/0xc0 [btrfs]() Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-09-11 10:52:24 -04:00
Miao Xie	bb3ac5a4df	Btrfs: fix wrong free space information Btrfs subtracted the size of the allocated space twice when it allocated the space from the bitmap in the cluster, it broke the free space information and led to oops finally. And this patch also fixes the bug that ctl->free_space was subtracted without lock. Reported-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-08-16 21:09:31 -04:00
Josef Bacik	a94733d0bc	Btrfs: use find_or_create_page instead of grab_cache_page grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to GFP_HIGHUSER_MOVABLE. So instead use find_or_create_page in all cases where we need GFP_NOFS so we don't deadlock. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-07-27 12:46:43 -04:00
Josef Bacik	2f356126c5	Btrfs: use the normal checksumming infrastructure for free space cache We used to store the checksums of the space cache directly in the space cache, however that doesn't work out too well if we have more space than we can fit the checksums into the first page. So instead use the normal checksumming infrastructure. There were problems with doing this originally but those problems don't exist now so this works out fine. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-07-11 09:58:49 -04:00
Josef Bacik	9b90f51353	Btrfs: make sure to update total_bitmaps when freeing cache V3 A user reported this bug again where we have more bitmaps than we are supposed to. This is because we failed to load the free space cache, but don't update the ctl->total_bitmaps counter when we remove entries from the tree. This patch fixes this problem and we should be good to go again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-25 09:31:06 -04:00
Chris Mason	38e8788066	Btrfs: make sure to recheck for bitmaps in clusters Josef recently changed the free extent cache to look in the block group cluster for any bitmaps before trying to add a new bitmap for the same offset. This avoids BUG_ON()s due covering duplicate ranges. But it didn't go quite far enough. A given free range might span between one or more bitmaps or free space entries. The code has looping to cover this, but it doesn't check for clustered bitmaps every time. This shuffles our gotos to check for a bitmap in the cluster for every new bitmap entry we try to add. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-10 16:36:57 -04:00
Josef Bacik	f6a398298d	Btrfs: fix duplicate checking logic When merging my code into the integration test the second check for duplicate entries got screwed up. This patch fixes it by dropping ret2 and just using ret for the return value, and checking if we got an error before adding the bitmap to the local list. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-08 16:37:29 -04:00
Josef Bacik	2cdc342c20	Btrfs: fix bitmap regression In cleaning up the clustering code I accidently introduced a regression by adding bitmap entries to the cluster rb tree. The problem is if we've maxed out the number of bitmaps we can have for the block group we can only add free space to the bitmaps, but since the bitmap is on the cluster we can't find it and we try to create another one. This would result in a panic because the total bitmaps was bigger than the max bitmaps that were allowed. This patch fixes this by checking to see if we have a cluster, and then looking at the cluster rb tree to see if it has a bitmap entry and if it does and that space belongs to that bitmap, go ahead and add it to that bitmap. I could hit this panic every time with an fs_mark test within a couple of minutes. With this patch I no longer hit the panic and fs_mark goes to completion. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-08 16:37:28 -04:00
Josef Bacik	3de85bb95c	Btrfs: noinline the cluster searching functions When profiling the find cluster code it's hard to tell where we are spending our time because the bitmap and non-bitmap functions get inlined by the compiler, so make that not happen. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-08 15:08:30 -04:00
Josef Bacik	86d4a77ba3	Btrfs: cache bitmaps when searching for a cluster If we are looking for a cluster in a particularly sparse or fragmented block group, we will do a lot of looping through the free space tree looking for various things, and if we need to look at bitmaps we will endup doing the whole dance twice. So instead add the bitmap entries to a temporary list so if we have to do the bitmap search we can just look through the list of entries we've found quickly instead of having to loop through the entire tree again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>	2011-06-08 15:08:28 -04:00
David Sterba	7841cb2898	btrfs: add helper for fs_info->closing wrap checking of filesystem 'closing' flag and fix a few missing memory barriers. Signed-off-by: David Sterba <dsterba@suse.cz>	2011-06-04 08:11:22 -04:00
Chris Mason	4b9465cb9e	Btrfs: add mount -o inode_cache This makes the inode map cache default to off until we fix the overflow problem when the free space crcs don't fit inside a single page. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-04 08:03:47 -04:00
Chris Mason	211f96c24f	Btrfs: make sure we don't overflow the free space cache crc page The free space cache uses only one page for crcs right now, which means we can't have a cache file bigger than the crcs we can fit in the first page. This adds a check to enforce that restriction. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-06-04 08:03:43 -04:00

... 2 3 4 5 6 ...

411 Commits