No description
Find a file
Wengang Wang c3ea265470 ocfs2: initialize ip_next_orphan
commit f5785283dd upstream.

Though problem if found on a lower 4.1.12 kernel, I think upstream has
same issue.

In one node in the cluster, there is the following callback trace:

   # cat /proc/21473/stack
   __ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2]
   ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2]
   ocfs2_evict_inode+0x152/0x820 [ocfs2]
   evict+0xae/0x1a0
   iput+0x1c6/0x230
   ocfs2_orphan_filldir+0x5d/0x100 [ocfs2]
   ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2]
   ocfs2_dir_foreach+0x29/0x30 [ocfs2]
   ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2]
   ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2]
   process_one_work+0x169/0x4a0
   worker_thread+0x5b/0x560
   kthread+0xcb/0xf0
   ret_from_fork+0x61/0x90

The above stack is not reasonable, the final iput shouldn't happen in
ocfs2_orphan_filldir() function.  Looking at the code,

  2067         /* Skip inodes which are already added to recover list, since dio may
  2068          * happen concurrently with unlink/rename */
  2069         if (OCFS2_I(iter)->ip_next_orphan) {
  2070                 iput(iter);
  2071                 return 0;
  2072         }
  2073

The logic thinks the inode is already in recover list on seeing
ip_next_orphan is non-NULL, so it skip this inode after dropping a
reference which incremented in ocfs2_iget().

While, if the inode is already in recover list, it should have another
reference and the iput() at line 2070 should not be the final iput
(dropping the last reference).  So I don't think the inode is really in
the recover list (no vmcore to confirm).

Note that ocfs2_queue_orphans(), though not shown up in the call back
trace, is holding cluster lock on the orphan directory when looking up
for unlinked inodes.  The on disk inode eviction could involve a lot of
IOs which may need long time to finish.  That means this node could hold
the cluster lock for very long time, that can lead to the lock requests
(from other nodes) to the orhpan directory hang for long time.

Looking at more on ip_next_orphan, I found it's not initialized when
allocating a new ocfs2_inode_info structure.

This causes te reflink operations from some nodes hang for very long
time waiting for the cluster lock on the orphan directory.

Fix: initialize ip_next_orphan as NULL.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201109171746.27884-1-wen.gang.wang@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-18 18:27:58 +01:00
arch ARM: 9019/1: kprobes: Avoid fortify_panic() when copying optprobe template 2020-11-18 18:27:56 +01:00
block blk-cgroup: Pre-allocate tree node on blkg_conf_prep 2020-11-10 10:29:05 +01:00
certs Replace magic for trusting the secondary keyring with #define 2018-09-09 19:55:54 +02:00
crypto crypto: algif_skcipher - EBUSY on aio should be an error 2020-10-29 09:07:01 +01:00
Documentation media: videodev2.h: RGB BT2020 and HSV are always full range 2020-11-05 11:06:54 +01:00
drivers mei: protect mei_cl_mtu from null dereference 2020-11-18 18:27:58 +01:00
firmware Fix built-in early-load Intel microcode alignment 2020-01-23 08:20:30 +01:00
fs ocfs2: initialize ip_next_orphan 2020-11-18 18:27:58 +01:00
include can: can_create_echo_skb(): fix echo skb generation: always use skb_clone() 2020-11-18 18:27:54 +01:00
init printk: reduce LOG_BUF_SHIFT range for H8300 2020-11-05 11:06:55 +01:00
ipc ipc/util.c: sysvipc_find_ipc() incorrectly updates position index 2020-05-20 08:17:07 +02:00
kernel futex: Don't enable IRQs unconditionally in put_pi_state() 2020-11-18 18:27:58 +01:00
lib lib/crc32test: remove extra local_irq_disable/enable 2020-11-10 10:29:04 +01:00
mm mm: mempolicy: fix potential pte_unmap_unlock pte error 2020-11-18 18:27:52 +01:00
net cfg80211: regulatory: Fix inconsistent format argument 2020-11-18 18:27:56 +01:00
samples misc: vop: add round_up(x,4) for vring_size to avoid kernel panic 2020-10-29 09:07:17 +01:00
scripts scripts/setlocalversion: make git describe output more reliable 2020-11-05 11:06:51 +01:00
security ima: Don't ignore errors from crypto_shash_update() 2020-10-29 09:07:00 +01:00
sound ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link() 2020-11-18 18:27:53 +01:00
tools perf tools: Add missing swap for ino_generation 2020-11-18 18:27:53 +01:00
usr initramfs: restore default compression behavior 2020-04-13 10:34:19 +02:00
virt KVM: fix overflow of zero page refcount with ksm running 2020-10-01 13:12:33 +02:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore kbuild: rpm-pkg: keep spec file until make mrproper 2018-02-13 10:19:46 +01:00
.mailmap .mailmap: Add Maciej W. Rozycki's Imagination e-mail address 2017-11-10 12:16:15 -08:00
COPYING
CREDITS MAINTAINERS: update TPM driver infrastructure changes 2017-11-09 17:58:40 -08:00
Kbuild License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
MAINTAINERS MAINTAINERS: Update drm/i915 bug filing URL 2020-02-28 16:36:12 +01:00
Makefile Linux 4.14.206 2020-11-10 21:10:29 +01:00
README README: add a new README file, pointing to the Documentation/ 2016-10-24 08:12:35 -02:00

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.