linux-stable/fs/erofs
Sandeep Dhavale 3fffb589b9 erofs: add per-cpu threads for decompression as an option
Using per-cpu thread pool we can reduce the scheduling latency compared
to workqueue implementation. With this patch scheduling latency and
variation is reduced as per-cpu threads are high priority kthread_workers.

The results were evaluated on arm64 Android devices running 5.10 kernel.

The table below shows resulting improvements of total scheduling latency
for the same app launch benchmark runs with 50 iterations. Scheduling
latency is the latency between when the task (workqueue kworker vs
kthread_worker) became eligible to run to when it actually started
running.
+-------------------------+-----------+----------------+---------+
|                         | workqueue | kthread_worker |  diff   |
+-------------------------+-----------+----------------+---------+
| Average (us)            |     15253 |           2914 | -80.89% |
| Median (us)             |     14001 |           2912 | -79.20% |
| Minimum (us)            |      3117 |           1027 | -67.05% |
| Maximum (us)            |     30170 |           3805 | -87.39% |
| Standard deviation (us) |      7166 |            359 |         |
+-------------------------+-----------+----------------+---------+

Background: Boot times and cold app launch benchmarks are very
important to the Android ecosystem as they directly translate to
responsiveness from user point of view. While EROFS provides
a lot of important features like space savings, we saw some
performance penalty in cold app launch benchmarks in few scenarios.
Analysis showed that the significant variance was coming from the
scheduling cost while decompression cost was more or less the same.

Having per-cpu thread pool we can see from the above table that this
variation is reduced by ~80% on average. This problem was discussed
at LPC 2022. Link to LPC 2022 slides and talk at [1]

[1] https://lpc.events/event/16/contributions/1338/

[ Gao Xiang: At least, we have to add this until WQ_UNBOUND workqueue
             issue [2] on many arm64 devices is resolved. ]
[2] https://lore.kernel.org/r/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com

Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230208093322.75816-1-hsiangkao@linux.alibaba.com
2023-02-15 08:11:26 +08:00
..
compress.h erofs: introduce multi-reference pclusters (fully-referenced) 2022-07-22 21:44:27 +08:00
data.c erofs: simplify iloc() 2023-02-15 08:11:24 +08:00
decompressor.c erofs: support interlaced uncompressed data for compressed files 2022-09-23 10:55:56 +08:00
decompressor_lzma.c erofs: introduce partial-referenced pclusters 2022-09-26 23:55:43 +08:00
dir.c erofs: get rid of debug_one_dentry() 2023-02-15 08:11:23 +08:00
erofs_fs.h erofs: introduce partial-referenced pclusters 2022-09-26 23:55:43 +08:00
fscache.c Changes since the last update: 2022-12-12 20:14:04 -08:00
inode.c erofs: simplify iloc() 2023-02-15 08:11:24 +08:00
internal.h erofs: tidy up internal.h 2023-02-15 08:11:25 +08:00
Kconfig erofs: add per-cpu threads for decompression as an option 2023-02-15 08:11:26 +08:00
Makefile erofs: register fscache volume 2022-05-18 00:11:19 +08:00
namei.c erofs: get rid of erofs_inode_datablocks() 2023-02-15 08:11:24 +08:00
pcpubuf.c erofs: get rid of ->lru usage 2021-10-25 08:22:59 +08:00
super.c erofs: remove linux/buffer_head.h dependency 2023-02-15 08:11:10 +08:00
sysfs.c erofs: fix use-after-free of fsid and domain_id string 2022-11-10 09:53:20 +08:00
utils.c mm: shrinkers: provide shrinkers with names 2022-07-03 18:08:40 -07:00
xattr.c erofs: simplify iloc() 2023-02-15 08:11:24 +08:00
xattr.h erofs: clean up unnecessary code and comments 2022-09-27 17:27:25 +08:00
zdata.c erofs: add per-cpu threads for decompression as an option 2023-02-15 08:11:26 +08:00
zmap.c erofs: get rid of z_erofs_do_map_blocks() forward declaration 2023-02-15 08:11:25 +08:00